Seminar Details
2026-02-05 (13:00) : Fair Tabular Data Generation: an Approach using Autoregressive Decision Trees
At Nyquist Maxwell a.164
Organized by Computer Science and Engineering
Speaker :
Benoît Ronval (ICTEAM)
Abstract :
In both research and industry, tabular data is among the most widely used data types. Represented using instances (rows) with features (columns), such data is easy for humans to interpret and readily usable by most machine learning algorithms.
Despite its large usage, acquiring new tabular data can be challenging. Data collection can require access to private sources or large-scale surveys, which may be costly and may suffer from low response rates. Moreover, real-world tabular datasets frequently exhibit bias, leading machine learning models to produce unfair classifications for certain subgroups, particularly with respect to sensitive attributes such as the nationality or the education level of a person.
In this seminar, I will present our new method TabFairGDT, which aims to generate data that can reduce fairness concerns in the predictions of machine learning models trained on this data. The approach leverages decision trees in an autoregressive generation framework, including a fairness optimization step. I will also discuss the advantages of decision trees for tabular data generation and present experimental results, including classification performance, fairness metrics, and data quality analyses.
