papers
2025
- AoSStrong approximations for empirical processes indexed by Lipschitz functionsMatias D Cattaneo, and Ruiqi Rae YuAnnals of Statistics, 2025
This paper presents new uniform Gaussian strong approximations for empirical processes indexed by classes of functions based on \(d\)-variate random vectors (\(d \geq1 \)). First, a uniform Gaussian strong approximation is established for general empirical processes indexed by possibly Lipschitz functions, improving on previous results in the literature. In the setting considered by [Rio (1994)], and if the function class is Lipschitzian, our result improves the approximation rate \(n^{-1/(2d)}\) to \(n^{-1/\max\{d,2\}}\), up to a \(polylog(n)\) term, where \(n\) denotes the sample size. Remarkably, we establish a valid uniform Gaussian strong approximation at the rate \(n^{-1/2}\log n\) for \(d=2\), which was previously known to be valid only for univariate (\(d=1\)) empirical processes via the celebrated Hungarian construction [Komlós, Major, Tusnády (1975)]. Second, a uniform Gaussian strong approximation is established for multiplicative separable empirical processes indexed by possibly Lipschitz functions, which addresses some outstanding problems in the literature [Chernozhukov, Chetverikov, Kato (2014)]. Finally, two other uniform Gaussian strong approximation results are presented when the function class is a sequence of Haar basis based on quasi-uniform partitions. Applications to nonparametric density and regression estimation are discussed.
@article{cattaneo2024strong, title = {Strong approximations for empirical processes indexed by Lipschitz functions}, author = {Cattaneo, Matias D and Yu, Ruiqi Rae}, journal = {Annals of Statistics}, year = {2025}, url = {https://projecteuclid.org/journals/annals-of-statistics/volume-53/issue-3/Strong-approximations-for-empirical-processes-indexed-by-Lipschitz-functions/10.1214/25-AOS2500.full}, }
- arXivThe Honest Truth About Causal Trees: Accuracy Limits for Heterogeneous Treatment Effect EstimationMatias D Cattaneo, Jason M Klusowski, and Ruiqi Rae YuarXiv preprint arXiv:2509.11381, 2025
This paper studies recursive decision trees for heterogeneous causal treatment effect estimation and inference in experimental and observational settings. These procedures are typically fitted with the CART (Classification and Regression Tree) algorithm [Breiman et al. (1984)] or close variants, and thus are often believed to be “adaptive” to high-dimensional data, sparsity, or other structural features of the data-generating process. Building on the “honest” causal trees proposed by [Athey & Imbens (2016)], which have become standard in academia and industry, we analyze those estimators (and variants) and establish lower bounds on their estimation error. We show that these popular heterogeneous-treatment-effect estimators cannot attain a polynomial-in-\(n\) convergence rate under basic conditions, where \(n\) denotes the sample size. Contrary to common belief, honesty does not remove these limitations and at best yields negligible logarithmic improvements in sample size or dimension. Consequently, these widely used estimators can perform poorly in practice and may even be inconsistent in some settings. Theoretical insights are corroborated with simulation evidence.
@article{cattaneo2025honest, title = {The Honest Truth About Causal Trees: Accuracy Limits for Heterogeneous Treatment Effect Estimation}, author = {Cattaneo, Matias D and Klusowski, Jason M and Yu, Ruiqi Rae}, journal = {arXiv preprint arXiv:2509.11381}, year = {2025}, url = {https://arxiv.org/abs/2509.11381}, }
- arXivRobust Inference for the Direct Average Treatment Effect with Treatment Assignment InterferenceMatias D Cattaneo, Yihan He, and Ruiqi Rae YuarXiv preprint arXiv:2502.13238, 2025
Uncertainty quantification in causal inference settings with random network interference is a challenging open problem. We study the large sample distributional properties of the classical difference-in-means Hajek treatment effect estimator, and propose a robust inference procedure for the (conditional) direct average treatment effect, allowing for cross-unit interference in both the outcome and treatment equations. Leveraging ideas from statistical physics, we introduce a novel Ising model capturing interference in the treatment assignment, and then obtain three main results. First, we establish a Berry-Esseen distributional approximation pointwise in the degree of interference generated by the Ising model. Our distributional approximation recovers known results in the literature under no-interference in treatment assignment, and also highlights a fundamental fragility of inference procedures developed using such a pointwise approximation. Second, we establish a uniform distributional approximation for the Hajek estimator, and develop robust inference procedures that remain valid regardless of the unknown degree of interference in the Ising model. Third, we propose a novel resampling method for implementation of robust inference procedure. A key technical innovation underlying our work is a new \textitDe-Finetti Machine that facilitates conditional i.i.d. Gaussianization, a technique that may be of independent interest in other settings.
@article{cattaneo2025robust, title = {Robust Inference for the Direct Average Treatment Effect with Treatment Assignment Interference}, author = {Cattaneo, Matias D and He, Yihan and Yu, Ruiqi Rae}, journal = {arXiv preprint arXiv:2502.13238}, year = {2025}, url = {https://arxiv.org/abs/2502.13238}, }
- arXivrd2d: Causal Inference in Boundary Discontinuity DesignsMatias D Cattaneo, Rocio Titiunik, and Ruiqi Rae YuarXiv preprint arXiv:2505.07989, 2025
Boundary discontinuity designs—also known as Multi-Score Regression Discontinuity (RD) designs, with Geographic RD designs as a prominent example—are often used in empirical research to learn about causal treatment effects along a continuous assignment boundary defined by a bivariate score. This article introduces the R package ‘rd2d‘, which implements and extends the methodological results developed in Cattaneo et al. [2025] for boundary discontinuity designs. The package employs local polynomial estimation and inference using either the bivariate score or a univariate distance-to-boundary metric. It features novel data-driven bandwidth selection procedures, and offers both pointwise and uniform estimation and inference along the assignment boundary. The numerical performance of the package is demonstrated through a simulation study.
@article{cattaneo2025rd2d, title = {rd2d: Causal Inference in Boundary Discontinuity Designs}, author = {Cattaneo, Matias D and Titiunik, Rocio and Yu, Ruiqi Rae}, journal = {arXiv preprint arXiv:2505.07989}, year = {2025}, url = {https://arxiv.org/abs/2505.07989}, }
- working paperEstimation and Inference in Boundary Discontinuity Designs: Location-Based MethodsMatias D Cattaneo, Rocio Titiunik, and Ruiqi Rae Yu2025
Boundary discontinuity designs are used to learn about causal treatment effects along a continuous assignment boundary that splits units into control and treatment groups according to a bivariate location score. We analyze the statistical properties of local polynomial treatment effect estimators employing location information for each unit. We develop pointwise and uniform estimation and inference methods for both the conditional treatment effect function at the assignment boundary as well as for transformations thereof, which aggregate information along the boundary. We illustrate our methods with an empirical application. Companion general-purpose software is provided.
@article{cattaneo2025location, title = {Estimation and Inference in Boundary Discontinuity Designs: Location-Based Methods}, author = {Cattaneo, Matias D and Titiunik, Rocio and Yu, Ruiqi Rae}, year = {2025}, }
- working paperEstimation and Inference in Boundary Discontinuity Designs: Distance-Based MethodsMatias D Cattaneo, Rocio Titiunik, and Ruiqi Rae Yu2025
We study the statistical properties of nonparametric distance-based (isotropic) local polynomial regression estimators of the conditional average treatment effect at the boundary, a key causal functional parameter capturing heterogeneous treatment effects in boundary discontinuity designs. We present necessary and/or sufficient conditions for identification, estimation and inference in large samples, both pointwise and uniformly along the assignment boundary. Our theoretical results highlight the crucial role played by the “regularity" of the boundary (a one-dimensional manifold) over which identification, estimation and inference is conducted. Our methods are illustrated with simulated and real-world data. Companion general-purpose software is provided.
@article{cattaneo2025distance, title = {Estimation and Inference in Boundary Discontinuity Designs: Distance-Based Methods}, author = {Cattaneo, Matias D and Titiunik, Rocio and Yu, Ruiqi Rae}, year = {2025}, }
- working paperIdentification, Estimation, and Inference for Boundary Average Treatment EffectsMatias D Cattaneo, Rocio Titiunik, and Ruiqi Rae Yu2025
Boundary Discontinuity designs are used to learn about average treatment effects at a continuous boundary that splits units into control and treatment groups according to a bivariate score variable. This research design is also called Multi-Score Regression Discontinuity design, a leading special case being the Geographic Regression discontinuity Design.
@article{cattaneo2025boundaryavg, title = {Identification, Estimation, and Inference for Boundary Average Treatment Effects}, author = {Cattaneo, Matias D and Titiunik, Rocio and Yu, Ruiqi Rae}, year = {2025}, }