More information coming soon.
One hopes that statistics will be used to analyze data and make beneficial decisions regarding people's health, finances, and well-being. But the data fed to a statistical analysis may systematically differ from the data where these decisions are ultimately applied. For instance, suppose we analyze data in one country and conclude that microcredit is effective at alleviating poverty; based on this analysis, we decide to distribute microcredit in other locations and in future years. We might then ask: can we trust our conclusion to apply under new conditions? If we found that a very small percentage of the original data was instrumental in determining the original conclusion, we might expect the conclusion to be unstable under new conditions. So we propose a method to assess the sensitivity of data analyses to the removal of a small fraction of the data set. Analyzing all possible data subsets of a certain size is computationally prohibitive, so we provide an approximation. We call our resulting metric the Approximate Maximum Influence Perturbation. Our approximation is automatically computable and works for common estimators—including (but not limited to) OLS, IV, GMM, MLE, MAP, and variational Bayes. We show that any non-robustness our metric finds is conclusive. Empirics demonstrate that while some applications are robust, in others the sign of a treatment effect can be changed by dropping much less than 1% of the data—even in simple models and even when standard errors are small.
Tamara Broderick is an
Associate Professor in the Department of Electrical Engineering and
Computer Science at MIT. She is a member of the MIT Computer Science
and Artificial Intelligence Laboratory (CSAIL), the MIT Statistics and
Data Science Center, and the Institute for Data, Systems, and Society
(IDSS). She completed her Ph.D. in Statistics at the University of
California, Berkeley in 2014. Previously, she received an AB in
Mathematics from Princeton University (2007), a Master of Advanced
Study for completion of Part III of the Mathematical Tripos from the
University of Cambridge (2008), an MPhil by research in Physics from
the University of Cambridge (2009), and an MS in Computer Science from
the University of California, Berkeley (2013). Her recent research has
focused on developing and analyzing models for scalable Bayesian
machine learning. She has been awarded selection to the COPSS
Leadership Academy (2021), an Early Career Grant (ECG) from the Office
of Naval Research (2020), an AISTATS Notable Paper Award (2019), an
NSF CAREER Award (2018), a Sloan Research Fellowship (2018), an Army
Research Office Young Investigator Program (YIP) award (2017), Google
Faculty Research Awards, an Amazon Research Award, the ISBA Lifetime
Members Junior Researcher Award, the Savage Award (for an outstanding
doctoral dissertation in Bayesian theory and methods), the Evelyn Fix
Memorial Medal and Citation (for the Ph.D. student on the Berkeley
campus showing the greatest promise in statistical research), the
Berkeley Fellowship, an NSF Graduate Research Fellowship, a Marshall
Scholarship, and the Phi Beta Kappa Prize (for the graduating
Princeton senior with the highest academic average).
We consider the Bayesian analysis of models in which the unknown distribution of the outcomes is specified up to a set of conditional moment restrictions. The nonparametric exponentially tilted empirical likelihood function is constructed to satisfy a sequence of unconditional moments based on an increasing (in sample size) vector of approximating functions (such as tensor splines based on the splines of each conditioning variable). For any given sample size, results are robust to the number of expanded moments. We derive Bernstein-von Mises theorems for the behavior of the posterior distribution under both correct and incorrect specification of the conditional moments, subject to growth rate conditions (slower under misspecification) on the number of approximating functions. A large-sample theory for comparing different conditional moment models is also developed. The central result is that the marginal likelihood criterion selects the model that is less misspecified. We also introduce sparsity-based model search for high-dimensional conditioning variables, and provide efficient MCMC computations for high-dimensional parameters. Along with clarifying examples, the framework is illustrated with real-data applications to risk-factor determination in finance, and causal inference under conditional ignorability.
Siddhartha Chib is the Harry
C. Hartkopf Professor of Econometrics and Statistics in the Olin
Business School at Washington University in St. Louis. He received his
bachelor's degree from Delhi University in 1979 and he earned his
Ph.D. in Economics from the University of California, Santa Barbara in
1986. He works in Bayesian statistics, econometrics and Markov chain
Monte Carlo (MCMC) methods. Professor Chib is a Fellow of the American
Statistical Association, of the journal Econometrics, and of the
International Society of Bayesian Analysis. He is an Associate Editor
of the Journal of Computational and Graphical Statistics, and
Statistics and Computing. He also directs the annual NBER-NSF
sponsored Seminar in Bayesian Econometrics and Statistics (SBIES),
which features presentations by young and established researchers
working on the theory and application of Bayesian methods. He teaches
statistics and econometrics to students in the MBA, specialized MS,
and doctoral programs.
Single-case experimental designs hold great promise for enabling participants to create personalized protocols to make personalized treatment decisions. The most common single- case design used in the health sciences is the N-of-1 trial, a multi-crossover withdrawal/reversal design in which participants receive a set of two or more treatments multiple times in a randomized order. Other forms such as the multiple-baseline (stepped wedge) design are also common in the behavioral and social sciences. In contrast to traditional group or cluster randomized designs, single-case designs can measure individual treatment efficacy. By combining single-case trials in a multilevel structure, they can also assess average treatment effects in populations and subgroups and treatment effect heterogeneity. Implementation of single-case designs has lagged in healthcare because of a lack of infrastructure for designing and running the trials, analyzing their data and reporting and interpreting their results. Mobile device applications provide a means to implement such trials on a large scale. We discuss some completed and ongoing N-of-1 studies in which we have combined mobile device applications with server-driven statistical analytics using an R package to return results to individuals. Issues that arise include defining treatments and sequences of treatments, synthesizing treatment networks, incorporating patient-specific prior information, automating the choice of appropriate statistical models and assessment of model assumptions, and automating graphical displays and text to facilitate appropriate interpretation by non-technical users. Development of smart tools that solve these problems could help to transform health care research by expanding the settings in which it is carried out and making findings directly applicable to and interpretable by individual trial participants.
Dr. Schmid is Professor
and Chair of Biostatistics at Brown University School of Public Health
where he co-founded the Center for Evidence Synthesis in Health. He
directs Biostatistics, Epidemiology and Research Design (BERD) Core of
the Rhode Island Center to Advance Translational Science. He is a
Fellow of the American Statistical Association, founding Editor of the
journal Research Synthesis Methods, long-time statistical editor of
the American Journal of Kidney Diseases and former member of the Drug
Safety and Risk Management Committee for FDA. His research focuses on
Bayesian methods for meta-analysis, methods for developing and
assessing predictive models using data from multiple sources and on
analysis of data from N-of-1 trials. Dr. Schmid graduated from
Haverford College with a BA in Mathematics and received his PhD in
Statistics from Harvard University.