Unless otherwise specified, the morning session of a short course is from 9:00 AM to 12:30 PM EST, and the afternoon session is from 1:30 PM to 5:00 PM EST.
Peter Mueller is
Professor in the Department of Statistics and Data Science and
in the Department of Mathematics at UT Austin. Before coming
to Austin he served on the faculty in the Institute of
Statistics and Decision Science at Duke University, and in the
Department of Biostatistics at M.D. Anderson Cancer Center. He
received his Ph.D. degree in statistics from the Purdue
University in 1991. Dr. Mueller's current major area of
interest is the theory and application of statistics to
biomedical problems. In particular, he has developed methods
for non-parametric Bayesian data analysis, semi-parametric
statistical methods for repeated measurement data, simulation
based approaches to optimal design, innovative clinical trial
designs, model based smoothing methods, and simulation based
methods for posterior inference. He is an elected fellow of
the ASA, the IMS, and ISBA, recipient of the Zellner Medal,
and served as president of ISBA, chair of the ISBA/BNP section
and the ASA/SBSS section.
We discuss Bayesian approaches to clinical trial design, focusing on early phase studies. We start with a brief review of Bayesian inference, to introduce notation and concepts. The discussion inludes Bayesian decision problems that add a formal description of selecting optimal actions in the context of Bayesian inference. We review the setup and elements of basic decision problems. We then proceed with a review of Bayesian approaches to phase I designs, including CRM (O'Quigley et al., 1990), EWOC (Tighiouart and Rogatko, 2010), Bayesian logistic regression (Neuenschwander et al., 2008), mTPI (Ji et al., 2010) and BOIN (Liu and Yuan, 2015). We summarize some recent results (Duan et al, 2022) showing how these designs can be represented as special instances of a more general unified decision-theoretic formulation of the phase I design problem.
In the afternoon we will discuss general notions of adaptive Bayesian designs for phase II trials, including in particular (adaptive) sequential stopping. We will review master protocols and underlying hierarchical models for Bayesian inference across related cohorts (sub-models). As a last major theme we discuss challenges and opportunities related to using real world data (RWD) in clinical trial design. We will introduce methods that exploit RWD and historical trials for prior construction. This includes historical data priors (Chen and Ibrahim, 2000), commensurate priors (Hobbs et al., 2011), MAP priors (Neuenschwander et al., 2010) and robust MAP priors (Schmidli et al., 2014). Several methods use propensity scores to adjust for a lack of randomization, including approaches proposed in Liu et al. (2021), Chen et al. (2020) and Wang and Rosner (2019). Finally, we discuss in more detail a recently proposed approach by Chandra et al. (2022).
Dr. Hongtu Zhu is a
tenured professor of biostatistics and computer science at
University of North Carolina at Chapel Hill (UNC-CH). He was
DiDi Fellow and Chief Scientist of Statistics at DiDi Chuxing,
an IOT company and was Endowed Bao-Shan Jing Professorship in
Diagnostic Imaging at MD Anderson Cancer Center. He is an
internationally recognized expert in statistical learning,
medical image analysis, precision medicine, biostatistics,
artificial intelligence, and big data analytics. He has been
an elected Fellow of American Statistical Association and
Institute of Mathematical Statistics since 2011. He received
an established investigator award from Cancer Prevention
Research Institute of Texas in 2016 and received the INFORMS
Daniel H. Wagner Prize for Excellence in Operations Research
Practice in 2019. His google citation is over 19000+ since
2001. His group pioneers in the joint analysis of imaging,
clinical, and genetic data from large-scale biobank studies,
such as the UK Biobank. He has published more than 300+ papers
in top journals including Nature, Science, Cell, Nature
Genetics, PNAS, Biometrika, JASA, AOS, and JRSSB, as well as
45+ conference papers in top conferences including NeurIPS,
AAAI, KDD, ICDM, MICCAI, and IPMI. He has served/is serving an
editorial board member of premier international journals
including Statistica Sinica, JRSSB, AOS, and JASA.
With modern imaging techniques, massive imaging data can be observed over both time and space. Such imaging techniques include functional magnetic resonance imaging (fMRI), electroencephalography (EEG), diffusion tensor imaging (DTI), positron emission tomography (PET), and single photon emission-computed tomography (SPECT) among many other imaging techniques. The subject of medical imaging analysis has exploded from simple algebraic operations on imaging data to advanced statistical and mathematical methods on imaging data. This course is designed to provide students advanced topics on statistical learning methods for medical imaging data.
This course is designed for researchers and students who wish to analyze and model medical image data quantitatively. The course material is applicable to a wide variety of medical and biological imaging problems. The topics cover some basic neuroimaging modalities, shape representation, population statistics, manifold-data analysis, big-data integration, imaging genetics, and mapping genetic-imaging-clinical networks.
Yong Chen is an Associate
Professor of Biostatistics in the Department of Biostatistics,
Epidemiology and Informatics at the University of
Pennsylvania. He has a strong interest in statistical theory
with a focus on robust inference, and methodological research
on leveraging large healthcare data (EHR data, administrative
claims data) for evidence-based medicine and personalized
disease prevention/intervention strategies. He is keen in
developing informatics and statistical methods with associated
software, using EHR data, to facilitate evidence extraction
and synthesis for comparative effectiveness studies, as well
as well-calibrated risk prediction models for aiding clinical
decision-making. He has published over 160 peer-reviewed
papers in statistical inference, medical informatics,
comparative effectiveness research, and biomedical sciences.
He has taught short courses at JSM, ENAR, the Deming
Conference on Applied Statistics, ICSA annual conference, and
workshops at the University of Pennsylvania.
The widespread adoption of electronic health records (EHR) has created a vast resource for the study of treatments and health outcomes in the general population. The 21st Century Cures Act and the FDA’s subsequent publication of a framework for using real world data (RWD) to generate real world evidence (RWE) has spurred additional interest in using EHR to generate RWE. While there are many benefits to conducting research with RWD, many challenges arise due to the complex and messy processes that give rise to EHR data. To make valid inference, statisticians must be aware of data generation, capture, integration, and availability issues and utilize appropriate study designs and statistical analysis methods to account for these issues. In this half-day short course, we will discuss key issues for research conducted using RWD, including error in covariates and outcomes extracted from EHR data; and synthesize evidence from EHR data across heterogeneous clinical sites. For each issue we will present a motivating case study to focus our discussion and use this to spur thinking about the pros and cons of using RWD for a given research question and alternative methodological choices that can strengthen inference. The overarching goal is to provide participants with a framework for thinking about the design and analysis of EHR-based studies to help guide their use of statistical best practices in the conduct of their own research.
Sally Paganin is a
Research fellow in the Department of Biostatistics at Harvard
T.H. Chan School of Public Health, currently working on
statistical methods for early cancer detection. Previously,
she was Postdoctoral Researcher at UC Berkeley, where she
worked on Bayesian methodology and algorithms, contributing to
the NIMBLE project. Her research focuses on latent variable
models and Bayesian nonparametrics, along with the development
of statistical software and algorithms.
NIMBLE is a system for building and sharing methods for statistical models, especially for hierarchical models and computationally-intensive methods. NIMBLE is built in R but compiles models and algorithms using C++ for speed. The resulting objects are manipulated from R without any need for analysts to program in C++. NIMBLE provides analysts with a flexible system for using MCMC, sequential Monte Carlo, MCEM, and other techniques, along with the ability to write computationally efficient algorithms in an R-like syntax that can be easily disseminated.
This workshop will introduce the NIMBLE system and demonstrate how one can use NIMBLE to:
Participants should have a basic understanding of Bayesian/hierarchical models and of one or more algorithms such as MCMC. Some experience with R is also expected. Please bring a laptop; I’ll give instructions in advance for installing NIMBLE.
Dr. Sy Han
(Steven) Chiou is an assistant professor in the Department of
Mathematical Sciences at the University of Texas at Dallas
(UTD). Before joining UTD, Dr. Chiou was a postdoctoral
research fellow in the Department of Biostatistics at the
Harvard T.H. Chan School of Public Health during 2015-2017 and
an assistant professor in the Department of Mathematics at the
University of Minnesota Duluth during 2013-2015. Dr. Chiou
received his PhD in Statistics from the University of
Connecticut in 2013. Dr. Chiou's primary research interests
focus on addressing important questions that arise with data
under complicated sampling schemes, dependent truncation, and
recurrent event data. Dr. Chiou has developed several R
packages in the related topic include aftgee, reReg, rocTree,
and spef. Dr. Chiou is an Elected Member of the International
Statistical Institute.
Dr. Jun Yan is a Professor
in the Department of Statistics at the University of
Connecticut (UConn) and a Research Fellow in the Center for
Population Health at UConn Health. He received his PhD in
Statistics from University of Wisconsin--Madison in 2003. He
was on the faculty of the Department of Statistics and
Actuarial Science at the University of Iowa for four years
before joining UConn in 2007. Dr. Yan's methodological
research interests include survival analysis, clustered data
analysis, spatial extremes, and statistical computing. His
application domains are public health, environmental sciences,
and sports. With a special interest in making his statistical
methods available via open source software, he and his
coauthors developed and maintain a collection of R packages in
the public domain. Since July 2020, he has been the editor of
the Journal of Data Science and led the reform of the journal.
Dr. Yan is an Elected Member of the International Statistical
Institute and a fellow of the American Statistical
Association.
This course will provide a comprehensive and practical introduction to analyzing data in the form of time-to-event or survival times. We will begin with the fundamental survival analysis concepts and techniques, including censoring, truncation, survival functions, hazard function, Kaplan-Meier curves, and log-rank tests. Regression analysis covers the Cox proportional hazards model and the accelerated failure time (AFT) model. The models will be extended to allow a cure rate and variable selection. Multivariate event times will be analyzed with marginal Cox or AFT models. A special case type of multivaraite event time data is recurrent events. Standard survival analysis methods that focus only on time to the first event cannot capture the cumulative experience of the recurrent events and could lead to invalid inferences. Thus, the development of statistical methods that appropriately address the structure of recurrent events has attracted considerable attention. We will introduce virtualization tools, nonparametric estimation, and regression analysis in recurrent event data. All statistical analysis will be illustrated with practical applications in R.
This course's intended audience includes researchers who want to gain basic exposure to analyzing time-to-event data with the ultimate goal of incorporating R into their research programs.
Introductory statistics; entry level of R knowledge; a laptop.
survival, survminer
survival, aftgee
intsurv, smcure
reda, reReg
Fei
Wang is an Associate Professor in Division of Health
Informatics, Department of Population Health Sciences, Weill
Cornell Medicine, Cornell University. His major research
interest is data mining, machine learning and their
applications in health data science. He has published more
than 250 papers on the top venues of related areas such as
ICML, KDD, NIPS, CVPR, AAAI, IJCAI, JAMA Internal Medicine,
Annals of Internal Medicine, Lancet Digital Health, etc. His
papers have received over 19,000 citations so far with an
H-index 67. His (or his students’) papers have won 8 best
paper (or nomination) awards at top international conferences
on data mining and medical informatics. His team won the
championship of the NIPS/Kaggle Challenge on Classification of
Clinically Actionable Genetic Mutations in 2017 and
Parkinson's Progression Markers' Initiative data challenge
organized by Michael J. Fox Foundation in 2016. Dr. Wang is
the recipient of the NSF CAREER Award in 2018, as well as the
inaugural research leadership award in IEEE International
Conference on Health Informatics (ICHI) 2019. Dr. Wang’s
Research has been supported by NSF, NIH, ONR, PCORI, MJFF,
AHA, Amazon, etc. Dr. Wang is the past chair of the Knowledge
Discovery and Data Mining working group in American Medical
Informatics Association (AMIA). Dr. Wang is a fellow of AMIA
and a Distinguished Member of ACM.
Machine learning algorithms, especially deep learning, have achieved great successes in a number of application domains in recent years. These approaches typically need a large data set for effective training (large n) due to the high model complexity. In healthcare and medicine, the study problems are typically complicated due to the complexity of the diseases and the size of the patient samples available for model training is typically limited (small n). At the same time, with the rapid development of computer software and hardware technologies and the initiatives such as precision medicine, richer and more heterogeneous information are captured for each individual patient now adays (large p, such as electronic health records, multi-omics, biomedical images, etc.). These trends and characteristics of patient health data make machine learning model development promising but challenging. In this short course, I will present experiences (success and failure) in recent years on developing machine learning models in such scenario which cover a diverse set of topics including multi-modal learning, algorithmic fairness, model interpretability, federated and transfer learning. I will demonstrate all these topics can be naturally unified and understood from such small n, large p learning framework. I will also discuss its implications on future research directions.
Joseph C.
Cappelleri, PhD, MPH, MS is an executive director in the
Statistical Research and Data Science Center at Pfizer Inc. He
earned his M.S. in statistics from the City University of New
York (Baruch College), Ph.D. in psychometrics from Cornell
University, and M.P.H. in epidemiology from Harvard
University. As an adjunct professor, Dr. Cappelleri has served
on the faculties of Brown University, University of
Connecticut, and Tufts Medical Center. He has delivered
numerous conference presentations and has published
extensively on clinical and methodological topics, including
on regression-discontinuity designs, meta-analyses, and health
measurement scales. He is lead author of the book
Patient-Reported Outcomes: Measurement, Implementation and
Interpretation and has co-authored or co-edited three other
books (Phase II Clinical Development of New Drugs, Statistical
Topics in Health Economics and Outcomes Research, Design and
Analysis of Subgroups with Biopharmaceutical Applications).
Dr. Cappelleri is a fellow of the American Statistical
Association and president of the New England Statistical
Society.
Thomas Mathew, PhD,
Professor, Department of Mathematics & Statistics,
University of Maryland Baltimore County (UMBC). He earned his
PhD in statistics from the Indian Statistical Institute in
1983, and has been a faculty member at UMBC since 1985. He has
delivered numerous conference presentations, nationally and
internationally, and has published extensively on
methodological and applied topics, including
cost-effectiveness analysis, bioequivalence testing, exposure
data analysis, meta-analysis, mix ed and random effects
models, and tolerance intervals. He is the co-author of two
books Statistical Tests in Mixed Linear Models and Statistical
Tolerance Regions: Theory, Applications and Computation, both
published by Wiley. He has served on the Editorial Boards of
several journals, and is currently an Associate Editor of the
Journal of the American Statistical Association, Journal of
Multivariate Analysis, and Sankhya. Dr. Mathew is a Fellow of
the American Statistical Association, and a Fellow of the
Institute of Mathematical Statistics. He has also been
appointed as Presidential Research Professor at his
campus.
Based in part on the co-edited volume “Statistical Topics in Health Economics and Outcomes Research” (Alemayehu et al.), this four-hour short course recognizes that, with ever-rising healthcare costs, evidence generation through health economics and outcomes research (HEOR) plays an increasingly important role in decision-making about the allocation of resources. This course highlights three major topics related to HEOR, with objectives to learn about 1) patient-reported outcomes, 2) analysis of aggregate data, and 3) methodological issues in health economic analysis. Key themes on patient-reported outcomes are presented regarding their development and validation: content validity, construct validity, and reliability. Regarding analysis of aggregate data, several areas are elucidated: traditional meta-analysis, network meta-analysis, assumptions, and best practices for the conduct and reporting of aggregated data. For methodological issues on health economic analysis, cost-effectiveness criteria are covered: traditional measures of cost-effectiveness, the cost-effectiveness acceptability curve, statistical inference for cost-effectiveness measures, the fiducial approach (or generalized pivotal quantity approach), and a probabilistic measure of cost-effectiveness. Illustrative examples are used throughout the course to complement the concepts. Attendees are expected to have taken at least one graduate-level course in statistics.
To understand and critique the major methodological issues in outcomes research on the development and validation of patient-reported outcomes, traditional meta-analysis and network meta-analysis, and health economic analysis.