Short Courses (June 3-4, 2023)


Schedule

  • 8:30 AM - 5:00 PM: Registration
  • 9:00 AM - 12:30 PM: AM Session
  • 12:30 PM - 1:30 PM: Lunch
  • 1:30 PM - 5:00 PM: PM Session

There will be 20-minute coffee breaks between 10:30 AM - 11:30 AM and 3:00 PM - 4:00 PM.


Bayesian Clinical Trial Designs and Their Implementation

Instructors: Dr. Ying Yuan and Dr. Yong Zang

Ying Yuan Ying Yuan is Bettyann Asche Murray Distinguished Professor and Deputy Chair in the Department of Biostatistics at University of Texas MD Anderson Cancer Center. Dr. Yuan is an internationally renowned researcher in innovative Bayesian adaptive designs, with over 140 statistical methodology papers published on early phase trials, seamless trials, biomarker-guided trials, and basket and platform trials. The designs and software developed by Dr. Yuan’s lab (www.trialdesign.org) have been widely used in medical research institutes and pharmaceutical companies. The BOIN design, developed by Dr. Yuan’s team, is a groundbreaking oncology dose-finding design that has been recognized by the FDA as a fit-for-purpose drug development tool. Dr. Yuan was elected as the American Statistical Association Fellow, and is the leading author of two books, “Bayesian Designs for Phase I-II Clinical Trials” and “Model-Assisted Bayesian Designs for Dose Finding and Optimization,” both published by Chapman & Hall/CRC.

Yong Zang Dr. Yong Zang is an Associate Professor in the Department of Biostatistics and Health Data Science, Indiana University. He also serves as the Co-Director of Clinical Research for the Biostatistics and Data Management Core, IU Simon Comprehensive Cancer Center. He received his Ph.D. degree in Statistics from the University of Hong Kong and finished his Postdoctoral training in The University of Texas, MD Anderson Cancer Center. His research interests are clinical trial design and statistical genetics. He has published over sixty papers in peer-reviewed statistical, genetics and medical journals such as Biometrics, Biostatistics, JRSS-C, Statistics in Medicine, Journal of Statistical Software, American Journal of Human Genetics, Genome Research and Cancer Research. His research is supported by National Institute of Health, Showalter Trust and Indiana CTSI.

Abstract: In this short course, we will delve into Bayesian clinical trial designs and their implementation, with a focus on early phase trials. We will begin by providing a brief review of Bayesian inference to introduce relevant notation and concepts.

Next, we will examine phase I dose-finding trial designs, encompassing single-agent, drug-combination, and late-onset outcome trials. Our focus will be on model-assisted designs (e.g., BOIN designs), which offer simplicity, flexibility, and excellent operating characteristics. In response to the growing interest in dose optimization, we will also discuss the dose-finding design for dose optimization. We will use real-world trial examples to illustrate how to design dose-finding trials using freely available software at www.trialdesign.org

Moving on to phase II trial design, we will introduce the Bayesian optimal phase II design and demonstrate its practical application. Additionally, we will cover biomarker-based designs, such as enrichment and biomarker-stratified designs. Finally, we will explore basket and platform trial designs.

The focus of this short course is to bridge the gap between theoretical understanding and practical application, partially based on the book "Model-Assisted Bayesian Designs for Dose Finding and Optimization” by Ying Yuan, Ruitao Lin and Jack J Lee (2022, Chapman & Hall/CRC). Together, we will work through real-world examples and case studies, allowing participants to gain hands-on experience with designing adaptive trials. By the end of the course, attendees will have a solid understanding of how to implement Bayesian clinical trial designs in their own research.

Prerequisites: Basic knowledge on clinical trials and completion of first-year graduate level statistical inference courses.


Spatial Data Science Using R

Instructor: Dr. Paula Moraga

Paula Moraga Paula Moraga is an Assistant Professor of Statistics at King Abdullah University of Science and Technology (KAUST), and the Principal Investigator of the GeoHealth group. Prior to KAUST, she was appointed to academic statistics positions at Lancaster University, Harvard School of Public Health, London School of Hygiene & Tropical Medicine, Queensland University of Technology, and University of Bath. She received her Ph.D. in Mathematics from the University of Valencia, and her Master's in Biostatistics from Harvard University. Paula's research focuses on the development of innovative statistical methods and computational tools for geospatial data analysis and health surveillance. She develops spatial and spatio-temporal statistical methods to understand the geographic and temporal patterns of diseases, assess their relationship with potential risk factors, detect clusters, measure inequalities, and evaluate the impact of interventions. She also works on the development of statistical software and interactive visualization applications for reproducible research and communication, and the impact of her work has directly informed strategic policy in reducing the burden of diseases such as malaria and cancer in several countries. She has published extensively in leading journals and is the author of the book "Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny" (2019, Chapman & Hall/CRC).

Abstract: Spatial data arise in many fields including health, ecology, environment and business. In this course, we will learn statistical methods, modeling approaches, and visualization techniques to analyze spatial data using R. We will also learn how to create interactive dashboards and Shiny web applications that facilitate the communication of insights to collaborators and policymakers. We will work through several fully reproducible data science examples using real-world data such as disease risk mapping, air pollution prediction, species distribution modeling, crime mapping and real state analyses. We will cover the following topics:

  • Spatial data including areal, geostatistical and point patterns.
  • R packages for retrieval, manipulation and visualization of spatial data.
  • Statistical methods to describe, analyze, and simulate spatial data.
  • Fitting and interpreting Bayesian spatial models using the integrated nested Laplace approximation (INLA) and stochastic partial differential equation (SPDE) approaches.
  • Communicating results with interactive dashboards and Shiny web applications.

The course materials are based on the book "Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny" by Paula Moraga (2019, Chapman & Hall/CRC) which is freely available at https://paula-moraga.github.io/book-geospatial/.

Prerequisites: It is assumed participants are familiar with R and it is recommended a working knowledge of generalized linear models. Participants should bring their laptops with R and RStudio installed.


Introduction to the Analysis of Neural Electrophysiology Data

Instructors: Dr. Uri Eden and Dr. Mark Kramer

Uri Eden Uri Eden is a Professor of Mathematics and Statistics and the director of the Statistics program at Boston University. He received his Ph.D. in Engineering Sciences with an emphasis in Medical Enginering/Medical Physics from Harvard University in 2005. He received an NSF CAREER award in 2007. His research focuses on developing mathematical and statistical methods to analyze neural spiking activity, integrating methodologies related to model identification, statistical inference, signal processing and stochastic estimation and control. He has co-written two textbooks on neural data analysis: Analysis of Neural Data published by Springer in 2014, and Case Studies in Neural Data Analysis published by MIT Press in 2016.

Mark Kramer Mark Kramer is a Professor of Mathematics and Statistics and the associate director of the Center for Systems Neuroscience at Boston University. He received his Ph.D. in Applied Physics from the University of California, Berkeley in 2005. He is a recipient of a Burroughs Wellcome Fund Career Award at the Scientific interface and an NSF CAREER award. His research focuses on developing applied mathematical methods to analyze neural time series data, and the development of mathematical models of neural rhythms. He has co-written with Prof. Eden a textbook on neural data analysis: Case Studies in Neural Data Analysis published by MIT Press in 2016.

Abstract: Neural electrophysiology data analysis can provide profound insights into information processing in the brain, but presents a number of unique statistical challenges. Individual neurons represent information and communicate through sequences of sudden electrical impulses, which are often modeled using temporal point processes. The combined activity of large neural populations produces electric fields that exhibit multiple physiological rhythms, and brain areas are thought to communicate through coordination of these rhythms. Analysis of this field data often uses spectral analysis and time series modeling methods.

This short course provides an overview of the fundamental concepts and techniques for analyzing electrophysiological data from the brain, including spike train and local field potential (LFP) data. Specific topics include spike sorting, receptive field modeling, latent process models, neural decoding, spectral estimation, and coherence analysis. The course will include lecture and tutorial elements, designed to provide a comprehensive introduction to the basic model structures used for neural electrophysiology data and to provide hands on experience fitting and interpreting these models. The course is designed for students, researchers, and practitioners who are interested in understanding the statistical challenges underlying neural data analysis, and how fundamental neural modeling approaches meet these challenges. This course also provides modeling foundations for a succeeding short course on Advanced Method for the Analysis of Neural Electrophysiology Data.

Prerequisites: Basic linear algebra, some familiarity with generalized linear models and spectral analysis.


Advanced Topics in the Analysis of Neural Electrophysiology Data: Decomposing Rhythmic and Broadband Components

Instructors: Dr. Emily P. Stephen and Dr. Thomas Donoghue

Emily Stephen Emily Stephen is an Assistant Professor of Statistical Neuroscience in the Department of Mathematics and Statistics at Boston University. She received her PhD in Computational Neuroscience from Boston University, and has worked in statistical and computational neuroscience at MIT and UCSF. She has developed methods for timeseries modeling and signal processing of neural voltage recordings across scales, from spiking data to noninvasive electroencephalography. Her current interest is in bridging scales in statistical models of neural data using stochastic generative models of extracellular voltage recordings.

Thomas Donoghue Thomas Donoghue is a postdoctoral research scientist in the department of biomedical engineering at Columbia University. His work is focused on developing open-source tools for analyzing neuro-electrophysiological data in order to better understand how patterns of neural activity relate to cognition and disease. In his current postdoctoral work, he is analyzing single neuron activity recorded from human neurosurgical patients to investigate neural mechanisms of spatial navigation and memory. Prior to this, he completed his PhD at the University of California, San Diego, where he worked on developing methods for parameterization periodic and aperiodic activity from neural recordings.

Abstract: Neural electrophysiological signals reflect complex combinations of multiple underlying sources, such that traditional approaches from time-series and digital signal processing can often conflate multiple overlapping features, complicating accurate interpretations of the underlying physiology. In particular, recent methodological work has shown how the complexity of neural data – which can include multiple rhythmic features, transient events and aperiodic activity as well as interactions between these features and traversals between discrete states – requires dedicated methods to accurately measure features of interest. This short course will introduce statistical tools to model and decompose neural electrophysiological signals into physiologically informed features of interest, including rhythmic and broadband components. The presenters will present brief lectures on (1) using frequency-domain spectral decomposition to estimate and separate rhythmic peaks from broadband power spectral signatures, and (2) using state space models to capture time-domain rhythms and their interactions. Following each lecture, attendees will use interactive notebooks to explore the methods in hands-on tutorials.

Attendees who are new to neural data analysis may benefit from attending the NESS short course “Introduction to the analysis of neural electrophysiology data” by Uri Eden and Mark Kramer, which will be offered prior to this course.

Special thanks to:

  • David W. Zhou, PhD, Carney Institute for Brain Science, Brown University, Providence, RI, USA
  • Matteo Fecchio, PhD, Center for Neurotechnology and Neurorecovery, Massachusetts General Hospital, Boston, MA, USA

Prerequisites: Some familiarity with neural electrophysiology data, spectral analysis, and state space modeling (Short course “Introduction to the analysis of neural electrophysiology data” is sufficient).


Causal Mediation Analysis: The Old and the New

Instructors: Dr. Judith Lok and Dr. Ilya Shpitser

Judith Lok Judith Lok is Associate Professor of Mathematics and Statistics at Boston University, where she teaches a causal inference course for undergraduate and graduate students. Her research program mostly focuses on causal inference methods and survival analysis with applications in HIV-AIDS. Besides HIV-AIDS, her application areas are bacterial infections, COVID-19, HCV, overdoses, and mother and child health. In causal mediation analysis, she has proposed “organic” direct and indirect effects, an intervention-based approach which obviates the need to be able to “set” the mediator to a specific value.

Ilya Shpitser Ilya Shpitser is a John C. Malone Associate Professor in Computer Science at Johns Hopkins University, working on causal inference, missing data, and algorithmic fairness. His prior work on mediation analysis developed interventionist approaches, identification theory in the presence of unobserved confounding, semi-parametric estimation theory, methods for handling unmeasured confounding using proxy variables, and development of fairness criteria based on direct, indirect, and path-specific effects.

Abstract: Mediation analysis, which started in the mid-1980s with Baron and Kenny (1986), is used extensively by applied researchers. Indirect and direct effects are the part of a treatment effect that is mediated by a post-treatment variable and the part that is not. Subsequent work on natural indirect and direct effects provided a formal causal interpretation, based on cross-worlds counterfactuals: outcomes under treatment with the mediator set to its value without treatment. Randomized, separable, and organic indirect and direct effects avoid cross-worlds counterfactuals. Organic indirect and direct effects also avoid having to be able to set the mediator.

In this short course, we will introduce and compare these different approaches to causal mediation analysis. We will also argue that pure indirect effects and organic indirect effects relative to "no treatment" are very relevant for drug development. We illustrate the benefits of these approaches by estimating the pure indirect effect or organic indirect effect of curative HIV treatments mediated by two HIV persistence measures, using data on interruption of antiretroviral therapy without curative HIV treatments combined with an estimated or hypothesized effect of the curative HIV treatments on these mediators. In another illustration we consider the promise of COVID-19 treatments in the ICU which target despr+ neutrophil nets. We will also cover general identification of direct, indirect, and path- specific effects, and present estimation methods, including the influence function-based methods which achieve semiparametric efficiency. We will conclude with a discussion on the application of causal mediation analysis to algorithmic fairness.

Prerequisites: At least one statistical inference class including regression.


Geometric Methods for Functional and Shape Data Analysis

Instructors: Dr. Karthik Bharath and Dr. Sebastian Kurtek

Karthik Bharath Karthik Bharath is a Professor of Statistics at the University of Nottingham. He received his PhD in Statistics from University of Connecticut in 2012. His research interests are in statistics on manifolds, shape and functional data analysis, and stochastic processes. He is an Associate Editor of Biometrika, JRSS-B, Sankhya (Series A), and is the Deputy Secretary of the Research Section of the Royal Statistical Society (RSS), which handles the discussion papers of JRSS-B. His research is supported by the NSF, NIH and EPSRC (UK).

Sebastian Kurtek Sebastian Kurtek is a Professor of Statistics at The Ohio State University. He received his PhD in Biostatistics from Florida State University in 2012. His research interests include functional data analysis, statistical shape analysis and statistics on manifolds, with applications in medical imaging, biology and environmental science. He is an Associate Editor of the Annals of Applied Statistics and the Journal of Computational and Graphical Statistics. His research is supported by the NSF and the NIH.

Abstract: Functional data is a generic term used to describe samples of univariate or multivariate data that have been observed over some ordered index set (e.g., time, space), since they can mathematically be represented as values of suitably defined functions. It has long been recognised that fundamental to analysing and modelling the two sources of variability—in the function’s range and its domain—in a dataset of functions is the notion of their shape. In the univariate setting, shape is related to the function’s amplitude (y-axis variation). In the multivariate setting, shape is a quantity derived from the data by accounting for nuisance transformations including translation, scale, rotation and reparameterization. This perspective naturally points towards a geometric approach to functional data analysis.

With a focus on decoupling and modelling different sources of variation, this course will present an overview of the use of geometry-driven methods to carry out metric-based statistical analysis of functional data. To demonstrate the broad applicability of the geometric tools, we will ground the mathematical descriptions in concrete statistical tasks arising from various application settings (e.g., biomedical, environmental); these will include amplitude-phase separation and modelling of univariate functions under sparse and dense sampling settings; mean computation, PCA and visualisation of variations of shapes of 2D and 3D curves; classification and regression with functions and curves.

In the morning we will focus on the setting of univariate functional data observed over time; in the afternoon we will move to the multivariate or curve setting. We will conclude with an overview of the current state-of-the-art, novel uses of functional data techniques in other areas (e.g., Topological Data Analysis), future directions and open problems. Accompanying computing code in R and Matlab will be provided and used during the course.

Prerequisites: The course is designed for researchers and students with a basic understanding of multivariate statistics with some experience in R/Matlab.


Conformal Inference Methods in Deep Learning

Instructor: Dr. Matteo Sesia

Matteo Sesia Dr. Matteo Sesia is an Assistant Professor of Data Sciences and Operations at the University of Southern California (USC) Marshall School of Business, and an Assistant Professor (by courtesy) of Computer Science at the USC Viterbi School of Engineering. Prior to joining USC, he earned a PhD in Statistics from Stanford University in 2020, advised by Emmanuel Candès. Dr. Sesia's research focuses on distribution-free statistical inference and uncertainty-aware machine learning. In particular, he develops powerful non-parametric methods to precisely estimate the uncertainty of predictions computed by deep neural networks or other sophisticated machine learning models, as well as assumption-lean methods to extract from high-dimensional data reliable knowledge about how a potentially complex outcome depends on a large number of explanatory variables. Dr. Sesia's research is partially funded by the NSF and by Amazon.

Abstract: This short course provides a hands-on introduction to modern techniques for uncertainty estimation in deep learning with a focus on conformal inference. Assuming only basic prior knowledge of probability, statistics, and machine learning, the course begins with an overview of the key concepts data exchangeability and univariate model-free prediction, which form the foundation of conformal inference. Participants will learn how to leverage these conformal inference ideas to construct reliable and interpretable uncertainty estimates for the predictions of deep neural network models in both multi-class classification and regression problems. The course also covers advanced topics of practical relevance, including techniques for computing conformal inferences that can automatically adapt to possible heteroscedasticity and skewness in the data, methods for obtaining conformal inferences with conditional validity properties to address issues of algorithmic fairness, and techniques for mitigating the over-confidence of deep neural networks. The course includes hands-on coding exercises and real-data demonstrations.

References:

  • "Classification with Valid and Adaptive Coverage", Yaniv Romano, Matteo Sesia, Emmanuel J. Candès. NeurIPS (2020).
  • "Training Uncertainty-Aware Classifiers with Conformalized Deep Learning", Bat-Sheva Einbinder, Yaniv Romano, Matteo Sesia, Yanfei Zhou. NeurIPS (2022).
  • "Conformal inference is (almost) free for neural networks trained with early stopping", Ziyi Liang, Yanfei Zhou, Matteo Sesia. arXiv preprint (2023).
  • "Conformalized Quantile Regression", Yaniv Romano, Evan Patterson, Emmanuel J. Candès. NeurIPS (2019).
  • "A comparison of some conformal quantile regression methods", Matteo Sesia, Emmanuel J. Candès. Stat (2020).
  • "Conformal Prediction using Conditional Histograms", Matteo Sesia, Yaniv Romano. NeurIPS (2021).

Prerequisites:

  • Basic knowledge of probability and statistics
  • Working knowledge of the Python programming language
  • Basic knowledge of deep learning with PyTorch (optional but recommended)
  • A laptop with Python installed, including key machine learning packages, including Jupyter notebooks, pytorch, scikit-learn, and numpy.