Short Courses
Two full-day short courses and two half-day short courses will be held on
Friday, April 13, 2018, from 8:00AM – 5:00PM.
Short courses will be held in the Life Sciences Laboratory (LSL) and the Integrated Sciences Building (ISB).
The short courses are:
Course 1 (full day): Introduction to Bayesian Inference with Stan
Location: LSL 330
Michael Betancourt, Symplectomorphic, LLC
Sean Talts, Columbia UniversityCourse 2 (full day): Introductory and advanced methods for causal inference
Location: LSL 340
Laura Balzer, University of Massachusetts-AmherstCourse 3 (half day, 8AM - 12PM): Statistical Inference: A Tidy Approach using R
Location: ISB 145
Chester Ismay, DataCampCourse 4 (half day, 1PM - 5PM): Patient-Reported Outcomes: Measurement, Implementation and Interpretation
Location: ISB 145
Joseph C. Cappelleri, Pfizer Inc.
Course 1: Introduction to Bayesian Inference with Stan
Location: LSL 330
Instructors
Michael Betancourt, Symplectomorphic, LLC
Michael Betancourt is a research with Symplectomorphic, LLC where he develops theoretical and methodological tools to support practical Bayesian inference. He is also a core developer of Stan, where he implements and tests these tools. In addition to hosting tutorials and workshops on Bayesian inference with Stan he also collaborates on analyses in epidemiology, pharmacology, and physics, amongst others. Before moving into statistics, Michael earned a B.S. from the California Institute of Technology and a Ph.D. from the Massachusetts Institute of Technology, both in physics.
Sean Talts, Columbia University
Sean comes from an industry background where he last worked with Watson creator Dave Ferrucci at the Collaborative Intelligent Systems Lab at a New York area hedge fund. He is now one of the core developers of Stan, a probabilistic programming language that scientists, researchers, and even economists can use to do statistical inference. Interests include programming language design, compiler optimization, epistemology, Bayesian data analysis, and helping others do better science.
Course Outline
Despite the promise of big data, inferences are often limited not by the size of data but rather by its systematic structure. Only by carefully modeling this structure can we take fully advantage of the data – big data must be complemented with big models and the algorithms that can fit them. Stan is a platform for facilitating this modeling, providing an expressive modeling language for specifying bespoke models and implementing state-of-the-art algorithms to draw subsequent Bayesian inferences.
In this course we will introduce how to implement a robust Bayesian workflow in Stan, from constructing models to analyzing inferences and validating the underlying modeling assumptions. The course will emphasize interactive exercises run through RStan, the R interface to Stan.
Prerequisites
The course will assume familiarity with the basics of calculus, probability, and statistics but the core concepts will be reviewed.
In order to participate in the interactive exercises attendees must provide a laptop with the latest version of RStanArm (https://cran.r-project.org/web/packages/rstan/index.html) installed. Users are encouraged to report any issues at http://discourse.mc-stan.org.
Course 2: Introductory and advanced methods for causal inference
Location: LSL 340
Instructor
Laura Balzer, University of Massachusetts-Amherst
Laura is an Assistant Professor of Biostatistics at the University of Massachusetts-Amherst. She earned her PhD from the University of California-Berkeley and completed her post-doctoral studies at the Harvard T.H. Chan School of Public Health. Laura’s area of expertise include causal inference and machine learning. She is the lead statistician for two ongoing cluster randomized trials: the SEARCH study for HIV prevention and treatment in East Africa and the SPIRIT study for TB prevention in Uganda. Laura is passionate about teaching causal inference and was awarded the ASA’s Causality in Statistics Education Award as well as the Gertrude M. Cox Scholarship.
Course Outline
With the recent and ongoing ‘data explosion’, methods to delineate causation from correlation are perhaps more pressing now than ever. This course will introduce a general framework for causal inference: 1) clear statement of the scientific question, 2) definition of the causal model and parameter of interest, 3) assessment of identifiability - that is, linking the causal effect to a parameter estimable from the observed data distribution, 4) choice and implementation of estimators including parametric and semi-parametric methods, and 5) interpretation of findings. The focus is on effect estimation for exposures occurring at a single time point, and extensions for longitudinal effects are also presented. The estimation methods include G-computation, inverse probability of treatment weighting (IPTW), and targeted maximum likelihood estimation (TMLE) with Super Learner. Participants gain practical experience with an applied example and implement these estimators in R.
Prerequisites
The course assumes a basic knowledge of statistics (notions of probability and applied regression models).
Course 3: Statistical Inference: A Tidy Approach using R
Location: ISB 145
Instructor
Chester Ismay, DataCamp
Chester Ismay is a Data Science Curriculum Lead at DataCamp, where he builds (and helps instructors build) R, Python, and SQL courses. He was formerly an Adjunct Professor of Sociology at Pacific University and an Instructional Technologist and Consultant for Data Science, Statistics, and R at Reed College. He obtained his PhD in statistics from Arizona State University and has taught courses and led workshops in statistics, data science, mathematics, computer science, and sociology. He is the co-author of the fivethirtyeight R data package, the infer, and moderndive R packages and is the creator and maintainer of the thesisdown R package. He is also a co-author of an open source, free textbook entitled ModernDive: An Introduction to Statistical and Data Sciences via R.
Course Outline
How do you code-up a permutation test in R? What about an ANOVA or a chi-square test? Have you ever been uncertain as to exactly which type of test you should run given the data and questions asked? The infer
R package was created to unite common statistical inference tasks into an expressive and intuitive framework to alleviate some of these struggles and make inference more intuitive. This workshop will focus on developing an understanding of the design principles of the package, which are firmly motivated by Hadley Wickham’s tidy tools manifesto. It will also discuss the implementation, centered on the common conceptual threads that link a surprising range of hypothesis tests and confidence intervals. Lastly, we’ll dive into some examples of how to implement the code of the infer
package via different data sets and variable scenarios. The package is aimed to be useful to new students of statistics as well as seasoned practitioners.
Prerequisites
None.
Course 4: Patient-Reported Outcomes: Measurement, Implementation and Interpretation
Location: ISB 145
Instructor
Joseph C. Cappelleri, Pfizer Inc.
Joseph C. Cappelleri earned his M.S. in statistics from the City University of New York, his Ph.D. in psychometrics from Cornell University, and his M.P.H. in epidemiology from Harvard University. He is an executive director of biostatistics at Pfizer Inc. As an adjunct professor, Dr. Cappelleri has served on the faculties of Brown University, University of Connecticut, and Tufts Medical Center. He has delivered numerous conference presentations and published extensively on clinical and methodological topics, including regression-discontinuity designs, meta-analysis, and health measurement scales. He is the lead author of the book Patient-Reported Outcomes: Measurement, Implementation and Interpretation. Dr. Cappelleri is a Fellow of the American Statistical Association.
Course Outline
This short course will provide an exposition on health measurement scales – specifically, on patient-reported outcomes. Some key elements in the development of a patient-reported outcome (PRO) measure will be noted. The core topics of validity and reliability of a PRO measure will be discussed. Approaches to interpret PRO results will be elucidated in order to make results useful and meaningful. Exploratory factor analysis and confirmatory factor analysis, mediation modeling, item response theory, longitudinal analysis, and missing data will among the topics considered. Illustrations will be provided through real-life examples and also through simulated examples using SAS.
Prerequisites
Attendees are expected to have at least basic quantitative knowledge.