Statathon will be held jointly by the Department of Biostatistics and the Department of Mathematics and Statistics at Boston University and New England Statistical Society in the 36th NESS Symposium (June 3, 2023 – June 6, 2023). Statathon is a statistical data science invention marathon. Anyone who has an interest in data science can attend Statathon to approach a real world data science problem, some of which are local, in new and innovative ways. It emphasizes the statistical aspects (insight, interpretation, significance, etc.) of data science problems that are often overlooked in many hackathons.
Online registration has started! Fill out this form to sign up. (Data sets to be released early April.)
Teams or individual participants should register by this deadline. Online registration will close at 11:59 pm EDT.
Deadline for teams to submit their work for the panelist to review. Submission will close at 11:59 pm EDT.
Finalist teams are selected and notified.
Finalist teams present to the review panel in the 36th NESS symposium, virtually. The presentation is tentatively scheduled for 2:00 pm – 5:00 pm EDT on June 5, 2023 (Monday).
Awards to winning teams at the closing ceremony.
Imagine you work for Travelers Insurance Company's fraud detection department as a modeler. Your colleagues, who are unfamiliar with statistics, would like you to create a predictive model based on historical insurance claim data. Your team is concerned about fraud detection accuracy as well as the key drivers that cause fraudulence. For this case competition, your group is tasked with identifying first-party physical damage fraudulence and explaining the indicators of fraudulent claims.
For more details about this theme, please register as a team or register to join a team for the Statathon, and we will send you a link to work on this challenge through Kaggle. The top 5 teams will be invited to give a virtual presentation of their solution (15 minutes) and answer questions from the judges, who will determine the winning teams.
(Data sets are synthetic, provided by Travelers)
More information available here
HSB offers equipment breakdown insurance and provides Internet of Things (IoT) sensor solutions to help customers avoid or reduce damage to their building, equipment, or machinery. These sensors monitor essential health and performance variables that alert customers to take corrective action before critical system failure occurs.
Using these sensors, HSB has created a customer alert program in which an alert (in the form of text, email, etc.) is sent to a customer upon detection of abnormal sensor activity. In order to determine the success and usefulness of this alert program, HSB must record whether or not a customer took corrective action after receiving an alert. Unfortunately, direct follow-up with all customers is infeasible due to the large number of deployed sensors as well as potential customer unresponsiveness, thus an automated system is needed to deduce if a customer took appreciable corrective action.
In this challenge, your team will be given time series data of a custom system health metric, which consists of 25 total alerts. Using whatever methodology you see fit, develop a model that determines if an insured took action to mitigate poor operating conditions after receiving an alert.
For a full description of the dataset, please see this link
A brief data tutorial can also be found here
All teams should register online. If you already have a team or want to participate as an individual, please register using the following link.
Registration form for teams or individual participants.
Each team may have up to four team members, and only one registration form should be submit by each team with all names of the team members.
All teams should submit their work by the deadline (May 29, 2023 11:59 pm EDT). Teams are encouraged to create a Git repository (e.g., Bitbucket, GitHub, or GitLab) to host their source code and data information. However, this is not a review factor in the competition.
Ten teams (five from each theme) will be selected in the finalist, and they are invited to give a team presentation to the review panels in the afternoon or evening of June 5, 2023. Each team will have 20 minutes to present their findings and products.
Students from universities and high schools can participate. We will not distinguish high school students, undergraduate students, and graduate students among participants.
No. Participation is free for Statathon. We will select five finalist teams from each theme to come and present the day before the 36th NESS symposium.
To be determined. Presentations may well be hybrid this year.
Each team can have up to 4 participants.
Participants can form teams (or work individually) among peer students with common interests and/or complementary expertise. Remember a participant can be a member of only one team.
You can start working on the problem when the data sets are released. We are working to release the data sets early April.
You can use any programming language or software packages.
Yes! There will be cash prizes for 1st, 2nd and 3rd place teams for both themes ranging from $100 to $300 dollars.
We will let you know once the data sets are made available.
Teams must be finalized no later than May 8 when the registration closes.
Yes, a professor or another professional can act as a team mentor. However, a mentor is not a participant and therefore cannot implement any work for the team. In addition, no one on the organizing committee or the refereeing committee is allowed to supervise any participating team or individual.
Patrick Buckley, Travelers
Nathan Lally, Hartford Steam Boiler (Munich Re Group)
Kelly Li, Travelers
Daeyoung Lim (Co-Chair), University of Connecticut
Peng Xiao, Hartford Steam Boiler (Munich Re Group)
Adnan Smajic, Travelers
Masanao Yajima (Chair), Boston University