Filter Results:
(1,102)
Show Results For
- All HBS Web
(1,102)
- People (1)
- News (174)
- Research (619)
- Events (18)
- Multimedia (5)
- Faculty Publications (375)
Show Results For
- All HBS Web
(1,102)
- People (1)
- News (174)
- Research (619)
- Events (18)
- Multimedia (5)
- Faculty Publications (375)
- 2011
- Article
Scalable Detection of Anomalous Patterns With Connectivity Constraints
By: Skyler Speakman, Edward McFowland III and Daniel B. Neill
We present GraphScan, a novel method for detecting arbitrarily shaped connected clusters in graph or network data. Given a graph structure, data observed at each node, and a score function defining the anomalousness of a set of nodes, GraphScan can efficiently and... View Details
- August 2020
- Technical Note
Comparing Two Groups: Sampling and t-Testing
This note describes sampling and t-tests, two fundamental statistical concepts. View Details
Keywords: Statistics; Econometric Analyses; Experimental Methods; Data Analysis; Data Analytics; Analytics and Data Science; Analysis; Surveys; Mathematical Methods
Bojinov, Iavor I., Chiara Farronato, Yael Grushka-Cockayne, Willy C. Shih, and Michael W. Toffel. "Comparing Two Groups: Sampling and t-Testing." Harvard Business School Technical Note 621-044, August 2020.
- 2013
- Book
Keeping Up with the Quants: Your Guide to Understanding and Using Analytics
By: Thomas H. Davenport and Jinho Kim
Managers today need to be able to analyze and make sense of data. They need to be conversant with analytical technology and methods and to make decisions on quantitative analysis. This book offers a variety of practical tools and examples to improve a manager's... View Details
- 2015
- Article
Scalable Detection of Anomalous Patterns With Connectivity Constraints
By: Skyler Speakman, Edward McFowland III and Daniel B. Neill
We present GraphScan, a novel method for detecting arbitrarily shaped connected clusters in graph or network data. Given a graph structure, data observed at each node, and a score function defining the anomalousness of a set of nodes, GraphScan can efficiently and... View Details
Speakman, Skyler, Edward McFowland III, and Daniel B. Neill. "Scalable Detection of Anomalous Patterns With Connectivity Constraints." Journal of Computational and Graphical Statistics 24, no. 4 (2015): 1014–1033.
- Article
Fast Generalized Subset Scan for Anomalous Pattern Detection
By: Edward McFowland III, Skyler Speakman and Daniel B. Neill
We propose Fast Generalized Subset Scan (FGSS), a new method for detecting anomalous patterns in general categorical data sets. We frame the pattern detection problem as a search over subsets of data records and attributes, maximizing a nonparametric scan statistic... View Details
Keywords: Pattern Detection; Anomaly Detection; Knowledge Discovery; Bayesian Networks; Scan Statistics; Analytics and Data Science
McFowland III, Edward, Skyler Speakman, and Daniel B. Neill. "Fast Generalized Subset Scan for Anomalous Pattern Detection." Art. 12. Journal of Machine Learning Research 14 (2013): 1533–1561.
- 10 Dec 2014
- News
Yes, A/B Testing Is Still Necessary
- 2024
- Working Paper
Empirical Guidance: Data Processing and Analysis with Applications in Stata, R, and Python
By: Melissa Ouellet and Michael W. Toffel
This paper describes a range of best practices to compile and analyze datasets, and includes some examples in Stata, R, and Python. It is meant to serve as a reference for those getting started in econometrics, and especially those seeking to conduct data analyses in... View Details
Keywords: Empirical Methods; Empirical Operations; Statistical Methods And Machine Learning; Statistical Interferences; Research Analysts; Analytics and Data Science; Mathematical Methods
Ouellet, Melissa, and Michael W. Toffel. "Empirical Guidance: Data Processing and Analysis with Applications in Stata, R, and Python." Harvard Business School Working Paper, No. 25-010, August 2024.
- Article
Fast Subset Scan for Multivariate Spatial Biosurveillance
By: Daniel B. Neill, Edward McFowland III and Huanian Zheng
We present new subset scan methods for multivariate event detection in massive space-time datasets. We extend the recently proposed 'fast subset scan' framework from univariate to multivariate data, enabling computationally efficient detection of irregular space-time... View Details
Neill, Daniel B., Edward McFowland III, and Huanian Zheng. "Fast Subset Scan for Multivariate Spatial Biosurveillance." Statistics in Medicine 32, no. 13 (June 15, 2013): 2185–2208.
- March 2022 (Revised January 2025)
- Technical Note
Exploratory Data Analysis
This module note provides an overview of exploratory data analysis for an introduction to data science course. It begins by defining the term "data", and then describes the different types of data that companies work with (structured v. unstructured, categorical v.... View Details
Keywords: Data Analysis; Data Science; Statistics; Data Visualization; Exploratory Data Analysis; Analytics and Data Science; Analysis
Bojinov, Iavor I., Michael Parzen, and Paul Hamilton. "Exploratory Data Analysis." Harvard Business School Technical Note 622-098, March 2022. (Revised January 2025.)
- 2016
- Article
Does volunteering improve well-being?
By: A.V. Whillans, Scott C. Seider, Lihan Chen, Ryan J. Dwyer, Sarah Novick, Kathryn J. Gramigna, Brittany A. Mitchell, Victoria Savalei, Sally S. Dickerson and Elizabeth W. Dunn
Does volunteering causally improve well-being? To empirically test this question, we examined one instantiation of volunteering that is common at post-secondary institutions across North America: community service learning (CSL). CSL is a form of experiential learning... View Details
Whillans, A.V., Scott C. Seider, Lihan Chen, Ryan J. Dwyer, Sarah Novick, Kathryn J. Gramigna, Brittany A. Mitchell, Victoria Savalei, Sally S. Dickerson, and Elizabeth W. Dunn. "Does volunteering improve well-being?" Comprehensive Results in Social Psychology 1, nos. 1-3 (2016): 35–50.
- 2022
- Article
Nonparametric Subset Scanning for Detection of Heteroscedasticity
By: Charles R. Doss and Edward McFowland III
We propose Heteroscedastic Subset Scan (HSS), a novel method for identifying covariates that are responsible for violations of the homoscedasticity assumption in regression settings. Viewing the problem as one of anomalous pattern detection, we use subset scanning... View Details
Doss, Charles R., and Edward McFowland III. "Nonparametric Subset Scanning for Detection of Heteroscedasticity." Journal of Computational and Graphical Statistics 31, no. 3 (2022): 813–823.
- March 2020
- Article
Is This My Group or Not? The Role of Ensemble Coding of Emotional Expressions in Group Categorization
By: Amit Goldenberg, Timothy D. Sweeny, Emmanuel Shpigel and James J. Gross
When exposed to others’ emotional responses, people often make rapid decisions as to whether these others are members of their group or not. These group categorization decisions have been shown to be extremely important to understanding group behavior. Yet, despite... View Details
Keywords: Categorization; Ensemble Coding; Summary Statistical Perception; Social Cognition; Emotions; Perception; Groups and Teams
Goldenberg, Amit, Timothy D. Sweeny, Emmanuel Shpigel, and James J. Gross. "Is This My Group or Not? The Role of Ensemble Coding of Emotional Expressions in Group Categorization." Journal of Experimental Psychology: General 149, no. 3 (March 2020).
- August 2021
- Article
Multiple Imputation Using Gaussian Copulas
By: F.M. Hollenbach, I. Bojinov, S. Minhas, N.W. Metternich, M.D. Ward and A. Volfovsky
Missing observations are pervasive throughout empirical research, especially in the social sciences. Despite multiple approaches to dealing adequately with missing data, many scholars still fail to address this vital issue. In this paper, we present a simple-to-use... View Details
Hollenbach, F.M., I. Bojinov, S. Minhas, N.W. Metternich, M.D. Ward, and A. Volfovsky. "Multiple Imputation Using Gaussian Copulas." Special Issue on New Quantitative Approaches to Studying Social Inequality. Sociological Methods & Research 50, no. 3 (August 2021): 1259–1283. (0049124118799381.)
- July–August 2024
- Article
Doing More with Less: Overcoming Ineffective Long-Term Targeting Using Short-Term Signals
By: Ta-Wei Huang and Eva Ascarza
Firms are increasingly interested in developing targeted interventions for customers with the best response,
which requires identifying differences in customer sensitivity, typically through the conditional average treatment
effect (CATE) estimation. In theory, to... View Details
Keywords: Long-run Targeting; Heterogeneous Treatment Effect; Statistical Surrogacy; Customer Churn; Field Experiments; Consumer Behavior; Customer Focus and Relationships; AI and Machine Learning; Marketing Strategy
Huang, Ta-Wei, and Eva Ascarza. "Doing More with Less: Overcoming Ineffective Long-Term Targeting Using Short-Term Signals." Marketing Science 43, no. 4 (July–August 2024): 863–884.
- August 2020 (Revised September 2020)
- Technical Note
Assessing Prediction Accuracy of Machine Learning Models
The note introduces a variety of methods to assess the accuracy of machine learning prediction models. The note begins by briefly introducing machine learning, overfitting, training versus test datasets, and cross validation. The following accuracy metrics and tools... View Details
Keywords: Machine Learning; Statistics; Econometric Analyses; Experimental Methods; Data Analysis; Data Analytics; Forecasting and Prediction; Analytics and Data Science; Analysis; Mathematical Methods
Toffel, Michael W., Natalie Epstein, Kris Ferreira, and Yael Grushka-Cockayne. "Assessing Prediction Accuracy of Machine Learning Models." Harvard Business School Technical Note 621-045, August 2020. (Revised September 2020.)
- Article
Detecting Adversarial Attacks via Subset Scanning of Autoencoder Activations and Reconstruction Error
By: Celia Cintas, Skyler Speakman, Victor Akinwande, William Ogallo, Komminist Weldemariam, Srihari Sridharan and Edward McFowland III
Reliably detecting attacks in a given set of inputs is of high practical relevance because of the vulnerability of neural networks to adversarial examples. These altered inputs create a security risk in applications with real-world consequences, such as self-driving... View Details
Keywords: Autoencoder Networks; Pattern Detection; Subset Scanning; Computer Vision; Statistical Methods And Machine Learning; Machine Learning; Deep Learning; Data Mining; Big Data; Large-scale Systems; Mathematical Methods; Analytics and Data Science
Cintas, Celia, Skyler Speakman, Victor Akinwande, William Ogallo, Komminist Weldemariam, Srihari Sridharan, and Edward McFowland III. "Detecting Adversarial Attacks via Subset Scanning of Autoencoder Activations and Reconstruction Error." Proceedings of the International Joint Conference on Artificial Intelligence 29th (2020).
Iavor I. Bojinov
Iavor Bojinov is an Assistant Professor of Business Administration and the Richard Hodgson Fellow at Harvard Business School. He is the co-PI of the AI and Data Science Operations Lab and a faculty affiliate in the Department of Statistics at Harvard University and... View Details
Michael I. Parzen
Michael Parzen is a Senior Lecturer in the Technology and Operations Management unit at Harvard Business School. He is an applied statistician with extensive experience in data science education and currently teaches Applied Business Analytics as an MBA elective... View Details
- 29 Aug 2022
- Op-Ed
Income Inequality Is Rising. Are We Even Measuring It Correctly?
finding ways to reduce inequality to create a more just and equal society for all. In making decisions on how to best intervene, policymakers commonly rely on the Gini coefficient, a statistical measure of resource distribution, including... View Details
- November 2018
- Case
Sportradar (A): From Data to Storytelling
By: Ramon Casadesus-Masanell, Karen Elterman and Oliver Gassmann
In 2013, the Swiss sports data company Sportradar debated whether to expand from its core business of data provision to bookmakers into sports media products. Sports data was becoming a commodity, and in the future, sports leagues might reduce their dependence on... View Details
Keywords: Sports Data; Data; Sport; Sportradar; Football; Soccer; Gambling; Betting; Betting Markets; Statistics; Odds; Live Data; Bookmakers; Betradar; Visualization; Integrity; Monitoring; Gaming; Streaming; 2013; St.Gallen; Algorithm; Mathematical Modeling; Carsten Koerl; Betandwin; Bwin; Wagering; Probability; Sports; Analytics and Data Science; Mathematical Methods; Games, Gaming, and Gambling; Transition; Strategy; Media; Sports Industry; Technology Industry; Information Technology Industry; Media and Broadcasting Industry; Europe; Switzerland; Asia; Austria; Germany; England
Casadesus-Masanell, Ramon, Karen Elterman, and Oliver Gassmann. "Sportradar (A): From Data to Storytelling." Harvard Business School Case 719-429, November 2018.