These are materials for a machine learning course at Anahuac, February 5 - 17, 2020. If you have any questions or comments please email me at hersh [at] chapman [dot] edu
Syllabus: Anahuac_Machine_Learning.docx
Schedule
- Class 1: February 5 — 13:00 - 16:00 — SDCAIDE3
- Introduction
- Data Visualizations and Rmarkdown
- Class1-1_Intro_DataViz.r
- In-class: Feb5_firsthalf.r
- Introduction to R, Basic Data Transformations and Exploratory Data Analysis
- Slides: Class_1_Anahuac_ML_Introduction.pptx
- Homework for Friday: Read R for Data Science Chapters 1-7
- February 7, 13:00-16:00, SCAD6
- More exploratory data analysis: group_by(), arrange()
- Bias-Variance Tradeoff
- Linear Models in R
- Slides: Class_2_Anahuac_ML_Bias_Variance_OLS.pptx
- Code: class_2_Test_Training_Split_Bias_Variance.r
- In-class code: Feb7_inclass.r
- Homework: read ISLR Chapters 1-2, Problem Set 1 link: Anahuac_ML_pset1.pdf
- February 10, 13:00-16:00, SDCAIDE1 - Classification
- Logistic Regression
- Interpreting logistic regression and estimation in R
- Classification diagnostics: ROC Curves, AUC, calibration,Confusion matrices, false/true positives and negatives, lift charts
- Severe class imbalance
- Slides: Class_3_Classification.pptx
- Code: class_3_classification.r
- In-class code: Feb10_inclass.r
- February 12, 13:00-16:00, SDCAIDE1 - Regularized Regression (Lasso, Ridge, ElasticNet)
- Cross-validation
- Forward and backward stepwise selection
- Ridge
- Lasso
- Slides: Class_4_Cross_Validation_Regularization.pptx
- Code: class_4_Cross_Validation_Ridge_Lasso.r
- February 14, 13:00-16:00, SDCAIDE1 - ElasticNet, Decision Trees and Random Forest
- ElasticNet
- Decision Trees
- Random Forests
- Slides: Class_5_ElasticNet_Tree_Methods.pptx
- Code: class_5_ElasticNet_Trees.r
- February 17, 13:00-16:00, SDCAIDE1 - Unsupervised learning
- K-Means clustering
- Hierarchical Clustering
- Principal Component Analysis (PCA)
- Slides: Class_6_Unsupervised.pptx
- Code: class_6_Unsupervised.r
- Fun with R - gganimate:
Textbooks
- R for Data Science https://r4ds.had.co.nz/
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: with applications in R(Vol. 103). Springer Science & Business Media.
- Varian, Hal R. “Big data: New tricks for econometrics.” Journal of Economic Perspectives 28, no. 2 (2014): 3-28.
- Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer Science & Business Media.
Problem Sets
- Problem Set 1
- Problem Set 2
Datasets
- Movie Metadata
- Housing Prices
- Wholesale Customers
- Bike Sharing Usage
Additional References:
- Afzal, M., Hersh, J., Newhouse, D. “Building a better model: Variable selection to predict poverty in Pakistan and Sri Lanka” (2015). Working Paper
- Athey, S. (2017). Beyond prediction: Using big data for policy problems. Science, 355(6324), 483-485.
- Athey, S., Imbens, GW. “Machine Learning Methods for Causal effects”. http://www.nasonline.org/programs/sackler-colloquia/documents/athey.pdf
- Monica Andini, Emanuele Ciani, Guido de Blasio, Alessio D’Ignazio. “Effective policy targeting with machine learning“
- Belloni, A., & Chernozhukov, V. (2009). Least squares after model selection in high-dimensional sparse models.
- Celiku, B., & Kraay, A. (2017). Predicting conflict. The World Bank
- Blumenstock, J. E. (2016). Fighting poverty with data. Science, 353(6301), 753-754.
- Diamond, Alexis; Gill, Michael; Rebolledo Dellepiane, Miguel Angel; Skoufias, Emmanuel; Vinha, Katja; Xu, Yiqing. 2016. Estimating poverty rates in target populations : an assessment of the simple poverty scorecard and alternative approaches. Policy Research working paper; no. WPS 7793. Washington, D.C. : World Bank Group. http://documents.worldbank.org/curated/en/801751471268674333/Estimating-poverty-rates-in-target-populations-an-assessment-of-the-simple-poverty-scorecard-and-alternative-approaches
- Einav, L., & Levin, J. (2014). Economics in the age of big data. Science, 346(6210), 1243089.
- Jean, N., Burke, M., Xie, M., Davis, W. M., Lobell, D. B., & Ermon, S. (2016). Combining satellite imagery and machine learning to predict poverty. Science, 353(6301), 790-794.
- Engstrom, R., Hersh, J., & Newhouse, D. (2016). Poverty from Space: Using high resolution satellite imagery for estimating economic well-being.
- Hersh, J., & Harding, M. (2018). Big Data in economics. IZA World of Labor, 451-451.
- Harding, Matthew & Lovenheim, Michael, 2017. “The effect of prices on nutrition: Comparing the impact of product- and nutrient-specific taxes,” Journal of Health Economics, Elsevier, vol. 53(C), pages 53-71.
- Kleinberg, J., Ludwig, J., Mullainathan, S., & Obermeyer, Z. (2015). Prediction policy problems. The American economic review, 105(5), 491-495.
- McBride, L., & Nichols, A. (2016). Retooling poverty targeting using out-of-sample validation and machine learning. The World Bank Economic Review, 32(3), 531-550.
- Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87-106.
- Wager, S., & Athey, S. (2017). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, (just-accepted).