This is the official webpage for COSC/MATH 4931: Introduction to Data Science at Marquette University for Spring 2017.
Location: 412 Cudahy
Time: TueTh 11:00-12:15
Office hours: TueTh 12:30-2:30 (1 extra hour every week!)
You will find the syllabus here.
- field guide to data science (booze-allen version, really cool in understanding modern data science philosophy) (download)
- jupyter master website (url)
- anaconda python distribution (url)
- github website (url) and desktop app (url)
- jupyter notebook best practices for data science (url)
- building a data science portfolio (url)
- install R (url), R Studio (url) and Shiny (url)
- class github repo (url)
- data science/computation social science conference papers
- cscw acm dl (url)
- cscw ea acm dl (url) (this is the expected format and content for your final project)
- icwsm aaai repo (url)
- www acm dl (url)
- programming resources
- python data science handbook (amazon)
- R for data science (amazon)
- foundations of data science (free!)
- project topic presentation list and rules (pdf)
Jan 17: introduction to the course (slides); reading: 50 years of data science
Jan 19: introduction to jupyter, matplotlib, pandas; start exploring fisher’s dataset.
Jan 24: data exploration with jupyter; reading: tukey’s exploratory data analysis [please follow, fork, star or watch the class github repo]
Jan 26: unsupervised learning; cluster analysis; k-means (centroid based) and DBSCAN (density based); reading: algorithms for clustering data [this is a free book just on clustering; please read for more background and math]
Jan 31: more clustering algorithms; reading: cluster analysis for dummies; k-means data science tutorial series
Feb 2: data cleaning – the tidy data concept; reading: Hadley Wickham – Tidy Data – Journal of Statistical Software – 2014 (we’ll do this in R)
Feb 6: data wrangling with the tidyverse (R)
Feb 9: Carolyn Olsen (Northwestern Mutual) guest lecture
Feb 14: frequentist and bayesian statistics; hypothesis testing; naive bayes introduction; readings: naive bayes explanation , bayes theorem
Feb 16: project topic presentations part I
Feb 21: project topic presentations part II
Feb 23: practical naive bayes algorithm
Feb 28: no class, away to cscw
Mar 2: heather bort guest lecture
Mar 7: introduction to binary logistic regression; readings: simple logistic regression intuition , UCLA logistic regression explanation
Mar 9: work on mid-term project submission class; project due 11:59 pm.
Mar 14: spring break, no class.
Mar 16: spring break, no class.
Mar 21: how do we report statistical results? readings: APA guide, calculating different metrics 1,2,3,
Mar 23: continue discussion on binary logistic regression; readings: simple logistic regression intuition , UCLA logistic regression explanation
Mar 28: introduction to ordinal logit regressions; readings: ucla, princeton, wiki
Mar 30: introduction to support vector machines; readings: kdnuggets, columbia, stackoverflow
Apr 4: recap svms; introduction to decision trees; introduction to boosting; readings: decision trees 1, 2, 3 boosting vs svm, boosting 1, 2
Apr 6: more discussion around boosting, specifically adaboost.
Apr 11: ensemble learning – random forests; readings: breiman and cutler’s intuition, gentle introduction to random forests 1, 2.
Apr 18: introduction to machine learning diagnostics and k-fold crossvalidation; readings: quora (lolz), hyndman blog
Apr 20: introduction to text processing and topic modeling; readings: joy of topic modeling, jdh mimno talk
Apr 25: demo of topic modeling using MALLET; readings: mallet, graham et al. mallet tutorial
Apr 27: introduction to social networks; watch videos of 6 degrees of freedom and social influence; readings: freeman’s centrality, centrality metrics as predictors
May 02: network centrality; glms using centrality metrics as predictors; readings: strength of weak ties, strength of strong ties
May 04: in-class discussion on networks, semester wrap-up, time for in-class evaluations; readings: collective dynamics of small world networks, life in the network: computational social science