COSC/MATH 4931: Introduction to Data Science

This is the official webpage for COSC/MATH 4931: Introduction to Data Science at Marquette University for Spring 2017.

Location: 412 Cudahy

Time: TueTh 11:00-12:15

Office hours: TueTh 12:30-2:30 (1 extra hour every week!)

You will find the syllabus here.

Resources:

  • field guide to data science (booze-allen version, really cool in understanding modern data science philosophy) (download)
  • jupyter master website (url)
  • anaconda python distribution (url)
  • github website (url) and desktop app (url)
  • jupyter notebook best practices for data science (url)
  • building a data science portfolio (url)
  • install R (url), R Studio (url) and Shiny (url)
  • class github repo (url)
  • data science/computation social science conference papers
    • cscw acm dl (url)
    • cscw ea acm dl (url) (this is the expected format and content for your final project)
    • icwsm aaai repo (url)
    • www acm dl (url)
  • programming resources
    • python data science handbook (amazon)
    • R for data science (amazon)
    • foundations of data science (free!)
  • project topic presentation list and rules (pdf)

Timeline:

Jan 17: introduction to the course (slides); reading: 50 years of data science

Jan 19: introduction to jupyter, matplotlib, pandas; start exploring fisher’s dataset.

Jan 24: data exploration with jupyter; reading: tukey’s exploratory data analysis [please follow, fork, star or watch the class github repo]

Jan 26: unsupervised learning; cluster analysis; k-means (centroid based) and DBSCAN (density based); reading: algorithms for clustering data [this is a free book just on clustering; please read for more background and math]

Jan 31: more clustering algorithms; reading: cluster analysis for dummies; k-means data science tutorial series

Feb 2: data cleaning – the tidy data concept; reading: Hadley Wickham – Tidy Data – Journal of Statistical Software – 2014 (we’ll do this in R)

Feb 6: data wrangling with the tidyverse (R)

Feb 9: Carolyn Olsen (Northwestern Mutual) guest lecture

Feb 14: frequentist and bayesian statistics; hypothesis testing; naive bayes introduction; readings:  naive bayes explanation , bayes theorem

Feb 16: project topic presentations part I

Feb 21: project topic presentations part II

Feb 23: practical naive bayes algorithm

Feb 28: no class, away to cscw

Mar 2: heather bort guest lecture

Mar 7: introduction to binary logistic regression; readings: simple logistic regression intuition  , UCLA logistic regression explanation

Mar 9: work on mid-term project submission class; project due 11:59 pm.

Mar 14: spring break, no class.

Mar 16: spring break, no class.

Mar 21: how do we report statistical results? readings: APA guide, calculating different metrics 1,2,3,

Mar 23: continue discussion on binary logistic regression; readings: simple logistic regression intuition  , UCLA logistic regression explanation

Mar 28: introduction to ordinal logit regressions; readings: ucla, princeton, wiki

Mar 30: introduction to support vector machines; readings: kdnuggets, columbia, stackoverflow

Apr 4: recap svms; introduction to decision trees; introduction to boosting; readings: decision trees 1, 2, 3 boosting vs svm, boosting 1, 2

Apr 6: more discussion around boosting, specifically adaboost.

Apr 11: ensemble learning – random forests; readings: breiman and cutler’s intuition, gentle introduction to random forests 1, 2.

Apr 18: introduction to machine learning diagnostics and k-fold crossvalidation; readings: quora (lolz), hyndman blog

Apr 20: introduction to text processing and topic modeling; readings: joy of topic modeling, jdh mimno talk

Apr 25: demo of topic modeling using MALLET; readings: mallet, graham et al. mallet tutorial

Apr 27: introduction to social networks; watch videos of 6 degrees of freedom and social influence; readings: freeman’s centrality, centrality metrics as predictors

May 02: network centrality; glms using centrality metrics as predictors; readings: strength of weak ties, strength of strong ties

May 04: in-class discussion on networks, semester wrap-up, time for in-class evaluations; readings: collective dynamics of small world networks, life in the network: computational social science