This is the official webpage for COSC/MATH 4931: Introduction to Data Science at Marquette University for Spring 2017.

**Location:** 412 Cudahy

**Time:** TueTh 11:00-12:15

**Office hours:** TueTh 12:30-2:30 (1 extra hour every week!)

You will find the syllabus here.

**Resources:**

- field guide to data science (
**booze-allen version, really cool in understanding modern data science philosophy**) (download) - jupyter master website (url)
- anaconda python distribution (url)
- github website (url) and desktop app (url)
- jupyter notebook best practices for data science (url)
- building a data science portfolio (url)
- install R (url), R Studio (url) and Shiny (url)
- class github repo (url)
- data science/computation social science conference papers
- programming resources
- project topic presentation list and rules (pdf)

**Timeline:**

**Jan 17:** introduction to the course (slides); **reading:** 50 years of data science

**Jan 19:** introduction to jupyter, matplotlib, pandas; start exploring fisher’s dataset.

**Jan 24:** data exploration with jupyter; **reading:** tukey’s exploratory data analysis [please follow, fork, star or watch the class github repo]

**Jan 26:** unsupervised learning; cluster analysis; k-means (centroid based) and DBSCAN (density based); **reading:** algorithms for clustering data [this is a free book just on clustering; please read for more background and math]

**Jan 31:** more clustering algorithms; **reading:** cluster analysis for dummies; k-means data science tutorial series

**Feb 2:** data cleaning – the tidy data concept; **reading: **Hadley Wickham – Tidy Data – Journal of Statistical Software – 2014 (**we’ll do this in R**)

**Feb 6:** data wrangling with the tidyverse (R)

**Feb 9:** Carolyn Olsen (Northwestern Mutual) guest lecture

**Feb 14:** frequentist and bayesian statistics; hypothesis testing; naive bayes introduction; **readings:** naive bayes explanation , bayes theorem

**Feb 16: **project topic presentations part I

**Feb 21: **project topic presentations part II

**Feb 23:** practical naive bayes algorithm

**Feb 28:** no class, away to cscw

**Mar 2:** heather bort guest lecture

**Mar 7: **introduction to binary logistic regression; **readings:** simple logistic regression intuition , UCLA logistic regression explanation

**Mar 9:** work on mid-term project submission class; project due 11:59 pm.

**Mar 14:** spring break, no class.

**Mar 16:** spring break, no class.

**Mar 21:** how do we report statistical results? **readings:** APA guide, calculating different metrics 1,2,3,

**Mar 23: **continue discussion on binary logistic regression; **readings:** simple logistic regression intuition , UCLA logistic regression explanation

**Mar 28:** introduction to ordinal logit regressions; **readings:** ucla, princeton, wiki

**Mar 30:** introduction to support vector machines; **readings:** kdnuggets, columbia, stackoverflow

**Apr 4:** recap svms; introduction to decision trees; introduction to boosting; readings: decision trees 1, 2, 3 boosting vs svm, boosting 1, 2

**Apr 6:** more discussion around boosting, specifically adaboost.

**Apr 11:** ensemble learning – random forests; **readings:** breiman and cutler’s intuition, gentle introduction to random forests 1, 2.

**Apr 18:** introduction to machine learning diagnostics and k-fold crossvalidation; **readings:** quora (lolz), hyndman blog

**Apr 20:** introduction to text processing and topic modeling; **readings:** joy of topic modeling, jdh mimno talk

**Apr 25:** demo of topic modeling using MALLET; **readings:** mallet, graham et al. mallet tutorial

**Apr 27: **introduction to social networks; watch videos of 6 degrees of freedom and social influence; **readings:** freeman’s centrality, centrality metrics as predictors

**May 02:** network centrality; glms using centrality metrics as predictors; **readings:** strength of weak ties, strength of strong ties

**May 04:** in-class discussion on networks, semester wrap-up, time for in-class evaluations; **readings:** collective dynamics of small world networks, life in the network: computational social science