






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The syllabus for the course PS 919: Introduction to Machine Learning taught by Adeline Lo in Spring 2020 at the University of Wisconsin-Madison. The course is a graduate-level introduction to machine/statistical learning and covers common techniques to collect, analyze and utilize large and unstructured data for social science questions. The course does not utilize Python or Julia, and all computation work is conducted in R. The syllabus includes course details, an overview of the course, prerequisites, and class and lab information.
Typology: Study notes
1 / 11
This page cannot be seen from the preview
Don't miss anything!







Class meets Wednesdays 3:30pm-5:25pm, Ogg Room 422, North Hall Professor: Adeline Lo Office: 322C, North Hall Contact: [email protected] Office hours: Wednesdays 2:25pm-3:25pm, 5:30pm-6:30pm This course serves as a graduate-level introduction to machine/statistical learning. It will cover some, but not all, common techniques to collect, analyze and utilize large and unstructured data for social science questions. The general goal of this course is to introduce students to modern machine learning techniques and provide the skills necessary to apply the methods widely. Note, this course does not utilize Python or Julia. All computation work is conducted in R.
The following are the broad and specific learning outcomes to the course:
Formally, students should have taken PS 812, 813 and 818 and be comfortable working in the R language. Instructor approval may override course requirements on a case-by-case basis. Auditors should note that assessments will not be provided to them. Informally, and even more importantly, the most essential prerequisite is a willingness to work hard on possibly unfamiliar material. Learning statistics and programming can be like learning new languages, which require time and dedication. Similar to studying languages, fluency and comfort come from daily practice and consistent effort. Students should arrive to the first day of class having completed the following:
Instruction for this course is conducted via two avenues: class and lab. Class lectures compose the first hour and ten minutes (3:30-4:40pm) and will typically focus on statistical material. Lab follows directly after for the remainder of the course time (4:40pm-5:25pm) and will typically focus on practical problem solving and/or computational skills. Both are essential to the learning process.
like to analyze in groups of 3 or 4 and schedule meetings with me to discuss feasibility of your proposed topics in the first couple weeks of class.
Learning by watching and not doing is hard when learning how to use any new tool, so in order to do more and practice with regularity you will have four problem sets. The assignments will be a mix of analytic problems, computer simulations and data analysis. Assignments must be completed in R Markdown, which allows you to show both your answers and the code you used to arrive at them. You will need to submit both the .rmd file and the knitted html/pdf. Problem sets will be made available on Canvas starting Wednesday immediately after class and are due Monday the following week by 9am Central. Solutions will be made available two days after, through Canvas. Working through the problem sets include looking at the solutions key so please remember to do this portion! Problem sets are graded out of 0-5 points. I reserve the right to add bonus points for aesthetics including presentable graphs, clear code, nice formatting and well-written answers. There will be 4 problem sets in total, constituting 40% of your grade. Late problem sets drop a grade by 1 point (out of the total 5) each late day, with a maximum of 2 late days, after which I will not accept the problem set anymore. I do not want to hold up the class and will not wait for everyone to submit their problem sets in order to post the solutions key.
Code Conventions: Throughout the course, students will receive feedback on their code from myself and other students. Therefore, consistent code conventions are critical. Good coding style is an important way to increase the readability of your code (even by a future you!). I strongly recommend you follow the code conventions developed by Hadley Wickham and implemented in the package lintr, which is built into RStudio.
Collaboration Policy: Unless otherwise stated, I encourage students to work together on the assignments, but you should write your own solutions (this includes code). That is, no copy-and-paste from other people’s code. You would not copy-and-paste from someone’s paper, and you should treat code the same way. However, I strongly suggest that you make a solo effort at all the problems before consulting others. Finally, you should credit your colleagues’ help when appropriate!
There will be a takehome quiz in the first few weeks of the course. It, and the problem sets due before it, are styled to help you assess your progress in the course as early as possible; if you are having trouble with the first two weeks of material and/or do poorly on the quiz, I would recommend you come speak to me immediately in office hours and consider deferring taking the course until after you’ve spent more time on statistics and/or programming in R. The quiz will be posted immediately after class and then due two days after by midnight. It is “open-book” in the sense that you can use the slides, your notes, books, and internet resources to answer the questions. However, the quiz must be completed by yourself. It will be the approximate length of a problem set (although I caution that it might take longer if you are used to collaborating on the problem sets). I encourage you to start early.
There will be an in-class miderm in week 8 of the class. It will be a traditional closed-book exam focusing on the theoretical aspects of the material you have covered up to that point.
You will work in groups of 3 or 4 to ask a social science question of a dataset that will culminate in a final presentation which will be presented at the end of the semester. I will provide more details on the project later in the semester.
classmates and I. Unless the question is of a personal nature or completely specific to you, you should not resort to emailing me directly; instead, you should post your questions on Canvas. I will be monitoring the discussion board, but I encourage you to help your classmates as well. Likely a significant amount of overlap will exist for both things people want to know more about and things people have just figured out. This is particularly true given the heavy emphasis on programming in this class.
6 Class Schedule
Note, this WILL change as we roll through the semester, though no exam/final project presentation dates will change. Please check Canvas regularly for up- dates.
January 22, 2020 Suggested readings prior to class: ESL Ch 2
January 29, 2020 Suggested readings prior to class: ESL Ch 3.2, 4.
February 5, 2020 Suggested readings prior to class: ESL Ch 3. Fun/advanced reading: Bien, Jacob, Jonathan Taylor, and Robert Tibshirani. (2013) “A lasso for hierarchical interactions”. The Annals of Statistics.
February 12, 2020 Suggested readings prior to class: ESL Ch 7.2-5, 7.10-7.
February 19, 2020 Suggested readings prior to class: ESL Ch 9.2, 15.1-15. Fun/advanced reading: Athey, Susan and Guido Imbens. (2016) “Recursive partitioning for heterogeneous causal effects”. PNAS.
February 26, 2020 Suggested readings prior to class: ESL Ch 8.2-8.7, ESL Ch 11.3- Fun/advanced reading: Dempster, Art, Nan Laird and Don Rubin. (1977) “Maximum Likelihood from Incomplete Data via the EM Algorithm”. JRSS.
April 15, 2020
April 22, 2020 Fun/advanced reading: Zhang, Yan, A.J. Friend, Amanda Traud, Mason Porter, James Fowler and Peter Mucha. (2008). “Community Structure in Congressional Cosponsorship Networks”. Physica A.
April 29, 2020
Acknowledgements
This course was developed on the shoulders of giants, in some cases borrowing directly from materials developed by amazing colleagues in political science, economics, and statistics. I am extremely grateful to everyone who has contributed directly, or indirectly. Lecture slides and related circulated materials should have appropriate citations – please send me an email if you believe they are incorrectly citing or lacking in citation rigour. Individuals include but are not limited to: Ines Levin, Peter Orbanz, Rochelle Terman, Michelle Torres, Tian Zheng. All errors that remain are my own.
ACADEMIC INTEGRITY
By enrolling in this course, each student assumes the responsibilities of an active participant in UW-Madison’s community of scholars in which everyone’s academic work and behavior are held to the highest academic integrity standards. Academic misconduct compromises (^1) Shortened class: class will end 5pm
the integrity of the university. Cheating, fabrication, plagiarism, unauthorized collabora- tion, and helping others commit these acts are examples of academic misconduct, which can result in disciplinary action. This includes but is not limited to failure on the as- signment/course, disciplinary probation, or suspension. Substantial or repeated cases of misconduct will be forwarded to the Office of Student Conduct & Community Standards for additional review. For more information, refer to https://conduct.students.wisc. edu/academic-integrity/.
ACCOMMODATIONS FOR STUDENTS WITH DIS-
ABILITIES
The University of Wisconsin-Madison supports the right of all enrolled students to a full and equal educational opportunity. The Americans with Disabilities Act (ADA), Wis- consin State Statute (36.12), and UW-Madison policy (Faculty Document 1071) require that students with disabilities be reasonably accommodated in instruction and campus life. Reasonable accommodations for students with disabilities is a shared faculty and student responsibility. Students are expected to inform faculty [me] of their need for instructional accommodations by the end of the third week of the semester, or as soon as possible after a disability has been incurred or recognized. Faculty [I], will work either directly with the stu- dent [you] or in coordination with the McBurney Center to identify and provide reasonable instructional accommodations. Disability information, including instructional accommoda- tions as part of a student’s educational record, is confidential and protected under FERPA. http://mcburney.wisc.edu/facstaffother/faculty/syllabus.php.
DIVERSITY & INCLUSION
Diversity is a source of strength, creativity, and innovation for UW-Madison. We value the contributions of each person and respect the profound ways their identity, culture, background, experience, status, abilities, and opinion enrich the university community. We commit ourselves to the pursuit of excellence in teaching, research, outreach, and diversity as inextricably linked goals. The University of Wisconsin-Madison fulfills its public mission by creating a welcoming and inclusive community for people from every background – people who as students, faculty, and staff serve Wisconsin and the world. https://diversity. wisc.edu/