Assignments
All problem sets assigned each week and the data needed to complete them can be found here. Complete the assignment in a Word document with your answers numbered following the problems in the assignment. For problems done by hand, show your work. For problems done in Stata, show your code and the relevant output from your log file. Turn in your assignment on Blackboard.
Assignment 1
Assignment 1 Data for Assignment 1
About This Data: The data in this homework comes from the Student-Teacher Achievement Ratio (STAR) study in Tennesee. In 1985, a randomized control trial (RCT) was conducted in Tennessee to examine whether students assigned to small classrooms (in this case, 13-17 students) learned more than students assigned to regular-sized classes (in this case 22-25 students). The small classroom is considered the treatment group. As we cover later in the course, RCT’s are the gold standard of establishing the causal effect of a treatment (here, smaller classes) on an outcome of interest (here, student learning as measured by standardized scores). Causal inference generally refers to using statistical methods to approximate RCTs. Since students were randomized into classrooms, they were also, by extension, randomly assigned to teachers. Researchers have also used this data to examine the impacts of teacher demographics on student outcomes, finding potential benefits from increasing diversity in the teacher workforce.
Assignment 2
Assignment 2 Data for Assignment 2
About This Data: The data in this homework comes from the American Community Survey (ACS). The ACS is an annual survey of a nationally representative sample of households in the United States. It provides rich information about household’s earnings, education, use of services, commuting patters, and various other policy relevant characteristics. Researchers and policy-makers look to data from the ACS to investigate all sorts of questions, often times using the geographic information to learn about neighborhood quality effects on individuals’ well-being. For instance, a team of researchers used data from a randomized control trial involving access to affordable housing in conjunction with neighborhood-level data from the ACS to estimate the effect of various neighborhood characteristics on long-term outcomes for low-income people.
Assignment 3
Assignment 3 Data for Assignment 3
About This Data: The data in this homework comes from the Public-Police Contact Survey (PPCS), conducted and maintained periodically by the Bureau of Justice Statistics. The PPCS collects data from a nationally representative sample of people who had contact with the police, either through reporting a crime or during a traffic stop, and collects information about the respondent, the stop, the outcome from the stop, and how the respondent felt about various aspects of their interaction with the police. Researchers have used this in the past to examine how contacts with police vary by race and gender. The extra credit for this assignment uses data from the Education Longitudinal Study of 2002. I have worked with the ELS extensively, using it to document racial disparities in teacher expectations and how experience might change this, examine the role of base of motivation in workers’ sector preferences, how schools shape prosocial motivation, and examine the link between intergroup contact and prosocial motivation. Obviously, panel data can be extremely useful for researchers in answering a wide range of questions.
Assignment 4 - Extra Credit
Assignment 4 Data for Assignment 4
About This Data: The data in this homework comes from the Cooperative Congressional Election Study, a nationally representative survey of voting age adults that is fielded before and after every federal election. The data provides a lot of information about voters, voting-aged adults, and how they think about the country, policy, ideology, and party preferences. Political scientists often rely on this data to get a sense of how the electorate has changed over time, how state policies or legislative priorities shift voting coalitions, and a wide range of other important insights about our democracy and its voters. Recent work has used the CCES to study the sources of polarization, the extent to which geographic sorting contributes to polarization, framing effects on support for automatic voter registration, and racial attitudes in the electorate. I have used data from the CCES to consider whether altruism or public service motivation better explain volunteerism and how public service motivation might correspond with policy preferences.
Assignment 5
Assignment 5 Data for Assignment 5
About This Data: The data in this homework returns to the world of time-use data from the American Time Use Survey (ATUS) and the world of politics from the Cooperative Congressional Election Study (CCES). The ATUS includes data from retrospective time diaries from a nationally representative sample. It allows researchers and policymakers to get a sense of how Americans spend their time on a typical day. I have used the ATUS in my research to examine gender gaps in time spent on homework among high school students, gaps between public and private sector workers in time spent volunteering, and inequality in time spent waiting for routine services. Other researchers have used the ATUS to study labor and leisure trade-offs by income, gender differences in care taking and childcare obligations, and a wide range of other policy relevant issues. The CCES and its applications to research were covered by Assignment 4.
Midterm Practice Set
Assignment 6
Assignment 6 Data for Assignment 6
About This Data The data in this homework again comes from the ATUS dataset, this time focusing on time spent working. The assignment is structured very similarly to the class exercise we did to practice working with confidence intervals and sampling distributions. You can find the solutions to those exercises here.
Assignment 7
Assignment 7 Data for Assignment 7
About This Data The data in this homework comes from the Census of State and Federal Adult Correctional Facilities (CCF). The CCF is collected by the Bureau of Justice Statistics every 5 to 7 years and provides a comprehensive look at the staffing, conditions, programs, and inmate populations at all corrections facilities in the country. Data on corrections facilities in a standardized and comparable format is often hard to come by, and the CCF provides a useful means for researchers and policymakers to assess the conditions of the nation’s prisons. Questions about the efficacy of various programs in improving conditions in prisons, the relative size and density of prisons, and the staff-to-inmate ratios in different kinds of prisons are potential applications for the data. In my own work, I have used this data to compare public and private prison conditions and examine the link between staff demographics and prison violence.
Assignment 8
Assignment 8 Data for Assignment 8
About This Data The data in this homework comes from a randomized control trial (RCT) in India. The RCT was carried out by Nobel laureats Esther Duflo and Abhijit Banerjee (with co-authors). They received their Nobel in Economics for increasing the use of randomized control trials in testing theoretically derived propositions in economics, international development, and political economy. This particular RCT was implemented to test how parents might use performance information on their child’s academic progress to ensure their schooling improves. One common theory is that when parents have more information about their child’s academic performance, they will either engage with the school more to demand improvement, make more effort to teach the child at home to complement their schooling, or enroll the student in additional schooling. However, after randomly assigning parents to reciving more information about school performance, tools to test their child to monitor progress, or access to remedial schooling, they found no effects - which suggests providing performance information alone is insuficient to improve public services. You will be using a cleaned version of the data used in this study to implement two-sample t-tests.
Assignment 9
Assignment 9 Data for Assignment 9
About This Data The data in this homework come from the National Health Interview Survey (NHIS). The NHIS is run by the Centers for Disease Control and Prevention (CDC) and collects health information from a nationally representative sample of the U.S. every year. The NHIS provides a rich set of information about respondents’ demographics, socioeconomic status, access to healthcare, use of various health programs, self-reported health information, and verious measures of habits and behaviors related to health. Public health researchers often use data like the NHIS to monitor population level trends in disease and various dimensions of health. For instance, recent work has examined how cardiovascular disease in the U.S. has changed in recent decades, the linkages between psychological distress and death, and racial gaps in the health benefits of educational attainment. You will be using a small random sample from the NHIS to implement a simple, bivariate linear regression that mirrors what we did in Class Lab 7.
Assignment 10
Assignment 10 Data for Assignment 10
About This Data The data in this homework returns to the National Health Interview Survey (NHIS) to expand on some of the predictors of mental health we examined in Assignment 9. You will implement a simple bivariate linear regression and then add controls to the regression to examine how accounting for additional factors alters our understanding of the relationship in which are most interested.