Means, Deviations, and Standard Errors

For this activity, we are going to be working with data from the Census of Adult Correctional Facilities collected and maintained by the Bureau of Prisons. For this activity, we are going to be looking at the use of disciplinary citations in different prison contexts. Disciplinary citations are used by prison staff to maintain order and control, but are sometimes overused, abused, or used extensively when inmates react to persistent mistreatment. In short, they can sometimes be considered a proxy for the environment in a prison.

Let’s start by summarizing the variable discipline in the dataset:

sum discipline

Which should look like:

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
  discipline |      3,311    725.2921    1633.571          0      53988

So correction facilities have an average of 725 disciplinary citations given to inmates per year. Note that the standard deviation is 1633.57 and there are 3,311 observations in our sample. If we apply our formula for measuring the spread of the sampling distribution for samples of 3311 prisons (i.e., calculate the standard error), we would get something like:

\[s.e. = \dfrac{\sigma}{\sqrt{n}} \rightarrow s.e. = \dfrac{1633.57}{\sqrt{3311}} \rightarrow s.e. = \dfrac{1633.57}{\pm 57.54} \rightarrow s.e. = \pm28.39\]

Confidence Intervals

Extending this further, if we wanted to know the 95% confidence interval of disciplinary citations used in our sample of prisons, we would use the z-score associated with 95% of observations falling between it in a normal distribution (see a z-table to confirm). Recall that the z-score for 95% will have 0.025 observations below it in the z-table because there are two tales in the distribution and 2 * 0.025 is 0.05 or 5%. We would calculate out confidence interval as:

\[C.I. = \overline{x} \pm (Z * s.e.) \rightarrow C.I. = 725.2921 \pm (1.96 * 28.39) \rightarrow C.I. = 725.2921 \pm 55.6444\]

Resolving equation (2) gives us a 95% confidence interval that ranges from 669.6477 to 780.9365. Of course, the reason we want to use tools like Stata when doing data analysis is because we do not want to have to calculate these statistics by hand for every analysis we run. Stata can apply the formulas for us with a simple command: mean. Using mean instead of sum like this:

mean discipline

Mean estimation                          Number of obs = 3,311

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
  discipline |   725.2921   28.38954      669.6292    780.9549
--------------------------------------------------------------

Note that the standard error and 95% confidence interval are provided for us! Of course, because Stata does not round and we did, our hand calculation is slightly different at the second decimal level. Still, Stata is making the same calculation we did, only without rounding.

Of course, sometimes we might be interested in a more relaxed or more restrictive confidence interval. Stata defaults to the 95% confidence interval because in most scientific analyses, 95% confidence has become a conventional threshold for acceptable confidence in an estimate. Using the , level(#) option in Stata can allow us to estimate a different confidence interval if needed. For example, the more relaxed 90% confidence interval can be estimated with this code:

. mean discipline, level(90)

Mean estimation                          Number of obs = 3,311

--------------------------------------------------------------
             |       Mean   Std. err.     [90% conf. interval]
-------------+------------------------------------------------
  discipline |   725.2921   28.38954      678.5823    772.0018
--------------------------------------------------------------

…and here we can see the more restrictive 99% confidence interval:

. mean discipline, level(99)

Mean estimation                          Number of obs = 3,311

--------------------------------------------------------------
             |       Mean   Std. err.     [99% conf. interval]
-------------+------------------------------------------------
  discipline |   725.2921   28.38954      652.1233    798.4609
--------------------------------------------------------------

Note that the range for the 90% confidence interval contains fewer values and the range for the 99% confidence interval contains more values. That’s because when we include a wider range of possible estimates of X-bar, we can be more confident that the true value of mu falls within that range of possible estimates.

Means and Confidence Intervals by Categories

We care about the average number of disciplinary citations (discipline) because the more citations used by guards might suggest a more turbulent environment in the prison. It could be that the guards are being excessively punitive with inmates or that inmates are simply more likely to violate rules, but either way, extensive use of citations might suggest something is amiss in the prison. As policy analysts, we might be interested in looking at how different programs change the number of citations a prison hands out to inmates. First, let’s see if the average number of citations is higher at higher security prisons. If citations capture something about the prison environment, we might expect maximum security prisons to have more citations handed to inmates, on average, than minimum security prisons. Let’s use the over() option of the mean command to see how the average number of disciplinary citations varies by prison security level:

. mean discipline, over(security_level)

Mean estimation                                         Number of obs = 3,311

-----------------------------------------------------------------------------
                            |       Mean   Std. err.     [95% conf. interval]
----------------------------+------------------------------------------------
c.discipline@security_level |
                   Sup Max  |   1473.957   609.1273      279.6531    2668.262
                       Max  |   1614.596    109.623       1399.66    1829.532
                       Med  |   953.3715   38.76677      877.3622    1029.381
                       Min  |    217.296   15.33997      187.2192    247.3728
-----------------------------------------------------------------------------

This looks about how we might expect. Higher security level prisons, on average, give inmates a lot more disciplinary citations than lower security level prisons.

Hypothesis Testing

Single Sample T-Test

Now, let’s say after a lawsuit against some prisons that excessively cite inmates, a judge has capped their use of citations at 300. We have been tasked with examining prisons with a court order and seeing if they are adhering to the cap. In the dataset, order_discipline is an indicator variable that is equal to 1 if the judge has ordered them to change their citation practices. Let’s look at the mean and standard deviation of these prisons:

. tabstat discipline if order_discipline == 1, statistics(mean sd)

    Variable |      Mean        SD
-------------+--------------------
  discipline |  925.5283  1191.809
----------------------------------

Our sample of prisons with a court order gave out an average of 925 citations, which certainly seems like prisons are ignoring court orders. But, do we think this is a systematic problem or did we just get a fluke sample? Here is where hypothesis testing can be handy. To assess how confident we are that we did not just get a fluke sample, we can assume that the average prison with a court order follows the order and uses 300 citations. We can then use the properties of a normal distribution to assess the likelihood of getting a sample with an average of 925 citations if the true average number of citations is 300. In place of doing this by hand, we will let Stata calculate the relevant statistics for us using the ttest command.

ttest discipline==300 if order_discipline == 1

In the above code, we are telling to conduct a hypothesis test that the average number of disciplinary citations among prisons with a court order is 300 (i.e., we are testing the hypothesis that the prisons are following the court order, on average). The discipline==300 part of the code tells Stata to test whether the average of discipline is equal to 300 citations. The if order_discipline == 1 part of the code sets the condition that only the sample of prisons with a court order is included in our hypothesis test. Using that code will give us output that looks like this:

One-sample t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
discip~e |      53    925.5283    163.7075    1191.809    597.0252    1254.031
------------------------------------------------------------------------------
    mean = mean(discipline)                                       t =   3.8210
H0: mean = 300                                   Degrees of freedom =       52

   Ha: mean < 300               Ha: mean != 300               Ha: mean > 300
 Pr(T < t) = 0.9998         Pr(|T| > |t|) = 0.0004          Pr(T > t) = 0.0002

In the above output, Stata gives us a table with the N, mean, standard error, standard deviation, and 95\% confidence interval for the sample. Below the table, you can see that Stata restates the null hypothesis you tested. In this case, it was H0: mean = 300, which is another way of saying that the null hypothesis we tested was that the average number of citations at prisons in the sample was 300. Of course, as the table should, the average we actually see in our sample is 925 citations. At the bottom of the output, Stata calculates the relevant test statistics for three alternative hypotheses: that the mean is actually less than 300, that the mean is not equal (!= is Stata’s way of saying “not equal”) to 300, and that the mean is actually greater than 300. Looking at the middle hypothesis, we can see that the p-value for the alternative hypothesis that the mean is not equal to 300 is 0.0004. This tells us that if the mean actually was 300 (i.e., that the null hypothesis really was true), only 0.04% of samples of 53 prisons would have 925 citations given out on average. Put another way, we can be very confident that the null hypothesis is not true and we have strong evidence that prisons are not following the court orders on average.

Note that the 95% confidence interval also provides some evidence that we can reject the null hypothesis even if we didn’t have the p-value immediately handy. The confidence interval suggests that 95% of samples of 53 prisons with court orders will have a mean between 597 citations and 1254 citations. In other words, most samples will have a mean greater than the 300 citation court ordered limit, and we can confidently reject the null hypothesis with that information alone. The p-value allows us to be more precise, but the confidence interval can still be informative.

Two-Sample T-Test

Of course, we are often interested in alleviating problems and assessing whether our attempts to alleviate problems are successful. Let’s say we are interested in assessing whether work release programs, programs where inmates are allowed to leave the jail or prison and work at a job, improve the climate at the prison. Here, we might calculate the average number of citations for prisons with work release programs and prisons without work release programs, like so:

. mean discipline, over(workrelease)

Mean estimation                                      Number of obs = 3,263

--------------------------------------------------------------------------
                         |       Mean   Std. err.     [95% conf. interval]
-------------------------+------------------------------------------------
c.discipline@workrelease |
                      0  |   951.4821   38.91204      875.1876    1027.777
                      1  |    207.597   19.06668      170.2132    244.9809
--------------------------------------------------------------------------

Looks promising for our program! Just a comparison of averages seems to suggest that prisons with work release programs, on average, have far fewer citations given to inmates than prisons without work release programs. Of course, we might be worried that such programs are less common in higher security level prisons. Let’s restrict our analysis to only minimum security level prisons:

. mean discipline if security_level == 4, over(workrelease)

Mean estimation                                      Number of obs = 1,592

--------------------------------------------------------------------------
                         |       Mean   Std. err.     [95% conf. interval]
-------------------------+------------------------------------------------
c.discipline@workrelease |
                      0  |   336.5699   31.20597      275.3608    397.7791
                      1  |   116.8658   7.503047      102.1489    131.5827
--------------------------------------------------------------------------

Our work release program still looks promising, but the gap between the two groups of prisons is certainly much smaller. Now, we want to be confident that this difference we are seeing is a real difference and not just statistical noise. Put another way, we want to test the null hypothesis that the average difference in citations between prisons with and without work release is actually 0.

Again, we will return to our t-test command, just like we did before, only this time, we will tell Stata to test the difference in means between two groups: prisons without work release and prisons with work release. We will keep our restriction to minimum security prisons. By using the , by() option in Stata, we can tell Stata to run the t-test between all categories in the the variable work release. Stata will use the null hypothesis that the difference between the categories is actually 0 and compute the test statistics needed for testing that hypothesis. The code looks like this:

. ttest discipline if security_level == 4, by(workrelease)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
       0 |     765    336.5699    31.20597    863.1145    275.3103    397.8296
       1 |     827    116.8658    7.503047    215.7697    102.1385    131.5931
---------+--------------------------------------------------------------------
Combined |   1,592    222.4397    15.73096    627.6634    191.5841    253.2953
---------+--------------------------------------------------------------------
    diff |            219.7042    31.01002                158.8793     280.529
------------------------------------------------------------------------------
    diff = mean(0) - mean(1)                                      t =   7.0849
H0: diff = 0                                     Degrees of freedom =     1590

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000

Here, Stata now computes a table with the N, mean, standard error and deviation, and 95% confidence interval for both groups; the same statistics for the two groups combined; and the difference between the two group averages. Again, the difference seems quite large: prisons without work release, on average, hand out 219 more citations than prisons with work release.

Below the table, you can see that Stata has provided the null and alternative hypotheses. The null is that the average of both groups is actually the same, or that there is no average difference in citations between prisons with and without work release (Stata puts this as H0: diff = 0). Again, looking to the alternative hypothesis that the difference between these two groups of prisons is not 0, we can see that there is a 0.0000 probability of a sample of 1,592 prisons showing an average difference of 219 citations if the null hypothesis of no difference was true. In other words, we can be very confident in rejecting the null hypothesis that work release programs make no difference in citation rates!