Introduction

    Going to college has an opportunity cost, and there could be a variety of reasons why someone might not pursue or complete higher education. One of them could be that the costs on school exceed the affordability. Studying residents’ ability to afford learning may provide insights into reasons that one may not graduate from college. This knowledge would be valuable in understanding the challenges behind education and lead to potential approaches to target this issue. Therefore, I would like to investigate the relationship between household income, student spending, percentage of residents, and percentage of residents who are college graduates. The question explored in this research is: How would the mean household income and student spending be related to the percentage of residents who graduated from college in each state? Higher mean household income in a state would imply that residents in that state overall had more financial power to afford learning, thus the mean household income was hypothesized to be positively related to the percentage of residents who graduated from college in each state. At the same time, state average student spending was hypothesized to be negatively related to the percentage of residents who were college graduates in each state, as higher mean student spending in a state would suggest that residents overall had more difficulty affording college. Partly aligning with the hypothesis, the results found that both mean household income and student spending were positively related to the percentage of residents who were college graduates in a state, controlling for the other variable. The analysis also found an unexpected correlation between mean household income and student spending.

Methods

    The investigation on that question used part of the data from the U.S. Census Bureau Five-Year American Community Survey from 2013 to 2017. This data was collected from residents in the 50 states in the United States, and this study used the following variables in the dataset: household income, student spending, and percentage of residents who are college graduates. The household income was measured as a mean for each state from 2013 to 2017 and represented in $1000. Student spending represented the average school spending for one pupil in each state in 2013, represented in $1000. Lastly, the percentage of residents who are college graduates accounted for the proportion of residents between ages 25 and 34 who were college graduates in each state from 2013 to 2017. I will use a multilinear regression model to investigate the relationship between these variables to find out if the mean household income and school spending would influence the percentage of residents who are college graduates. Specifically, the multilinear model would use the mean household income and the school spending as explanatory variables and the percentage of residents who are college graduates as the response variable. Although Census randomly selected participants to respond to the survey each year, the average values of the states would vary due to random selections of individual participants, but it could not be considered as randomly collected data. Therefore, the analysis should be careful in assuming the independence between data points.

Analysis and Results

    There was multicollinearity between mean household income and student spending. As shown in Table 1, the t-test for correlation between the two explanatory variables resulted in a correlation of 0.61 with a p-value of 2.6e-6. There was strong evidence that the correlation between household income and student spending was greater than 0, suggesting that household income increased as student spending increased. Therefore, the multilinear model for the relationships should account for this influence.

Correlation between Household Income and Student Spending
estimate statistic p.value parameter conf.low conf.high method alternative
0.61 5.33 0 48 0.4 0.759 Pearson’s product-moment correlation two.sided
Effects of Household Income and Student Spending on Percent of College Graduates in Each State
term estimate std.error statistic p.value
(Intercept) 6.649 3.773 1.76 0.085
HouseholdIncome 0.330 0.081 4.06 0.000
StudentSpending 0.616 0.230 2.68 0.010

    The analysis partly aligned with the hypothesis. While I suspected a negative relationship between student spending and the percentage of residents in a state who were college graduates, the percentage of residents who are college graduates was positively correlated with both the mean household income and the student spending per pupil. The predicted model for the percentage of college graduates was:

\[ Percent\ Graduated\ from\ College = 6.65\ +\ 0.33\ *\ HouseholdIncome\ +\ 0.62\ *\ StudentSpending \]

    The F-test for this model was significant (p = 5.22e-9). Household income and student spending explained 55.58% of the variance in the percentage of college graduates from the multiple \(R^2\). Both coefficients for household income and student spending were significant. The coefficient of household income was 0.33 with a p-value of 0.00018, and the coefficient of student spending was 0.62 with a p-value of 0.01. These individual slope tests suggested that the percentage of college graduates in a state increases by 0.33% for each $1000 increase in state mean household income, while allowing for simultaneous change in mean student spending. Additionally, the tests predicted that the percentage of college graduates in a state increases by 0.62% for each $1000 increase in student spending, while allowing for simultaneous change in mean household income. However, the value of the intercept was not as significant, with a p-value of 0.08, implying that the model provided some evidence, but not strong evidence, that the percentage of college graduates in a state was 6.65 when the household income and student spending were both zero.

    A linear model was somewhat sufficient to fit this data. The data was mostly linear but had a larger variance when the modeled percentage of college graduates in each state was higher compared to when that value was lower. This indicated that the household income and student spending were less effective at predicting the percentage of college graduates in states that had more residents who were college graduates compared to states that had fewer residents who were college graduates. The data appeared to have a zero mean but did not align well with the normal distribution, hindering its ability to make predictions. There was a leverage point that did not influence the data.

    Alaska, Massachusetts, and Wyoming were outliers that had extreme values that may have skewed the model equation. Alaska had a relatively high mean household income and a moderately high student spending among the states but had a comparatively low college graduate percentage. Similarly, Wyoming had moderate value for mean household income and relatively high student spending, but it also had a comparably low college graduate percentage. In contrast, Massachusetts had high household income and slightly lower student spending than Alaska and Wyoming, but it had the highest college graduate percentage among all states. In sum, a multilinear model had acceptable performance at outlining the positive relationship between the percentage of residents who were college graduates in each state and household income and student spending.

Discussion

    The results partly support the hypothesis, with the mean household income of the states positively related to the percent of residents who were college graduates in each state. However, contrasting with the hypothesis, student spending is also positively related to the proportion of residents who are college graduates in each state. Unexpectedly, there is a positive relationship between household income and student spending, which are the two predictors for the percent of residents who were college graduates in each state. This may be because the states where residents have higher incomes may also have higher costs of living, including the cost for school. Additionally, the results are from a sample of US state averages and can be generalized to predict future residents’ college graduation percentage in each state, but they cannot be utilized to predict whether an individual can graduate from college based on their household income and student spending.

    A limitation of this analysis is the multicollinearity between household income and student spending, increasing variance in the estimated values and making it difficult to examine the individual influences that each explanatory variable has on the response variable. In addition, the data does not represent a fully independent sample to ensure a reliable multilinear model. Finally, the study does not suggest that mean household income and student spending have a causal effect on the percentage of residents who are college graduates in a state.

    To address these limitations, further analysis can include more statistical tests to identify the more effective predictor and remove the one that is less predictive for a better model fit. Another way to resolve multicollinearity is to collect more data points, which is not plausible in this study since all 50 states in the US have all been represented in the current data set. To resolve this sample limitation, future studies can focus on a smaller scale and look at the mean household income, student spending, and percentage of college graduates among residents of a county or major cities in each state. The more detailed approach will allow the inclusion of more data points to better understand the relationship between these variables. Although it is difficult to test a causal relationship between household income, student spending, and percentage of college graduates in a state, future research can explore other relevant factors such as school resources availability, incentives for schooling, and reasons for dropping out of college for a more comprehensive understanding of the influences for percentage of college graduates in an area.

Reference

U.S. Census Bureau. (n.d.). 2013-2017 ACS 5-year Estimates. U.S. Department of Commerce. https://www.census.gov/programs-surveys/acs/technical-documentation/table-and-geography-changes/2017/5-year.html

Appendix

Diagnostic plots for the linear model of Percentage of Residents Graduated from College