Yuning Wang (yw3438); Aiming Liu (al3998); Qi Lu (ql2370); Yucong Jiang (yj2581); Weiran Zhang (wz2506) 12/02/2019
Here’s the map of bad mental health prevalence
The prevalence of mental illness among U.S. adults(2017)
Understanding the risk factors for bad mental health
Visualizing correlation of the risk factors and bad mental health
visualizing the distribution of risk factors and bad mental health prevalence across United States.
Giving suggestions about reducing prevelance of bad mental health
We tried to figure out the possible causes of bad mental health or mental health illness. We were also interested in visualizing the distribution of risk factors and mental illness prevalence prevalence in the United states, by state and by cities. In addition, we also sought to figure out the correlations between bad mental health and chronic disease and bad living behaviors. As a result, the questions were as followed:
What are the risk factors of bad mental health? What are the correlations between the outcome and the factors? How to visualize them?
What are the distributions of bad mental health and risk factors in different states and cities in the United States? How to visualize them?
Are there any ways for us to reduce the prevalence of bad mental health?
In order to do the analysis, we utilized the 500 Cities: Local Data for Better Health, 2018 release dataset. This dataset includes 2016, 2015 model-based small area estimates for 27 measures of chronic disease related to unhealthy behaviors (5), health outcomes (13), and use of preventive services (9). Data were provided by the Centers for Disease Control and Prevention (CDC), Division of Population Health, Epidemiology and Surveillance Branch. The project was funded by the Robert Wood Johnson Foundation (RWJF) in conjunction with the CDC Foundation. Other datas about U.S. states and territories’ income, U.S. states and territories’ educational attainment and U.S. population race were also used in analyzing the cause of mental health problems.
We downloaded the datasets and filtered the variables that we were interested in and established a new dataset. The risk factors that we are interested in include:
Outcome that we were interested in is:
First, we read in the dataset, named it as total_data
and filtered the variables that we need.
To make inferences of the relationships between mental health and five variables including obesity, diabetes, drinking, cancer and sleeping which represent the mean prevalence in each cities in the United States in our dataset, we build five linear regression models and plot them. Based on the observations from the five regression models, diabetes, obesity and sleeping have positive association with mental health. drinking and cancer have negative association with mental health.
library(flexdashboard)
library(tidyverse)
library(patchwork)
library(ggridges)
library(viridis)
library(plotly)
# read in the data
mental = read.csv("./data/health data.csv")%>%
janitor::clean_names() %>%
as_tibble() %>%
filter(measure == "mental health" | measure == "obesity" | measure == "diabetes" | measure == "drink" | measure == "cancer" | measure == "sleeping")
mental_data = mental %>%
filter(measure == "mental health")
# read in income, education and ethnicity data
state_stat <- read_csv("data/state_stat.csv")
#deal with data
total_data = mental %>%
pivot_wider(names_from = measure, city_name, values_from = data_value) %>%
unnest() %>%
drop_na() %>%
group_by(city_name) %>%
summarize(obesity = mean(obesity), drink = mean(drink), diabetes = mean(diabetes), mental_health = mean(`mental health`), cancer = mean(cancer), sleeping = mean(sleeping))
# Drink
drink_lm = lm(total_data$mental_health ~ total_data$drink, total_data)
drink_plot =
plot_ly(x = total_data$drink, y = total_data$mental_health,
type = "scatter",
mode = "markers",alpha = .5, name = "distribution") %>%
add_lines(x = total_data$drink, y = fitted(drink_lm), name = "linear model") %>%
layout(title = 'Binge Drink', colorway = c('#F5B7B1', '#B03A2E'))
drink_plot
From the plots of drinking, we can see a slightly negative relationship. To test if the negative relationship between mental health and drinking is significant or not, we perform a t-test: By using 0.05 significance level, p-value is less than 2e-16, so we can conclude that the negative relationship between mental health and drinking is significant and the regression model is \(Y = -0.30765X + 0.03355\).
# Diabetes
diabetes_lm = lm(total_data$mental_health ~ total_data$diabetes)
diabetes_plot =
plot_ly(x = total_data$diabetes, y = total_data$mental_health,
type = "scatter",
mode = "markers",alpha = .5) %>%
add_lines(x = total_data$diabetes, y = fitted(diabetes_lm), name = "distribution") %>%
layout(title = 'Diabetes', colorway = c('#BB8FCE', '#7D3C98'), name = "linear model")
diabetes_plot
For diabetes, we can see from the plot that there is an positive linear relationship and the regression model is \(Y = 0.5985X + 6.7551\). The positive relationship implies people with diabetes will be more likely to have mental health issues.
# Cancer
cancer_lm = lm(total_data$mental_health ~ total_data$cancer)
cancer_plot =
plot_ly(x = total_data$cancer, y = total_data$mental_health,
type = "scatter",
mode = "markers",alpha = .5, name = "distribution") %>%
add_lines(x = total_data$cancer, y = fitted(cancer_lm)) %>%
layout(title = 'Cancer', colorway = c('#A3E4D7', '#148F77'), name = "linear model")
cancer_plot
From the plot, we can also see a slightly negative association. For the relationship between mental health and cancer, p-value is 0.00149 < 0.05 so we can also conclude that the negative relationship is significant. The regression model is \(Y_i = -0.3275X + 14.7374\)
# Obesity
obe_lm = lm(total_data$mental_health ~ total_data$obesity)
obe_plot =
plot_ly(x = total_data$obesity, y = total_data$mental_health,
type = "scatter",
mode = "markers",alpha = .5, name = "distribution") %>%
add_lines(x = total_data$obesity, y = fitted(obe_lm)) %>%
layout(title = 'Obesity',colorway = c('#F9E79F', '#D4AC0D'), name = "linear model")
obe_plot
As for the obesity, we can also see a salient positive relationship and the linear regression model is \(Y = 0.26779X + 4.97108\). The positive relationship implies people with obesity will be more likely to have mental health issues.
#sleeping
sleep_lm = lm(total_data$mental_health ~ total_data$sleeping)
sleep_plot =
plot_ly(x = total_data$sleeping, y = total_data$mental_health,
type = "scatter",
mode = "markers",alpha = .5, name = "distribution") %>%
add_lines(x = total_data$sleeping, y = fitted(sleep_lm)) %>%
layout(title = 'Sleeping', colorway = c('#AED6F1', '#2E86C1'), name = "linear model")
sleep_plot
The sleeping variable in our model and plot represents the prevalence of sleeping less than 7 hours, we can see a positive linear relationship between mental health and sleeping which means the cities with more people sleeping less than 7 hours will be more likely to have mental health issues. And the model is \(Y_i = 0.33366X + 1.00842\)
From this plot we can see that the association between median household income and mental health is significant, i.e. there’s a trend that the higher the household income a state has, the less likely will people in this state have mental illness.
From this plot we can see that there is a significant association between penetration rate of high-school/undergraduate education and the prevalence of bad mental health. Higher penetration rate of high-school/undergraduate education indicates less probability to acquire mental disease. However, there is no significant association between proportion of advanced degress and mental health. We guess that it is because this proportion is too low to affect the statistics of whole population.
The most concering finding in this plot is that the proportion of black or African-American population is highly and positively related with the prevalence of bad mental health, which corresponds the fact of discrimination. In addition, the proportion of Indian or Asian is negatively related with the prevalence of bad mental health. These facts seem confusing. We guess that it might have been biased by other factors, such as income and education.
We used heatmap
and plotly
to draw the distribution of bad mental health in U.S. states and cities. The distribution of mental health was in similar pattern with that of sleepin less than 7 hours.
From the analysis, we gained some ideas about the cause of bad mental health.
In the risk factors that we were interested in, diabetes, obesity and sleeping less than 7 hours have positive relationship with bad mental health. This indicates that chronic diseas are more likely to cause mental health disease. The economic pressure and poor physical conditions caused by chronic diseas might lead to bad mood of the patients and eventually cause a bad mental health and even mental health disease.
Binge drinking, cancer and income are inversely associated with bad mental health. Drinking might be able to release stress, which can enable people to keep a good mood. Individuals who have high income may have less socioeconomic pressure and are more cared about their mental health conditions.
As for level of education,we can see from the results that a high rate of high-school and college education are inversely associated with bad mental health. Thus a higher education level indicates that the person has a better way to adjust the mental condition.
In the aspect of race. It can be discovered that asian and indian have negative relationship with bad mental health while the black has a positive relationship with bad mental health prevalence. This shows that racial discrimination experience that different populations have been through have an effect on the mental health conditions of people.