wenjzh / Blood-Pressure-by-Gender-using-LASSO

R and Stata - Which predictor variables for blood pressure differ the greatest between males and females? - Computational Methods and Tools in Statistics: Final Project - Umich STATS 506 - 2019 Fall

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

STATS 506 Group Project F19

Author: Group 2

Diana Liang (STATA)

Sijun Zhang (dplyr_glmnet)

Wenjing Zhou (data.table_customized_cross-validation)

Navigation

Report

The completed report in .html and .Rmd files are stored in current folder.

Data Sets

We’re using a combination of 4 datasets (Demographics, Blood Pressure, Total Nutrients day 1, Total Nutrients day 2) from the 2015-2016 NHANES, which are stored in DATA folder.

Code Reviews

We are working on both the issues and commit code reviews. Here some instances about code reviews README.md, STATA_LASSO_penalty, README.md_formula_editing and Interactions_glmnet.

Scripts

The final scripts path are linked in the Author part and you can find running guidence in each folder's readme.md.

Overview

We choose to investigate whether both genders will react the same under the the effect of consumption habits on blood pressure, a known symptom of different chronic diseases. Will the factors that are most important in determining blood pressure be different between males the females?

Whether both genders will react the same under the the effect of consumption habits on blood pressure

We will show in the following analysis that there are certain foods that affect one gender more than the other, and that these foods change depending on the type of blood pressure being measured.

Method: LASSO with customized penalty.factor

Since our purpose is to find how gender effect the relationship between the nutrition intake and the blood pressure measurement, we only penalize the interaction terms by setting the penalty.factor of the interaction terms with a same positive value and let other terms' zero. The explict form for LASSO minizing goal is shaped to

where is the penalty factor we used for each term

Requirements

To run the group_2_final.Rmd file in Draft folder, the following packages should be pre-installed in the IDE.

Version Package
3.6.1 dplyr
3.6.1 ggplot2
2.1.3 tibble
1.3.1 readr
1.0.0 tidyr
1.4.0 stringr
0.4.0 forcats
4.3-0 Hmisc
1.6.0 SASxport
2.0-18 glmnet
1.12.2 data.table
0.4.0 Statamarkdown
1.1.0 kableExtra
x.x.x doMC

doMC is only available in UNIX-like System. The installation command has been embedded in the group_2_final.Rmd file, thus the UNIX-like System user can knit the group_2_final.Rmd directly after meeting the requirements other than doMC.

Install Statamarkdown

As Statamarkdown doesn't support direct downloading using install.packages(), we can use the devtools package to install it from github.com.

library(devtools) # before this you may need to install devtools
install_github("hemken/Statamarkdown")

If for some reason that gives you problems, you can also install from this website

# For Windows
install.packages("https://www.ssc.wisc.edu/~hemken/Stataworkshops/Stata%20and%20R%20Markdown/Statamarkdown_0.3.9.zip", repos=NULL)

# For linux or Mac
install.packages("https://www.ssc.wisc.edu/~hemken/Stataworkshops/Stata%20and%20R%20Markdown/Statamarkdown_0.3.9.tar.gz", type="source", repos=NULL)

About

R and Stata - Which predictor variables for blood pressure differ the greatest between males and females? - Computational Methods and Tools in Statistics: Final Project - Umich STATS 506 - 2019 Fall


Languages

Language:HTML 99.3%Language:R 0.4%Language:Stata 0.3%