AndreaHobby / Lung-Cancer-Screening-Project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lung Cancer Screening Utilization Project

Objective

This is a project to analyze lung cancer screening utilization.

Hypothesis

Step 1 Obtaining the data

The data came in 6 seperate files. I used prewritten code to merge the 6 files into 1 database/data set. Once the final dataset was ready, I removed variables that would not be used. Anything you removed due to missing data? Then, I checked the variable types in SAS 9.4.

Step 2 Data Cleaning and Manipulation

I checked for invalid character values and missing data. Then, I looked for out of range data. I checked for invalid numeric values and missing data. I checked to see if any data types need to be converted. Last, I checked the range for variables. I looked for duplicates and values that are repeating. The race variable had a signficant level of missing data but I kept it in the data set. Also, I created a format library and stored my format for the labeled crace variable.

Step 3 Data Modeling (need to update)

The continuous variables used for this study.....

The categorical variables used were sex, income, race, and education.

I wanted to learn the spread of the variables of interest that will be used in the model. I have created histogram that showed the skewness of the variables of interest.

Also, I ran descriptive statistics for my variables of interest. I checked the Mean, Medium, Mode and n for variables of interest.

Significance tests were used to identify the associations between each variable and the outcome. For categorical variables, the chi square test was used or Fisher exact test where appropriate. For continuous variables, the Mann-Whitney U test was used.

I chose to do a restricted cubic spline.

Conclusion

........

Source:

The data was obtained from BRFSS. Here are some of the references that I used to help me complete this project.

About