DMRL_THA
This repository is dedicated to all the class assignments completed in the course Data Management and Research Cycle, Spring 2019.
Files in the repo
-
THA1 contains sample data compiled from three source datasets. These datasets are mainly about open access journals and journal rankings. In response to THA1, I also created a notebook file documenting profiles of these datasets and the sample compilation process. See THA1.ipynb.
-
THA2 contains a tentative proposal for the final project. The proposal is still in continuous evolution.
-
THA3 contains three files: an input .csv data file, a data dictionary file (in .csv format), and an output word document.
-
THA4 contains a folder and 4 files. The folder input includes 2 original data files in csv format. The workflow is instantiated in the notebook file THA4_dcf_apr26.ipynb. An abstract visual depction of the workflow can be found as THA4_Workflow.png. One output of the workflow, a merged dataset, is contained in an csv file named THA4_output.csv. The variables in the output dataset is described in detail in THA4_DataDict.csv.
-
FinalPaper contains all the input, intermediate, and output files for the final project. Here is the detailed file list, with nested bullet points indicating the tree structure of the files:
- 0_data_preprocessing includes all the files used for Step 0: data preprocessing.
All the raw datasets can be found in 0_raw.
The code for data preprocessing can be found in 1_code.
2 preprocessed datasets and a data dictionary sit in 2_output. - 1_analysis contains a R script for Step 1: data analysis.
- 2_draft contains 2 deliverables for the final project: paper proposal & final paper.
- 3_presentation contains slides for the final presentation.
- A workflow chart can also be found in the FinalPaper folder.
- 0_data_preprocessing includes all the files used for Step 0: data preprocessing.