caifand / DMRL_THA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DMRL_THA

This repository is dedicated to all the class assignments completed in the course Data Management and Research Cycle, Spring 2019.

Files in the repo

  1. THA1 contains sample data compiled from three source datasets. These datasets are mainly about open access journals and journal rankings. In response to THA1, I also created a notebook file documenting profiles of these datasets and the sample compilation process. See THA1.ipynb.

  2. THA2 contains a tentative proposal for the final project. The proposal is still in continuous evolution.

  3. THA3 contains three files: an input .csv data file, a data dictionary file (in .csv format), and an output word document.

  4. THA4 contains a folder and 4 files. The folder input includes 2 original data files in csv format. The workflow is instantiated in the notebook file THA4_dcf_apr26.ipynb. An abstract visual depction of the workflow can be found as THA4_Workflow.png. One output of the workflow, a merged dataset, is contained in an csv file named THA4_output.csv. The variables in the output dataset is described in detail in THA4_DataDict.csv.

  5. FinalPaper contains all the input, intermediate, and output files for the final project. Here is the detailed file list, with nested bullet points indicating the tree structure of the files:

    • 0_data_preprocessing includes all the files used for Step 0: data preprocessing.
      All the raw datasets can be found in 0_raw.
      The code for data preprocessing can be found in 1_code.
      2 preprocessed datasets and a data dictionary sit in 2_output.
    • 1_analysis contains a R script for Step 1: data analysis.
    • 2_draft contains 2 deliverables for the final project: paper proposal & final paper.
    • 3_presentation contains slides for the final presentation.
    • A workflow chart can also be found in the FinalPaper folder.

About


Languages

Language:Jupyter Notebook 100.0%