Gulsah-G / dataprep-py-r

The same data cleaning and wrangling task is executed in both Python and R to demonstrate the equivalent code structures (Data: public-use PIAAC)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data pre-processing: Python vs R dplyr

This repository contains both a Python notebook and an R script for the same data cleaning and wrangling task to demonstrate the equivalent code structures in these two languages. Pre-processing task includes but not limited to:

  • Reading in .sav data files
  • Dealing with labelled data and value labels
  • Basic frequency tables
  • Filtering by group
  • Removing missing data
  • Creating new variables or recoding them into the same ones
  • Calculating group-centered/scaled variables
  • Removing outliers based on within-group quartiles
  • Replacing missing values with group means
  • Exporting data into csv

Data used: The U.S. public-use PIAAC data (2012-2014) (https://nces.ed.gov/surveys/piaac/datafiles.asp)

About

The same data cleaning and wrangling task is executed in both Python and R to demonstrate the equivalent code structures (Data: public-use PIAAC)


Languages

Language:Jupyter Notebook 97.3%Language:R 2.7%