CenterForSpatialResearch / hnyc_street_dictionary

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Instructions

Before running the scripts

  • Create a new folder "Data" under this "script" folder.
  • All dataset inputs and outputs should be placed under the "Data" folder.
  • Latest versions of datasets could be find in the shared google drive folder.

Raw dataset inputs

  • Morse dictionary
    • MN_MORSE_EDstreet_dict_1910.csv
    • BK_MORSE_EDstreet_dict_1910.csv
  • Geo-coder segment info
    • mn_segments_1910export.csv
    • bk_segments_1910export.csv

Clean function

clean_function.R under "lib" folder

Scripts (in accordance with the pipeline construction)

  • clean_morse.rmd clean morse dictionary
  • clean_segment_create_geo_dict.rmd clean segment data, create geo_dict
  • Dictionary
    • full_dict.rmd merge morse dict and geo_dict
    • missing_EDs.rmd check missing EDs in geo_dict from Morse
  • segment info
    • combine_morse_segment.rmd merge morse ED and segment info into segment datasets
    • house_range_segment_flags.rmd summarize house range by per ED per segment, flag inconsistent and incomplete house ranges
    • nested data also available in google drive; nest combined morse and segment data by ED

Output datasets (available in google drive)

https://drive.google.com/drive/folders/1N7S7l0XX8y4eYUTmI3AZlwO6XpCPhAXn

  • morse dict
    • morse_mn1910.csv
    • morse_bk1910.csv
  • cleaned segment data
    • segment_mn.csv
    • segment_bk.csv
  • geo_dict
    • geo_dict_mn.csv
    • geo_dict_bk.csv
  • full_dict
    • full_dict_mn.csv
    • full_dict_bk.csv
  • missing EDs
    • missing_EDs_mn.csv
    • missing_EDs_bk.csv
  • combined segment info with morse
    • combine_mn.csv
    • combine_bk.csv
  • exported nested data
    • mn_nest.rds
    • bk_nest.rds
  • segment house range
    • mn_seg_add_range.csv
    • bk_seg_add_range.csv
  • flag out error house numbers
    • mn_seg_flag.csv
    • bk_seg_flag.csv

About


Languages

Language:HTML 99.5%Language:R 0.5%