adamj9431 / parquet-from-object-storage

Repository from Github https://github.comadamj9431/parquet-from-object-storageRepository from Github https://github.comadamj9431/parquet-from-object-storage

##Access a Parquet Table From Object Storage in IBM Data Science Experience

This repository contains the supporting data and notebooks for this tutorial.

  • ParquetFromObjectStorage.ipynb is the notebook used in the tutorial.
  • csvToParquet.ipynb is a Python notebook that converts a subset of the GTFS-formatted transit data into a Parquet table. This resulting Parquet table is used as an example in the tutorial.

The input data I used is not included in the repository. The data is GTFS-formatted transit data from the MBTA (Massachusetts Bay Transit Authority). I used the Spring 2016, Version 4D data, available here.

About


Languages

Language:Jupyter Notebook 100.0%