raskle / getting-started

Code and Data for Getting Started documentation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Getting Started

This repository contains the dataset and sample code for the Getting Started section of Pilosa documentation.

The Dataset

The sample dataset contains stargazer and language data for Github projects which were retrieved for the search keyword "Go". See the Generating the Dataset section below to create other datasets.

  • language.txt: Language name to languageID mapping. The line number corresponds to the languageID.
  • language.csv: languageID, projectID
  • stargazer.csv: stargazerID, projectID, timestamp(starred)

Usage

  1. Pilosa server should be running: Starting Pilosa
  2. The appropriate schema should be initialized: Create the Schema
  3. Finally, the data can be imported: Import Some Data

Sample Projects

Generating the Dataset

fetch.py script searches Github for a given keyword, and creates the dataset explained in The Dataset section.

Using a Github token is strongly recommended for avoiding throttling. If you don't already have a token for the GitHub API, see Creating a personal access token for the command line.

A recent version of Python is required. We test the script with 2.7 and 3.5.

Below are the steps to run fetch.py:

  1. Create a virtual env:
    • Using Python 2.7: virtualenv getting-started
    • Using Python 3.5: python3 -m venv getting-started
  2. Activate the virtual env:
    • On Linux, MacOS, other UNIX: source getting-started/bin/activate
    • On Windows: getting-started\Scripts\activate
  3. Install requirements: pip install -r requirements.txt
  4. If you have a Github token, save it as token in the root directory of the project.
  5. Run the script: python fetch.py KEYWORD

About

Code and Data for Getting Started documentation

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Python 40.8%Language:Java 35.8%Language:Go 23.3%