countkisg / pystock-data

US stock market data since 2009

Home Page:http://data.pystock.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

US Stock Data

Historical data of US stocks, updated daily.

How Data are Collected

The data are collected using pystock-crawler and pystock-github. Every day before the US stock exchanges open at 9:30 EST/EDT, the crawler collects the stock prices and financial reports, and pushes the data to this repository. Thus, the latest data is of yesterday. The data are day-based, meaning that you won't find hourly or minute-level data here.

pystock-crawler crawls data from the following sources:

Directory Layout

All data are stored in CSV and TXT files, archived with gzip. The files are categorized and named by their created dates. For example, a file collected on 2015-03-23 is named 20150323.tar.gz placed under 2015 directory.

Initial data are the first batch of collected data, whose date range spans from 2009-01-01 to 2015-03-20. They are split into three files: 0001_initial.tar.gz to 0003_initial.tar.gz, under 2015 directory.

Every gzip archive file may or may not contain the following files:

  • symbols.txt
  • prices.csv
  • reports.csv

symbols.txt is a list of companies, line by line. For example:

AAPL	Apple Inc.
FB	Facebook
GOOGL	Google Inc.

prices.csv contains daily prices in CSV format. Normally, in a daily-generated file, two trading days of prices are included. This is because when a company splits its shares, you will need to compare the close price (close) and the adjusted close price (adj_close) of the previous trading day to detect the split. See Yahoo's explaination for more details.

Here's a sample of prices.csv:

symbol,date,open,high,low,close,volume,adj_close
AAPL,2015-03-23,127.12,127.85,126.52,127.21,36761000,127.21
AAPL,2015-03-20,128.25,128.40,125.16,125.90,67941100,125.90

reports.csv contains several financial metrics extracted from 10-Q or 10-K reports. 10-Q is a quarterly report. 10-K is an annual report. The CSV file looks like:

symbol,end_date,amend,period_focus,fiscal_year,doc_type,revenues,op_income,net_income,eps_basic,eps_diluted,dividend,assets,cur_assets,cur_liab,cash,equity,cash_flow_op,cash_flow_inv,cash_flow_fin
GOOG,2009-06-30,False,Q2,2009,10-Q,5522897000.0,1873894000.0,1484545000.0,4.7,4.66,0.0,35158760000.0,23834853000.0,2000962000.0,11911351000.0,31594856000.0,3858684000.0,-635974000.0,46354000.0
GOOG,2009-09-30,False,Q3,2009,10-Q,5944851000.0,2073718000.0,1638975000.0,5.18,5.13,0.0,37702845000.0,26353544000.0,2321774000.0,12087115000.0,33721753000.0,6584667000.0,-3245963000.0,74851000.0
GOOG,2009-12-31,False,FY,2009,10-K,23650563000.0,8312186000.0,6520448000.0,20.62,20.41,0.0,40496778000.0,29166958000.0,2747467000.0,10197588000.0,36004224000.0,9316198000.0,-8019205000.0,233412000.0

The financial metrics included are:

  • end_date: the end date of the period of the filing report
  • amend: is the filing an amendment?
  • period_focus: Q1, Q2, Q3 for quarterly reports, or FY for annual reports
  • fiscal_year: fiscal year of the company
  • doc_type: 10-Q or 10-K

Income statement

  • revenues: revenues or sales
  • op_income: operating income
  • net_income: net income or net earnings
  • eps_basic: basic earnings per share
  • eps_diluted: diluted earnings per share
  • dividend

Balance sheet

  • assets: total assets
  • cur_assets: current assets
  • cur_liab: current liabilities
  • cash: cash and cash equivalents
  • equity: total equity

Cash flow

  • cash_flow_op: cash from operating activities
  • cash_flow_inv: cash from investing activities
  • cash_flow_fin: cash from financing activities

About

US stock market data since 2009

http://data.pystock.com

License:Other


Languages

Language:HTML 100.0%