rozim / ChessData

PGN Mirror

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ChessData

PGN Mirror. There will be dups, dirty data, errors, GM draws etc -- the data will probably need to be post-processed, filtered, deduped etc.

In the news:

Command-line tools can be 235x faster than your Hadoop cluster

The first thing to do is get a lot of game data. This proved more difficult than I thought it would be, but after some looking around online I found a git repository on GitHub from rozim that had plenty of games. I used this to compile a set of 3.46GB of data, which is about twice what Tom used in his test. The next step is to get all that data into our pipeline

About

PGN Mirror


Languages

Language:C++ 72.5%Language:Python 16.4%Language:C 7.9%Language:Shell 2.7%Language:Makefile 0.4%