Git Regional Languages
More famous languages may be written everywhere in the world, yet the languages with lesser users may probably tend to be epidemic or popular in some particular regions. Let's find out.
Getting started
Prerequisites:
- Github API access token
- GoogleMap API key
- MongoDB
- Python 3.4+
- Node.js 4.x+
Since it uses github API,
you'll need to supply your access token.
Create a text file named API-TOKEN
in the root directory
of this repository and put your Github login ID
and your access token in the following line like so:
johndoeagain
f00ba478293cbbae0a6552
Also, GoogleMap API key is also required for geocoding
API access. You'll need to generate your own API key
can place it in a text file named GOOGLE-API-KEY
,
put it in the root directory of the repo.
Prepare your MongoDB
To set your data storage, create a DB named gitlang
.
Don't need to create any initial collection because
the script will do it for you.
CAVEAT It really requires a huge capacity to store the data and you can make up to 5000 requests in a span of one hour.
Download GitHub data
Run the following script to download a batch of repository data via Github API:
$ python3 download.py
NOTE: Adjust the number of batch, the range of repos to download
right in the file download.py
yourself.
The script will download the batch repository data and
store them in the MongoDB.
DB/Collection | data storage |
---|---|
gitlang/repos | Github repositories data with the location of owners |
Process the repositories
We have a script to process the entire bulk of downloaded repository data in MongoDB, generate the geospatial distribution of languages written in each repository. Execute the following script:
$ node process.js
Why Node?
In case a question has popped up in your mind, Node.js natively communicates with
MongoDB
and JavaScript works well withGoogleMAP API
. That's why.
What process.js does?
Following tasks are sequentially run:
-
Maps repository data from the collection
repos
to distributions of languages by regions, store them in the collectiondistLangByRegion
and list all regions inregions
. -
Fetches Geolocations of all regions of repos.
-
Saves all Geolocations to MongoDB: collection
geo
. -
Aligns all languages available in
distLangByRegion
with the geospatial distribution data. Maps the output in to the JS file underhtml\js\dist.js
. -
Saves the GoogleAPI Key to
html/js/gapi.js
.