My praises to Google

For already a while Google Colab is providing free Tesla K80 GPU to those who are interested in ~~mining~~ deep learning. Again, GPU for free, yay!

Сonnecting to your google drive from Colab

Though Jupyter Notebooks are now familiar to most of the people programming in Python, the workflow on Colab may be unfamiliar to those who are not used to work on virtual machines.

One cannot just start working in Colab Notebook as they used to be working on local machine.

My first mistake was to assume that if the notebook is being created on my google drive I can access the files on the drive in the same manner as I do when working locally:

# List files from current working directory
!ls

See? Brand new and clean current working directory. With Google Colab you get a clear virtual machine for 12 hours and it has nothing to do with neither your local files nor google drive where notebook is being saved.

Moreover, google drive does not have the hierarchical structure of folders by default. From https://developers.google.com/drive/v3/web/about-files :

Each file is identified by a unique opaque ID. File IDs are stable throughout the life of the file, even if the file name changes.

Files in Drive can not be directly addressed by their path. Search expressions are used to locate files by name, type, content, parent container, owner, or other metadata.

If you want to have old-fashioned way of working with folders on the drive (e.g. to access data or to save models/submission), do the following routine each time you start a notebook on google colab (taken from https://www.kaggle.com/getting-started/47096#273889):

from google.colab import drive
drive.mount("/content/drive")

print('Files in Drive:')
!ls /content/drive/'My Drive'

Newly created drive folder is now lying besides other folders that you might create during this session on this virtual machine. After 12 hours it will all ~~become a pumpkin~~ be deleted.

Working with folders

New project

Now you can start creating folders for the project if it's a new one, e.g.:

# Create directories for the new project
!mkdir -p drive/kaggle/talkingdata-adtracking-fraud-detection

!mkdir -p drive/kaggle/talkingdata-adtracking-fraud-detection/input/train
!mkdir -p drive/kaggle/talkingdata-adtracking-fraud-detection/input/test
!mkdir -p drive/kaggle/talkingdata-adtracking-fraud-detection/input/valid

Existing project

If you already have an existing project on github you can clone it to Colab (here you also need to decide if you want to clone it just to your VM or you want it to be on your drive as well)

# Clone to the folder on google drive to have it after 12 hours
%cd drive/kaggle 
!git clone https://github.com/wxs/keras-mnist-tutorial.git

Import modules

The installed packages can be imported as usual with import pandas as pd
If you need to load some helper script (*.py file that has a bunch of uuseful functions for the project), it can be done with the following snippet:

import imp 
helper = imp.new_module('helper')
exec(open("drive/path/to/helper.py").read(), helper.__dict__)

fc_model = imp.new_module('fc_model')
exec(open("pytorch-challenge/deep-learning-v2-pytorch/intro-to-pytorch/fc_model.py").read(), fc_model.__dict__)

You can replace helper name with any other, but keep it consistent.

Downloading and unzipping data [pytorch-challenge]

!wget https://s3.amazonaws.com/content.udacity-data.com/nd089/Cat_Dog_data.zip 
!unzip Cat_Dog_data.zip

Installing PyTorch [pytorch-challenge]

# http://pytorch.org/
from os import path
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())

accelerator = 'cu80' if path.exists('/opt/bin/nvidia-smi') else 'cpu'

!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.4.0-{platform}-linux_x86_64.whl torchvision

Installing Kaggle API in Colab

Spoiler: pip install kaggle is not enough, though you have to start with it

Install the API

!pip install kaggle

Get API credentials and put it to `.kaggle` folder

Sign-up or sign-in to the kaggle account at https://www.kaggle.com.
Go to 'Account' and click on 'Create API Token' to download kaggle.json with credentials
Drag and drop it to your google drive and run the following script:

import io, os
from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload

auth.authenticate_user()

drive_service = build('drive', 'v3')
results = drive_service.files().list(
        q="name = 'kaggle.json'", fields="files(id)").execute()
kaggle_api_key = results.get('files', [])

filename = "/content/.kaggle/kaggle.json"
os.makedirs(os.path.dirname(filename), exist_ok=True)

request = drive_service.files().get_media(fileId=kaggle_api_key[0]['id'])
fh = io.FileIO(filename, 'wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download %d%%." % int(status.progress() * 100))
os.chmod(filename, 600)

Use the API to e.g. download a dataset

!kaggle competitions download -c [name-of-the-competition]

In this case datasets won't appear in your google drive, they will only be on the VM (and removed after 12 hours) in .kaggle/competitions/[name-of-the-competition] folder. One can specify folder on the mounted drive to still have it after 12 hours.

To be continued...

nataliasverchkova / on-using-google-colab

My praises to Google

Сonnecting to your google drive from Colab

One cannot just start working in Colab Notebook as they used to be working on local machine.

Working with folders

New project

Existing project

Import modules

Downloading and unzipping data [pytorch-challenge]

Installing PyTorch [pytorch-challenge]

Installing Kaggle API in Colab

Install the API

Get API credentials and put it to `.kaggle` folder

Use the API to e.g. download a dataset

About

My praises to Google

Сonnecting to your google drive from Colab

One cannot just start working in Colab Notebook as they used to be working on local machine.

Working with folders

New project

Existing project

Import modules

Downloading and unzipping data [pytorch-challenge]

Installing PyTorch [pytorch-challenge]

Installing Kaggle API in Colab

Install the API

Get API credentials and put it to .kaggle folder

Use the API to e.g. download a dataset

About

Get API credentials and put it to `.kaggle` folder