pervrosen / fastai-projects

Jupyter notebooks that use the Fastai library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fastai-projects

Jupyter notebooks that use the Fastai library

fastai v2.0

Faster than training from scratch — Fine-tuning the English GPT-2 in any language with Hugging Face and fastai v2 (practical case with Portuguese)

In this notebook (nbviewer version), instead of training from scratch, we will see how to fine-tune in just over a day, on one GPU and with a little more than 1GB of training data an English pre-trained transformer-based language model to any another language. As a practical case, we fine-tune to Portuguese the English pre-trained GPT-2 by wrapping the Transformers and Tokenizers libraries of Hugging Face into fastai v2. We thus create a new language model: GPorTuguese-2, a language model for Portuguese text generation (and more NLP tasks...).

Note: as the full notebook is very detailed, use this fast notebook (nbviewer version) if you just want to run the code without explanation.

GPorTuguese-2 (Portuguese GPT-2 small) , a language model for Portuguese text generation (and more NLP tasks...)

Byte-level BPE, an universal tokenizer but...

In this study, we will see that, while it is true that a BBPE tokenizer (Byte-level Byte-Pair-Encoding) trained on a huge monolingual corpus can tokenize any word of any language (there is no unknown token), it requires on average almost 70% of additional tokens when it is applied to a text in a language different from that used for its training. This information is key when it comes to choosing a tokenizer to train a natural language model like a Transformer model.

Distributed Data Parallel (DDP)

The script 05_pet_breeds_DDP.py gives the code to run for training a Deep Learning model in Distributed Data Parallel (DDP) mode with fastai v2. It is inspired by the notebook 05_pet_breeds.ipynb from the fastbook (fastai v2), the Distributed and parallel training fastai v2 documentation and the notebook train_imagenette.py.

In order to get it run, you need to launch the following command within a fastai 2 virtual environment in a Terminal of a server with at least 2 GPUs:

python -m fastai2.launch 05_pet_breeds_DDP.py

Data Parallel (DP)

The notebook 05_pet_breeds_DataParallel.ipynb (nbviewer version) gives the code to run for training a Deep Learning model in Data Parallel (DP) mode with PyTorch and fastai v2. It is inspired by the notebook 05_pet_breeds.ipynb from the fastbook (fastai v2), the Distributed and parallel training fastai v2 documentation and the notebook train_imagenette.py.

How to create groups of layers and each one with a different Learning Rate?

The objective of this notebook (nbviewer version) is to explain how to create parameters groups for a model with fastai v2 in order to train each one with a different learning rate, how to pass the list of Learning rates and how to check the Learning Rates effectively used by the Optimizer during the training.

How fastai v2 deals with batch sizes for the training and validation datasets

The objective of this notebook is to explain how fastai v2 deals with batch sizes for the training and validation datasets.

Comparison of sizes of learn.export() files by batch size

The objective of this notebook is to show that the sizes of pkl files created by learn.export() of fastai v2 are different depending on the batch size used. This is odd, no?

fastai v1.0

Aplicação template para fazer deploy de modelos fastai para um Web App

Este repositório pode user usado como ponto de partida para fazer deploy de modelos do fastai no Heroku.

A aplicativo simples descrito aqui está em https://glasses-or-not.herokuapp.com/. Teste com imagens de você com e sem oculos!

Este é um tutorial rápido para fazer o deploy no Heroku dos seus modelos treinados com apenas alguns cliques. Ele vem com este repositório template que usa o modelo de Classificação de Ursos do Jeremy Howard da lição 2.

Images | Reduction of images channels to 3 in order to use the normal fastai Transfer Learning techniques

This notebook lesson1-pets_essential_with_xc_to_3c.ipynb (nbviewer) shows how to modify learner.py to a new file learner_xc_to_3c.py (learner x channels to 3 channels) to put a ConvNet in a fastai cnn_learner() before the pre-trained model like resnet (followed by a normalization by imagenet_stats).

This ConvNet as first layer allows to transform any images of the dataloader with n channels to an image with 3 channels. During the training, the filters of this ConvNet as first layer will be learnt. Thanks to that, it is possible to go on using fastai Transfer Learning functions even with images with more than 3 channels RGB like satellite images.

Warning As the Oxford IIIT Pet dataset already has 3 channels by image, there is no need here to change this number of channels. We only used this dataset to create our code. However, it would be more interesting to apply this code to images with more than 3 channels like images with 16 channels of the Dstl Satellite Imagery Feature Detection.

NLP | Platform independent python scripts for fastai NLP course

Following our publication of the WikiExtractor.py file which is platform-independent (ie running on all platforms, especially Windows), we publish our nlputils2.py file, which is the platform-independent version of the nlputils.py file of the fastai NLP course (more: we have split the original methods into many to use them separately and we have added one that cleans a text file).

[ EDIT 09/23/2019 ]

NLP | Platform independent python script for Wikipedia text extraction

The extraction script WikiExtractor.py does not work when running fastai on Windows 10 because of the 'utf-8' encoding that is platform-dependent default in the actual code of the file.

Thanks to Albert Villanova del Moral that did the pull request "Force 'utf-8' encoding without relying on platform-dependent default" (but not merged until now (31st of August, 2019) by the script author Giuseppe Attardi), we know how to change the code. Thanks to both of them!

Links:

Vendedor IA | Ajudando vendedores de Brasal Veículos (text in Portuguese)

O Hackathon Brasal/PCTec-UnB 2019 foi uma maratona de dados (dias 9 e 10 de maio de 2019), que reuniu estudantes, profissionais e comunidade, com o desafio de em dois dias, realizaram um projeto de Bussiness Intelligence para um cliente real: Brasal Veículos. Aconteceu no CDT da Universidade de Brasília (UnB) no Brasil. Nesse contexto, minha equipe desenvolveu o projeto "Vendedor IA" (VIA), um conjunto de modelos de Inteligência Artificial (IA) usando o Deep Learning cujo princípio é descrito nos 2 jupyter notebooks que foram criados:

  1. Data clean (vendas_veiculos_brasal_data_clean.ipynb): é o notebook de preparação da tabela de dados de vendas para treinar os modelos do VIA.
  2. Regressão (vendedor_IA_vendas_veiculos_brasal_REGRESSAO.ipynb): é o notebook de treinamento do modelo que fornece o orçamento que o cliente está disposto a gastar na compra de um veículo.

MURA abnormality detection

The objective of the jupyter notebook MURA | Abnormality detection is to show how the fastai v1 techniques and code allow to get a top-level classifier in the world of health. [ NEW ] We managed to increase our kappa score in this notebook (part 2).

ImageNet Classifier Web App

[ EDIT 06/11/2019 ] This Web app is not online anymore. If you want to deploy it on Render, check the "Deploying on Render" fastai guide.

It is an images classifier that use the Deep Learning model resnet (the resnet50 version) that won the ImageNet competition in 2015 (ILSVRC2015). It classifies an image into 1000 categories.

Pretrained ImageNet Classifier by fastai v1

The objective of the jupyter notebook pretrained-imagenet-classifier-fastai-v1.ipynb is to use fastai v1 instead of Pytorch code in order to classify images into 1000 classes by using an ImageNet winner model.

Data Augmentation by fastai v1

The jupyter notebook data-augmentation-by-fastai-v1.ipynb presents the code to apply transformations on images with fastai v1.

fastai version BEFORE v1.0

Lesson 1 (part 1) : CatsDogs, the quick way

The jupyter notebook lesson1-quick.ipynb is an exercise that was proposed on 17/04/2018 & 21/04/2018 to the participants of the Deep Learning study group of Brasilia (Brazil). Link to the thread : http://forums.fast.ai/t/deep-learning-brasilia-revisao-licoes-1-2-3-e-4/14993

Lesson 1 (part 1) : DogBreeds

The jupyter notebook lesson1-DogBreed.ipynb is an exercise that was proposed on 17/04/2018 & 21/04/2018 to the participants of the Deep Learning study group of Brasilia (Brazil). Link to the thread : http://forums.fast.ai/t/deep-learning-brasilia-revisao-licoes-1-2-3-e-4/14993

How to make predictions on the test set when it was not initially given to the data object

https://github.com/piegu/fastai-projects/blob/master/howto_make_predictions_on_test_set

Mastercard or Visa ? Image classification with Transfer Learning and Fastai

https://github.com/piegu/fastai-projects/blob/master/mastercard_visa_classifier_fastai_resnet34_PierreGuillou_16july2018.ipynb

About

Jupyter notebooks that use the Fastai library


Languages

Language:Jupyter Notebook 99.4%Language:Python 0.5%Language:CSS 0.0%Language:HTML 0.0%Language:JavaScript 0.0%Language:Dockerfile 0.0%