radekosmulski / nvt_op_examples

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NVT OP examples

This repository will contain examples of using nvtabular ops to apply preprocessing to your data.

Ops with examples right now:

  1. Categorify (twitter thread)
  2. JoinExternal (twitter thread)
  3. HashBucket (twitter thread)
  4. Clip (twitter thread)
  5. LogOp (twitter thread)
  6. TargetEncoding (twitter thread)
  7. Filter (twitter thread)
  8. ReduceDtypeSize (twitter thread)
  9. AddMetadata (twitter thread)
  10. Bucketize (twitter thread)
  11. DifferenceLag (twitter thread)
  12. FillMissing (twitter thread)
  13. FillMedian (twitter thread)
  14. HashedCross (twitter thread)
  15. Rename (twitter thread)
  16. LambdaOp

nvtabular to me is the toolset of the future. It

  • abstracts away your hardware (you can process your data on equipment with varying amount of CPU and GPU RAM, you can read your data from various sources)
  • speeds up the processing pipeline (GPUs 🔥🔥🔥)
  • has a lot of functionality expertly coded to include best practices (some of the ops are really powerful and unlike anything you will find in other libraries)

This repository will document my journey as I learn nvtabular.

Running the examples

  1. Have docker installed (see "Getting started with docker below")
  2. From the root of the repository, run ./start_docker_container.
  3. Navigate to http://localhost:8888 in your browser.

Getting started with docker

Whether we like it or not, docker is becoming a big piece of data science work. My history of using docker is riddled with suffering, but with time there are actually aspects of docker that I am starting to enjoy.

If you follow along with the work in this repository, you will get up to speed with using docker the way I feel it can be used for cutting edge data science work.

Below are instructions to get started.

  1. Install docker. (I use docker on ubuntu server and windows subsystem for linux, native GPU support is really nice!)
  2. You might want to be able to use docker as a non-root user. Do note: this comes with security risks.
  3. The docker tutorial is exquisite. It only tells you a part of the story thought, but the part it tells you it tells really well.
  4. Here are the missing pieces
  5. Equipped with all this information, the only other missing piece of the puzzle is the command to start a docker container. The start_docker_container in this repository contains the most commonly required crucial bits.

Misc

About


Languages

Language:Jupyter Notebook 92.4%Language:Shell 7.6%