google-research / reverse-engineering-neural-networks

A collection of tools for reverse engineering neural networks.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Simplify dataset code

nirum opened this issue · comments

Currently, the datasets.py module has a number of helper functions for loading data. However, these functions are a mishmash of multiple responsibilities:

  • Loading raw data, from either TFDS or csv files
  • Tokenization
  • Getting inputs / labels / index into a standard format
  • Apply custom filters/transformations
  • Batching
  • Caching / Shuffling

It might make sense to refactor the code a bit to more cleanly express this pipeline, and do so in a way that lets users customize it efficiently. Also, that way it's easier to reason about what custom filters/transformations are doing.