huggingface / jat

General multi-task deep RL Agent

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add image tokenizer and patch position encoding

qgallouedec opened this issue · comments

Images are first transformed into sequences of non-overlapping 16 × 16 patches in raster order, as done in ViT (Dosovitskiy et al., 2020). Each pixel in the image patches is then normalized between [−1, 1] and divided by the square-root of the patch size (i.e. √16 = 4).

Usage should look like:

import numpy as np
import torch

from gia.model.embedding import Embeddings
from gia.model.tokenization import Tokenizer

# Define tokenizer and embedding layer
tokenizer = Tokenizer()
embedding_layer = Embeddings(embedding_dim=32)

# Load dataset (100k samples)
# First, clone it with `git clone https://huggingface.co/datasets/edbeeching/prj_gia_dataset_atari_2B_atari_yarsrevenge_1111`
dataset = np.load("prj_gia_dataset_atari_2B_atari_yarsrevenge_1111/dataset.npy", allow_pickle=True)

# Convert numpy object to dict. Keys are ['observations', 'actions', 'dones', 'rewards']
dataset = dataset.item()
observations = torch.from_numpy(dataset["observations"]).squeeze(1)  # TODO: remove sqeeze when fixed in dataset
actions = torch.from_numpy(dataset["actions"]).to(torch.int64)  # TODO: remove when fixed in dataset

# Tokenize and embed
tokens = tokenizer(images=observations, actions=actions)
print(tokens.shape)  # torch.Size([100000, K])
embeddings = embedding_layer(tokens)
print(embeddings.shape)  # torch.Size([100000, K, 32])