pyg-team / pytorch-frame

Tabular Deep Learning Library for PyTorch

Home Page:https://pytorch-frame.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error in Handling Heterogeneous Semantic Types (in documentation)

preritt opened this issue · comments

The following code in the example documentation produces the error below. Somehow arrays are not of the same length. Can you please look into this?

import random

import numpy as np
import pandas as pd

# Numerical column
numerical = np.random.randint(0, 100, size=10)

# Categorical column
simple_categories = ['Type 1', 'Type 2', 'Type 3']
categorical = np.random.choice(simple_categories, size=100)

# Timestamp column
time = pd.date_range(start='2023-01-01', periods=100, freq='D')

# Multicategorical column
categories = ['Category A', 'Category B', 'Category C', 'Category D']
multicategorical = [
    random.sample(categories, k=random.randint(0, len(categories)))
    for _ in range(100)
]

# Embedding column (assuming an embedding size of 5 for simplicity)
embedding_size = 5
embedding = np.random.rand(100, embedding_size)

# Create the DataFrame
df = pd.DataFrame({
    'Numerical': numerical,
    'Categorical': categorical,
    'Time': time,
    'Multicategorical': multicategorical,
    'Embedding': list(embedding)
})

Error:

 "Mixing dicts with non-Series may lead to ambiguous ordering."
ValueError: All arrays must be of the same length

I found a minor error in the documentation example

numerical = np.random.randint(0, 100, size=10) ->
numerical = np.random.randint(0, 100, size=100)