Error in Handling Heterogeneous Semantic Types (in documentation)
preritt opened this issue · comments
preritt commented
The following code in the example documentation produces the error below. Somehow arrays are not of the same length. Can you please look into this?
import random
import numpy as np
import pandas as pd
# Numerical column
numerical = np.random.randint(0, 100, size=10)
# Categorical column
simple_categories = ['Type 1', 'Type 2', 'Type 3']
categorical = np.random.choice(simple_categories, size=100)
# Timestamp column
time = pd.date_range(start='2023-01-01', periods=100, freq='D')
# Multicategorical column
categories = ['Category A', 'Category B', 'Category C', 'Category D']
multicategorical = [
random.sample(categories, k=random.randint(0, len(categories)))
for _ in range(100)
]
# Embedding column (assuming an embedding size of 5 for simplicity)
embedding_size = 5
embedding = np.random.rand(100, embedding_size)
# Create the DataFrame
df = pd.DataFrame({
'Numerical': numerical,
'Categorical': categorical,
'Time': time,
'Multicategorical': multicategorical,
'Embedding': list(embedding)
})
Error:
"Mixing dicts with non-Series may lead to ambiguous ordering."
ValueError: All arrays must be of the same length
preritt commented
I found a minor error in the documentation example
numerical = np.random.randint(0, 100, size=10)
->
numerical = np.random.randint(0, 100, size=100)