delip / PyTorchNLPBook

Code and data accompanying Natural Language Processing with PyTorch published by O'Reilly Media https://amzn.to/3JUgR2L

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Chapter 03 Yelp Dataset has a Typo

amancioandre opened this issue · comments

Hi everyone,

Chapter 3 does not load Yelp data due to a typo on the last line of the dataset:

Line Review
73357: "1","Capital City Transfer han

Using nrows argument passing the number of rows - 1, fixed for me.

train_reviews = pd.read_csv(args.raw_train_dataset_csv, header=None, names = ['rating', 'review'], nrows=73356)

Or

train_reviews = pd.read_csv(args.raw_train_dataset_csv, header=None, names = ['rating', 'review'], error_bad_lines=False)

Or by just appending a " at this line.

Still, would be nice to fix this typo on the dataset.