arne-cl / pandas-gotchas

List of gotchas in Pandas (the Python data analysis library).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pandas-gotchas

This is a list of gotchas I found in Pandas (the Python data analysis library).

grouping / aggregation

membership in series

filter series / column by substring

To check which elements of a column start with the prefix field_,
run df.my_column.str.startswith('field_'). To avoid the error
ValueError: Cannot mask with non-boolean array containing NA / NaN values,
simply add na=False (which will ignore NA values):

df.my_column.str.startswith('field_', na=False)

joining / merging

  • values in a Pandas index column do not have to be unique (unlike values in a PRIMARY_KEY column in SQL)
    • If you do a LEFT JOIN on two tables, you expect the result to have as many rows as the left table.
    • In Pandas, for a .join() or .merge() to work the same way, you have to remove duplicate rows, e.g. by calling df_right.drop_duplicates() before pd.merge(df_left, df_right, on='common_column_name', how='left').

See also

Prabhant Sing. Gotchas of Pandas (Pydata Delhi).

About

List of gotchas in Pandas (the Python data analysis library).