This is a list of gotchas I found in Pandas (the Python data analysis library).
- Dropping nuisance columns in groupby is a nuisance #21664
- Pandas silently drops a column if the chosen aggregation method doesn't work on it.
- pandas GroupBy columns with NaN (missing) values
- Pandas silently drops rows when grouping by a column that contains
NaN
. - You can avoid this behaviour by using .
groupby(..., dropna=False)
.
- Pandas silently drops rows when grouping by a column that contains
- How to determine whether a Pandas Column contains a particular value
x in series
tells you ifx
is in theindex
ofseries
- use
x in series.values
to check ifx
is in the actualseries
To check which elements of a column start with the prefix field_
,
run df.my_column.str.startswith('field_')
. To avoid the error
ValueError: Cannot mask with non-boolean array containing NA / NaN values
,
simply add na=False
(which will ignore NA values):
df.my_column.str.startswith('field_', na=False)
- values in a Pandas index column do not have to be unique (unlike values in a PRIMARY_KEY column in SQL)
- If you do a LEFT JOIN on two tables, you expect the result to have as many rows as the left table.
- In Pandas, for a
.join()
or.merge()
to work the same way, you have to remove duplicate rows, e.g. by callingdf_right.drop_duplicates()
beforepd.merge(df_left, df_right, on='common_column_name', how='left')
.