Data-Science-Notes
Listing my Data Science Notes
- We can access variables in pd.DataFrame().query() method like below
df = pd.DataFrame({'a':[1,2,3],'b':['x','y','z]})
BIGGER_FILTER = 2
df.query("a > @BIGGER_FILTER")
-
Usage of
.query()can be encouraged. It is simple than complex filters. -
For columns in the datetime format, use parse_dates=['date_column_here'] in pd.read_csv().
-
Prefer dumping via to_parquet, to_feather, to_pickle instead of to_csv. It will preserver or data types and consume less spaces on hard disc.
-
We can use pd.DataFrame().style instead of MS Excel for formatting files.
-
validate option for pd.DataFrame().merge(validate=)
-
Converting string columns which are categorical to category type is a best practice. We can do this via
.astype('category') -
Yellowbrick is a Python library that has useful visualizations for ML.
