MuhammedBuyukkinaci / Data-Science-Notes

Listing my Data Science Notes

Repository from Github https://github.comMuhammedBuyukkinaci/Data-Science-Notes

Data-Science-Notes

Listing my Data Science Notes

We can access variables in pd.DataFrame().query() method like below

df = pd.DataFrame({'a':[1,2,3],'b':['x','y','z]})
BIGGER_FILTER = 2
df.query("a > @BIGGER_FILTER")

Usage of .query() can be encouraged. It is simple than complex filters.
For columns in the datetime format, use parse_dates=['date_column_here'] in pd.read_csv().
Prefer dumping via to_parquet, to_feather, to_pickle instead of to_csv. It will preserver or data types and consume less spaces on hard disc.
We can use pd.DataFrame().style instead of MS Excel for formatting files.
validate option for pd.DataFrame().merge(validate=)

Converting string columns which are categorical to category type is a best practice. We can do this via .astype('category')
Yellowbrick is a Python library that has useful visualizations for ML.

About

Listing my Data Science Notes