MuhammedBuyukkinaci / Data-Science-Notes

Listing my Data Science Notes

Repository from Github https://github.comMuhammedBuyukkinaci/Data-Science-NotesRepository from Github https://github.comMuhammedBuyukkinaci/Data-Science-Notes

Data-Science-Notes

Listing my Data Science Notes

  1. We can access variables in pd.DataFrame().query() method like below
df = pd.DataFrame({'a':[1,2,3],'b':['x','y','z]})
BIGGER_FILTER = 2
df.query("a > @BIGGER_FILTER")
  1. Usage of .query() can be encouraged. It is simple than complex filters.

  2. For columns in the datetime format, use parse_dates=['date_column_here'] in pd.read_csv().

  3. Prefer dumping via to_parquet, to_feather, to_pickle instead of to_csv. It will preserver or data types and consume less spaces on hard disc.

  4. We can use pd.DataFrame().style instead of MS Excel for formatting files.

  5. validate option for pd.DataFrame().merge(validate=)

merge_validate

  1. Converting string columns which are categorical to category type is a best practice. We can do this via .astype('category')

  2. Yellowbrick is a Python library that has useful visualizations for ML.

About

Listing my Data Science Notes