pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Home Page:https://pandas.pydata.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Annotate API-exposed Items from pandas.core.api

WillAyd opened this issue · comments

We currently expose all of the following items from pandas.core.api via the API:

from pandas.core.api import (
    # dtype
    Int8Dtype, Int16Dtype, Int32Dtype, Int64Dtype, UInt8Dtype,
    UInt16Dtype, UInt32Dtype, UInt64Dtype, CategoricalDtype,
    PeriodDtype, IntervalDtype, DatetimeTZDtype,

    # missing
    isna, isnull, notna, notnull,

    # indexes
    Index, CategoricalIndex, Int64Index, UInt64Index, RangeIndex,
    Float64Index, MultiIndex, IntervalIndex, TimedeltaIndex,
    DatetimeIndex, PeriodIndex, IndexSlice,

    # tseries
    NaT, Period, period_range, Timedelta, timedelta_range,
    Timestamp, date_range, bdate_range, Interval, interval_range,
    DateOffset,

    # conversion
    to_numeric, to_datetime, to_timedelta,

    # misc
    np, Grouper, factorize, unique, value_counts, NamedAgg,
    array, Categorical, set_eng_float_format, Series, DataFrame,
    Panel)

A pseudo-prioritized list of annotations I think we need out of this would be the below. Open to suggestions on how to prioritize and obviously community PRs are very welcome!

  • DataFrame
  • Series
  • Index
  • MultiIndex
  • Categorical
  • CategoricalIndex
  • Datetimelike indices
  • Numeric indices

...

These don't necessarily need to be completed in order. Will continue to expand checklist as we tackle more items so if you see something you'd like to tackle feel free to call it out

@WillAyd I am not sure what exactly needs to be done. Do we need to annotate every attribute/method of class DataFrame, Index, etc.? Can you provide some example(mypy docs or SO link maybe)

Yes that's correct - would want to add annotations to the methods for these objects (and attributes where inference may not work)

I'm unpinning this to make room for #31879. Will re-pin tomorrow.

@WillAyd I am not quite sure that I understand the meaning of this issue.
Is it about creating something like this?

class MyDataFrame(pandas.DataFrame):
    col_foo: datetime.datetime

def func(df: MyDataFrame):
    df['col_foo'].dt

func(MyDataFrame())  # mypy passes
func(pd.Dataframe(columns=['col_foo'] , dtype=np.datetime64))  # mypy passes?
func(pd.Dataframe(columns=['col_foo']))  # mypy raises error?

I was looking for something that imitates the dataclass\ NamedTuple usage api

We can close this issue; I haven't tracked it in quite some time

@WillAyd which issue tracks development of something similar to the type annotations I have mentioned above?

I'd be interested in something like this, obviously it would be very hard to type everything properly and conveniently (considering how dynamic pandas can be), but for simple usecases, having some helpers which exploit Literal and TypedDict could cover many cases where typing is desirable.

Sounds good. Feel free to submit PRs to improve annotations - they are always welcome

I think this has served its purpose. Closing.