Version 2.0 Compatibility Tracker
Dr-Irv opened this issue · comments
This is the list of things that are in pandas 2.0 release notes that need to be addressed in pandas-stubs. PR's welcome. If you do a PR, check off the item and put a link to the PR that closed it. One PR can address multiple issues.
Some of these may already have been taken care of, so if so, check them off and indicate with a comment such as "previously complete"
- Int64Index, UInt64Index & Float64Index were deprecated in pandas version 1.4 and have now been removed. Instead Index should be used directly, and can it now take all numpy numeric dtypes, i.e. int8/ int16/int32/int64/uint8/uint16/uint32/uint64/float32/float64 dtypes: #626
- The various numeric datetime attributes of DatetimeIndex (day, month, year etc.) were previously in of dtype int64, while they were int32 for arrays.DatetimeArray. They are now int32 on DatetimeIndex also #632
- Level dtypes on Indexes from Series.sparse.from_coo() are now of dtype int32, the same as they are on the rows/cols on a scipy sparse matrix. Previously they were of dtype int64 #632
- Index cannot be instantiated using a float16 dtype. Previously instantiating an Index using dtype float16 resulted in a Float64Index with a float64 dtype. #632
- The following functions gained a new keyword dtype_backend (GH36712) #655
- Copy-on-Write can be enabled through one of
- pd.set_option("mode.copy_on_write", True)
- pd.options.mode.copy_on_write = True
- with pd.option_context("mode.copy_on_write", True):
-
Added support for str accessor methods when using ArrowDtype with a pyarrow.string type -
Added support for dt accessor methods when using ArrowDtype with a pyarrow.timestamp type - read_sas() now supports using encoding='infer' to correctly read and use the encoding specified by the sas file.
- Series.add_suffix(), DataFrame.add_suffix(), Series.add_prefix() and DataFrame.add_prefix() support an axis argument. #638
- Added index parameter to DataFrame.to_dict() #638
- Added cumsum, cumprod, cummin and cummax to the ExtensionArray interface via _accumulate
- CategoricalConversionWarning, InvalidComparison, InvalidVersion, LossySetitemError, and NoBufferPresent are now exposed in pandas.errors
- date_range() now supports a unit keyword (“s”, “ms”, “us”, or “ns”) to specify the desired resolution of the output index #734
- DataFrame.to_json() now supports a mode keyword with supported inputs ‘w’ and ‘a’. Defaulting to ‘w’, ‘a’ can be used when lines=True and orient=’records’ to append record oriented json lines to an existing json file.
- Added name parameter to IntervalIndex.from_breaks(), IntervalIndex.from_arrays() and IntervalIndex.from_tuples()
- Added Index.infer_objects() analogous to Series.infer_objects()
- DataFrame.plot.hist() now recognizes xlabel and ylabel arguments
- Series.drop_duplicates() has gained ignore_index keyword to reset index
- Series.dropna() and DataFrame.dropna() has gained ignore_index keyword to reset index
- Added DatetimeIndex.as_unit() and TimedeltaIndex.as_unit() to convert to different resolutions; supported resolutions are “s”, “ms”, “us”, and “ns”
- Added Series.dt.unit() and Series.dt.as_unit() to convert to different resolutions; supported resolutions are “s”, “ms”, “us”, and “ns”
- Added new argument dtype to read_sql() to be consistent with read_sql_query() #649
- read_csv() now accept date_format #650
- read_table(), read_fwf() and read_excel() now accept date_format #695
- to_datetime() now accepts "ISO8601" as an argument to format, which will match any ISO8601 string (but possibly not identically-formatted)
- to_datetime() now accepts "mixed" as an argument to format, which will infer the format for each element individually
- Added new argument engine to read_json() to support parsing JSON with pyarrow by specifying engine="pyarrow"
- Added support for decimal parameter when engine="pyarrow" in read_csv()
- Index set operations Index.union(), Index.intersection(), Index.difference(), and Index.symmetric_difference() now support sort=True, which will always return a sorted result, unlike the default sort=None which does not sort in some cases
- Construction with datetime64 or timedelta64 dtype with unsupported resolution (check allowable resolutions for
pd.Series()
) - Disallow astype conversion to non-supported datetime64/timedelta64 dtypes
- The pandas latex options below are no longer used and have been removed. The generic max rows and columns arguments remain but for this functionality should be replaced by the Styler equivalents. The alternative options giving similar functionality are indicated below:
- display.latex.escape: replaced with styler.format.escape,
- display.latex.longtable: replaced with styler.latex.environment,
- display.latex.multicolumn, display.latex.multicolumn_format and display.latex.multirow: replaced with styler.sparse.rows, styler.sparse.columns, styler.latex.multirow_align and styler.latex.multicol_align,
- display.latex.repr: replaced with styler.render.repr,
- display.max_rows and display.max_columns: replace with styler.render.max_rows, styler.render.max_columns and styler.render.max_elements.
- The freq, tz, nanosecond, and unit keywords in the Timestamp constructor are now keyword-only
- Default value of dtype in get_dummies() is changed to bool from uint8
- DatetimeIndex.astype(), TimedeltaIndex.astype(), PeriodIndex.astype() Series.astype(), DataFrame.astype() with datetime64, timedelta64 or PeriodDtype dtypes no longer allow converting to integer dtypes other than “int64”, do obj.astype('int64', copy=False).astype(dtype) instead
- The other argument in DataFrame.mask() and Series.mask() now defaults to no_default instead of np.nan consistent with DataFrame.where() and Series.where().
- Series.unique() with dtype “timedelta64[ns]” or “datetime64[ns]” now returns TimedeltaArray or DatetimeArray instead of numpy.ndarray
- to_datetime() and DatetimeIndex now allow sequences containing both datetime objects and numeric entries, matching Series behavior
-
Added "None" to default na_values in read_csv()#715 - Disallow computing cumprod for Timedelta object; previously this returned incorrect values
- The levels of the index of the Series returned from Series.sparse.from_coo now always have dtype int32. Previously they had dtype int64
- Added pandas.api.types.is_any_real_numeric_dtype() to check for real numeric dtypes #715
- Deprecated argument infer_datetime_format in to_datetime() and read_csv(), as a strict version of it is now the default
- Deprecated pandas.io.sql.execute()
- Index.is_boolean() has been deprecated
- Index.is_integer() has been deprecated.
- Index.is_floating() has been deprecated.
- Index.holds_integer() has been deprecated.
- Index.is_numeric() has been deprecated.
- Index.is_categorical() has been deprecated.
- Index.is_object() has been deprecated.
- Index.is_interval() has been deprecated.
- Deprecated argument date_parser in read_csv(), read_table(), read_fwf(), and read_excel() in favour of date_format #695
- Deprecated unused arguments *args and **kwargs in Resampler
- Deprecated Grouper.groups()
- Deprecated Grouper.grouper()
- Deprecated Grouper.obj()
- Deprecated Grouper.indexer()
- Deprecated Grouper.ax()
- Deprecated keyword use_nullable_dtypes in read_parquet()
- Deprecated Series.pad()
- Deprecated Series.backfill()
- Deprecated DataFrame.pad()
- Deprecated DataFrame.backfill()
- Deprecated close(). Use StataReader as a context manager instead
- Removed Int64Index, UInt64Index and Float64Index.
- Removed deprecated Timestamp.freq, Timestamp.freqstr and argument freq from the Timestamp constructor and Timestamp.fromordinal()
- Removed deprecated CategoricalBlock, Block.is_categorical(), require datetime64 and timedelta64 values to be wrapped in DatetimeArray or TimedeltaArray before passing to Block.make_block_same_class(), require DatetimeTZBlock.values to have the correct ndim when passing to the BlockManager constructor, and removed the “fastpath” keyword from the SingleBlockManager constructor
- Removed deprecated global option use_inf_as_null in favor of use_inf_as_na
- Removed deprecated module pandas.core.index
- Removed deprecated alias pandas.core.tools.datetimes.to_time
- Removed deprecated alias pandas.io.json.json_normalize
- Removed deprecated Categorical.to_dense() (#932)
- Removed deprecated Categorical.take_nd() (#931) (#932)
- Removed deprecated Categorical.mode() (#932)
- Removed deprecated Categorical.is_dtype_equal() and CategoricalIndex.is_dtype_equal() (#932)
- Removed deprecated CategoricalIndex.take_nd() (#931)
- Removed deprecated Index.is_type_compatible() (#931)
- Removed deprecated Index.is_mixed() (#931)
- Removed deprecated pandas.api.types.is_categorical()
- Removed deprecated Index.asi8() (#931)
- Removed deprecated DataFrame._AXIS_NUMBERS(), DataFrame._AXIS_NAMES(), Series._AXIS_NUMBERS(), Series._AXIS_NAMES()
- Removed deprecated Index.to_native_types() (#931)
- Removed deprecated Series.iteritems(), DataFrame.iteritems()
- Removed deprecated DataFrame.lookup() (#930)
- Removed deprecated Series.append(), DataFrame.append()
- Removed deprecated Series.iteritems(), DataFrame.iteritems() and HDFStore.iteritems()
- Removed deprecated DatetimeIndex.union_many()
- Removed deprecated weekofyear and week attributes of DatetimeArray, DatetimeIndex and dt accessor (#932)
- Removed deprecated RangeIndex._start(), RangeIndex._stop(), RangeIndex._step()
- Removed deprecated DatetimeIndex.to_perioddelta() (#931)
- Removed deprecated Styler.hide_index() and Styler.hide_columns()
- Removed deprecated Styler.set_na_rep() and Styler.set_precision()
- Removed deprecated Styler.where()
- Removed deprecated Styler.render()
- Removed deprecated argument col_space in DataFrame.to_latex() (#930)
- Removed deprecated argument null_color in Styler.highlight_null()
- Removed deprecated argument check_less_precise in testing.assert_frame_equal(), testing.assert_extension_array_equal(), testing.assert_series_equal(), testing.assert_index_equal() (#933)
- Removed deprecated null_counts argument in DataFrame.info()
- Removed deprecated Index.is_monotonic(), and Series.is_monotonic()
- Removed deprecated Index.is_all_dates() (#931)
- Enforced deprecation disallowing unit-less “datetime64” dtype in Series.astype() and DataFrame.astype()
- Enforced deprecation disallowing passing non boolean argument to sort in concat()
- Removed Date parser functions parse_date_time(), parse_date_fields(), parse_all_fields() and generic_parser()
- Removed argument index from the core.arrays.SparseArray constructor
- Remove argument squeeze from DataFrame.groupby() and Series.groupby()
- Removed deprecated apply, apply_index, call, onOffset, and isAnchored attributes from DateOffset
- Removed keep_tz argument in DatetimeIndex.to_series() #802
- Remove arguments names and dtype from Index.copy() and levels and codes from MultiIndex.copy()
- Remove argument inplace from MultiIndex.set_levels() and MultiIndex.set_codes()
- Removed arguments verbose and encoding from DataFrame.to_excel() and Series.to_excel()
- Removed argument line_terminator from DataFrame.to_csv() and Series.to_csv(),
- Removed argument inplace from DataFrame.set_axis() and Series.set_axis()
- Disallow passing positional arguments to MultiIndex.set_levels() and MultiIndex.set_codes()
- Disallow parsing to Timedelta strings with components with units “Y”, “y”, or “M”, as these do not represent unambiguous durations
- Removed MultiIndex.is_lexsorted() and MultiIndex.lexsort_depth()
- Removed argument how from PeriodIndex.astype()
- Removed argument try_cast from DataFrame.mask(), DataFrame.where(), Series.mask() and Series.where() (#930)
- Removed argument tz from Period.to_timestamp()
- Removed argument sort_columns in DataFrame.plot() and Series.plot()
- Removed argument is_copy from DataFrame.take() and Series.take() (#930)
- Removed argument kind from Index.get_slice_bound(), Index.slice_indexer() and Index.slice_locs()
- Removed arguments prefix, squeeze, error_bad_lines and warn_bad_lines from read_csv()
- Removed arguments squeeze from read_excel()
- Removed argument datetime_is_numeric from DataFrame.describe() and Series.describe() (#930)
- Disallow passing list key to Series.xs() and DataFrame.xs()
- Disallow subclass-specific keywords (e.g. “freq”, “tz”, “names”, “closed”) in the Index constructor
- Removed argument inplace from Categorical.remove_unused_categories() (#932)
- Remove keywords convert_float and mangle_dupe_cols from read_excel()
- Remove keyword mangle_dupe_cols from read_csv() and read_table()
- Removed errors keyword from DataFrame.where(), Series.where(), DataFrame.mask() and Series.mask()
- Disallow passing non-keyword arguments to [read_excel()] (https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html#pandas.read_excel) except io and sheet_name
- Disallow passing non-keyword arguments to DataFrame.drop() and Series.drop() except labels
- Disallow passing non-keyword arguments to DataFrame.fillna() and Series.fillna() except value
- Disallow passing non-keyword arguments to StringMethods.split() and StringMethods.rsplit() except for pat
- Disallow passing non-keyword arguments to DataFrame.set_index() except keys
- Disallow passing non-keyword arguments to Resampler.interpolate() except method
- Disallow passing non-keyword arguments to DataFrame.reset_index() and Series.reset_index() except level
- Disallow passing non-keyword arguments to DataFrame.dropna() and Series.dropna()
- Disallow passing non-keyword arguments to ExtensionArray.argsort()
- Disallow passing non-keyword arguments to Categorical.sort_values()
- Disallow passing non-keyword arguments to Index.drop_duplicates() and Series.drop_duplicates()
- Disallow passing non-keyword arguments to DataFrame.drop_duplicates() except for subset
- Disallow passing non-keyword arguments to DataFrame.sort_index() and Series.sort_index()
- Disallow passing non-keyword arguments to DataFrame.interpolate() and Series.interpolate() except for method
- Disallow passing non-keyword arguments to DataFrame.any() and Series.any()
- Disallow passing non-keyword arguments to Index.set_names() except for names
- Disallow passing non-keyword arguments to Index.join() except for other
- Disallow passing non-keyword arguments to concat() except for objs
- Disallow passing non-keyword arguments to pivot() except for data
- Disallow passing non-keyword arguments to DataFrame.pivot()
- Disallow passing non-keyword arguments to read_html() except for io
- Disallow passing non-keyword arguments to read_json() except for path_or_buf
- Disallow passing non-keyword arguments to read_sas() except for filepath_or_buffer
- Disallow passing non-keyword arguments to read_stata() except for filepath_or_buffer
- Disallow passing non-keyword arguments to read_csv() except filepath_or_buffer
- Disallow passing non-keyword arguments to read_table() except filepath_or_buffer
- Disallow passing non-keyword arguments to read_fwf() except filepath_or_buffer
- Disallow passing non-keyword arguments to read_xml() except for path_or_buffer
- Disallow passing non-keyword arguments to Series.mask() and DataFrame.mask() except cond and other
- Disallow passing non-keyword arguments to DataFrame.to_stata() except for path
- Disallow passing non-keyword arguments to DataFrame.where() and Series.where() except for cond and other
- Disallow passing non-keyword arguments to Series.set_axis() and DataFrame.set_axis() except for labels
- Disallow passing non-keyword arguments to Series.rename_axis() and DataFrame.rename_axis() except for mapper
- Disallow passing non-keyword arguments to Series.clip() and DataFrame.clip()
- Disallow passing non-keyword arguments to Series.bfill(), Series.ffill(), DataFrame.bfill() and DataFrame.ffill()
- Disallow passing non-keyword arguments to DataFrame.replace(), Series.replace() except for to_replace and value
- Disallow passing non-keyword arguments to DataFrame.sort_values() except for by
- Disallow passing non-keyword arguments to Series.sort_values()
- Disallow passing non-keyword arguments to DataFrame.reindex() except for labels
- Disallowed constructing Categorical with scalar data
- Removed Rolling.validate(), Expanding.validate(), and ExponentialMovingWindow.validate()
- Removed Rolling.win_type returning "freq"
- Removed Rolling.is_datetimelike
- Removed the level keyword in DataFrame and Series aggregations
- Removed deprecated Timedelta.delta(), Timedelta.is_populated(), and Timedelta.freq
- Removed deprecated NaT.freq
- Removed deprecated Categorical.replace()
- Removed the numeric_only keyword from Categorical.min() and Categorical.max()
- Removed is_extension_type() in favor of is_extension_array_dtype()
- Removed .ExponentialMovingWindow.vol
- Removed Index.get_value() and Index.set_value() (#931)
- Removed Series.slice_shift() and DataFrame.slice_shift() (#930)
- Remove DataFrameGroupBy.pad() and DataFrameGroupBy.backfill()
- Remove numpy argument from read_json()
- Disallow passing abbreviations for orient in DataFrame.to_dict() #638
- Removed get_offset
- Removed the warn keyword in infer_freq()
- Removed the include_start and include_end arguments in DataFrame.between_time()
- Removed the closed argument in date_range() and bdate_range()
- Removed the center keyword in DataFrame.expanding()
- Removed the truediv keyword from eval()
- Removed the method and tolerance arguments in Index.get_loc()
- Removed the pandas.datetime submodule
- Removed the pandas.np submodule
- Removed pandas.util.testing
- Removed Series.str.iter()
- Removed pandas.SparseArray
- Removed pandas.SparseSeries and pandas.SparseDataFrame
- Enforced disallowing passing an integer fill_value to DataFrame.shift() and Series.shift`() with datetime64, timedelta64, or period dtypes
- Enforced disallowing a string column label into times in DataFrame.ewm()
- Enforced disallowing passing True and False into inclusive in Series.between() in favor of "both" and "neither" respectively
- Enforced disallowing the use of **kwargs in ExcelWriter
- Enforced disallowing a tuple of column labels into DataFrameGroupBy.getitem()
- Enforced disallowing set or dict indexers in getitem and setitem methods
- Enforced disallowing dict or set objects in suffixes in merge()
- Removed setting Categorical._codes directly
- Removed setting Categorical.categories directly
- Removed argument inplace from Categorical.add_categories(), Categorical.remove_categories(), Categorical.set_categories(), Categorical.rename_categories(), Categorical.reorder_categories(), (#932)
- Removed argument inplace from Categorical.set_ordered(), Categorical.as_ordered(), Categorical.as_unordered()
- Renamed fname to path in DataFrame.to_parquet(), DataFrame.to_stata() and DataFrame.to_feather()
- Removed the display.column_space option
- Removed the deprecated method mad from pandas classes
- Removed the deprecated method tshift from pandas classes (#930)
- Changed behavior of Index.ravel() to return a view on the original Index instead of a np.ndarray
- Removed the deprecated base and loffset arguments from pandas.DataFrame.resample(), pandas.Series.resample() and pandas.Grouper
- Change the default argument of regex for Series.str.replace() from True to False
- Changed behavior of comparison of a Timestamp with a datetime.date object; these now compare as un-equal and raise on inequality comparisons
- Changed behavior of comparison of NaT with a datetime.date object; these now raise on inequality comparisons
- Removed na_sentinel argument from factorize(), Index.factorize(), and ExtensionArray.factorize()
- Enforced deprecation disallowing passing numeric_only=True to Series reductions (rank, any, all, …) with non-numeric dtype
- Changed default of numeric_only to False in all DataFrame methods with that argument
- Changed default of numeric_only to False in Series.rank()
- Changed default of numeric_only in various DataFrameGroupBy methods; all methods now default to numeric_only=False
- Changed default of numeric_only to False in Resampler methods
- Removed deprecated methods ExcelWriter.write_cells(), ExcelWriter.save(), ExcelWriter.cur_sheet(), ExcelWriter.handles(), ExcelWriter.path()
- The ExcelWriter attribute book can no longer be set
- Removed unused *args and **kwargs in Rolling, Expanding, and ExponentialMovingWindow ops
- Removed the deprecated argument line_terminator from DataFrame.to_csv()
- Removed the deprecated argument label from lreshape()
- Arguments after expr in DataFrame.eval() and DataFrame.query() are keyword-only
- Removed Index._get_attributes_dict()
- Removed Series.array_wrap()
Is the cut-off for removal of support for 1.5 effectively immediately? For instance, there are bugs constantly being fixed in the annotations, and so I currently stay up-to-date on the version. However I don't know if I'll be able to easily upgrade to 2.0 without breaking anything, and so I'd probably defer that upgrade. It seems 2.0 brings a fair amount of backwards-incompatible changes.
Is the cut-off for removal of support for 1.5 effectively immediately? For instance, there are bugs constantly being fixed in the annotations, and so I currently stay up-to-date on the version. However I don't know if I'll be able to easily upgrade to 2.0 without breaking anything, and so I'd probably defer that upgrade. It seems 2.0 brings a fair amount of backwards-incompatible changes.
Yes. The idea is that we want to encourage people to use the 2.0 API. I think that the 2.0 stubs (when released) will still work with 1.5 code, and, if not, they are telling you places in your code that you should change in order to be compatible with 2.0 in the future.
As an example, anything that was deprecated in 1.5 was already removed from the stubs.
@Dr-Irv i think you can mark this Added name parameter to IntervalIndex.from_breaks(), IntervalIndex.from_arrays(), IntervalIndex.from_tuples()
as completed as the name parameter is already added in these.
@Dr-Irv i think you can mark this
Added name parameter to IntervalIndex.from_breaks(), IntervalIndex.from_arrays(), IntervalIndex.from_tuples()
as completed as the name parameter is already added in these.
Thanks. I've done that update
What is actually needed to be done to disallow non-keyword arguments? Most of the stubs already have the *
in the overloads, is it just a matter of bring in the function def too for those overloaded ones? Lets just use drop
as an example.
If so I can take those over.
What is actually needed to be done to disallow non-keyword arguments? Most of the stubs already have the
*
in the overloads, is it just a matter of bring in the function def too for those overloaded ones? Lets just usedrop
as an example.If so I can take those over.
It's a mix of using *
and /
. See https://peps.python.org/pep-0570/#syntax-and-semantics
PR's are welcome. Maybe do one function so we can agree on the approach, then you can expand it to others.