Version 2.0 Compatibility Tracker

Question

Version 2.0 Compatibility Tracker

Dr-Irv opened this issue a year ago · comments

This is the list of things that are in pandas 2.0 release notes that need to be addressed in pandas-stubs. PR's welcome. If you do a PR, check off the item and put a link to the PR that closed it. One PR can address multiple issues.

Some of these may already have been taken care of, so if so, check them off and indicate with a comment such as "previously complete"

Int64Index, UInt64Index & Float64Index were deprecated in pandas version 1.4 and have now been removed. Instead Index should be used directly, and can it now take all numpy numeric dtypes, i.e. int8/ int16/int32/int64/uint8/uint16/uint32/uint64/float32/float64 dtypes: #626
The various numeric datetime attributes of DatetimeIndex (day, month, year etc.) were previously in of dtype int64, while they were int32 for arrays.DatetimeArray. They are now int32 on DatetimeIndex also #632
Level dtypes on Indexes from Series.sparse.from_coo() are now of dtype int32, the same as they are on the rows/cols on a scipy sparse matrix. Previously they were of dtype int64 #632
Index cannot be instantiated using a float16 dtype. Previously instantiating an Index using dtype float16 resulted in a Float64Index with a float64 dtype. #632
The following functions gained a new keyword dtype_backend (GH36712) #655
- read_csv()
- read_clipboard()
- read_fwf()
- read_excel()
- read_html()
- read_xml()
- read_json()
- read_sql() #649
- read_sql_query()
- read_sql_table()
- read_orc()
- read_feather()
- read_spss()
- to_numeric()
- DataFrame.convert_dtypes()
- Series.convert_dtypes()
Copy-on-Write can be enabled through one of
- pd.set_option("mode.copy_on_write", True)
- pd.options.mode.copy_on_write = True
- with pd.option_context("mode.copy_on_write", True):
~~Added support for str accessor methods when using ArrowDtype with a pyarrow.string type~~
~~Added support for dt accessor methods when using ArrowDtype with a pyarrow.timestamp type~~
read_sas() now supports using encoding='infer' to correctly read and use the encoding specified by the sas file.
Series.add_suffix(), DataFrame.add_suffix(), Series.add_prefix() and DataFrame.add_prefix() support an axis argument. #638
Added index parameter to DataFrame.to_dict() #638
Added cumsum, cumprod, cummin and cummax to the ExtensionArray interface via _accumulate
CategoricalConversionWarning, InvalidComparison, InvalidVersion, LossySetitemError, and NoBufferPresent are now exposed in pandas.errors
date_range() now supports a unit keyword (“s”, “ms”, “us”, or “ns”) to specify the desired resolution of the output index #734
DataFrame.to_json() now supports a mode keyword with supported inputs ‘w’ and ‘a’. Defaulting to ‘w’, ‘a’ can be used when lines=True and orient=’records’ to append record oriented json lines to an existing json file.
Added name parameter to IntervalIndex.from_breaks(), IntervalIndex.from_arrays() and IntervalIndex.from_tuples()
Added Index.infer_objects() analogous to Series.infer_objects()
DataFrame.plot.hist() now recognizes xlabel and ylabel arguments
Series.drop_duplicates() has gained ignore_index keyword to reset index
Series.dropna() and DataFrame.dropna() has gained ignore_index keyword to reset index
Added DatetimeIndex.as_unit() and TimedeltaIndex.as_unit() to convert to different resolutions; supported resolutions are “s”, “ms”, “us”, and “ns”
Added Series.dt.unit() and Series.dt.as_unit() to convert to different resolutions; supported resolutions are “s”, “ms”, “us”, and “ns”
Added new argument dtype to read_sql() to be consistent with read_sql_query() #649
read_csv() now accept date_format #650
read_table(), read_fwf() and read_excel() now accept date_format #695
to_datetime() now accepts "ISO8601" as an argument to format, which will match any ISO8601 string (but possibly not identically-formatted)
to_datetime() now accepts "mixed" as an argument to format, which will infer the format for each element individually
Added new argument engine to read_json() to support parsing JSON with pyarrow by specifying engine="pyarrow"
Added support for decimal parameter when engine="pyarrow" in read_csv()
Index set operations Index.union(), Index.intersection(), Index.difference(), and Index.symmetric_difference() now support sort=True, which will always return a sorted result, unlike the default sort=None which does not sort in some cases
Construction with datetime64 or timedelta64 dtype with unsupported resolution (check allowable resolutions for pd.Series())
Disallow astype conversion to non-supported datetime64/timedelta64 dtypes
The pandas latex options below are no longer used and have been removed. The generic max rows and columns arguments remain but for this functionality should be replaced by the Styler equivalents. The alternative options giving similar functionality are indicated below:
- display.latex.escape: replaced with styler.format.escape,
- display.latex.longtable: replaced with styler.latex.environment,
- display.latex.multicolumn, display.latex.multicolumn_format and display.latex.multirow: replaced with styler.sparse.rows, styler.sparse.columns, styler.latex.multirow_align and styler.latex.multicol_align,
- display.latex.repr: replaced with styler.render.repr,
- display.max_rows and display.max_columns: replace with styler.render.max_rows, styler.render.max_columns and styler.render.max_elements.
- The freq, tz, nanosecond, and unit keywords in the Timestamp constructor are now keyword-only
Default value of dtype in get_dummies() is changed to bool from uint8
DatetimeIndex.astype(), TimedeltaIndex.astype(), PeriodIndex.astype() Series.astype(), DataFrame.astype() with datetime64, timedelta64 or PeriodDtype dtypes no longer allow converting to integer dtypes other than “int64”, do obj.astype('int64', copy=False).astype(dtype) instead
The other argument in DataFrame.mask() and Series.mask() now defaults to no_default instead of np.nan consistent with DataFrame.where() and Series.where().
Series.unique() with dtype “timedelta64[ns]” or “datetime64[ns]” now returns TimedeltaArray or DatetimeArray instead of numpy.ndarray
to_datetime() and DatetimeIndex now allow sequences containing both datetime objects and numeric entries, matching Series behavior
~~Added "None" to default na_values in read_csv()~~ #715
Disallow computing cumprod for Timedelta object; previously this returned incorrect values
The levels of the index of the Series returned from Series.sparse.from_coo now always have dtype int32. Previously they had dtype int64
Added pandas.api.types.is_any_real_numeric_dtype() to check for real numeric dtypes #715
Deprecated argument infer_datetime_format in to_datetime() and read_csv(), as a strict version of it is now the default
Deprecated pandas.io.sql.execute()
Index.is_boolean() has been deprecated
Index.is_integer() has been deprecated.
Index.is_floating() has been deprecated.
Index.holds_integer() has been deprecated.
Index.is_numeric() has been deprecated.
Index.is_categorical() has been deprecated.
Index.is_object() has been deprecated.
Index.is_interval() has been deprecated.
Deprecated argument date_parser in read_csv(), read_table(), read_fwf(), and read_excel() in favour of date_format #695
Deprecated unused arguments *args and **kwargs in Resampler
Deprecated Grouper.groups()
Deprecated Grouper.grouper()
Deprecated Grouper.obj()
Deprecated Grouper.indexer()
Deprecated Grouper.ax()
Deprecated keyword use_nullable_dtypes in read_parquet()
Deprecated Series.pad()
Deprecated Series.backfill()
Deprecated DataFrame.pad()
Deprecated DataFrame.backfill()
Deprecated close(). Use StataReader as a context manager instead
Removed Int64Index, UInt64Index and Float64Index.
Removed deprecated Timestamp.freq, Timestamp.freqstr and argument freq from the Timestamp constructor and Timestamp.fromordinal()
Removed deprecated CategoricalBlock, Block.is_categorical(), require datetime64 and timedelta64 values to be wrapped in DatetimeArray or TimedeltaArray before passing to Block.make_block_same_class(), require DatetimeTZBlock.values to have the correct ndim when passing to the BlockManager constructor, and removed the “fastpath” keyword from the SingleBlockManager constructor
Removed deprecated global option use_inf_as_null in favor of use_inf_as_na
Removed deprecated module pandas.core.index
Removed deprecated alias pandas.core.tools.datetimes.to_time
Removed deprecated alias pandas.io.json.json_normalize
Removed deprecated Categorical.to_dense() (#932)
Removed deprecated Categorical.take_nd() (#931) (#932)
Removed deprecated Categorical.mode() (#932)
Removed deprecated Categorical.is_dtype_equal() and CategoricalIndex.is_dtype_equal() (#932)
Removed deprecated CategoricalIndex.take_nd() (#931)
Removed deprecated Index.is_type_compatible() (#931)
Removed deprecated Index.is_mixed() (#931)
Removed deprecated pandas.api.types.is_categorical()
Removed deprecated Index.asi8() (#931)
Removed deprecated DataFrame._AXIS_NUMBERS(), DataFrame._AXIS_NAMES(), Series._AXIS_NUMBERS(), Series._AXIS_NAMES()
Removed deprecated Index.to_native_types() (#931)
Removed deprecated Series.iteritems(), DataFrame.iteritems()
Removed deprecated DataFrame.lookup() (#930)
Removed deprecated Series.append(), DataFrame.append()
Removed deprecated Series.iteritems(), DataFrame.iteritems() and HDFStore.iteritems()
Removed deprecated DatetimeIndex.union_many()
Removed deprecated weekofyear and week attributes of DatetimeArray, DatetimeIndex and dt accessor (#932)
Removed deprecated RangeIndex._start(), RangeIndex._stop(), RangeIndex._step()
Removed deprecated DatetimeIndex.to_perioddelta() (#931)
Removed deprecated Styler.hide_index() and Styler.hide_columns()
Removed deprecated Styler.set_na_rep() and Styler.set_precision()
Removed deprecated Styler.where()
Removed deprecated Styler.render()
Removed deprecated argument col_space in DataFrame.to_latex() (#930)
Removed deprecated argument null_color in Styler.highlight_null()
Removed deprecated argument check_less_precise in testing.assert_frame_equal(), testing.assert_extension_array_equal(), testing.assert_series_equal(), testing.assert_index_equal() (#933)
Removed deprecated null_counts argument in DataFrame.info()
Removed deprecated Index.is_monotonic(), and Series.is_monotonic()
Removed deprecated Index.is_all_dates() (#931)
Enforced deprecation disallowing unit-less “datetime64” dtype in Series.astype() and DataFrame.astype()
Enforced deprecation disallowing passing non boolean argument to sort in concat()
Removed Date parser functions parse_date_time(), parse_date_fields(), parse_all_fields() and generic_parser()
Removed argument index from the core.arrays.SparseArray constructor
Remove argument squeeze from DataFrame.groupby() and Series.groupby()
Removed deprecated apply, apply_index, call, onOffset, and isAnchored attributes from DateOffset
Removed keep_tz argument in DatetimeIndex.to_series() #802
Remove arguments names and dtype from Index.copy() and levels and codes from MultiIndex.copy()
Remove argument inplace from MultiIndex.set_levels() and MultiIndex.set_codes()
Removed arguments verbose and encoding from DataFrame.to_excel() and Series.to_excel()
Removed argument line_terminator from DataFrame.to_csv() and Series.to_csv(),
Removed argument inplace from DataFrame.set_axis() and Series.set_axis()
Disallow passing positional arguments to MultiIndex.set_levels() and MultiIndex.set_codes()
Disallow parsing to Timedelta strings with components with units “Y”, “y”, or “M”, as these do not represent unambiguous durations
Removed MultiIndex.is_lexsorted() and MultiIndex.lexsort_depth()
Removed argument how from PeriodIndex.astype()
Removed argument try_cast from DataFrame.mask(), DataFrame.where(), Series.mask() and Series.where() (#930)
Removed argument tz from Period.to_timestamp()
Removed argument sort_columns in DataFrame.plot() and Series.plot()
Removed argument is_copy from DataFrame.take() and Series.take() (#930)
Removed argument kind from Index.get_slice_bound(), Index.slice_indexer() and Index.slice_locs()
Removed arguments prefix, squeeze, error_bad_lines and warn_bad_lines from read_csv()
Removed arguments squeeze from read_excel()
Removed argument datetime_is_numeric from DataFrame.describe() and Series.describe() (#930)
Disallow passing list key to Series.xs() and DataFrame.xs()
Disallow subclass-specific keywords (e.g. “freq”, “tz”, “names”, “closed”) in the Index constructor
Removed argument inplace from Categorical.remove_unused_categories() (#932)
Remove keywords convert_float and mangle_dupe_cols from read_excel()
Remove keyword mangle_dupe_cols from read_csv() and read_table()
Removed errors keyword from DataFrame.where(), Series.where(), DataFrame.mask() and Series.mask()
Disallow passing non-keyword arguments to [read_excel()] (https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html#pandas.read_excel) except io and sheet_name
Disallow passing non-keyword arguments to DataFrame.drop() and Series.drop() except labels
Disallow passing non-keyword arguments to DataFrame.fillna() and Series.fillna() except value
Disallow passing non-keyword arguments to StringMethods.split() and StringMethods.rsplit() except for pat
Disallow passing non-keyword arguments to DataFrame.set_index() except keys
Disallow passing non-keyword arguments to Resampler.interpolate() except method
Disallow passing non-keyword arguments to DataFrame.reset_index() and Series.reset_index() except level
Disallow passing non-keyword arguments to DataFrame.dropna() and Series.dropna()
Disallow passing non-keyword arguments to ExtensionArray.argsort()
Disallow passing non-keyword arguments to Categorical.sort_values()
Disallow passing non-keyword arguments to Index.drop_duplicates() and Series.drop_duplicates()
Disallow passing non-keyword arguments to DataFrame.drop_duplicates() except for subset
Disallow passing non-keyword arguments to DataFrame.sort_index() and Series.sort_index()
Disallow passing non-keyword arguments to DataFrame.interpolate() and Series.interpolate() except for method
Disallow passing non-keyword arguments to DataFrame.any() and Series.any()
Disallow passing non-keyword arguments to Index.set_names() except for names
Disallow passing non-keyword arguments to Index.join() except for other
Disallow passing non-keyword arguments to concat() except for objs
Disallow passing non-keyword arguments to pivot() except for data
Disallow passing non-keyword arguments to DataFrame.pivot()
Disallow passing non-keyword arguments to read_html() except for io
Disallow passing non-keyword arguments to read_json() except for path_or_buf
Disallow passing non-keyword arguments to read_sas() except for filepath_or_buffer
Disallow passing non-keyword arguments to read_stata() except for filepath_or_buffer
Disallow passing non-keyword arguments to read_csv() except filepath_or_buffer
Disallow passing non-keyword arguments to read_table() except filepath_or_buffer
Disallow passing non-keyword arguments to read_fwf() except filepath_or_buffer
Disallow passing non-keyword arguments to read_xml() except for path_or_buffer
Disallow passing non-keyword arguments to Series.mask() and DataFrame.mask() except cond and other
Disallow passing non-keyword arguments to DataFrame.to_stata() except for path
Disallow passing non-keyword arguments to DataFrame.where() and Series.where() except for cond and other
Disallow passing non-keyword arguments to Series.set_axis() and DataFrame.set_axis() except for labels
Disallow passing non-keyword arguments to Series.rename_axis() and DataFrame.rename_axis() except for mapper
Disallow passing non-keyword arguments to Series.clip() and DataFrame.clip()
Disallow passing non-keyword arguments to Series.bfill(), Series.ffill(), DataFrame.bfill() and DataFrame.ffill()
Disallow passing non-keyword arguments to DataFrame.replace(), Series.replace() except for to_replace and value
Disallow passing non-keyword arguments to DataFrame.sort_values() except for by
Disallow passing non-keyword arguments to Series.sort_values()
Disallow passing non-keyword arguments to DataFrame.reindex() except for labels
Disallowed constructing Categorical with scalar data
Removed Rolling.validate(), Expanding.validate(), and ExponentialMovingWindow.validate()
Removed Rolling.win_type returning "freq"
Removed Rolling.is_datetimelike
Removed the level keyword in DataFrame and Series aggregations
Removed deprecated Timedelta.delta(), Timedelta.is_populated(), and Timedelta.freq
Removed deprecated NaT.freq
Removed deprecated Categorical.replace()
Removed the numeric_only keyword from Categorical.min() and Categorical.max()
Removed is_extension_type() in favor of is_extension_array_dtype()
Removed .ExponentialMovingWindow.vol
Removed Index.get_value() and Index.set_value() (#931)
Removed Series.slice_shift() and DataFrame.slice_shift() (#930)
Remove DataFrameGroupBy.pad() and DataFrameGroupBy.backfill()
Remove numpy argument from read_json()
Disallow passing abbreviations for orient in DataFrame.to_dict() #638
Removed get_offset
Removed the warn keyword in infer_freq()
Removed the include_start and include_end arguments in DataFrame.between_time()
Removed the closed argument in date_range() and bdate_range()
Removed the center keyword in DataFrame.expanding()
Removed the truediv keyword from eval()
Removed the method and tolerance arguments in Index.get_loc()
Removed the pandas.datetime submodule
Removed the pandas.np submodule
Removed pandas.util.testing
Removed Series.str.iter()
Removed pandas.SparseArray
Removed pandas.SparseSeries and pandas.SparseDataFrame
Enforced disallowing passing an integer fill_value to DataFrame.shift() and Series.shift`() with datetime64, timedelta64, or period dtypes
Enforced disallowing a string column label into times in DataFrame.ewm()
Enforced disallowing passing True and False into inclusive in Series.between() in favor of "both" and "neither" respectively
Enforced disallowing the use of **kwargs in ExcelWriter
Enforced disallowing a tuple of column labels into DataFrameGroupBy.getitem()
Enforced disallowing set or dict indexers in getitem and setitem methods
Enforced disallowing dict or set objects in suffixes in merge()
Removed setting Categorical._codes directly
Removed setting Categorical.categories directly
Removed argument inplace from Categorical.add_categories(), Categorical.remove_categories(), Categorical.set_categories(), Categorical.rename_categories(), Categorical.reorder_categories(), (#932)
Removed argument inplace from Categorical.set_ordered(), Categorical.as_ordered(), Categorical.as_unordered()
Renamed fname to path in DataFrame.to_parquet(), DataFrame.to_stata() and DataFrame.to_feather()
Removed the display.column_space option
Removed the deprecated method mad from pandas classes
Removed the deprecated method tshift from pandas classes (#930)
Changed behavior of Index.ravel() to return a view on the original Index instead of a np.ndarray
Removed the deprecated base and loffset arguments from pandas.DataFrame.resample(), pandas.Series.resample() and pandas.Grouper
Change the default argument of regex for Series.str.replace() from True to False
Changed behavior of comparison of a Timestamp with a datetime.date object; these now compare as un-equal and raise on inequality comparisons
Changed behavior of comparison of NaT with a datetime.date object; these now raise on inequality comparisons
Removed na_sentinel argument from factorize(), Index.factorize(), and ExtensionArray.factorize()
Enforced deprecation disallowing passing numeric_only=True to Series reductions (rank, any, all, …) with non-numeric dtype
Changed default of numeric_only to False in all DataFrame methods with that argument
Changed default of numeric_only to False in Series.rank()
Changed default of numeric_only in various DataFrameGroupBy methods; all methods now default to numeric_only=False
Changed default of numeric_only to False in Resampler methods
Removed deprecated methods ExcelWriter.write_cells(), ExcelWriter.save(), ExcelWriter.cur_sheet(), ExcelWriter.handles(), ExcelWriter.path()
The ExcelWriter attribute book can no longer be set
Removed unused *args and **kwargs in Rolling, Expanding, and ExponentialMovingWindow ops
Removed the deprecated argument line_terminator from DataFrame.to_csv()
Removed the deprecated argument label from lreshape()
Arguments after expr in DataFrame.eval() and DataFrame.query() are keyword-only
Removed Index._get_attributes_dict()
Removed Series.array_wrap()

Siddhartha Gandhi · Answer 1 · Tue Apr 04 2023 05:19:53 GMT+0800 (China Standard Time)

Is the cut-off for removal of support for 1.5 effectively immediately? For instance, there are bugs constantly being fixed in the annotations, and so I currently stay up-to-date on the version. However I don't know if I'll be able to easily upgrade to 2.0 without breaking anything, and so I'd probably defer that upgrade. It seems 2.0 brings a fair amount of backwards-incompatible changes.

Irv Lustig · Answer 2 · Tue Apr 04 2023 05:23:07 GMT+0800 (China Standard Time)

Is the cut-off for removal of support for 1.5 effectively immediately? For instance, there are bugs constantly being fixed in the annotations, and so I currently stay up-to-date on the version. However I don't know if I'll be able to easily upgrade to 2.0 without breaking anything, and so I'd probably defer that upgrade. It seems 2.0 brings a fair amount of backwards-incompatible changes.

Yes. The idea is that we want to encourage people to use the 2.0 API. I think that the 2.0 stubs (when released) will still work with 1.5 code, and, if not, they are telling you places in your code that you should change in order to be compatible with 2.0 in the future.

As an example, anything that was deprecated in 1.5 was already removed from the stubs.

Ram Vikram Singh · Answer 3 · Wed Apr 26 2023 23:07:30 GMT+0800 (China Standard Time)

@Dr-Irv I think i have added the dtype_backend in series.convert_dtypes whch is covered in #655

Irv Lustig · Answer 4 · Thu Apr 27 2023 03:31:12 GMT+0800 (China Standard Time)

@Dr-Irv I think i have added the dtype_backend in series.convert_dtypes is covered in #655

Thanks. I updated the list.

Ram Vikram Singh · Answer 5 · Sat Jun 17 2023 00:49:14 GMT+0800 (China Standard Time)

@Dr-Irv i think you can mark this Added name parameter to IntervalIndex.from_breaks(), IntervalIndex.from_arrays(), IntervalIndex.from_tuples() as completed as the name parameter is already added in these.

Irv Lustig · Answer 6 · Sat Jun 17 2023 01:04:58 GMT+0800 (China Standard Time)

@Dr-Irv i think you can mark this Added name parameter to IntervalIndex.from_breaks(), IntervalIndex.from_arrays(), IntervalIndex.from_tuples() as completed as the name parameter is already added in these.

Thanks. I've done that update

caneff · Answer 7 · Sat Sep 09 2023 03:23:12 GMT+0800 (China Standard Time)

What is actually needed to be done to disallow non-keyword arguments? Most of the stubs already have the * in the overloads, is it just a matter of bring in the function def too for those overloaded ones? Lets just use drop as an example.

If so I can take those over.

Irv Lustig · Answer 8 · Tue Sep 12 2023 03:27:11 GMT+0800 (China Standard Time)

What is actually needed to be done to disallow non-keyword arguments? Most of the stubs already have the * in the overloads, is it just a matter of bring in the function def too for those overloaded ones? Lets just use drop as an example.

If so I can take those over.

It's a mix of using * and /. See https://peps.python.org/pep-0570/#syntax-and-semantics

PR's are welcome. Maybe do one function so we can agree on the approach, then you can expand it to others.