What’s New¶
These are new features and improvements of note in each release.
v0.20.3 (July 7, 2017)¶
This is a minor bug-fix release in the 0.20.x series and includes some small regression fixes and bug fixes. We recommend that all users upgrade to this version.
What’s new in v0.20.3
Bug Fixes¶
- Fixed a bug in failing to compute rolling computations of a column-MultiIndexed DataFrame(GH16789, GH16825)
- Fixed a pytest marker failing downstream packages’ tests suites (GH16680)
Conversion¶
- Bug in pickle compat prior to the v0.20.x series, when UTCis a timezone in a Series/DataFrame/Index (GH16608)
- Bug in Seriesconstruction when passing aSerieswithdtype='category'(GH16524).
- Bug in DataFrame.astype()when passing aSeriesas thedtypekwarg. (GH16717).
Indexing¶
- Bug in Float64Indexcausing an empty array instead ofNoneto be returned from.get(np.nan)on a Series whose index did not contain anyNaNs (GH8569)
- Bug in MultiIndex.isincausing an error when passing an empty iterable (GH16777)
- Fixed a bug in a slicing DataFrame/Series that have a  TimedeltaIndex(GH16637)
I/O¶
- Bug in read_csv()in which files weren’t opened as binary files by the C engine on Windows, causing EOF characters mid-field, which would fail (GH16039, GH16559, GH16675)
- Bug in read_hdf()in which reading aSeriessaved to an HDF file in ‘fixed’ format fails when an explicitmode='r'argument is supplied (GH16583)
- Bug in DataFrame.to_latex()wherebold_rowswas wrongly specified to beTrueby default, whereas in reality row labels remained non-bold whatever parameter provided. (GH16707)
- Fixed an issue with DataFrame.style()where generated element ids were not unique (GH16780)
- Fixed loading a DataFramewith aPeriodIndex, from aformat='fixed'HDFStore, in Python 3, that was written in Python 2 (GH16781)
Plotting¶
- Fixed regression that prevented RGB and RGBA tuples from being used as color arguments (GH16233)
- Fixed an issue with DataFrame.plot.scatter()that incorrectly raised aKeyErrorwhen categorical data is used for plotting (GH16199)
Reshaping¶
v0.20.2 (June 4, 2017)¶
This is a minor bug-fix release in the 0.20.x series and includes some small regression fixes, bug fixes and performance improvements. We recommend that all users upgrade to this version.
What’s new in v0.20.2
Enhancements¶
- Unblocked access to additional compression types supported in pytables: ‘blosc:blosclz, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’ (GH14478)
- Seriesprovides a- to_latexmethod (GH16180)
- A new groupby method ngroup(), parallel to the existingcumcount(), has been added to return the group order (GH11642); see here.
Performance Improvements¶
- Performance regression fix when indexing with a list-like (GH16285)
- Performance regression fix for MultiIndexes (GH16319, GH16346)
- Improved performance of .clip()with scalar arguments (GH15400)
- Improved performance of groupby with categorical groupers (GH16413)
- Improved performance of MultiIndex.remove_unused_levels()(GH16556)
Bug Fixes¶
- Silenced a warning on some Windows environments about “tput: terminal attributes: No such device or address” when detecting the terminal size. This fix only applies to python 3 (GH16496)
- Bug in using pathlib.Pathorpy.path.localobjects with io functions (GH16291)
- Bug in Index.symmetric_difference()on two equal MultiIndex’s, results in aTypeError(:issue 13490)
- Bug in DataFrame.update()withoverwrite=FalseandNaN values(GH15593)
- Passing an invalid engine to read_csv()now raises an informativeValueErrorrather thanUnboundLocalError. (GH16511)
- Bug in unique()on an array of tuples (GH16519)
- Bug in cut()whenlabelsare set, resulting in incorrect label ordering (GH16459)
- Fixed a compatibility issue with IPython 6.0’s tab completion showing deprecation warnings on Categoricals(GH16409)
Conversion¶
- Bug in to_numeric()in which empty data inputs were causing a segfault of the interpreter (GH16302)
- Silence numpy warnings when broadcasting DataFrametoSerieswith comparison ops (GH16378, GH16306)
Indexing¶
- Bug in DataFrame.reset_index(level=)with single level index (GH16263)
- Bug in partial string indexing with a monotonic, but not strictly-monotonic, index incorrectly reversing the slice bounds (GH16515)
- Bug in MultiIndex.remove_unused_levels()that would not return aMultiIndexequal to the original. (GH16556)
I/O¶
- Bug in read_csv()whencommentis passed in a space delimited text file (GH16472)
- Bug in read_csv()not raising an exception with nonexistent columns inusecolswhen it had the correct length (GH14671)
- Bug that would force importing of the clipboard routines unnecessarily, potentially causing an import error on startup (GH16288)
- Bug that raised IndexErrorwhen HTML-rendering an emptyDataFrame(GH15953)
- Bug in read_csv()in which tarfile object inputs were raising an error in Python 2.x for the C engine (GH16530)
- Bug where DataFrame.to_html()ignored theindex_namesparameter (GH16493)
- Bug where pd.read_hdf()returns numpy strings for index names (GH13492)
- Bug in HDFStore.select_as_multiple()where start/stop arguments were not respected (GH16209)
Plotting¶
Groupby/Resample/Rolling¶
Reshaping¶
- Bug in DataFrame.stackwith unsorted levels inMultiIndexcolumns (GH16323)
- Bug in pd.wide_to_long()where no error was raised wheniwas not a unique identifier (GH16382)
- Bug in Series.isin(..)with a list of tuples (GH16394)
- Bug in construction of a DataFramewith mixed dtypes including an all-NaT column. (GH16395)
- Bug in DataFrame.agg()andSeries.agg()with aggregating on non-callable attributes (GH16405)
Numeric¶
- Bug in .interpolate(), wherelimit_directionwas not respected whenlimit=None(default) was passed (GH16282)
v0.20.1 (May 5, 2017)¶
This is a major release from 0.19.2 and includes a number of API changes, deprecations, new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
- New .agg()API for Series/DataFrame similar to the groupby-rolling-resample API’s, see here
- Integration with the feather-format, including a new top-levelpd.read_feather()andDataFrame.to_feather()method, see here.
- The .ixindexer has been deprecated, see here
- Panelhas been deprecated, see here
- Addition of an IntervalIndexandIntervalscalar type, see here
- Improved user API when grouping by index levels in .groupby(), see here
- Improved support for UInt64dtypes, see here
- A new orient for JSON serialization, orient='table', that uses the Table Schema spec and that gives the possibility for a more interactive repr in the Jupyter Notebook, see here
- Experimental support for exporting styled DataFrames (DataFrame.style) to Excel, see here
- Window binary corr/cov operations now return a MultiIndexed DataFramerather than aPanel, asPanelis now deprecated, see here
- Support for S3 handling now uses s3fs, see here
- Google BigQuery support now uses the pandas-gbqlibrary, see here
Warning
Pandas has changed the internal structure and layout of the codebase.
This can affect imports that are not from the top-level pandas.* namespace, please see the changes here.
Check the API Changes and deprecations before updating.
Note
This is a combined release for 0.20.0 and and 0.20.1.
Version 0.20.1 contains one additional change for backwards-compatibility with downstream projects using pandas’ utils routines. (GH16250)
What’s new in v0.20.0
- New features- aggAPI for DataFrame/Series
- dtypekeyword for data IO
- .to_datetime()has gained an- originparameter
- Groupby Enhancements
- Better support for compressed URLs in read_csv
- Pickle file I/O now supports compression
- UInt64 Support Improved
- GroupBy on Categoricals
- Table Schema Output
- SciPy sparse matrix from/to SparseDataFrame
- Excel output for styled DataFrames
- IntervalIndex
- Other Enhancements
 
- Backwards incompatible API changes- Possible incompatibility for HDF5 formats created with pandas < 0.13.0
- Map on Index types now return other Index types
- Accessing datetime fields of Index now return Index
- pd.unique will now be consistent with extension types
- S3 File Handling
- Partial String Indexing Changes
- Concat of different float dtypes will not automatically upcast
- Pandas Google BigQuery support has moved
- Memory Usage for Index is more Accurate
- DataFrame.sort_index changes
- Groupby Describe Formatting
- Window Binary Corr/Cov operations return a MultiIndex DataFrame
- HDFStore where string comparison
- Index.intersection and inner join now preserve the order of the left Index
- Pivot Table always returns a DataFrame
- Other API Changes
 
- Reorganization of the library: Privacy Changes
- Deprecations
- Removal of prior version deprecations/changes
- Performance Improvements
- Bug Fixes
New features¶
agg API for DataFrame/Series¶
Series & DataFrame have been enhanced to support the aggregation API. This is a familiar API
from groupby, window operations, and resampling. This allows aggregation operations in a concise way
by using agg() and transform(). The full documentation
is here (GH1623).
Here is a sample
In [1]: df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
   ...:                  index=pd.date_range('1/1/2000', periods=10))
   ...: 
In [2]: df.iloc[3:7] = np.nan
In [3]: df
Out[3]: 
                   A         B         C
2000-01-01  1.474071 -0.064034 -1.282782
2000-01-02  0.781836 -1.071357  0.441153
2000-01-03  2.353925  0.583787  0.221471
2000-01-04       NaN       NaN       NaN
2000-01-05       NaN       NaN       NaN
2000-01-06       NaN       NaN       NaN
2000-01-07       NaN       NaN       NaN
2000-01-08  0.901805  1.171216  0.520260
2000-01-09 -1.197071 -1.066969 -0.303421
2000-01-10 -0.858447  0.306996 -0.028665
One can operate using string function names, callables, lists, or dictionaries of these.
Using a single function is equivalent to .apply.
In [4]: df.agg('sum')
Out[4]: 
A    3.456119
B   -0.140361
C   -0.431984
dtype: float64
Multiple aggregations with a list of functions.
In [5]: df.agg(['sum', 'min'])
Out[5]: 
            A         B         C
sum  3.456119 -0.140361 -0.431984
min -1.197071 -1.071357 -1.282782
Using a dict provides the ability to apply specific aggregations per column.
You will get a matrix-like output of all of the aggregators. The output has one column
per unique function. Those functions applied to a particular column will be NaN:
In [6]: df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
Out[6]: 
            A         B
max       NaN  1.171216
min -1.197071 -1.071357
sum  3.456119       NaN
The API also supports a .transform() function for broadcasting results.
In [7]: df.transform(['abs', lambda x: x - x.min()])
Out[7]: 
                   A                   B                   C          
                 abs  <lambda>       abs  <lambda>       abs  <lambda>
2000-01-01  1.474071  2.671143  0.064034  1.007322  1.282782  0.000000
2000-01-02  0.781836  1.978907  1.071357  0.000000  0.441153  1.723935
2000-01-03  2.353925  3.550996  0.583787  1.655143  0.221471  1.504252
2000-01-04       NaN       NaN       NaN       NaN       NaN       NaN
2000-01-05       NaN       NaN       NaN       NaN       NaN       NaN
2000-01-06       NaN       NaN       NaN       NaN       NaN       NaN
2000-01-07       NaN       NaN       NaN       NaN       NaN       NaN
2000-01-08  0.901805  2.098877  1.171216  2.242573  0.520260  1.803042
2000-01-09  1.197071  0.000000  1.066969  0.004388  0.303421  0.979361
2000-01-10  0.858447  0.338624  0.306996  1.378353  0.028665  1.254117
When presented with mixed dtypes that cannot be aggregated, .agg() will only take the valid
aggregations. This is similiar to how groupby .agg() works. (GH15015)
In [8]: df = pd.DataFrame({'A': [1, 2, 3],
   ...:                    'B': [1., 2., 3.],
   ...:                    'C': ['foo', 'bar', 'baz'],
   ...:                    'D': pd.date_range('20130101', periods=3)})
   ...: 
In [9]: df.dtypes
Out[9]: 
A             int64
B           float64
C            object
D    datetime64[ns]
dtype: object
In [10]: df.agg(['min', 'sum'])
Out[10]: 
     A    B          C          D
min  1  1.0        bar 2013-01-01
sum  6  6.0  foobarbaz        NaT
dtype keyword for data IO¶
The 'python' engine for read_csv(), as well as the read_fwf() function for parsing
fixed-width text files and read_excel() for parsing Excel files, now accept the dtype keyword argument for specifying the types of specific columns (GH14295). See the io docs for more information.
In [11]: data = "a  b\n1  2\n3  4"
In [12]: pd.read_fwf(StringIO(data)).dtypes
Out[12]: 
a    int64
b    int64
dtype: object
In [13]: pd.read_fwf(StringIO(data), dtype={'a':'float64', 'b':'object'}).dtypes