pydata
Here are 72 public repositories matching this topic...
Series.reindex
Implement Series.reindex.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.reindex.html
It would be nice to have some general developer documentation for potential contributors to help in cases such as #510, etc.
What are the best steps to take towards accomplishing this? Maybe something similar (albeit not all details needed) to the Pandas developer docs?
I've begun an implementation of this on my fork, basicall
User @codyschank had noticed that for small datasets, stumpy.stomp._stomp is faster than stumpy.stump. Here is some very rough timing calculations from my 2-core laptop:
length stomp stump stomp/stump stump/stomp
0 128 0.006628 0.018066 0.366867 2.725782
1 256 0.
In some workloads with highly compressible data we would like to trade off some computation time for more in-memory storage automatically. Dask workers store data in a MutableMapping (the superclass of dict). So in principle all we would need to do is make a MutableMapping subclass that overrides the getitem and setitem methods to compress and decompress data on demand.
This would be an i
janitor.biology could do with a to_fasta function, I think. The intent here would be to conveniently export a dataframe of sequences as a FASTA file, using one column as the fasta header.
strawman implementation below:
import pandas_flavor as pf
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
from Bio import SeqIO
@pf.register_dataframe_method
def to_fasta(d-
Updated
Oct 18, 2016 - Jupyter Notebook
-
Updated
Jan 12, 2018 - HTML
Now that airspeed-velocity/asv#449 is fixed, we could do a proper test of our benchmarks during CI.
In trying to write tests for #189, I'm finding very difficult to add columns to existing tests, as in some cases like the all_types table, the table is defined in a separate file than the tests and multiple tests try to write to the same table.
Additionally, our test suite doesn't prove that the data that are uploaded are the same as the data downloaded for all types.
We should consider m
-
Updated
Mar 14, 2020 - Python
-
Updated
Jul 30, 2017 - Jupyter Notebook
wikipedia page is terrible for this,
- used in PCA
- fraction of explained variance = R^2
- used in ANOVA
just overview http://www-ist.massey.ac.nz/dstirlin/CAST/CAST/Hvariation/variation_b4.html
some examples
http://onlinestatbook.com/2/effect_size/variance_explained.html
-
Updated
Jul 2, 2018 - Jupyter Notebook
-
Updated
Aug 14, 2018 - HTML
-
Updated
Nov 16, 2018 - Jupyter Notebook
-
Updated
Mar 16, 2020 - Jupyter Notebook
-
Updated
Sep 14, 2017 - Jupyter Notebook
-
Updated
Sep 24, 2018 - Shell
-
Updated
Sep 2, 2017 - Jupyter Notebook
-
Updated
Feb 19, 2019 - Jupyter Notebook
-
Updated
Jul 11, 2017 - Python
-
Updated
Jul 9, 2018 - Python
-
Updated
Feb 25, 2018 - Jupyter Notebook
-
Updated
Mar 7, 2017 - Jupyter Notebook
Improve this page
Add a description, image, and links to the pydata topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the pydata topic, visit your repo's landing page and select "manage topics."


If you join Dask DataFrame on a categorical column, then the outputted Dask DataFrame column is still
categorydtype. However, the moment you.compute()the outputted Dask DataFrame, then the column is the wrong dtype, not categorical.Tested on Dask 2.14.0 and Pandas 1.0.3
This example where the category type looks like a float, so after .compute(), the dtype is float.