pydata

If you join Dask DataFrame on a categorical column, then the outputted Dask DataFrame column is still category dtype. However, the moment you .compute() the outputted Dask DataFrame, then the column is the wrong dtype, not categorical.

Tested on Dask 2.14.0 and Pandas 1.0.3
This example where the category type looks like a float, so after .compute(), the dtype is float.

import dask.d

Implement Series.reindex.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.reindex.html

It would be nice to have some general developer documentation for potential contributors to help in cases such as #510, etc.

What are the best steps to take towards accomplishing this? Maybe something similar (albeit not all details needed) to the Pandas developer docs?

I've begun an implementation of this on my fork, basicall

@codyschank

User @codyschank had noticed that for small datasets, stumpy.stomp._stomp is faster than stumpy.stump. Here is some very rough timing calculations from my 2-core laptop:

    length       stomp       stump  stomp/stump   stump/stomp
0      128    0.006628    0.018066     0.366867      2.725782
1      256    0.

In some workloads with highly compressible data we would like to trade off some computation time for more in-memory storage automatically. Dask workers store data in a MutableMapping (the superclass of dict). So in principle all we would need to do is make a MutableMapping subclass that overrides the getitem and setitem methods to compress and decompress data on demand.

This would be an i

janitor.biology could do with a to_fasta function, I think. The intent here would be to conveniently export a dataframe of sequences as a FASTA file, using one column as the fasta header.

strawman implementation below:

import pandas_flavor as pf
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
from Bio import SeqIO

@pf.register_dataframe_method
def to_fasta(d

Now that airspeed-velocity/asv#449 is fixed, we could do a proper test of our benchmarks during CI.

In trying to write tests for #189, I'm finding very difficult to add columns to existing tests, as in some cases like the all_types table, the table is defined in a separate file than the tests and multiple tests try to write to the same table.

Additionally, our test suite doesn't prove that the data that are uploaded are the same as the data downloaded for all types.

We should consider m

wikipedia page is terrible for this,

used in PCA
fraction of explained variance = R^2
used in ANOVA

just overview http://www-ist.massey.ac.nz/dstirlin/CAST/CAST/Hvariation/variation_b4.html

some examples
http://onlinestatbook.com/2/effect_size/variance_explained.html

Apr	MAY	Jun
	10
2019	2020	2021

pydata

Here are 72 public repositories matching this topic...

dask / dask

databricks / koalas

pydata / pandas-datareader

TDAmeritrade / stumpy

dask / distributed

ericmjl / pyjanitor

DataTau / datascience-anthology-pydata

rasbt / pydata-chicago2016-ml-tutorial

JasonKessler / Scattertext-PyData

JDASoftwareGroup / kartothek

omnisci / pymapd

WinVector / pyvtreat

mattilyra / pydataberlin-2017

martinapugliese / tales-science-data

gcampanella / pydata-london-2018

pydataberlin / meetup-slides

yinleon / pydata2017

josephofiowa / pydata-dc-2018

dimgold / pycon_social_networkx

PyDataKR / pydata.kr

cytora / clickbait-workshop

bweigel / ml_at_awslambda_pydatabln2018

Shinichi-Nakagawa / scrapy-sample-baseball

sachin-kmr / Neural-Image-Captioning

AlexIoannides / lime-interpretable-ml

quasiben / kubernetes-pydata-parallel

TwentyBN / 20bn-video-data-loading-talk

GapData / PyDataBratislava

koaning / kadro

pydatacharlotte / effortless_rest_flask

Improve this page

Add this topic to your repo