The Wayback Machine - https://web.archive.org/web/20200510044611/https://github.com/topics/pydata
Skip to content
#

pydata

Here are 72 public repositories matching this topic...

eugeneh101
eugeneh101 commented Apr 24, 2020

If you join Dask DataFrame on a categorical column, then the outputted Dask DataFrame column is still category dtype. However, the moment you .compute() the outputted Dask DataFrame, then the column is the wrong dtype, not categorical.

Tested on Dask 2.14.0 and Pandas 1.0.3
This example where the category type looks like a float, so after .compute(), the dtype is float.

import dask.d
addisonlynch
addisonlynch commented Apr 14, 2018

It would be nice to have some general developer documentation for potential contributors to help in cases such as #510, etc.

What are the best steps to take towards accomplishing this? Maybe something similar (albeit not all details needed) to the Pandas developer docs?

I've begun an implementation of this on my fork, basicall

mrocklin
mrocklin commented Mar 28, 2020

In some workloads with highly compressible data we would like to trade off some computation time for more in-memory storage automatically. Dask workers store data in a MutableMapping (the superclass of dict). So in principle all we would need to do is make a MutableMapping subclass that overrides the getitem and setitem methods to compress and decompress data on demand.

This would be an i

ericmjl
ericmjl commented Mar 12, 2020

janitor.biology could do with a to_fasta function, I think. The intent here would be to conveniently export a dataframe of sequences as a FASTA file, using one column as the fasta header.

strawman implementation below:

import pandas_flavor as pf
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
from Bio import SeqIO

@pf.register_dataframe_method
def to_fasta(d
randyzwitch
randyzwitch commented Mar 28, 2019

In trying to write tests for #189, I'm finding very difficult to add columns to existing tests, as in some cases like the all_types table, the table is defined in a separate file than the tests and multiple tests try to write to the same table.

Additionally, our test suite doesn't prove that the data that are uploaded are the same as the data downloaded for all types.

We should consider m

Improve this page

Add a description, image, and links to the pydata topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pydata topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.