The Wayback Machine - https://web.archive.org/web/20200702225918/https://github.com/topics/dask
Skip to content
#

dask

Here are 152 public repositories matching this topic...

eugeneh101
eugeneh101 commented Apr 24, 2020

If you join Dask DataFrame on a categorical column, then the outputted Dask DataFrame column is still category dtype. However, the moment you .compute() the outputted Dask DataFrame, then the column is the wrong dtype, not categorical.

Tested on Dask 2.14.0 and Pandas 1.0.3
This example where the category type looks like a float, so after .compute(), the dtype is float.

import dask.d
mrocklin
mrocklin commented Mar 28, 2020

In some workloads with highly compressible data we would like to trade off some computation time for more in-memory storage automatically. Dask workers store data in a MutableMapping (the superclass of dict). So in principle all we would need to do is make a MutableMapping subclass that overrides the getitem and setitem methods to compress and decompress data on demand.

This would be an i

RoscoeTheDog
RoscoeTheDog commented Sep 4, 2019

Hello. I am trying to migrate my project from basic logging to something more advanced and someone recommended this module through reddit. I have been through the quick-start guide and other available documentation and have some very basic questions about the API.

How can I parse the logs and format them for the stdout?

Is there a way to stream what's being written to the log, just like the

pystore
yohplala
yohplala commented Jan 6, 2020

Hello,

I haven't tested append() yet, and I was wondering if duplicates are removed when an append is managed.
I had a look in collection.py script and following pandas function are used:
combined = dd.concat([current.data, new]).drop_duplicates(keep="last")

After a look into pandas documentation, I understand that duplicate lines are removed, only the last occurence is kept.

Plantain
Plantain commented Sep 30, 2019

I'm using nearest_s2d in combination with the add_matrix_NaNs' 'hack' documented in #15 , but I'm getting 'smearing' of the nearest value all the way to the border with nearest_s2d when I would expect it to behave like bilinear with the 'outside' values instead missing.
Is this the intended behaviour? Can I work around this somehow by masking the output again?

![ex](https://user-images.githubu

rabernat
rabernat commented Mar 27, 2019

Over in ECCO-GROUP/ECCOv4-py#6, @ifenty reported some difficulty in using open_mdsdataset to read data from ECCO. Some if this is likely due to our lousy error messages (see #126), but it's also likely related to overall deficiencies in our documentation.

This is what I wrote in that thread.

I think a big source of confusion is that the user-facing part of xmitgcm is designed not to read j

Improve this page

Add a description, image, and links to the dask topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dask topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.