dask

What happened:

When reading an empty parquet file with chunksize argument, the error "IndexError: list index out of range" is raised. While it may seem that using chunksize is irrelevant, the use case here is reading files from an external source where it is not known a priori whether or not the file is empty (or really large).

What you expected to happen:

An empty dataframe

pydata/xarray#5865 (reply in thread)

I wonder if it's possible to implement a built-in function like:
da.str.format("%.2f") or xr.string_format(da, "%.2f)

To wrap:

import xarray as xr

da = xr.DataArray([5., 6., 7.])
das = xr.DataArray("%.2f")
das.str % da

<xarray.DataArray (dim_0: 3)>
array(['5.00', '6.00', '7.00'], dtype='<U4')
Dim

Describe the bug

Failed to execute Series.drop_duplicates.

In [75]: a = md.DataFrame(np.random.rand(10, 2), columns=['a', 'b'], chunk_size=2)                  

In [76]: a['a'].drop_duplicates().execute()

The stumpy.snippets feature is now completed in #283 which follows this work:

We have a rough notebook t

Our coverage badge is a bit misleading showing coverage below 90%. This is due to us not collecting coverage in a few places. Also, we simply have a few modules which are only there for debugging and/or historical reasons

The most relevant parts (scheduler, worker, etc.) do have quite good coverage. I believe the <90% batch doesn't reflect well on the project and the wrong configuration creates

Describe the bug
According to the multiscene documentation, the property all_same_area does:

Determine if all contained Scenes have the same ‘area’.

However, I have created a multiscene where all scenes have the same area (they just differ between datasets), yet the property returns Fa

In sklearn cross validation function, we can pass group parameter. Looking for this option here,

Is your feature request related to a problem? Please describe.
Look at here

If taking just one row with our sorting, we may use GROUP BY and FIRST to solve this problem, it can be a lot faster. Let's add this special handling.

Code Sample, a minimal, complete, and verifiable piece of code

from pyresample.boundary import Boundary
b = Boundary(my_lons, my_lats)
print(b.contour_poly.area())

Problem description

The above code doesn't fail if the provided lons/lats are 2D (not sure on 3D+), but the class and all functions/utilities underneath it assume 1D arrays. The end results are incor

@romainr

The ML implementation is still a bit experimental - we can improve on this:

SHOW MODELS and DESCRIBE MODEL
Hyperparameter optimizations, AutoML-like behaviour
@romainr brought up the idea of exporting models (#191, still missing: onnx - see discussion in the PR by @rajagurunath)
and some more showcases and examples

from dask_jobqueue import SLURMCluster 
cluster = SLURMCluster(cores=1, memory='1GB') 
print(cluster.job_script())

#!/usr/bin/env bash

#SBATCH -J dask-worker
#SBATCH -n 1
#SBATCH --cpus-per-task=1
#SBATCH --mem=954M
#SBATCH -t 00:30:00

/home/lesteve/miniconda3/bin/python -m distributed.cli.dask_worker tcp://192.168.0.11:44065 --nthreads 1 --memory-limit 1000.00MB -

Problem description

Reading a dataset with eager's read functionality raises a ValueError when providing columns.

Example code (ideally copy-pastable)

import pandas as pd

from tempfile import TemporaryDirectory
from functools import partial
from storefact import get_store_from_url

from kartothek.io.eager import store_dataframes_as_dataset, read_dataset_as_data

Helping

In determining the correct reader for the file provided we currently have two options (as of #224).

Providing reader param to AICSImage (i.e. img = AICSImage("s3://some-file.ext", reader=readers.lif_reader.LifReader)
Not providing a reader, and AICSImage looping over all SUPPORTED_READERS.

Option 1 is the fastest + safest method for loading a file into AICSImage (without using

Currently all of the metrics computed are independent of a target variable or column, but if lens.summarise took the name of a column as the target variable, the output of some metrics could be more interpretable even if the target variable is not used in any kind of predictive modelling.

A good example of this could be PCA (see #14), which could plot the different categories of the target va

Without thinking I put resampling="bilinear" and got an error when I called .compute()

Traceback (most recent call last):
  File "carajas.py", line 92, in <module>
    band_medianNP = band_median.compute()
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/xarray/core/dataarray.py", line 899, in compute
    return new.load(**kwargs)
  File "/home/ubuntu/anaco

@krfricke

As per @krfricke, we should create a single RayCommunication actor class that holds both an asyncio Queue and asyncio Event. This way we only need one remote actor for all communication between training workers and the driver, and we don't have to be dependent on ray.util.Queue

Oct	NOV	Dec
	27
2020	2021	2022

dask

Here are 276 public repositories matching this topic...

dask / dask

pydata / xarray

mars-project / mars

TDAmeritrade / stumpy

jmcarpenter2 / swifter

dask / distributed

hi-primus / optimus

itamarst / eliot

pytroll / satpy

DataCanvasIO / HyperGBM

fugue-project / fugue

ranaroussi / pystore

mouradmourafiq / datatile

timkpaine / paperboy

pytroll / pyresample

Code Sample, a minimal, complete, and verifiable piece of code

Problem description

JiaweiZhuang / xESMF

dask-contrib / dask-sql

dask / dask-jobqueue

JDASoftwareGroup / kartothek

Problem description

Example code (ideally copy-pastable)

pangeo-data / climpred

hi-primus / bumblebee

dask / dask-ec2

AllenCellModeling / aicsimageio

LDO-CERT / orochi

facultyai / lens

gjoseph92 / stackstac

dymaxionlabs / dask-rasterio

polyaxon / mloperator

ray-project / xgboost_ray

chmp / framequery

Improve this page

Add this topic to your repo