[FEA] Expand cuIO benchmarks coverage #5999

vuule · 2020-08-17T05:01:21Z

Depends on #6000

Current cuIO benchmarks only use random data with low repetition. Encode/decode of some formats (ORC, Parquet...) varies significantly depending on the data profile.
Also, newly supported data types (like lists) are not covered.
As of now, the missing cases are:

Reading RLE encoded ORC files.
Reading RLE encoded Parquet files.
Writing RLE-friendly data to ORC files.
Writing RLE-friendly data to Parquet files.
Reading/writing ORC and Parquet files with low cardinality.
Reading Parquet files with list columns of varying nesting levels.

vuule · 2020-09-01T23:39:04Z

The data generator API has changed so much that I have to update all cuIO benchmarks. So I might as well add these cases.

vuule added feature request tech debt labels Aug 17, 2020

harrism added the cuIO label Aug 17, 2020

harrism added this to Issue-Needs prioritizing in v0.16 Release via automation Aug 17, 2020

vuule added the good first issue label Aug 17, 2020

vuule self-assigned this Sep 1, 2020

vuule added the 2 - In Progress label Sep 4, 2020

vuule mentioned this issue Sep 8, 2020

[REVIEW] Data profile support in random data generator; Expand cuIO benchmarks #6174

Merged

harrism added this to Issue-Needs prioritizing in v0.17 Release via automation Oct 1, 2020

harrism removed this from Issue-Needs prioritizing in v0.16 Release Oct 1, 2020

Sep	OCT	Nov
	01
2019	2020	2021

rapidsai / cudf

[FEA] Expand cuIO benchmarks coverage #5999

[FEA] Expand cuIO benchmarks coverage #5999

vuule commented Aug 17, 2020 •

edited

vuule commented Sep 1, 2020

rapidsai / cudf

Join GitHub today

[FEA] Expand cuIO benchmarks coverage #5999

[FEA] Expand cuIO benchmarks coverage #5999

Comments

vuule commented Aug 17, 2020 • edited

vuule commented Sep 1, 2020

vuule commented Aug 17, 2020 •

edited