The Wayback Machine - https://web.archive.org/web/20201001235032/https://github.com/rapidsai/cudf/issues/5999
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Expand cuIO benchmarks coverage #5999

Open
vuule opened this issue Aug 17, 2020 · 1 comment
Open

[FEA] Expand cuIO benchmarks coverage #5999

vuule opened this issue Aug 17, 2020 · 1 comment

Comments

@vuule
Copy link
Collaborator

@vuule vuule commented Aug 17, 2020

Depends on #6000

Current cuIO benchmarks only use random data with low repetition. Encode/decode of some formats (ORC, Parquet...) varies significantly depending on the data profile.
Also, newly supported data types (like lists) are not covered.
As of now, the missing cases are:

  • Reading RLE encoded ORC files.
  • Reading RLE encoded Parquet files.
  • Writing RLE-friendly data to ORC files.
  • Writing RLE-friendly data to Parquet files.
  • Reading/writing ORC and Parquet files with low cardinality.
  • Reading Parquet files with list columns of varying nesting levels.
@harrism harrism added the cuIO label Aug 17, 2020
@harrism harrism added this to Issue-Needs prioritizing in v0.16 Release via automation Aug 17, 2020
@vuule vuule self-assigned this Sep 1, 2020
@vuule
Copy link
Collaborator Author

@vuule vuule commented Sep 1, 2020

The data generator API has changed so much that I have to update all cuIO benchmarks. So I might as well add these cases.

@harrism harrism added this to Issue-Needs prioritizing in v0.17 Release via automation Oct 1, 2020
@harrism harrism removed this from Issue-Needs prioritizing in v0.16 Release Oct 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.