Issue with cubestore TableImport during preAggregation with unload: support more CSV quote options

Hi, I'm implementing unload support for presto/trino's driver and I'm seeing an odd issue with cubestore's import behavior. The way I've implemented this is to follow the same strategy as athena's driver, but using CREATE TABLE instead of UNLOAD (since trino doesn't implement this).

What I'm seeing is that sometimes, cubestore is dropping an arbitrary number of records from each file that it imports. I've double checked its logs and all files are reported as complete

<pid:1> Running job completed (14.535796782s): IdRow { ...

To give you guys more context, the unload strategy is as follows

Execute a CREATE TABLE ... AS ... query on trino that writes that data using TEXTFILE format with gzip compression
Get the columns from the created table
List files on table dir and generate the signed s3 urls

This is basically the same strategy as athena's driver but using CREATE TABLE instead. I've compared cube's imported data and the data on the table, but couldn't find anything obvious yet. When the issue happens, about 50% of the records are missing on cube but breaking it down per imported file it varies a lot(from about 20% to almost 80%).

I realize that this issue is basically impossible to reproduce, but I"m looking for advice on which points cubestore may report that it successfully imported a file but it's only partially successful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue with cubestore TableImport during preAggregation with unload: support more CSV quote options #7071

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue with cubestore TableImport during preAggregation with unload: support more CSV quote options #7071

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions