Skip to content

Rewrite statistics during OPTIMIZE in Iceberg#20885

Merged
ebyhr merged 4 commits into
trinodb:masterfrom
pajaks:pajaks/rewrite_stats_optimize
Mar 11, 2024
Merged

Rewrite statistics during OPTIMIZE in Iceberg#20885
ebyhr merged 4 commits into
trinodb:masterfrom
pajaks:pajaks/rewrite_stats_optimize

Conversation

@pajaks
Copy link
Copy Markdown
Member

@pajaks pajaks commented Feb 29, 2024

Description

Fixes #19992

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot Bot added the cla-signed label Feb 29, 2024
@pajaks pajaks changed the title Rewrite statistics during OPTIMIZE @pajaks Rewrite statistics during OPTIMIZE Feb 29, 2024
@pajaks pajaks changed the title Rewrite statistics during OPTIMIZE Rewrite statistics during OPTIMIZE in Iceberg Feb 29, 2024
@findinpath findinpath self-requested a review February 29, 2024 15:07
@pajaks pajaks requested review from ebyhr and findepi March 1, 2024 08:15

// For optimize we need to set task_min_writer_count to 1, otherwise it will create more than one file.
computeActual(withSingleWriterPerTask(getSession()), "ALTER TABLE " + tableName + " EXECUTE OPTIMIZE WHERE regionkey = 4");
computeActual(withSingleWriterPerTask(getSession()), "ALTER TABLE " + tableName + " EXECUTE OPTIMIZE WHERE regionkey = 3");
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Row deleted has regionkey = 3

@findepi findepi requested a review from alexjo2144 March 5, 2024 11:55
Comment thread plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this happen in case of a rollback_to_snapshot ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rollback_to_snapshot can cause that is some special case and race condition. I followed what is done for finishInsert where we also write statistics file.

// TODO (https://github.com/trinodb/trino/issues/15439): it would be good to publish data and stats atomically
beforeWriteSnapshotId.ifPresent(previous ->
verify(previous != newSnapshotId, "Failed to get new snapshot ID "));

Comment thread plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java Outdated
@pajaks pajaks force-pushed the pajaks/rewrite_stats_optimize branch from eddabb0 to d2338f4 Compare March 6, 2024 12:24
@github-actions github-actions Bot added the iceberg Iceberg connector label Mar 6, 2024
@pajaks pajaks force-pushed the pajaks/rewrite_stats_optimize branch from d2338f4 to 7345d65 Compare March 6, 2024 13:43
@pajaks pajaks requested review from alexjo2144 and findinpath March 8, 2024 11:32
Comment thread plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java Outdated
@pajaks pajaks force-pushed the pajaks/rewrite_stats_optimize branch from 53eccd9 to e424326 Compare March 11, 2024 09:18
@ebyhr ebyhr merged commit 947f972 into trinodb:master Mar 11, 2024
@github-actions github-actions Bot added this to the 441 milestone Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed iceberg Iceberg connector

5 participants