Skip to content

Fix hudi connector gets stuck #19506#20027

Merged
electrum merged 2 commits into
trinodb:masterfrom
willzgw:fix_issue_19506
Mar 7, 2025
Merged

Fix hudi connector gets stuck #19506#20027
electrum merged 2 commits into
trinodb:masterfrom
willzgw:fix_issue_19506

Conversation

@willzgw
Copy link
Copy Markdown
Contributor

@willzgw willzgw commented Dec 5, 2023

Description

  • Fix HUDI connector gets stuck when an exception occurs
    • Solve HUDI connector stuck by using ExceptionCallback to ensure the asyncQueue is closed when an exception occurs.
    • Optimize the handling of multiple Futures with Futures.whenAllComplete instead of for_loop.
  • Add a new HUDI session property: ignore_absent_partitions
    • Boolean property.
    • When set to false, the Trino HUDI connector will report an error if the partition does not exist; when set to true, non-existing partitions will be ignored.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix: Trino Hudi connector gets stuck while attempting to read empty partitions of a partitioned Hudi table. ({issue}`19506 `)
* New HUDI session property: ignore_absent_partitions
@cla-bot cla-bot Bot added the cla-signed label Dec 5, 2023
@github-actions github-actions Bot added the hudi Hudi connector label Dec 5, 2023
Copy link
Copy Markdown
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test?

Copy link
Copy Markdown
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a test for this?

Comment thread plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiSplitSource.java Outdated
@willzgw
Copy link
Copy Markdown
Contributor Author

willzgw commented Dec 13, 2023

Hi @electrum
I'm not quite sure what to do when we get an empty partition, ignore it or throw an exception?

@uroell
Copy link
Copy Markdown

uroell commented Jan 3, 2024

Hi @electrum I'm not quite sure what to do when we get an empty partition, ignore it or throw an exception?

Hi @willzgw, thank you very much for your contribution! This helps a lot. Just a suggestion for your question: In the hive connector there is this property called "hive.ignore-absent-partitions" with default "false" to switch the behavior. Maybe you could implement the same for the hudi connector?

@codesorcery
Copy link
Copy Markdown
Contributor

codesorcery commented Jan 23, 2024

#20151 fixes the underlying problem of Trino failing on empty Hudi partitions. So when this PR gets merged, there should be no reason to handle it here.
So it should be enough to reduce the scope of this PR to something like

--- a/plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/split/HudiBackgroundSplitLoader.java
+++ b/plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/split/HudiBackgroundSplitLoader.java
@@ -14,6 +14,7 @@
 package io.trino.plugin.hudi.split;
 
 import com.google.common.util.concurrent.Futures;
+import com.google.common.util.concurrent.ListenableFuture;
 import io.trino.plugin.hive.util.AsyncQueue;
 import io.trino.plugin.hudi.HudiTableHandle;
 import io.trino.plugin.hudi.partition.HudiPartitionInfoLoader;
@@ -28,8 +29,8 @@ import java.util.List;
 import java.util.concurrent.ConcurrentLinkedDeque;
 import java.util.concurrent.ExecutionException;
 import java.util.concurrent.Executor;
-import java.util.concurrent.Future;
 
+import static com.google.common.util.concurrent.MoreExecutors.directExecutor;
 import static io.trino.plugin.hudi.HudiErrorCode.HUDI_CANNOT_OPEN_SPLIT;
 import static io.trino.plugin.hudi.HudiSessionProperties.getSplitGeneratorParallelism;
 import static java.util.Objects.requireNonNull;
@@ -66,7 +67,7 @@ public class HudiBackgroundSplitLoader
     {
         Deque<String> partitionQueue = new ConcurrentLinkedDeque<>(partitions);
         List<HudiPartitionInfoLoader> splitGeneratorList = new ArrayList<>();
-        List<Future> splitGeneratorFutures = new ArrayList<>();
+        List<ListenableFuture<Void>> splitGeneratorFutures = new ArrayList<>();
 
         // Start a number of partition split generators to generate the splits in parallel
         for (int i = 0; i < splitGeneratorNumThreads; i++) {
@@ -79,16 +80,18 @@ public class HudiBackgroundSplitLoader
             // Let the split generator stop once the partition queue is empty
             generator.stopRunning();
         }
-
-        // Wait for all split generators to finish
-        for (Future future : splitGeneratorFutures) {
-            try {
-                future.get();
-            }
-            catch (InterruptedException | ExecutionException e) {
-                throw new TrinoException(HUDI_CANNOT_OPEN_SPLIT, "Error generating Hudi split", e);
-            }
+        try {
+            // Wait for all split generators to finish
+            Futures.whenAllComplete(splitGeneratorFutures) // also succeeds when some tasks fail
+                    .run(asyncQueue::finish, directExecutor())
+                    .get(); // will throw an ExecutionException when one of the tasks failed
+        }
+        catch (ExecutionException e) {
+            throw new TrinoException(HUDI_CANNOT_OPEN_SPLIT, "Error generating Hudi split", e);
+        }
+        catch (InterruptedException e) {
+            Thread.currentThread().interrupt();
+            throw new TrinoException(HUDI_CANNOT_OPEN_SPLIT, "Error generating Hudi split", e);
         }
-        asyncQueue.finish();
     }
 }
@willzgw willzgw requested a review from codope January 24, 2024 15:22
@willzgw
Copy link
Copy Markdown
Contributor Author

willzgw commented Jan 25, 2024

Hi @ebyhr @electrum
I tried to add a test to TestHudiSmokeTest but Trino will throw exception when creating table due to partition not exists.
It seems not convenient to add test cases.
Do you have any suggestions?

@github-actions
Copy link
Copy Markdown

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions Bot added the stale label Feb 15, 2024
@wendigo wendigo requested a review from ebyhr February 16, 2024 17:11
@github-actions github-actions Bot removed the stale label Feb 16, 2024
@github-actions
Copy link
Copy Markdown

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Mar 14, 2024

I tried to add a test to TestHudiSmokeTest but Trino will throw exception when creating table due to partition not exists.

@willzgw Sorry for my late reply. We could create the table on Spark and add the contents under resources directory. You can refer to resources/hudi-testing-data/hudi_cow_pt_tbl.

@willzgw
Copy link
Copy Markdown
Contributor Author

willzgw commented Mar 14, 2024

I tried to add a test to TestHudiSmokeTest but Trino will throw exception when creating table due to partition not exists.

@willzgw Sorry for my late reply. We could create the table on Spark and add the contents under resources directory. You can refer to resources/hudi-testing-data/hudi_cow_pt_tbl.

Hi @ebyhr
Thanks for your advice.
I've actually tried this approach before. We want to test the case where the Partition does not exist, which requires the metadata to contain files that do not actually exist (in resource). However, it will report an error by prompting that the corresponding file is missing from the Resource.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 4, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions Bot added the stale label Apr 4, 2024
@mosabua
Copy link
Copy Markdown
Member

mosabua commented Apr 6, 2024

Should we merge this PR given that you approved @electrum ?

Copy link
Copy Markdown
Contributor

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. One thing is we also need to update the huddi connector page for the new config introuced by this PR. I am ok if it's done in this PR or as a followup. Also, please rebase.

@codesorcery
Copy link
Copy Markdown
Contributor

Do I see it right, that one then has to explicitly set hudi.ignore-absent-partitions=false to get the current behavior? IMHO, that should then definitely be highlighted in the release notes, since it's a breaking change.

@github-actions github-actions Bot removed the stale label Apr 8, 2024
@github-actions github-actions Bot added the docs label Apr 11, 2024
Copy link
Copy Markdown
Contributor

@chenjian2664 chenjian2664 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rearrange the commits and squash to two, one is fixing the stuck issue, another is adding the session property.

@willzgw willzgw force-pushed the fix_issue_19506 branch 2 times, most recently from 43b214a to ec9a0e2 Compare June 21, 2024 15:07
@chenjian2664 chenjian2664 requested a review from electrum August 16, 2024 03:12
@mosabua
Copy link
Copy Markdown
Member

mosabua commented Aug 26, 2024

@electrum can we merge this?

@ebyhr ebyhr force-pushed the fix_issue_19506 branch 2 times, most recently from a4bf784 to 7f7d329 Compare October 7, 2024 08:46
Copy link
Copy Markdown
Contributor

@chenjian2664 chenjian2664 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % commit message

Revisit the first commit changes and think the commit message could be improved.
The changes is actual Use Futures.whenAllComplete to handle splitGeneratorFutures (in HudiBackgroundSplitLoader)
Put the commit body something like:

This commit improved async handling using `Futures.whenAllComplete` 
to wait/handle all the splitGeneratorFutures instead of using 
for-loop to handle the future one by one, the change guarantee the 
`asyncQueue.finish()` always be called to avoid the split leak
@willzgw
Copy link
Copy Markdown
Contributor Author

willzgw commented Feb 27, 2025

@ebyhr @mosabua
Hi master, Could you please help review this PR.
I just rebased the branch and updated the description.
This modification has been validated at Shopee, and we've been running it in our production environment for over a year.

@willzgw willzgw force-pushed the fix_issue_19506 branch 3 times, most recently from 83cd249 to f634ff2 Compare March 4, 2025 09:15
willzgw added 2 commits March 5, 2025 11:57
This commit solves the problem of HUDI connector stucks by
using ExceptionCallback to ensure the asyncQueue is closed when an exception occurs,
and optimizes the handling of multiple Futures with Futures.whenAllComplete instead of for_loop.
@willzgw willzgw requested a review from yihua March 6, 2025 02:23
Copy link
Copy Markdown
Member

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
@ebyhr @mosabua @electrum could you help merge the PR if no more revision is required?

@electrum electrum merged commit d46bb32 into trinodb:master Mar 7, 2025
@github-actions github-actions Bot added this to the 473 milestone Mar 7, 2025
@willzgw willzgw deleted the fix_issue_19506 branch March 7, 2025 02:36
@mosabua
Copy link
Copy Markdown
Member

mosabua commented Mar 7, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed docs hudi Hudi connector stale stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.

10 participants