Skip to content

Stop RecordingApmServer message processing before returning from tests #130007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 26, 2025

Conversation

ldematte
Copy link
Contributor

The RecordingApmServer needs to be started before the ElasticsearchCluster; however, it needs to be stopped (or, at least, it needs stopping processing messages) before the ElasticsearchCluster.
JUnit rules do process before/after in reverse order at startup/shutdown; using a RuleChain slightly improves the situation (as opposed to today's ClassRule + Rule), but does not solve the issue.
Here I am introducing an additional "stop" method; the method needs to be called explicitly at the end of the test, but this way we can make RecordingApmServer stop processing requests before the ElasticsearchCluster is gone and communication is interrupted exceptionally.

Fixes #129651

@ldematte ldematte requested a review from a team June 25, 2025 12:44
@ldematte ldematte added >test Issues or PRs that are addressing/adding tests :Core/Infra/Metrics Metrics and metering infrastructure v9.1.0 labels Jun 25, 2025
@elasticsearchmachine elasticsearchmachine added Team:Core/Infra Meta label for core/infra team v9.2.0 labels Jun 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Copy link
Contributor

@mosche mosche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, though wondering if it wouldn't be easier to reason if instead handling IOExceptions

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused how this addresses the linked failure. I thought the problem was the apm agent continues to run in the background, so requests to the recording server begin failing. Can you explain how this change fixes the issue?

@ldematte
Copy link
Contributor Author

Turned out that the issue is that the APM agent is in the middle of writing, ES gets shut down, the connection breaks, the mock APM server throws.
Shutting down the APM server before ES brings things down cleanly.
But thinking again, this is still brittle and an implementation detail; probably catching all exceptions like we were discussing is still better (more robust and simpler)

@ldematte
Copy link
Contributor Author

@mosche I've decided to go with your suggestion; I tried to refactor the RecordingApmServer to better coordinate the APM mock server and the ES cluster, but it's error prone and I was not satisfied. So better to go with the minimalist and more robust solution, at least until we stay with the APM agent.

Copy link
Contributor

@mosche mosche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

@ldematte ldematte enabled auto-merge (squash) June 26, 2025 13:22
@ldematte ldematte merged commit d6e2b57 into elastic:main Jun 26, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Metrics Metrics and metering infrastructure Team:Core/Infra Meta label for core/infra team >test Issues or PRs that are addressing/adding tests v9.2.0
4 participants