Skip to content

pageserver: thrashing on compaction and eviction loop #12123

@trungda

Description

@trungda

Steps to reproduce

In our ingestion workload:

  1. There is constant amount of eviction (around 1-2 files per second) due to disk usage;
  2. After running the workload for couple of hours, we ran into a thrashing loop in which:
  • Compaction needs to download a file (or files) to create an image layers;
  • Eviction aggressively evicts the file right after it is downloaded (10-20 seconds). Since this file is covered, it's in the top of the eviction candidate;
  • As a result, creating images got stuck in this loop (downloaded -> evicted -> downloaded -> evicted) and there is no way to recover from this situation except stopping the workload to reduce disk pressure.

I have a couple of questions:

  1. Is our observation accurate?
  2. Have you run into this situation?
  3. Is there a way to get out of this loop more gracefully?

Expected result

PS should be able to get out of this situation with backpressure somehow.

Actual result

PS stuck in download -> evict -> download -> evict loop, L0 layers pile up, there is no way to get out of this situation except killing traffic.

Environment

Logs, links

Metadata

Metadata

Assignees

No one assigned

    Labels

    externalA PR or Issue is created by an external usert/bugIssue Type: Bug

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions