The Wayback Machine - https://web.archive.org/web/20201210051146/https://github.com/thanos-io/thanos/issues/3545
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rule: Rules with YAML `|2` indentation cannot be parsed #3545

Open
matthiasr opened this issue Dec 8, 2020 · 4 comments
Open

rule: Rules with YAML `|2` indentation cannot be parsed #3545

matthiasr opened this issue Dec 8, 2020 · 4 comments

Comments

@matthiasr
Copy link

@matthiasr matthiasr commented Dec 8, 2020

Thanos, Prometheus and Golang version used:

thanos, version  (branch: , revision: )
  build user:
  build date:
  go version:       go1.15.5
  platform:         darwin/amd64

built at revision d616214. We saw the same effect in production with v0.15.0, haven't tried again in 0.17.2.

Prometheus version N/A.

Object Storage Provider: N/A

What happened:

A rule file like

groups:
- name: indentation
  rules:
  - record: indented:two
    expr: |2
        vector(1)
      OR
        vector(0)

passes thanos tools rules-check but cannot be loaded correctly:

level=error ts=2020-12-08T15:25:39.955496Z caller=manager.go:946 component=rules msg="loading groups failed" err="data/.tmp-rules/ABORT/yaml: yaml: line 3: did not find expected key"
level=error ts=2020-12-08T15:25:39.955528Z caller=manager.go:946 component=rules msg="loading groups failed" err="data/.tmp-rules/ABORT/yaml: yaml: line 3: did not find expected key"

What you expected to happen:

This YAML is valid, it should be loaded correctly. This is a moderately popular PromQL formatting style and we use it a lot.

How to reproduce it (as minimally and precisely as possible):

With the rules file above,

thanos rule --rule-file tmp/rules.yaml --query https://prometheus.demo.do.prometheus.io/graph

Full logs to relevant components:

full example
[I] ~/s/g/t/thanos (master|✔) [1] $ git rev
d6162144
[I] ~/s/g/t/thanos (master|✔) $ go build ./cmd/thanos/
[I] ~/s/g/t/thanos (master|✔) $ cat tmp/rules.yaml
groups:
- name: indentation
  rules:
  - record: indented:two
    expr: |2
        vector(1)
      OR
        vector(0)
[I] ~/s/g/t/thanos (master|✔) $ ./thanos tools rules-check --rules tmp/rules.yaml
level=info ts=2020-12-08T15:46:54.04793Z caller=tools.go:41 msg=checking filename=tmp/rules.yaml
level=info ts=2020-12-08T15:46:54.048234Z caller=tools.go:61 result=SUCCESS rulesfound=1
level=info ts=2020-12-08T15:46:54.048456Z caller=main.go:154 msg=exiting
[I] ~/s/g/t/thanos (master|✔) $ ./thanos rule --rule-file tmp/rules.yaml --query https://prometheus.demo.do.prometheus.io/graph
level=info ts=2020-12-08T15:46:59.028325Z caller=head.go:645 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2020-12-08T15:46:59.028384Z caller=head.go:659 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=5.611µs
level=info ts=2020-12-08T15:46:59.028393Z caller=head.go:665 component=tsdb msg="Replaying WAL, this may take a while"
level=info ts=2020-12-08T15:46:59.028703Z caller=head.go:717 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=6
level=info ts=2020-12-08T15:46:59.02889Z caller=head.go:717 component=tsdb msg="WAL segment loaded" segment=1 maxSegment=6
level=info ts=2020-12-08T15:46:59.029042Z caller=head.go:717 component=tsdb msg="WAL segment loaded" segment=2 maxSegment=6
level=info ts=2020-12-08T15:46:59.029186Z caller=head.go:717 component=tsdb msg="WAL segment loaded" segment=3 maxSegment=6
level=info ts=2020-12-08T15:46:59.029437Z caller=head.go:717 component=tsdb msg="WAL segment loaded" segment=4 maxSegment=6
level=info ts=2020-12-08T15:46:59.029605Z caller=head.go:717 component=tsdb msg="WAL segment loaded" segment=5 maxSegment=6
level=info ts=2020-12-08T15:46:59.029751Z caller=head.go:717 component=tsdb msg="WAL segment loaded" segment=6 maxSegment=6
level=info ts=2020-12-08T15:46:59.029773Z caller=head.go:722 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=61.953µs wal_replay_duration=1.311464ms total_replay_duration=1.386628ms
level=info ts=2020-12-08T15:46:59.032123Z caller=rule.go:684 msg="a leftover lockfile found and removed"
level=warn ts=2020-12-08T15:46:59.032161Z caller=rule.go:402 msg="no alertmanager configured"
level=info ts=2020-12-08T15:46:59.032749Z caller=options.go:23 component=rules protocol=gRPC msg="disabled TLS, key and cert must be set to enable"
level=info ts=2020-12-08T15:46:59.03403Z caller=rule.go:666 component=rules msg="no supported bucket was configured, uploads will be disabled"
level=info ts=2020-12-08T15:46:59.034073Z caller=rule.go:669 component=rules msg="starting rule node"
level=info ts=2020-12-08T15:46:59.034396Z caller=rule.go:836 component=rules msg="reload rule files" numFiles=1
level=info ts=2020-12-08T15:46:59.034535Z caller=intrumentation.go:48 component=rules msg="changing probe status" status=ready
level=info ts=2020-12-08T15:46:59.034713Z caller=grpc.go:116 component=rules service=gRPC/server component=rule msg="listening for serving gRPC" address=0.0.0.0:10901
level=info ts=2020-12-08T15:46:59.034766Z caller=intrumentation.go:60 component=rules msg="changing probe status" status=healthy
level=info ts=2020-12-08T15:46:59.034778Z caller=http.go:58 component=rules service=http/server component=rule msg="listening for requests and metrics" address=0.0.0.0:10902
level=error ts=2020-12-08T15:46:59.039119Z caller=manager.go:946 component=rules msg="loading groups failed" err="data/.tmp-rules/ABORT/yaml: yaml: line 3: did not find expected key"
level=error ts=2020-12-08T15:46:59.039199Z caller=manager.go:946 component=rules msg="loading groups failed" err="data/.tmp-rules/ABORT/yaml: yaml: line 3: did not find expected key"
level=error ts=2020-12-08T15:46:59.039232Z caller=rule.go:516 component=rules msg="initialize rules failed" err="reloading rules failed: strategy ABORT, update rules: error loading rules, previous rule set restored"
level=info ts=2020-12-08T15:46:59.039274Z caller=manager.go:924 component=rules msg="Stopping rule manager..."
level=info ts=2020-12-08T15:46:59.039633Z caller=manager.go:934 component=rules msg="Rule manager stopped"
level=info ts=2020-12-08T15:46:59.039785Z caller=manager.go:924 component=rules msg="Stopping rule manager..."
level=info ts=2020-12-08T15:46:59.039801Z caller=manager.go:934 component=rules msg="Rule manager stopped"
level=warn ts=2020-12-08T15:46:59.039818Z caller=intrumentation.go:54 component=rules msg="changing probe status" status=not-ready reason="reloading rules failed: strategy ABORT, update rules: error loading rules, previous rule set restored"
level=info ts=2020-12-08T15:46:59.040514Z caller=grpc.go:123 component=rules service=gRPC/server component=rule msg="internal server is shutting down" err="reloading rules failed: strategy ABORT, update rules: error loading rules, previous rule set restored"
level=info ts=2020-12-08T15:46:59.040588Z caller=grpc.go:136 component=rules service=gRPC/server component=rule msg="gracefully stopping internal server"
level=info ts=2020-12-08T15:46:59.0407Z caller=grpc.go:149 component=rules service=gRPC/server component=rule msg="internal server is shutdown gracefully" err="reloading rules failed: strategy ABORT, update rules: error loading rules, previous rule set restored"
level=warn ts=2020-12-08T15:46:59.040741Z caller=intrumentation.go:54 component=rules msg="changing probe status" status=not-ready reason="reloading rules failed: strategy ABORT, update rules: error loading rules, previous rule set restored"
level=info ts=2020-12-08T15:46:59.04076Z caller=http.go:65 component=rules service=http/server component=rule msg="internal server is shutting down" err="reloading rules failed: strategy ABORT, update rules: error loading rules, previous rule set restored"
level=info ts=2020-12-08T15:46:59.549083Z caller=http.go:84 component=rules service=http/server component=rule msg="internal server is shutdown gracefully" err="reloading rules failed: strategy ABORT, update rules: error loading rules, previous rule set restored"
level=info ts=2020-12-08T15:46:59.549264Z caller=intrumentation.go:66 component=rules msg="changing probe status" status=not-healthy reason="reloading rules failed: strategy ABORT, update rules: error loading rules, previous rule set restored"
level=error ts=2020-12-08T15:46:59.549374Z caller=main.go:151 err="reloading rules failed: strategy ABORT, update rules: error loading rules, previous rule set restored\nrule command failed\nmain.main\n\t/Users/mr/src/github.com/thanos-io/thanos/cmd/thanos/main.go:151\nruntime.main\n\t/usr/local/Cellar/go/1.15.5/libexec/src/runtime/proc.go:204\nruntime.goexit\n\t/usr/local/Cellar/go/1.15.5/libexec/src/runtime/asm_amd64.s:1374"
[I] ~/s/g/t/thanos (master|✔) [1] $

Anything else we need to know:

I don't know where the error is being produced – pkg/rules/manager.go does not have 946 lines!?

@GiedriusS
Copy link
Member

@GiedriusS GiedriusS commented Dec 8, 2020

The error seems to happen in the rule.Manager of Prometheus. We reuse that code. I was able to reproduce the same issue with vanilla Prometheus:

level=error ts=2020-12-08T20:07:03.552Z caller=manager.go:946 component="rule manager" msg="loading groups failed" err="data/prom0/rules.yml: yaml: line 3: did not find expected key"

Care filling an issue in upstream Prometheus? Then, we could pull the fixes to Thanos once it is fixed there.

@matthiasr
Copy link
Author

@matthiasr matthiasr commented Dec 9, 2020

Prometheus 2.22.1 (and all previous versions going back to 1.x) can load the same rules just fine. Something happens along the way, note that the rule file that fails to load is given as data/.tmp-rules/ABORT/yaml not the original file name.

@matthiasr
Copy link
Author

@matthiasr matthiasr commented Dec 9, 2020

How can I get at this intermediate rule file?

@bwplotka
Copy link
Member

@bwplotka bwplotka commented Dec 9, 2020

We parse, then remove field and generate, then Prometheus parses and fails, so it's ruler bug.

I don't think this is required, I think this issue has all the things to investigate fix it. Thanks for reporting 🤗

Help wanted.

The easiest is to just write unit test and adjust the marshaller/remove field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.