1

The use case is rather simple. I have a text file, say the following named eg.txt:

'simple_example': 345, 'to_demonstrate': 232,
'regex': 'is not easy to use'

I am trying to capture the keys:

grep -oP (?<=')[a-zA-Z_0-9]+(?=':) eg.txt

It gives me error:

-bash: syntax error near unexpected token `('

Escaping the single quote does not help either:

grep -oP (?<=\')[a-zA-Z_0-9]+(?=\':) eg.txt

Nor does using extended grep help:

grep -oE (?<=')[a-zA-Z_0-9]+(?=':) eg.txt

What is happening here? I am using linux bash with Windows 10 WSL.

1 Answer 1

5

The issue that the error message addresses is not the single quotes but the parentheses. Unquoted parentheses are special to the shell, and their meaning depends on their placement on the command line. The unquoted single quotes and the input redirection operator < will also be an issue, so it's better that you quote the whole of the regular expression to prevent the shell from interpreting it as shell syntax:

grep -P -o "(?<=')[a-zA-Z_0-9]+(?=':)" eg.txt

Since your expression contains single quotes and a single-quoted string can't contain single quotes, I'm using double quotes to quote the entire expression.


If your input is a well-formed JSON document (using double quoted keys and values), then it would be easier to get the top-level keys from it using a JSON parser, such as jq:

$ cat file
{
  "simple_example": 345,
  "to_demonstrate": 232,
  "regex": "is not easy to use"
}
$ jq -r 'keys[]' file
regex
simple_example
to_demonstrate

This extracts the top-level keys into an array (with keys) and then expands that array into a set (with []), which is then outputted decoded (i.e. as "raw" strings rather as encoded JSON strings, due to -r).

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.