Return to Question

added 30 characters in body

Source Link

edited Jun 18, 2019 at 11:58

23.2k
2
43
70

I'm new to regex and I'm trying to extract all the unique occurrences of each competition. So it should return FA Cup but only once no matter how many FA Cup games are in the file

<Date="2014-02-15" Competition="FA Cup" Home="West Bromwich Albion">

Based on the accepted solution solution from:

grep all strings that start with a certain char, and finish with another charthis question here

, I tried this:

grep -o 'Competition=".*" 'Soccer_Data.xml' | sort --unique

grep -o 'Competition=".*\" 'Soccer_Data.xml' | sort --unique

But it is returning everything on the line after Competition="Competition=", but I only everything up to the first occurrence of a double quotation mark i.e. "FA Cup""FA Cup". It is also returning the same competition multiple times!

To prevent the multiple returns I tried using .*?.*? as suggested in the solution below but that gave me the opposite problem as it did not return anything!

https://stackoverflow.com/questions/22444/my-regex-is-matching-too-much-how-do-i-make-it-stop

Can someone please tell me what the correct regular expression to use is.

I'm new to regex and I'm trying to extract all the unique occurrences of each competition. So it should return FA Cup but only once no matter how many FA Cup games are in the file

<Date="2014-02-15" Competition="FA Cup" Home="West Bromwich Albion">

Based on the accepted solution solution from:

grep all strings that start with a certain char, and finish with another char

I tried this:

grep -o 'Competition=".*" 'Soccer_Data.xml' | sort --unique

But it is returning everything on the line after Competition=", but I only everything up to the first occurrence of a double quotation mark i.e. "FA Cup". It is also returning the same competition multiple times!

To prevent the multiple returns I tried using .*? as suggested in the solution below but that gave me the opposite problem as it did not return anything!

https://stackoverflow.com/questions/22444/my-regex-is-matching-too-much-how-do-i-make-it-stop

Can someone please tell me what the correct regular expression to use is.

I'm new to regex and I'm trying to extract all the unique occurrences of each competition. So it should return FA Cup but only once no matter how many FA Cup games are in the file

<Date="2014-02-15" Competition="FA Cup" Home="West Bromwich Albion">

Based on the accepted solution solution from this question here, I tried this:

grep -o 'Competition=".*\" 'Soccer_Data.xml' | sort --unique

But it is returning everything on the line after Competition=", but I only everything up to the first occurrence of a double quotation mark i.e. "FA Cup". It is also returning the same competition multiple times!

To prevent the multiple returns I tried using .*? as suggested in the solution below but that gave me the opposite problem as it did not return anything!

https://stackoverflow.com/questions/22444/my-regex-is-matching-too-much-how-do-i-make-it-stop

Can someone please tell me what the correct regular expression to use is.

Source Link

asked Jun 18, 2019 at 11:17

Bazman

grep all strings that start with a certain sub string, and finish with the first quotation mark

I'm new to regex and I'm trying to extract all the unique occurrences of each competition. So it should return FA Cup but only once no matter how many FA Cup games are in the file

<Date="2014-02-15" Competition="FA Cup" Home="West Bromwich Albion">

Based on the accepted solution solution from:

grep all strings that start with a certain char, and finish with another char

I tried this:

grep -o 'Competition=".*" 'Soccer_Data.xml' | sort --unique

To prevent the multiple returns I tried using .*? as suggested in the solution below but that gave me the opposite problem as it did not return anything!

https://stackoverflow.com/questions/22444/my-regex-is-matching-too-much-how-do-i-make-it-stop

Can someone please tell me what the correct regular expression to use is.