Skip to main content
added 30 characters in body
Source Link
pLumo
  • 23.2k
  • 2
  • 43
  • 70

I'm new to regex and I'm trying to extract all the unique occurrences of each competition. So it should return FA Cup but only once no matter how many FA Cup games are in the file

<Date="2014-02-15" Competition="FA Cup" Home="West Bromwich Albion">

Based on the accepted solution solution from:

  grep all strings that start with a certain char, and finish with another charthis question here

 , I tried this:

grep -o 'Competition=".*" 'Soccer_Data.xml' | sort --unique

grep -o 'Competition=".*\" 'Soccer_Data.xml' | sort --unique

But it is returning everything on the line after Competition="Competition=", but I only everything up to the first occurrence of a double quotation mark i.e. "FA Cup""FA Cup". It is also returning the same competition multiple times!

To prevent the multiple returns I tried using .*?.*? as suggested in the solution below but that gave me the opposite problem as it did not return anything!

https://stackoverflow.com/questions/22444/my-regex-is-matching-too-much-how-do-i-make-it-stop

Can someone please tell me what the correct regular expression to use is.

I'm new to regex and I'm trying to extract all the unique occurrences of each competition. So it should return FA Cup but only once no matter how many FA Cup games are in the file

<Date="2014-02-15" Competition="FA Cup" Home="West Bromwich Albion">

Based on the accepted solution solution from:

grep all strings that start with a certain char, and finish with another char

  I tried this:

grep -o 'Competition=".*" 'Soccer_Data.xml' | sort --unique

But it is returning everything on the line after Competition=", but I only everything up to the first occurrence of a double quotation mark i.e. "FA Cup". It is also returning the same competition multiple times!

To prevent the multiple returns I tried using .*? as suggested in the solution below but that gave me the opposite problem as it did not return anything!

https://stackoverflow.com/questions/22444/my-regex-is-matching-too-much-how-do-i-make-it-stop

Can someone please tell me what the correct regular expression to use is.

I'm new to regex and I'm trying to extract all the unique occurrences of each competition. So it should return FA Cup but only once no matter how many FA Cup games are in the file

<Date="2014-02-15" Competition="FA Cup" Home="West Bromwich Albion">

Based on the accepted solution solution from  this question here, I tried this:

grep -o 'Competition=".*\" 'Soccer_Data.xml' | sort --unique

But it is returning everything on the line after Competition=", but I only everything up to the first occurrence of a double quotation mark i.e. "FA Cup". It is also returning the same competition multiple times!

To prevent the multiple returns I tried using .*? as suggested in the solution below but that gave me the opposite problem as it did not return anything!

https://stackoverflow.com/questions/22444/my-regex-is-matching-too-much-how-do-i-make-it-stop

Can someone please tell me what the correct regular expression to use is.

Source Link
Bazman
  • 109
  • 2

grep all strings that start with a certain sub string, and finish with the first quotation mark

I'm new to regex and I'm trying to extract all the unique occurrences of each competition. So it should return FA Cup but only once no matter how many FA Cup games are in the file

<Date="2014-02-15" Competition="FA Cup" Home="West Bromwich Albion">

Based on the accepted solution solution from:

grep all strings that start with a certain char, and finish with another char

I tried this:

grep -o 'Competition=".*" 'Soccer_Data.xml' | sort --unique

But it is returning everything on the line after Competition=", but I only everything up to the first occurrence of a double quotation mark i.e. "FA Cup". It is also returning the same competition multiple times!

To prevent the multiple returns I tried using .*? as suggested in the solution below but that gave me the opposite problem as it did not return anything!

https://stackoverflow.com/questions/22444/my-regex-is-matching-too-much-how-do-i-make-it-stop

Can someone please tell me what the correct regular expression to use is.