Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upUpdate manipulate.py re.search #2338
Conversation
re.search only allows for the first hit to be returned. By adding the option to use re.findall multiple matches can be returned to the user using the same method as re.search.
|
A few minor issues that I suggested fixes for. Otherwise it looks fine. @gazpachoking Any thoughts on this? |
|
I've altered all instances in manipulate.py with the minor corrections that @cvium provided. |
|
I'm good with this, should we put an underscore in the option though? |
|
Also, it looks like the behavior of only capturing groups is different depending on whether findall is specified. Shouldn't that stay consistent? |
|
In fact, it looks like this new config,
would be equivalent to this already possible config:
Is that true? I agree the syntax on that already possible one is more complicated, just trying to figure out the best/simplest way to allow this feature. |
|
I'm not 100% sure but I just had a quick try and my regex that I use to capture links, only finds the first instance when I do this:
(?:href\s*=\s*\"((?:[\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|[^[:punct:]\s]|))\".*)+
While findall using this:
href\s*=\s*\"((?:[\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|[^[:punct:]\s]|))\"
Does find all. But then maybe my regex is wrong haha.
Personally having the option of specifying it this way is easier but happy to be wrong.
… On 13 Feb 2019, at 18:51, Chase Sterling ***@***.***> wrote:
In fact, it looks like this new config,
findall: yes
extract: anything
would be equivalent to this already possible config:
extract: "(?:(anything).*)+"
Is that true? I agree the syntax on that already possible one is more complicated, just trying to figure out the best/simplest way to allow this feature.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
Aha, seems like python's
(I'm leaning towards 'yes' on both of these questions.) |
I'm not 100% sure what you mean by 1. In my understanding findall should only extract capture groups, but if they occur more than once they will all be captured. |
|
I didn't realise this was never finished. Can we finalise this pull please? |
|
Files have been updated to latest development version |


chaosmaker commentedFeb 12, 2019
re.search only allows for the first hit to be returned. By adding the option to use re.findall multiple matches can be returned to the user using the same method as re.search.
Motivation for changes:
re.search only allows a single hit to be returned during extract
Detailed changes:
Addressed issues:
Implemented feature requests:
Config usage if relevant (new plugin or updated schema):
Log and/or tests output (preferably both):
To Do: