1

Let's say i have the following string :

KEYWORD_1 OR "KEYWORD COMPOSITE_2" NOT "KEYWORD COMPOSITE_3" NOT "KEYWORD_4" AND "KEYWORD_5" KEYWORD_6 KEYWORD_7 KEYWORD_8 -KEYWORD_9

The result (this is not JSON format, just a visual formatting to explain the output) i want to get with my regex(es) is to split the string to the following three arrays of keywords, each one is corresponding to a delimiter (AND, OR, NOT) and contains all the words that follows every occurrence of the delimiter. Think of it like the google search field syntaxt :) :

final_result = {
    {
        OR: [KEYWORD_COMPOSITE_2]
    },
    {
        AND: [
            KEYWORD_1, 
            KEYWORD_5, 
            KEYWORD_6, 
            KEYWORD_7, 
            KEYWORD_8
        ]
    },
    {
        NOT: [
            KEYWORD_COMPOSITE_3, 
            KEYWORD_4, 
            KEYWORD_9
        ]
    }
}

I am trying to do this in javascript with one or more regex.

Any idea ? any help ? thank you

14
  • This is not a good job for a regex. Commented Jul 18, 2013 at 13:16
  • Well, where do you need help? What exactly is not working for you ? Commented Jul 18, 2013 at 13:17
  • Why is there a - in front of KEYWORD_9 Commented Jul 18, 2013 at 13:18
  • Are you sure that your provided the expected output? Commented Jul 18, 2013 at 13:19
  • 1
    AND: [ KEYWORD_1, JSON is correct? its String equivalent to a FORK? Diretio I still do not understand how it should work (I know what you need, the problem is your String has no logic in JSON format, the problem is to understand the logic) Commented Jul 18, 2013 at 13:32

1 Answer 1

1

You cannot do that with regexes alone, some programming is still required:

function parse(input) {
    var keywords = ['AND', 'OR', 'NOT'];
    var result = {}, kw = 'AND';
    input.replace(/"(.+?)"|(\w+)/g, function($0, $1, $2) {
        if(keywords.indexOf($2) >= 0) {
            kw = $2;
        } else {
            if(!(kw in result))
                result[kw] = [];
            result[kw].push($1 || $2);
        }
    });
    return result;
}
Sign up to request clarification or add additional context in comments.

6 Comments

Do not use (.+?) Has other better ways.
Well this is pretty nice and close to what i want but sill don't make it :) It gives me a great starting ! i think i can improve it ;)
@WissemGarouachi: glad to hear it was helpful - edited to meet your specs more closely.
@thg435 replace "(.+?)" by ["][^"]+["], eg.: /["][^"]+["]|(\w+)/g
Thank you this is useful, sorry i don't have enough reputation to upvote your answer :) i have one last question : you think i can't split on the 3 delimiters in the same time the loop on the result by couple of matches and not one by one ? this way i will have in each iteration a couple (matched_keyword, delimiter). something like this : (\sOR\s)|(\sAND\s)|(\sNOT\s) but this looks ugly :p
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.