2

I'm using the parsec Haskell library.

I want to parse strings of the following kind:

[[v1]][[v2]]

xyz[[v1]][[v2]]

[[v1]]xyz[[v2]]

etc.

I'm interesting to collect only the values v1 and v2, and store these in a data structure.

I tried with the following code:

import Text.ParserCombinators.Parsec

quantifiedVars = sepEndBy var (string "]]")
var = between (string "[[") (string "") (many (noneOf "]]"))

parseSL :: String -> Either ParseError [String]
parseSL input = parse quantifiedVars "(unknown)" input

main = do {
   c <- getContents;
   case parse quantifiedVars "(stdin)" c of {
      Left e -> do { putStrLn "Error parsing input:"; print e; };
      Right r -> do{ putStrLn "ok"; mapM_ print r; };
   }
}

In this way, if the input is "[[v1]][[v2]]" the program works fine, returning the following output:

"v1"

"v2"

If the input is "xyz[[v1]][[v2]]" the program doesn't work. In particular, I want only what is contained in [[...]], ignoring "xyz".

Also, I want to store the content of [[...]] in a data structure.

How do you solve this problem?

2
  • So you want to skip anything not delimited by [[ and ]]? "xyz[[v1]][[v2]]" and "[[v1]]xyz[[v2]]" both should yield ["v1","v2"]? Commented Feb 14, 2012 at 14:43
  • It looks like easy task for regex. Something like \\[\\[([^]]+)\\]\\] Commented Feb 14, 2012 at 22:14

1 Answer 1

10

You need to restructure your parser. You are using combinators in very strange locations, and they mess things up.

A var is a varName between "[[" and "]]". So, write that:

var = between (string "[[") (string "]]") varName

A varName should have some kind of format (I don't think that you want to accept "%A¤%&", do you?), so you should make a parser for that; but in case it really can be anything, just do this:

varName = many $ noneOf "]"

Then, a text containing vars, is something with vars separated by non-vars.

varText = someText *> var `sepEndBy` someText

... where someText is anything except a '[':

someText = many $ noneOf "["

Things get more complicated if you want this to be parseable:

bla bla [ bla bla [[somevar]blabla]]

Then you need a better parser for varName and someText:

varName = concat <$> many (try incompleteTerminator <|> many1 (noneOf "]"))

-- Parses e.g. "]a"
incompleteTerminator = (\ a b -> [a, b]) <$> char ']' <*> noneOf "]"

someText = concat <$> many (try incompleteInitiator <|> many1 (noneOf "["))

-- Parses e.g. "[b"
incompleteInitiator = (\ a b -> [a, b]) <$> char '[' <*> noneOf "["

PS. (<*>), (*>) and (<$>) is from Control.Applicative.

Sign up to request clarification or add additional context in comments.

2 Comments

Dear dflemstr, thanks for the answer very detailed. I tried to implement your solution, but I've a problem of ambiguity importing the library "Control.Applicative" together with "Text.ParserCombinators.Parsec". In particular, "Ambiguous occurrence many'..." "Ambiguous occurrence <|>'...". So, I tried to use "hiding ((<|>),many)" but ghc returns a new error. How do you solve this problem? Thanks!
That's how I do it; you can try doing import Control.Applicative ((<*>), (*>), (<$>)) instead.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.