| Safe Haskell | Safe-Inferred |
|---|
Text.Luthor.Combinator
Contents
Description
Parsec combinators are not composable by default. The idea is that this increases efficiency
by eliminating unnecessary backtracking. The practical problem with this approach is that it
is difficult to determine exactly where backtracking really is necessary and insert the needed
try combinators. Any mistakes are almost always subtle and difficult to diagnose and fix.
Compute power is cheap and programmers are expensive, so it only makes sense to make it easy on the programmer first and on the computer second. This module is mostly full of composable drop-in replacements for Parsec combinators. There's also some renaming, to make the API more idiomatic. Also, some additional combinators are exported.
When migrating to this API, it is recommended to import Parsec modules qualified so that your code will be composable by default. Where efficiency really is needed, you can carefully but easily fall back to non-composable Parsec combinators by namespacing the appropriate combinators.
One place where we could not provide a replacement was the <|> operator. This is exported by
the Alternative typeclass, which we really don't want to mess with. The composable alternative
is spelled <||>.
The re-named parsers are option, optional, optional_, many_, and many1_. It may not
be easiest to switch old habits, but these names are more idiomatic and also reduce the
amount of typing necessary. Also, we've altered the semantics of manyTill to make room for the
new manyThru combinator. This gives the user an easy choice whether to consume the terminating
element. Finally, we've changed the type of notFollowedBy to allow writing
x `notfollowedBy` y in place of x <* notFollowedBy y.
Below are some selected examples where this library is more intuitive:
- In Parsec,
string "aaa" <|> string "aa", will fail on input"aa". Using this module's<||>will succeed. - In Parsec,
(char 'a' `sepBy` char ' ') *> optional (char ' ') *> eofwill fail on input"a a ". Using this module'ssepBywill succeed. Similar results hold forsepBy1,chainl,chainl1,chainr, andchainr1. - In Parsec,
anyChar `manyTill` string "-->" *> eofwill fail on input"part1 -- part2-->". Using this module'smanyThruwill succeed with the same semantics. This modulesmanyTillwill not consume the"-->".
While we're at it, we've also re-exported applicative parsing functions and defined some of our
own combinators that have been found useful. Applicative parsing is recommended over monadic
parsing where it will suffice, so we'd rather eliminate the extra Control.Applicative import.
Among the additional combinators defined here are dispatch, count, atLeast, atMost, manyNM,
chomp, and between2.
- (<$>) :: Functor f => (a -> b) -> f a -> f b
- (<$$>) :: Functor f => f a -> (a -> b) -> f b
- (<*>) :: Applicative f => forall a b. f (a -> b) -> f a -> f b
- (*>) :: Applicative f => forall a b. f a -> f b -> f b
- (<*) :: Applicative f => forall a b. f a -> f b -> f a
- (<**>) :: Applicative f => f a -> f (a -> b) -> f b
- (<$) :: Functor f => forall a b. a -> f b -> f a
- pure :: Applicative f => forall a. a -> f a
- (<||>) :: Stream s m t => ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a
- choice :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m a
- dispatch :: Stream s m t => [(ParsecT s u m a, ParsecT s u m b)] -> ParsecT s u m b
- longestOf :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m a
- option :: Stream s m t => a -> ParsecT s u m a -> ParsecT s u m a
- optional :: Stream s m t => ParsecT s u m a -> ParsecT s u m (Maybe a)
- optional_ :: Stream s m t => ParsecT s u m a -> ParsecT s u m ()
- many :: ParsecT s u m a -> ParsecT s u m [a]
- many1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m [a]
- many_ :: Stream s m t => ParsecT s u m a -> ParsecT s u m ()
- many1_ :: Stream s m t => ParsecT s u m a -> ParsecT s u m ()
- count :: Stream s m t => Int -> ParsecT s u m a -> ParsecT s u m [a]
- atLeast :: Stream s m t => Int -> ParsecT s u m a -> ParsecT s u m [a]
- atMost :: Stream s m t => Int -> ParsecT s u m a -> ParsecT s u m [a]
- manyNM :: Stream s m t => Int -> Int -> ParsecT s u m a -> ParsecT s u m [a]
- manyOf :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m [a]
- manyOf_ :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m ()
- manyTill :: Stream s m t => ParsecT s u m a -> ParsecT s u m end -> ParsecT s u m [a]
- manyThru :: Stream s m t => ParsecT s u m a -> ParsecT s u m end -> ParsecT s u m [a]
- chomp :: Stream s m t => ParsecT s u m a -> ParsecT s u m trash -> ParsecT s u m a
- between :: Stream s m t => ParsecT s u m open -> ParsecT s u m close -> ParsecT s u m a -> ParsecT s u m a
- between2 :: Stream s m t => ParsecT s u m around -> ParsecT s u m a -> ParsecT s u m a
- sepBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
- sepBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
- sepEndBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
- sepEndBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
- endBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
- endBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
- sepAroundBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
- sepAroundBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
- chainl :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> a -> ParsecT s u m a
- chainl1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a
- chainr :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> a -> ParsecT s u m a
- chainr1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a
- lookAhead :: Stream s m t => ParsecT s u m a -> ParsecT s u m a
- notFollowedBy :: (Stream s m t, Show trash) => ParsecT s u m a -> ParsecT s u m trash -> ParsecT s u m a
- atEndOfInput :: (Stream s m t, Show t) => ParsecT s u m Bool
- endOfInput :: (Stream s m t, Show t) => ParsecT s u m ()
- allInput :: (Stream s m t, Show t) => ParsecT s u m a -> ParsecT s u m a
- withRemainingInput :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a, s)
- (<?>) :: ParsecT s u m a -> String -> ParsecT s u m a
- expect :: Stream s m t => String -> ParsecT s u m a -> ParsecT s u m a
- withPosition :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a, SourcePos)
- withPositionEnd :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a, SourcePos)
- withPositions :: Stream s m t => ParsecT s u m a -> ParsecT s u m (SourcePos, a, SourcePos)
- try :: ParsecT s u m a -> ParsecT s u m a
- (<|>) :: ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a
- unexpected :: Stream s m t => String -> ParsecT s u m a
- void :: Functor f => f a -> f ()
Applicative Parsing
(<*>) :: Applicative f => forall a b. f (a -> b) -> f a -> f b
Sequential application.
(*>) :: Applicative f => forall a b. f a -> f b -> f b
Sequence actions, discarding the value of the first argument.
(<*) :: Applicative f => forall a b. f a -> f b -> f a
Sequence actions, discarding the value of the second argument.
(<**>) :: Applicative f => f a -> f (a -> b) -> f b
A variant of <*> with the arguments reversed.
pure :: Applicative f => forall a. a -> f a
Lift a value.
Choices
(<||>) :: Stream s m t => ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m aSource
p <||> q tries to parse p, but if it fails, parses q.
Unlike the Alternative instance for ParsecT, backtracking will occur if p fails.
That is, a parser such as string "flange" <||> string "fly" will succeed on
the input "fly", whereas its Parsec counterpart will unintuitively fail.
choice :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m aSource
choice ps tries to apply the parsers in the list ps in order,
until one of them succeeds. Returns the value of the succeeding
parser. Unlike the Parsec version, this one ensures that parsers
do not consume input if they fail.
dispatch :: Stream s m t => [(ParsecT s u m a, ParsecT s u m b)] -> ParsecT s u m bSource
Given a map of parsers to parsers, attempt each key parser until one succeeds, then perform the value parser. Return the result of the value parser.
longestOf :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m aSource
Attempt all of the passed parsers under the current conditions and return the value of the parser which makes it furthest into the input stream (and updates the parser's internals as if that were the only parser parsed).
longestOf [string "do", string "don't"]
Zero or One
option :: Stream s m t => a -> ParsecT s u m a -> ParsecT s u m aSource
option x p tries to apply parser p. If p fails, no input is consumed and x
is returned.
optional :: Stream s m t => ParsecT s u m a -> ParsecT s u m (Maybe a)Source
optional p tries to parse p, but does not fail or consume input of p fails.
This is like optionMaybe, but is easier to type. See optional_.
optional_ :: Stream s m t => ParsecT s u m a -> ParsecT s u m ()Source
optional_ p tries to parse p, but does not fail or consume input if p fails.
This is like optional, but the use of underscore is more idiomatic
for actions whose results are ignored.
Many
many :: ParsecT s u m a -> ParsecT s u m [a]
many p applies the parser p zero or more times. Returns a
list of the returned values of p.
identifier = do{ c <- letter
; cs <- many (alphaNum <|> char '_')
; return (c:cs)
}
many1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m [a]
many1 p applies the parser p one or more times. Returns a
list of the returned values of p.
word = many1 letter
many_ :: Stream s m t => ParsecT s u m a -> ParsecT s u m ()Source
many1_ p applies the parser p zero or more times, skipping
its result.
many1_ :: Stream s m t => ParsecT s u m a -> ParsecT s u m ()Source
many1_ p applies the parser p one or more times, skipping
its result.
count :: Stream s m t => Int -> ParsecT s u m a -> ParsecT s u m [a]
count n p parses n occurrences of p. If n is smaller or
equal to zero, the parser equals to return []. Returns a list of
n values returned by p.
atLeast :: Stream s m t => Int -> ParsecT s u m a -> ParsecT s u m [a]Source
atLeast n p applies the parser p n or more times.
atMost :: Stream s m t => Int -> ParsecT s u m a -> ParsecT s u m [a]Source
atMost n p applies the parser p up to n times.
manyNM :: Stream s m t => Int -> Int -> ParsecT s u m a -> ParsecT s u m [a]Source
manyNM n m p applies the parser p n or more times up to m times.
manyOf :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m [a]Source
Parse zero or more of any mix of the passed parsers.
manyOf_ :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m ()Source
As manyOf, but ignoring the results.
Common Structures
Terminate
manyTill :: Stream s m t => ParsecT s u m a -> ParsecT s u m end -> ParsecT s u m [a]Source
manyTill p end applies parser p zero or more times until
parser end succeeds. Returns the list of values returned by p.
The end parse does not consume input, c.f. manyThru.
This parser can be used to scan comments:
simpleComment = do{ string "//"
; anyChar `manyThru` char '\n'
}
Note that despite the overlapping parsers anyChar and char '\n',
there is never a need to add a try: the end parser does not consume
input on failure.
manyThru :: Stream s m t => ParsecT s u m a -> ParsecT s u m end -> ParsecT s u m [a]Source
manyThru p end applies parser p zero or more times until
parser end succeeds. Returns the list of values returned by p.
The end parse does consume input, c.f. manyTill.
This parser can be used to scan comments:
simpleComment = do{ string "<!--"
; anyChar `manyThru` string "-->"
}
Note that despite the overlapping parsers anyChar and string "-->",
there is no need to add a try: the end parser does not consume input
on failure.
chomp :: Stream s m t => ParsecT s u m a -> ParsecT s u m trash -> ParsecT s u m aSource
chomp p x will parse p, then throw away a subsequent
parse by x provided it succeeds.
This combinator will only fail when p fails, not when x does.
Surround
between :: Stream s m t => ParsecT s u m open -> ParsecT s u m close -> ParsecT s u m a -> ParsecT s u m a
between open close p parses open, followed by p and close.
Returns the value returned by p.
braces = between (symbol "{") (symbol "}")
between2 :: Stream s m t => ParsecT s u m around -> ParsecT s u m a -> ParsecT s u m aSource
between2 p q is equivalent to between p p q
Intercalate
sepBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]Source
sepBy p sep parses zero or more occurrences of p, separated
by sep. Returns a list of values returned by p.
commaSep p = p `sepBy` (symbol ",")
sepBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]Source
sepBy1 p sep parses one or more occurrences of p, separated
by sep. Returns a list of values returned by p.
sepEndBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]Source
sepEndBy p sep parses zero or more occurrences of p,
separated and optionally ended by sep, ie. haskell style
statements. Returns a list of values returned by p.
haskellStatements = haskellStatement `sepEndBy` semi
sepEndBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]Source
sepEndBy1 p sep parses one or more occurrences of p,
separated and optionally ended by sep. Returns a list of values
returned by p.
endBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]Source
endBy p sep parses zero or more occurrences of p, seperated
and ended by sep. Returns a list of values returned by p.
cStatements = cStatement `endBy` semi
endBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]Source
endBy1 p sep parses one or more occurrences of p, seperated
and ended by sep. Returns a list of values returned by p.
sepAroundBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]Source
sepAroundBy p sep parses zero or more occurrences of p,
separated and optionally starting with and ended by sep. Returns
a list of values returned by p.
sepAroundBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]Source
sepAroundBy1 p sep parses one or more occurrences of p,
separated and optionally starting with and ended by sep. Returns
a list of values returned by p.
Chaining
chainl :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> a -> ParsecT s u m aSource
chainl p op x parses zero or more occurrences of p,
separated by op. Returns a value obtained by a left associative
application of all functions returned by op to the values returned
by p. If there are zero occurrences of p, the value x is
returned.
C.f. chainr
chainl1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m aSource
chainl1 p op x parses one or more occurrences of p,
separated by op Returns a value obtained by a left associative
application of all functions returned by op to the values returned
by p. This parser can for example be used to eliminate left
recursion which typically occurs in expression grammars.
expr = term `chainl1` mulop
term = factor `chainl1` addop
factor = parens expr <|> integer
mulop = do{ symbol "*"; return (*) }
<|> do{ symbol "/"; return (div) }
addop = do{ symbol "+"; return (+) }
<|> do{ symbol "-"; return (-) }
C.f. chainr1
chainr :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> a -> ParsecT s u m aSource
chainr p op x parses zero or more occurrences of p,
separated by op Returns a value obtained by a right associative
application of all functions returned by op to the values returned
by p. If there are no occurrences of p, the value x is
returned.
C.f. chainl
chainr1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m aSource
chainr1 p op x parses one or more occurrences of p,
separated by op Returns a value obtained by a right associative
application of all functions returned by op to the values returned
by p.
C.f. chainl1
Lookahead
lookAhead :: Stream s m t => ParsecT s u m a -> ParsecT s u m aSource
lookAhead p parses p without consuming any input, even if p fails.
notFollowedBy :: (Stream s m t, Show trash) => ParsecT s u m a -> ParsecT s u m trash -> ParsecT s u m aSource
notFollowedBy p q parses p, but only when q will fail immediately
after parsing p. Parsing q never consumes input, and if this
combinator fails, no input is consumed.
This combinator can be used to implement the 'longest match' rule.
For example, when recognizing keywords (for example let), we want
to make sure that a keyword is not followed by a legal identifier
character, in which case the keyword is actually an identifier
(for example lets). We can program this behavior as follows:
keywordLet = string "let" `notFollowedBy` alphaNum
endOfInput :: (Stream s m t, Show t) => ParsecT s u m ()Source
Succeed only when at the end of the input stream.
Input Stream
allInput :: (Stream s m t, Show t) => ParsecT s u m a -> ParsecT s u m aSource
Uses the passed parser, but succeeds only if it consumes all of the input.
withRemainingInput :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a, s)Source
Parse using the passed parser, but also return the input that was not consumed.
Additional Data
(<?>) :: ParsecT s u m a -> String -> ParsecT s u m a
The parser p ? msg behaves as parser p, but whenever the
parser p fails without consuming any input, it replaces expect
error messages with the expect error message msg.
This is normally used at the end of a set alternatives where we want
to return an error message in terms of a higher level construct
rather than returning all possible characters. For example, if the
expr parser from the try example would fail, the error
message is: '...: expecting expression'. Without the (<?>)
combinator, the message would be like '...: expecting "let" or
letter', which is less friendly.
withPosition :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a, SourcePos)Source
Annotate the return value of the passed parser with the position just before parsing.
withPositionEnd :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a, SourcePos)Source
Annotate the return value of the passed parser with the position just after parsing.
withPositions :: Stream s m t => ParsecT s u m a -> ParsecT s u m (SourcePos, a, SourcePos)Source
Annotate the return value of the passed parser with the position just before and after parsing respectively.
Re-exports
try :: ParsecT s u m a -> ParsecT s u m a
The parser try p behaves like parser p, except that it
pretends that it hasn't consumed any input when an error occurs.
This combinator is used whenever arbitrary look ahead is needed.
Since it pretends that it hasn't consumed any input when p fails,
the (<|>) combinator will try its second alternative even when the
first parser failed while consuming input.
The try combinator can for example be used to distinguish
identifiers and reserved words. Both reserved words and identifiers
are a sequence of letters. Whenever we expect a certain reserved
word where we can also expect an identifier we have to use the try
combinator. Suppose we write:
expr = letExpr <|> identifier <?> "expression"
letExpr = do{ string "let"; ... }
identifier = many1 letter
If the user writes "lexical", the parser fails with: unexpected
'x', expecting 't' in "let". Indeed, since the (<|>) combinator
only tries alternatives when the first alternative hasn't consumed
input, the identifier parser is never tried (because the prefix
"le" of the string "let" parser is already consumed). The
right behaviour can be obtained by adding the try combinator:
expr = letExpr <|> identifier <?> "expression"
letExpr = do{ try (string "let"); ... }
identifier = many1 letter
(<|>) :: ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a
This combinator implements choice. The parser p <|> q first
applies p. If it succeeds, the value of p is returned. If p
fails without consuming any input, parser q is tried. This
combinator is defined equal to the mplus member of the MonadPlus
class and the (<|>) member of Alternative.
The parser is called predictive since q is only tried when
parser p didn't consume any input (i.e.. the look ahead is 1).
This non-backtracking behaviour allows for both an efficient
implementation of the parser combinators and the generation of good
error messages.
unexpected :: Stream s m t => String -> ParsecT s u m a
The parser unexpected msg always fails with an unexpected error
message msg without consuming any input.
The parsers fail, (<?>) and unexpected are the three parsers
used to generate error messages. Of these, only (<?>) is commonly
used. For an example of the use of unexpected, see the definition
of notFollowedBy.