| Portability | non-portable (regex-base needs MPTC+FD) |
|---|---|
| Stability | experimental |
| Maintainer | [email protected], [email protected] |
Text.Regex.Posix.Wrap
Contents
Description
WrapPosix.hsc exports a wrapped version of the ffi imports. To
increase type safety, the flags are newtype'd. The other important
export is a Regex type that is specific to the Posix library
backend. The flags are documented in Text.Regex.Posix. The
defaultCompOpt is (compExtended .|. compNewline).
The Regex, CompOption, and ExecOption types and their RegexOptions
instance is declared. The =~ and =~~ convenience functions are
defined.
The exported symbols are the same whether 1 is defined, but
when it is not defined then getVersion == Nothing and all other
exported values will call error or fail.
This module will fail or error only if allocation fails or a nullPtr is passed in.
- data Regex
- type RegOffset = Int64
- (=~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target) => source1 -> source -> target
- (=~~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target, Monad m) => source1 -> source -> m target
- type WrapError = (ReturnCode, String)
- wrapCompile :: CompOption -> ExecOption -> CString -> IO (Either WrapError Regex)
- wrapTest :: Regex -> CString -> IO (Either WrapError Bool)
- wrapMatch :: Regex -> CString -> IO (Either WrapError (Maybe [(RegOffset, RegOffset)]))
- wrapMatchAll :: Regex -> CString -> IO (Either WrapError [MatchArray])
- wrapCount :: Regex -> CString -> IO (Either WrapError Int)
- unusedRegOffset :: RegOffset
- newtype CompOption = CompOption CInt
- compBlank :: CompOption
- compExtended :: CompOption
- compIgnoreCase :: CompOption
- compNoSub :: CompOption
- compNewline :: CompOption
- newtype ExecOption = ExecOption CInt
- execBlank :: ExecOption
- execNotBOL :: ExecOption
- execNotEOL :: ExecOption
- newtype ReturnCode = ReturnCode CInt
- retBadbr :: ReturnCode
- retBadpat :: ReturnCode
- retBadrpt :: ReturnCode
- retEcollate :: ReturnCode
- retEctype :: ReturnCode
- retEescape :: ReturnCode
- retEsubreg :: ReturnCode
- retEbrack :: ReturnCode
- retEparen :: ReturnCode
- retEbrace :: ReturnCode
- retErange :: ReturnCode
- retEspace :: ReturnCode
High-level API
A compiled regular expression.
RegOffset is typedef int regoff_t on Linux and ultimately typedef long long __int64_t on Max OS X. So rather than saying 2,147,483,647 is all the length you need, I'll take the larger: 9,223,372,036,854,775,807 should be enough bytes for anyone, no need for Integer. The alternative is to compile to different sizes in a platform dependent manner with type RegOffset = (, which I do not want to do.
There is also a special value unusedRegOffset :: RegOffset which is
(-1) and as a starting index means that the subgroup capture was
unused. Otherwise the RegOffset indicates a character boundary that
is before the character at that index offset, with the first
character at index offset 0. So starting at 1 and ending at 2 means
to take only the second character.
(=~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target) => source1 -> source -> targetSource
(=~~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target, Monad m) => source1 -> source -> m targetSource
Low-level API
type WrapError = (ReturnCode, String)Source
The return code will be retOk when it is the Haskell wrapper and not the underlying library generating the error message.
Arguments
| :: CompOption | Flags (bitmapped) |
| -> ExecOption | Flags (bitmapped) |
| -> CString | The regular expression to compile (ASCII only, no null bytes) |
| -> IO (Either WrapError Regex) | Returns: the compiled regular expression |
wrapMatch :: Regex -> CString -> IO (Either WrapError (Maybe [(RegOffset, RegOffset)]))Source
wrapMatch returns offsets for the begin and end of each capture. Unused captures have offsets of unusedRegOffset which is (-1)
wrapMatchAll :: Regex -> CString -> IO (Either WrapError [MatchArray])Source
wrapMatchAll returns the offset and length of each capture. Unused captures have an offset of unusedRegOffset which is (-1) and length of 0.
Miscellaneous
Compilation options
newtype CompOption Source
A bitmapped CInt containing options for compilation of regular
expressions. Option values (and their man 3 regcomp names) are
-
compBlankwhich is a completely zero value for all the flags. This is also theblankCompOptvalue. -
compExtended(REG_EXTENDED) which can be set to use extended instead of basic regular expressions. This is set in thedefaultCompOptvalue. -
compNewline(REG_NEWLINE) turns on newline sensitivity: The dot (.) and inverted set[^ ]never match newline, and ^ and $ anchors do match after and before newlines. This is set in thedefaultCompOptvalue. -
compIgnoreCase(REG_ICASE) which can be set to match ignoring upper and lower distinctions. -
compNoSub(REG_NOSUB) which turns off all information from matching except whether a match exists.
Constructors
| CompOption CInt |
A completely zero value for all the flags.
This is also the blankCompOpt value.
Execution options
newtype ExecOption Source
A bitmapped CInt containing options for execution of compiled
regular expressions. Option values (and their man 3 regexec names) are
-
execBlankwhich is a complete zero value for all the flags. This is the blankExecOpt value. -
execNotBOL(REG_NOTBOL) can be set to prevent ^ from matching at the start of the input. -
execNotEOL(REG_NOTEOL) can be set to prevent $ from matching at the end of the input (before the terminating NUL).
Constructors
| ExecOption CInt |
A completely zero value for all the flags.
This is also the blankExecOpt value.
Return codes
newtype ReturnCode Source
ReturnCode is an enumerated CInt, corresponding to the error codes
from man 3 regex:
-
retBadbr(REG_BADBR) invalid repetition count(s) in{ } -
retBadpat(REG_BADPAT) invalid regular expression -
retBadrpt(REG_BADRPT)?,*, or+operand invalid -
retEcollate(REG_ECOLLATE) invalid collating element -
retEctype(REG_ECTYPE) invalid character class -
retEescape(REG_EESCAPE)\applied to unescapable character -
retEsubreg(REG_ESUBREG) invalid backreference number -
retEbrack(REG_EBRACK) brackets[ ]not balanced -
retEparen(REG_EPAREN) parentheses( )not balanced -
retEbrace(REG_EBRACE) braces{ }not balanced -
retErange(REG_ERANGE) invalid character range in[ ] -
retEspace(REG_ESPACE) ran out of memory -
retNoMatch(REG_NOMATCH) The regexec() function failed to match
Constructors
| ReturnCode CInt |
Instances