Parsing.String
- Package
- purescript-parsing
- Repository
- purescript-contrib/purescript-parsing
Primitive parsers for working with an input stream of type String.
All of these primitive parsers will consume their input when they succeed.
All of these primitive parsers will consume no input when they fail.
The behavior of these primitive parsers is based on the behavior of the
Data.String module in the strings package.
In most JavaScript runtime environments, the String
is little-endian UTF-16.
The primitive parsers which return Char will only succeed when the character
being parsed is a code point in the
Basic Multilingual Plane
(the “BMP”). These parsers can be convenient because of the good support
that PureScript has for writing Char literals like 'あ', 'β', 'C'.
The other primitive parsers, which return CodePoint and String types,
can parse the full Unicode character set. All of the primitive parsers
in this module can be used together.
Position
In a String parser, the Position {index} counts the number of
unicode CodePoints since the beginning of the input string.
Each tab character (0x09) encountered in a String parser will advance
the Position {column} by 8.
These patterns will advance the Position {line} by 1 and reset
the Position {column} to 1:
- newline (
0x0A) - carriage-return (
0x0D) - carriage-return-newline (
0x0D 0x0A)
#anyCodePoint Source
anyCodePoint :: forall m. ParserT String m CodePointMatch any Unicode character. Always succeeds when any input remains.
#match Source
match :: forall m a. ParserT String m a -> ParserT String m (Tuple String a)Combinator which returns both the result of a parse and the slice of the input that was consumed while it was being parsed.
Because Strings are not Char arrays in PureScript, many and some
on Char parsers need to
be used with Data.String.CodeUnits.fromCharArray to
construct a String.
fromCharArray <$> Data.Array.many (char 'x')
It’s more efficient to achieve the same result by using this match combinator
instead of fromCharArray.
fst <$> match (Combinators.skipMany (char 'x'))
#regex Source
regex :: forall m. String -> RegexFlags -> Either String (ParserT String m String)Compile a regular expression string into a regular expression parser.
This function will use the Data.String.Regex.regex function to compile and return a parser which can be used
in a ParserT String m monad.
This parser will try to match the regular expression pattern starting at the current parser position. On success, it will return the matched substring.
MDN Regular Expressions Cheatsheet
This function should be called outside the context of a ParserT String m monad, because this function might
fail with a Left RegExp compilation error message.
If you call this function inside of the ParserT String m monad and then fail the parse when the compilation fails,
then that could be confusing because a parser failure is supposed to indicate an invalid input string.
If the compilation failure occurs in an alt then the compilation failure might not be reported at all and instead
the input string would be parsed incorrectly.
This parser may be useful for quickly consuming a large section of the
input String, because in a JavaScript runtime environment the RegExp
runtime is a lot faster than primitive parsers.
Example
This example shows how to compile and run the xMany parser which will
capture the regular expression pattern x*.
case regex "x*" noFlags of
Left compileError -> unsafeCrashWith $ "xMany failed to compile: " <> compileError
Right xMany -> runParser "xxxZ" do
xMany
Flags
Set RegexFlags with the Semigroup instance like this.
regex "x*" (dotAll <> ignoreCase)
The dotAll, unicode, and ignoreCase flags might make sense for a regex parser. The other flags will
probably cause surprising behavior and you should avoid them.
#anyTill Source
anyTill :: forall m a. Monad m => ParserT String m a -> ParserT String m (Tuple String a)Combinator which finds the first position in the input String where the
phrase can parse. Returns both the
parsed result and the unparsable input section searched before the parse.
Will fail if no section of the input is parseable. To backtrack the input
stream on failure, combine with tryRethrow.
This combinator works like Data.String.takeWhile or Data.String.Regex.search and it allows using a parser for the pattern search.
This combinator is equivalent to manyTill_ anyCodePoint, but it will be
faster because it returns a slice of the input String for the
section preceding the parse instead of a List CodePoint.
Be careful not to look too far
ahead; if the phrase parser looks to the end of the input then anyTill
could be O(n²).
#consumeWith Source
consumeWith :: forall m a. (String -> Either String { consumed :: String, remainder :: String, value :: a }) -> ParserT String m aConsume a portion of the input string while yielding a value.
Takes a consumption function which takes the remaining input String
as its argument and returns either an error message, or three fields:
valueis the value to return.consumedis the inputStringthat was consumed. It is used to update the parser position.remainderis the new remaining inputString.
This function is used internally to construct primitive String parsers.