Text.Parsing.Parser.String
- Package
- purescript-parsing
- Repository
- purescript-contrib/purescript-parsing
Primitive parsers for working with an input stream of type String.
All of these primitive parsers will consume their input when they succeed.
All of these primitive parsers will consume no input when they fail.
The behavior of these primitive parsers is based on the behavior of the
Data.String module in the strings package.
In most JavaScript runtime environments, the String
is little-endian UTF-16.
The primitive parsers which return Char will only succeed when the character
being parsed is a code point in the
Basic Multilingual Plane
(the “BMP”). These parsers can be convenient because of the good support
that PureScript has for writing Char literals like 'あ', 'β', 'C'.
The other primitive parsers, which return CodePoint and String types,
can parse the full Unicode character set. All of the primitive parsers
in this module can be used together.
#anyCodePoint Source
anyCodePoint :: forall m. Monad m => ParserT String m CodePointMatch any Unicode character. Always succeeds.
#whiteSpace Source
whiteSpace :: forall m. Monad m => ParserT String m StringMatch zero or more whitespace characters satisfying
Data.CodePoint.Unicode.isSpace. Always succeeds.
#skipSpaces Source
skipSpaces :: forall m. Monad m => ParserT String m UnitSkip whitespace characters and throw them away. Always succeeds.
#match Source
match :: forall m a. Monad m => ParserT String m a -> ParserT String m (Tuple String a)Combinator which returns both the result of a parse and the slice of the input that was consumed while it was being parsed.
Because Strings are not Char arrays in PureScript, many and some
on Char parsers need to
be used with Data.String.CodeUnits.fromCharArray to
construct a String.
fromCharArray <$> Data.Array.many (char 'x')
It’s more efficient to achieve the same result by using this match combinator
instead of fromCharArray.
fst <$> match (Combinators.skipMany (char 'x'))
#regex Source
regex :: forall m flags f_. Monad m => Union flags RegexFlagsRow f_ => Nub f_ RegexFlagsRow => Record flags -> String -> ParserT String m StringParser which uses the Data.String.Regex module to match the regular
expression pattern passed as the String
argument to the parser.
This parser will try to match the regular expression pattern starting at the current parser position. On success, it will return the matched substring.
If the Regex pattern string fails to compile then this parser will fail.
(Note: It’s not possible to use a precompiled Regex because this parser
must set flags and make adjustments to the Regex pattern string.)
This parser may be useful for quickly consuming a large section of the
input String, because in a JavaScript runtime environment the RegExp
runtime is a lot faster than primitive parsers.
MDN Regular Expressions Cheatsheet
Flags
The Record flags argument to the parser is for Regex flags. Here are
the default flags.
{ dotAll: true
ignoreCase: false
unicode: true
}
To use the defaults, pass
{} as the flags argument. For case-insensitive pattern matching, pass
{ignoreCase: true} as the flags argument.
The other Data.String.Regex.Flags.RegexFlagsRec fields are mostly
nonsense in the context of parsing
and use of the other flags may cause strange behavior in the parser.
MDN Advanced searching with flags
Example
runParser "ababXX" (regex {} "(ab)+")
(Right "abab")