Text.Parsing.Parser.String
- Package
- purescript-parsing
- Repository
- purescript-contrib/purescript-parsing
Primitive parsers for working with an input stream of type String
.
All of these primitive parsers will consume their input when they succeed.
All of these primitive parsers will consume no input when they fail.
The behavior of these primitive parsers is based on the behavior of the
Data.String
module in the strings package.
In most JavaScript runtime environments, the String
is little-endian UTF-16.
The primitive parsers which return Char
will only succeed when the character
being parsed is a code point in the
Basic Multilingual Plane
(the “BMP”). These parsers can be convenient because of the good support
that PureScript has for writing Char
literals like 'あ'
, 'β'
, 'C'
.
The other primitive parsers, which return CodePoint
and String
types,
can parse the full Unicode character set. All of the primitive parsers
in this module can be used together.
#anyCodePoint Source
anyCodePoint :: forall m. Monad m => ParserT String m CodePoint
Match any Unicode character. Always succeeds.
#whiteSpace Source
whiteSpace :: forall m. Monad m => ParserT String m String
Match zero or more whitespace characters satisfying
Data.CodePoint.Unicode.isSpace
. Always succeeds.
#skipSpaces Source
skipSpaces :: forall m. Monad m => ParserT String m Unit
Skip whitespace characters and throw them away. Always succeeds.
#match Source
match :: forall m a. Monad m => ParserT String m a -> ParserT String m (Tuple String a)
Combinator which returns both the result of a parse and the slice of the input that was consumed while it was being parsed.
Because String
s are not Char
arrays in PureScript, many
and some
on Char
parsers need to
be used with Data.String.CodeUnits.fromCharArray
to
construct a String
.
fromCharArray <$> Data.Array.many (char 'x')
It’s more efficient to achieve the same result by using this match
combinator
instead of fromCharArray
.
fst <$> match (Combinators.skipMany (char 'x'))
#regex Source
regex :: forall m flags f_. Monad m => Union flags RegexFlagsRow f_ => Nub f_ RegexFlagsRow => Record flags -> String -> ParserT String m String
Parser which uses the Data.String.Regex
module to match the regular
expression pattern passed as the String
argument to the parser.
This parser will try to match the regular expression pattern starting at the current parser position. On success, it will return the matched substring.
If the Regex
pattern string fails to compile then this parser will fail.
(Note: It’s not possible to use a precompiled Regex
because this parser
must set flags and make adjustments to the Regex
pattern string.)
This parser may be useful for quickly consuming a large section of the
input String
, because in a JavaScript runtime environment the RegExp
runtime is a lot faster than primitive parsers.
MDN Regular Expressions Cheatsheet
Flags
The Record flags
argument to the parser is for Regex
flags. Here are
the default flags.
{ dotAll: true
ignoreCase: false
unicode: true
}
To use the defaults, pass
{}
as the flags argument. For case-insensitive pattern matching, pass
{ignoreCase: true}
as the flags argument.
The other Data.String.Regex.Flags.RegexFlagsRec
fields are mostly
nonsense in the context of parsing
and use of the other flags may cause strange behavior in the parser.
MDN Advanced searching with flags
Example
runParser "ababXX" (regex {} "(ab)+")
(Right "abab")