Module

Text.Parsing.Parser.String

Package
purescript-parsing
Repository
purescript-contrib/purescript-parsing

Primitive parsers for working with an input stream of type String.

All of these primitive parsers will consume their input when they succeed.

All of these primitive parsers will consume no input when they fail.

The behavior of these primitive parsers is based on the behavior of the Data.String module in the strings package. In most JavaScript runtime environments, the String is little-endian UTF-16.

The primitive parsers which return Char will only succeed when the character being parsed is a code point in the Basic Multilingual Plane (the “BMP”). These parsers can be convenient because of the good support that PureScript has for writing Char literals like 'あ', 'β', 'C'.

The other primitive parsers, which return CodePoint and String types, can parse the full Unicode character set. All of the primitive parsers in this module can be used together.

#string Source

string :: forall m. Monad m => String -> ParserT String m String

Match the specified string.

#eof Source

eof :: forall m. Monad m => ParserT String m Unit

Match “end-of-file,” the end of the input stream.

#rest Source

rest :: forall m. Monad m => ParserT String m String

Match the entire rest of the input stream. Always succeeds.

#anyChar Source

anyChar :: forall m. Monad m => ParserT String m Char

Match any BMP Char. Parser will fail if the character is not in the Basic Multilingual Plane.

#anyCodePoint Source

anyCodePoint :: forall m. Monad m => ParserT String m CodePoint

Match any Unicode character. Always succeeds.

#satisfy Source

satisfy :: forall m. Monad m => (Char -> Boolean) -> ParserT String m Char

Match a BMP Char satisfying the predicate.

#satisfyCodePoint Source

satisfyCodePoint :: forall m. Monad m => (CodePoint -> Boolean) -> ParserT String m CodePoint

Match a Unicode character satisfying the predicate.

#char Source

char :: forall m. Monad m => Char -> ParserT String m Char

Match the specified BMP Char.

#takeN Source

takeN :: forall m. Monad m => Int -> ParserT String m String

Match a String exactly N characters long.

#whiteSpace Source

whiteSpace :: forall m. Monad m => ParserT String m String

Match zero or more whitespace characters satisfying Data.CodePoint.Unicode.isSpace. Always succeeds.

#skipSpaces Source

skipSpaces :: forall m. Monad m => ParserT String m Unit

Skip whitespace characters and throw them away. Always succeeds.

#oneOf Source

oneOf :: forall m. Monad m => Array Char -> ParserT String m Char

Match one of the BMP Chars in the array.

#oneOfCodePoints Source

oneOfCodePoints :: forall m. Monad m => Array CodePoint -> ParserT String m CodePoint

Match one of the Unicode characters in the array.

#noneOf Source

noneOf :: forall m. Monad m => Array Char -> ParserT String m Char

Match any BMP Char not in the array.

#noneOfCodePoints Source

noneOfCodePoints :: forall m. Monad m => Array CodePoint -> ParserT String m CodePoint

Match any Unicode character not in the array.

#match Source

match :: forall m a. Monad m => ParserT String m a -> ParserT String m (Tuple String a)

Combinator which returns both the result of a parse and the slice of the input that was consumed while it was being parsed.

Because Strings are not Char arrays in PureScript, many and some on Char parsers need to be used with Data.String.CodeUnits.fromCharArray to construct a String.

fromCharArray <$> Data.Array.many (char 'x')

It’s more efficient to achieve the same result by using this match combinator instead of fromCharArray.

fst <$> match (Combinators.skipMany (char 'x'))

#regex Source

regex :: forall m flags f_. Monad m => Union flags RegexFlagsRow f_ => Nub f_ RegexFlagsRow => Record flags -> String -> ParserT String m String

Parser which uses the Data.String.Regex module to match the regular expression pattern passed as the String argument to the parser.

This parser will try to match the regular expression pattern starting at the current parser position. On success, it will return the matched substring.

If the Regex pattern string fails to compile then this parser will fail. (Note: It’s not possible to use a precompiled Regex because this parser must set flags and make adjustments to the Regex pattern string.)

This parser may be useful for quickly consuming a large section of the input String, because in a JavaScript runtime environment the RegExp runtime is a lot faster than primitive parsers.

MDN Regular Expressions Cheatsheet

Flags

The Record flags argument to the parser is for Regex flags. Here are the default flags.

{ dotAll: true
  ignoreCase: false
  unicode: true
}

To use the defaults, pass {} as the flags argument. For case-insensitive pattern matching, pass {ignoreCase: true} as the flags argument.

The other Data.String.Regex.Flags.RegexFlagsRec fields are mostly nonsense in the context of parsing and use of the other flags may cause strange behavior in the parser.

MDN Advanced searching with flags

Example

runParser "ababXX" (regex {} "(ab)+")
(Right "abab")

#RegexFlagsRow Source

type RegexFlagsRow :: Row Typetype RegexFlagsRow = (dotAll :: Boolean, global :: Boolean, ignoreCase :: Boolean, multiline :: Boolean, sticky :: Boolean, unicode :: Boolean)

The fields from Data.String.Regex.Flags.RegexFlagsRec.