A parsing combinator library with a simple definition.
Error messages:
> parse digit "This is not a digit."
Left ("Expected a digit but found 'T' when trying to parse the string \"This \"...")
Backtracking by default:
> parse (lower <|> upper <|> digit) "9"
Right ('9')
With a Semigroup
instance (plus appropriate combinators) that does not:
parse (string' "hello" <> string' "hell0") "hell0"
Left ("Expecting \"hell0\" but found \"\"")
Brand-new combinators (such as atLeast
, atMost
, exactly
, suchThat
):
> parse (fromCharList <$> atLeast 3 (item `suchThat` (_ < 'c'))) "abcd"
Left ("Predicate failed on `suchThat` when trying to parse the string \"cd\"...")
Specialized versions of many typeclass functions, to avoid passing class dictionaries:
Typeclass function | Typeclassless version |
---|---|
<$> |
` |
pure |
pureP |
<*> |
~ |
*> |
>> |
<* |
<< |
bind |
bindP |
>>= |
>>- |
=<< |
-<< |
>=> |
>-> |
<=< |
<-< |
`< | >` |
flip alt |
` |
Versions of common combinators optimized for speed,
including a fast stack-safe implementation of many
:
> length longstring
41405
> parse (fromCharList <$> many item) longstring
Right ('asjdf89...
> import Text.Parsing.StringParser as SP
> SP.runParser (fromCharList <$> SP.many SP.anyChar) longstring
RangeError: Maximum call stack size exceeded
Polymorphism in the input type:
data Token = LParen | RParen | Digits Int | Comma
type Lexer a = Parser String a
lparen :: Lexer Token
lparen = LParen <$ (char '(' << skipSpaces)
rparen :: Lexer Token
rparen = RParen <$ (char ')' << skipSpaces)
comma :: Lexer Token
comma = Comma <$ (char ',' << skipSpaces)
digits :: Lexer Token
digits = Digit |-> (int << skipSpaces)
lex :: String -> Either ParseError (List Token)
lex = parse (many token)
where
token = lparen <| rparen <| comma <| digits
type P a = Parser (List Token) a
popToken :: forall a. P a
popToken = first List.uncons
data Expr = Integer Int | Pair Expr Expr
integer :: P Expr
integer = popToken >>- case _ of
Digits n -> pure (Integer n)
t -> fail ("Expected digits but found " <> show t)
open :: P Unit
open = popToken >>- case _ of
LParen -> pure unit
t -> fail ("Expected '(' but found " <> show t)
close :: P Unit
close = popToken >>- case _ of
RParen -> pure unit
t -> fail ("Expected ')' but found " <> show t)
sep :: P Unit
sep = popToken >>- case _ of
Comma -> pure unit
t -> fail ("Expected ',' but found " <> show t)
simplePair :: P Expr
simplePair = do
open
d1 <- integer
sep
d2 <- integer
close
case d1, d2 of
Integer n, Integer m -> pure (Tuple n m)
x, y -> fail ("Expected integers but found " <> show x <> ", " <> show y)
runParser :: String -> Either ParseError Expr
runParser = parse simplePair <=< lex
The module Text.Parsing.Combinators
includes general combinators for use with
any parser that has instances for Alt
, MonadPlus
, etc.
The Text.Parsing.Simple
module has combinators made specifically for its
Parser
data type.
Combinators that don't backtrack by default are denoted by a prime, for example: alphanum
vs. alphanum'
. These are meant to be used in combination with the Semigroup
instance instead of Alt
.
Example:
import Text.Parsing.Simple
import Text.Parsing.Combinators as C
data PositiveTupleInt = PositiveTupleInt Int Int
parseTupleIntA :: Parser String PositiveTupleInt
parseTupleIntA = PositiveTupleInt <$> (char '(' *> int |= (_ > 0) <* char ',') <*> (int |= (_ > 0) <* char ')')
aprseTupleIntM :: Parser String PositiveTupleInt
parseTupleIntM = fail "Expected TupleInt of the form (x,y)" |> do
char '('
fst <- int `suchThat` (_ > 0)
char ','
snd <- int `suchThat` (_ > 0)
char ')'
return $ PositiveTupleInt fst snd
fromString :: String -> PositiveTupleInt
fromString = either (const $ PositiveTupleInt 1 1) id <<< parse parseTupleIntA
Also see the test folder.
bower i purescript-simple-parser