Parsing.String.Replace
- Package
- purescript-parsing
- Repository
- purescript-contrib/purescript-parsing
This module is for finding patterns in a String
, and also
replacing or splitting on the found patterns.
This activity is traditionally done with
Regex,
but this module uses parsers instead for the pattern matching.
Functions in this module are ways to run a parser on an input String
,
like runParser
or runParserT
.
Why would we want to do pattern matching and substitution with parsers instead of regular expressions?
Monadic parsers have a nicer syntax than regular expressions, which are notoriously difficult to read. With monadic parsers we can perform textual pattern-matching in plain PureScript rather than using a special regex domain-specific programming language.
Regular expressions can do “group capture” on sections of the matched pattern, but they can only return stringy lists of the capture groups. Parsers can construct typed data structures based on the capture groups, guaranteeing no disagreement between the pattern rules and the rules that we're using to build data structures based on the pattern matches.
For example, consider scanning a string for numbers. A lot of different things can look like a number, and can have leading plus or minus signs, or be in scientific notation, or have commas, or whatever. If we try to parse all of the numbers out of a string using regular expressions, then we have to make sure that the regular expression and the string-to-number conversion function agree about exactly what is and what isn't a numeric string. We can get into an awkward situation in which the regular expression says it has found a numeric string but the string-to-number conversion function fails. A typed parser will perform both the pattern match and the conversion, so it will never be in that situation. Parse, don't validate.
Regular expressions are only able to pattern-match regular grammars. Monadic parsers are able pattern-match context-free (by recursion) or context-sensitive (by monad transformer) grammars.
The replacement expression for a traditional regular expression-based substitution command is usually just a string template in which the Nth “capture group” can be inserted with the syntax
\N
. With this library, instead of a template, we get aneditor
function which can perform any computation, includingEffect
s.
Implementation Notes
All of the functions in this module work by calling runParserT
with the anyTill
combinator.
We can expect the speed of parser-based pattern matching to be
about 10× worse than regex-based pattern matching in a JavaScript
runtime environment.
This module is based on the Haskell packages
replace-megaparsec
and
replace-attoparsec.
#breakCap Source
breakCap :: forall a. String -> Parser String a -> Maybe (T3 String a String)
Break on and capture one pattern
Find the first occurence of a pattern in a text stream, capture the found pattern, and break the input text stream on the found pattern.
This function can be used instead of Data.String.indexOf or Data.String.Regex.search or Data.String.Regex.replace and it allows using a parser for the pattern search.
This function can be used instead of
Data.String.takeWhile
or
Data.String.dropWhile
and it is predicated beyond more than just the next single CodePoint
.
Output
Nothing
when no pattern match was found.Just (prefix /\ parse_result /\ suffix)
for the result of parsing the pattern match, and theprefix
string before and thesuffix
string after the pattern match.prefix
andsuffix
may be zero-length strings.
Access the matched section of text
If you want to capture the matched string, then combine the pattern
parser sep
with match
.
With the matched string, we can reconstruct the input string.
For all input
, sep
, if
let (Just (prefix /\ (infix /\ _) /\ suffix)) =
breakCap input (match sep)
then
input == prefix <> infix <> suffix
Example
Find the first pattern match and break the input string on the pattern.
breakCap "hay needle hay" (string "needle")
Result:
Just ("hay " /\ "needle" /\ " hay")
Example
Find the first pattern match, capture the matched text and the parsed result.
breakCap "abc 123 def" (match intDecimal)
Result:
Just ("abc " /\ ("123" /\ 123) /\ " def")
#splitCap Source
splitCap :: forall a. String -> Parser String a -> NonEmptyList (Either String a)
Split on and capture all patterns
Find all occurences of the pattern parser sep
, split the input string,
capture all the patterns and the splits.
This function can be used instead of Data.String.Common.split or Data.String.Regex.split or Data.String.Regex.match or Data.String.Regex.search.
The input string will be split on every leftmost non-overlapping occurence
of the pattern sep
. The output list will contain
the parsed result of input string sections which match the sep
pattern
in Right
, and non-matching sections in Left
.
Access the matched section of text
If you want to capture the matched strings, then combine the pattern
parser sep
with the match
combinator.
With the matched strings, we can reconstruct the input string.
For all input
, sep
, if
let output = splitCap input (match sep)
then
input == fold (either identity fst <$> output)
Example
Split the input string on all Int
pattern matches.
splitCap "hay 1 straw 2 hay" intDecimal
Result:
[Left "hay ", Right 1, Left " straw ", Right 2, Left " hay"]
Example
Find the beginning positions of all pattern matches in the input.
catMaybes $ hush <$> splitCap ".𝝺...\n...𝝺." (position <* string "𝝺")
Result:
[ Position {index: 1, line: 1, column: 2 }
, Position { index: 9, line: 2, column: 4 }
]
Example
Find groups of balanced nested parentheses. This pattern is an example of a “context-free” grammar, a pattern that can't be expressed by a regular expression. We can express the pattern with a recursive parser.
balancedParens :: Parser String Unit
balancedParens = do
void $ char '('
void $ manyTill (balancedParens <|> void anyCodePoint) (char ')')
rmap fst <$> splitCap "((🌼)) (()())" (match balancedParens)
Result:
[Right "((🌼))", Left " ", Right "(()())"]
#splitCapT Source
splitCapT :: forall m a. Monad m => MonadRec m => String -> ParserT String m a -> m (NonEmptyList (Either String a))
Monad transformer version of splitCap
. The sep
parser will run in the
monad context.
Example
Count the pattern matches.
Parse in a State
monad to remember state in the parser. This
stateful letterCount
parser counts
the number of pattern matches which occur in the input, and also
tags each match with its index.
letterCount :: ParserT String (State Int) (Tuple Char Int)
letterCount = do
x <- letter
i <- modify (_+1)
pure (x /\ i)
flip runState 0 $ splitCapT "A B" letterCount
Result:
[Right ('A' /\ 1), Left " ", Right ('B' /\ 2)] /\ 2
#replace Source
replace :: String -> Parser String String -> String
Find-and-replace
Also called “match-and-substitute”. Find all
of the leftmost non-overlapping sections of the input string which match
the pattern parser sep
, and
replace them with the result of the parser.
The sep
parser must return a result of type String
.
This function can be used instead of Data.String.replaceAll or Data.String.Regex.replace'.
Access the matched section of text in the editor
To get access to the matched string for the replacement
combine the pattern parser sep
with match
.
This allows us to write a sep
parser which can choose to not
edit the match and just leave it as it is.
So, for all sep
:
replace input (fst <$> match sep) == input
Example
Find and uppercase the "needle"
pattern.
replace "hay needle hay" (toUpper <$> string "needle")
Result:
"hay NEEDLE hay"
Example
Find integers and double them.
replace "1 6 21 107" (show <$> (_*2) <$> intDecimal)
Result:
"2 12 42 214"
#replaceT Source
replaceT :: forall m. Monad m => MonadRec m => String -> ParserT String m String -> m String
Monad transformer version of replace
.
Example
Find an environment variable in curly braces and replace it with its value
from the environment.
We can read from the environment with lookupEnv
because replaceT
is
running the sep
parser in Effect
.
replaceT "◀ {HOME} ▶" do
_ <- string "{"
Tuple home _ <- anyTill (string "}")
lift (lookupEnv home) >>= maybe empty pure
Result:
"◀ /home/jbrock ▶"