Module

Data.String.Utils

Package: purescript-stringutils
Repository: menelaos/purescript-stringutils

#NormalizationForm Source

data NormalizationForm

Possible Unicode Normalization Forms

Constructors

NFC
NFD
NFKC
NFKD

Instances

Show NormalizationForm

#charAt Source

charAt :: Int -> String -> Maybe String

Return the character at the given index, if the index is within bounds. Note that this function handles Unicode as you would expect. If you want a simple wrapper around JavaScript's String.prototype.charAt method, you should use the Data.String.CodeUnits.charAt function from purescript-strings. This function returns a String instead of a Char because PureScript Chars must be UTF-16 code units and hence cannot represent all Unicode code points.

Example:

-- Data.String.Utils.charAt
charAt 2 "ℙ∪𝕣ⅇႽ𝚌𝕣ⅈ𝚙†" == Just "𝕣"
-- Data.String.CodeUnits.charAt
charAt 2 "ℙ∪𝕣ⅇႽ𝚌𝕣ⅈ𝚙†" == Just '�'

#codePointAt Source

codePointAt :: Warn (Text "DEPRECATED: `Data.String.Utils.codePointAt`") => Int -> String -> Maybe Int

DEPRECATED: This function is now available in purescript-strings.

Return the Unicode code point value of the character at the given index, if the index is within bounds. Note that this function handles Unicode as you would expect. If you want a simple wrapper around JavaScript's String.prototype.codePointAt method, you should use codePointAt'.

Example:

codePointAt   0 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == Just 120792
codePointAt   1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == Just 120793
codePointAt   2 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == Just 120794
codePointAt  19 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == Nothing

codePointAt'  0 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == Just 120793
codePointAt'  1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == Just 57304   -- Surrogate code point
codePointAt'  2 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == Just 120794
codePointAt' 19 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == Just 57313   -- Surrogate code point

#codePointAt' Source

codePointAt' :: Int -> String -> Maybe Int

Return the Unicode code point value of the character at the given index, if the index is within bounds. This function is a simple wrapper around JavaScript's String.prototype.codePointAt method. This means that if the index does not point to the beginning of a valid surrogate pair, the code unit at the index (i.e. the Unicode code point of the surrogate pair half) is returned instead. If you want to treat a string as an array of Unicode Code Points, use codePointAt from purescript-strings instead.

Example:

codePointAt'  0 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == Just 120793
codePointAt'  1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == Just 57304   -- Surrogate code point
codePointAt'  2 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == Just 120794
codePointAt' 19 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == Just 57313   -- Surrogate code point

codePointAt   0 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == Just 120792
codePointAt   1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == Just 120793
codePointAt   2 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == Just 120794
codePointAt  19 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == Nothing

#endsWith Source

endsWith :: String -> String -> Boolean

Determine whether the second string ends with the first one.

#endsWith' Source

endsWith' :: String -> Int -> String -> Boolean

Determine whether the second string ends with the first one but search as if the string were only as long as the given argument.

#escapeRegex Source

escapeRegex :: String -> String

Escape a string so that it can be used as a literal string within a regular expression.

#filter Source

filter :: (String -> Boolean) -> String -> String

Keep only those characters that satisfy the predicate. This function uses String instead of Char because PureScript Chars must be UTF-16 code units and hence cannot represent all Unicode code points.

#fromCharArray Source

fromCharArray :: Array String -> String

Convert an array of characters into a String. This function uses String instead of Char because PureScript Chars must be UTF-16 code units and hence cannot represent all Unicode code points.

Example:

fromCharArray ["ℙ", "∪", "𝕣", "ⅇ", "Ⴝ", "𝚌", "𝕣", "ⅈ", "𝚙", "†"]
  == "ℙ∪𝕣ⅇႽ𝚌𝕣ⅈ𝚙†"

#includes Source

includes :: String -> String -> Boolean

Determine whether the second arguments contains the first one.

Example:

includes "Merchant" "The Merchant of Venice" === true
includes "Duncan"   "The Merchant of Venice" === false

#includes' Source

includes' :: String -> Int -> String -> Boolean

Determine whether the second string argument contains the first one, beginning the search at the given position. Note that this function handles Unicode as you would expect. Negative position values result in a search from the beginning of the string.

Example:

includes' "𝟙"  1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == true
includes' "𝟙"  2 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == false
includes' "𝟡" 10 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == false
-- This behaviour is different from `String.prototype.includes`:
-- "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡".includes("𝟡", 10) == true

#length Source

length :: Warn (Text "DEPRECATED: `Data.String.Utils.length`") => String -> Int

DEPRECATED: This function is now available in purescript-strings.

Return the number of Unicode code points in a string. Note that this function correctly accounts for Unicode symbols that are made up of surrogate pairs. If you want a simple wrapper around JavaScript's string.length property, you should use the Data.String.CodeUnits.length function from purescript-strings.

length "PureScript" == 10
length "ℙ∪𝕣ⅇႽ𝚌𝕣ⅈ𝚙†" == 10    -- 14 with `Data.String.length`

#lines Source

lines :: String -> Array String

Split a string into an array of strings which were delimited by newline characters.

Example:

lines "Action\nis\neloquence." == ["Action", "is", "eloquence."]

#mapChars Source

mapChars :: (String -> String) -> String -> String

Return the string obtained by applying the mapping function to each character (i.e. Unicode code point) of the input string. Note that this is probably not what you want as Unicode code points are not necessarily the same as user-perceived characters (grapheme clusters). Only use this function if you know what you are doing. This function uses Strings instead of Chars because PureScript Chars must be UTF-16 code units and hence cannot represent all Unicode code points.

Example:

-- Mapping over what appears to be six characters...
mapChars (const "x") "Åström" == "xxxxxxxx" -- See? Don't use this!

#normalize Source

normalize :: String -> String

Return the Normalization Form C of a given string. This is the form that is recommended by the W3C.

#normalize' Source

normalize' :: NormalizationForm -> String -> String

Return a given Unicode Normalization Form of a string.

#padEnd Source

padEnd :: Int -> String -> String

Pad the given string with space from the end until the resulting string reaches the given length. Note that this function handles Unicode as you would expect. If you want a simple wrapper around JavaScript's String.prototype.padEnd method, you should use padEnd'.

Example:

-- Treats strings as a sequence of Unicode code points
padEnd   1 "0123456789" == "0123456789"
padEnd   1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"
padEnd  11 "0123456789" == "0123456789 "
padEnd  11 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 "
padEnd  21 "0123456789" == "0123456789           "
padEnd  21 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡           "

-- Treats strings as a sequence of Unicode code units
padEnd'  1 "0123456789" == "0123456789"
padEnd'  1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"
padEnd' 11 "0123456789" == "0123456789 "
padEnd' 11 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"
padEnd' 21 "0123456789" == "0123456789           "
padEnd' 21 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 "

#padEnd' Source

padEnd' :: Int -> String -> String

Wrapper around JavaScript's String.prototype.padEnd method. Note that this function treats strings as a sequence of Unicode code units. You will probably want to use padEnd instead.

Example:

-- Treats strings as a sequence of Unicode code points
padEnd   1 "0123456789" == "0123456789"
padEnd   1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"
padEnd  11 "0123456789" == "0123456789 "
padEnd  11 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 "
padEnd  21 "0123456789" == "0123456789           "
padEnd  21 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡           "

-- Treats strings as a sequence of Unicode code units
padEnd'  1 "0123456789" == "0123456789"
padEnd'  1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"
padEnd' 11 "0123456789" == "0123456789 "
padEnd' 11 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"
padEnd' 21 "0123456789" == "0123456789           "
padEnd' 21 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 "

#padStart Source

padStart :: Int -> String -> String

Pad the given string with space from the start until the resulting string reaches the given length. Note that this function handles Unicode as you would expect. If you want a simple wrapper around JavaScript's String.prototype.padStart method, you should use padStart'.

Example:

-- Treats strings as a sequence of Unicode code points
padStart   1 "0123456789" == "0123456789"
padStart   1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"
padStart  11 "0123456789" == " 0123456789"
padStart  11 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == " 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"
padStart  21 "0123456789" == "           0123456789"
padStart  21 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "           𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"

-- Treats strings as a sequence of Unicode code units
padStart'  1 "0123456789" == "0123456789"
padStart'  1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"
padStart' 11 "0123456789" == " 0123456789"
padStart' 11 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"
padStart' 21 "0123456789" == "           0123456789"
padStart' 21 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == " 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"

#padStart' Source

padStart' :: Int -> String -> String

Wrapper around JavaScript's String.prototype.padStart method. Note that this function treats strings as a sequence of Unicode code units. You will probably want to use padStart instead.

Example:

-- Treats strings as a sequence of Unicode code points
padStart   1 "0123456789" == "0123456789"
padStart   1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"
padStart  11 "0123456789" == " 0123456789"
padStart  11 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == " 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"
padStart  21 "0123456789" == "           0123456789"
padStart  21 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "           𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"

-- Treats strings as a sequence of Unicode code units
padStart'  1 "0123456789" == "0123456789"
padStart'  1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"
padStart' 11 "0123456789" == " 0123456789"
padStart' 11 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"
padStart' 21 "0123456789" == "           0123456789"
padStart' 21 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == " 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"

#repeat Source

repeat :: Int -> String -> Maybe String

Return a string that contains the specified number of copies of the input string concatenated together. Return Nothing if the repeat count is negative or if the resulting string would overflow the maximum string size.

Example:

repeat 3 "𝟞" == Just "𝟞𝟞𝟞"
repeat (-1) "PureScript" == Nothing
repeat 2147483647 "PureScript" == Nothing

#replaceAll Source

replaceAll :: Warn (Text "DEPRECATED: `Data.String.Utils.replaceAll`") => String -> String -> String -> String

DEPRECATED: This function is now available in purescript-strings.

Replace all occurences of the first argument with the second argument.

#startsWith Source

startsWith :: String -> String -> Boolean

Determine whether the second argument starts with the first one.

#startsWith' Source

startsWith' :: String -> Int -> String -> Boolean

Determine whether a string starts with a certain substring at a given position.

#stripChars Source

stripChars :: String -> String -> String

Strip a set of characters from a string. This function is case-sensitive.

Example:

stripChars "aeiou" "PureScript" == "PrScrpt"
stripChars "AEIOU" "PureScript" == "PureScript"

#stripDiacritics Source

stripDiacritics :: String -> String

Strip diacritics from a string.

Example:

stripDiacritics "Ångström"        == "Angstrom"
stripDiacritics "Crème Brulée"    == "Creme Brulee"
stripDiacritics "Götterdämmerung" == "Gotterdammerung"
stripDiacritics "ℙ∪𝕣ⅇႽ𝚌𝕣ⅈ𝚙†"      == "ℙ∪𝕣ⅇႽ𝚌𝕣ⅈ𝚙†"
stripDiacritics "Raison d'être"   == "Raison d'etre"
stripDiacritics "Týr"             == "Tyr"
stripDiacritics "Zürich"          == "Zurich"

#stripMargin Source

stripMargin :: String -> String

Removes leading whitespace and pipe character from each line. Useful for dedenting strings enclosed in triple double quotation marks. Inspired by Scala's stripMargin method. Does not preserve original line endings.

Example:

stripMargin
  """
  |Line 1
  |Line 2
  |Line 3
  """
== "Line 1\nLine 2\nLine 3"

#stripMarginWith Source

stripMarginWith :: String -> String -> String

Same as stripMargin except with the option to use any given string to delimit the margin. Does not preserve original line endings.

Example:

stripMarginWith ">> "
  """
  >> Line 1
  >> Line 2
  >> Line 3
  """
== "Line 1\nLine 2\nLine 3"

#toCharArray Source

toCharArray :: String -> Array String

Convert a string to an array of Unicode code points. Note that this function is different from Data.String.CodeUnits.toCharArray in purescript-strings which converts a string to an array of 16-bit code units. The difference becomes apparent when converting strings that contain characters which are internally represented as surrogate pairs. This function uses Strings instead of Chars because PureScript Chars must be UTF-16 code units and hence cannot represent all Unicode code points.

Example:

-- Data.String.Utils
toCharArray "ℙ∪𝕣ⅇႽ𝚌𝕣ⅈ𝚙†"
  == ["ℙ", "∪", "𝕣", "ⅇ", "Ⴝ", "𝚌", "𝕣", "ⅈ", "𝚙", "†"]

-- Data.String.CodeUnits
toCharArray "ℙ∪𝕣ⅇႽ𝚌𝕣ⅈ𝚙†" ==
  ['ℙ', '∪', '�', '�', 'ⅇ', 'Ⴝ', '�', '�', '�', '�', 'ⅈ', '�', '�', '†']

#trimEnd Source

trimEnd :: String -> String

Remove whitespace from the end of a string. Wrapper around JavaScript's String.prototype.trimEnd method.

#trimStart Source

trimStart :: String -> String

Remove whitespace from the beginning of a string. Wrapper around JavaScript's String.prototype.trimStart method.

#unsafeCodePointAt Source

unsafeCodePointAt :: Int -> String -> Int

Return the Unicode code point value of the character at the given index, if the index is within bounds. Note that this function handles Unicode as you would expect. If you want a simple (unsafe) wrapper around JavaScript's String.prototype.codePointAt method, you should use unsafeCodePointAt'.

Unsafe: Throws runtime exception if the index is not within bounds.

Example:

unsafeCodePointAt   0 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == 120792
unsafeCodePointAt   1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == 120793
unsafeCodePointAt   2 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == 120794
unsafeCodePointAt  19 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" -- Error

unsafeCodePointAt'  0 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == 120793
unsafeCodePointAt'  1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == 57304   -- Surrogate code point
unsafeCodePointAt'  2 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == 120794
unsafeCodePointAt' 19 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == 57313   -- Surrogate code point

#unsafeCodePointAt' Source

unsafeCodePointAt' :: Int -> String -> Int

Return the Unicode code point value of the character at the given index, if the index is within bounds. This function is a simple (unsafe) wrapper around JavaScript's String.prototype.codePointAt method. This means that if the index does not point to the beginning of a valid surrogate pair, the code unit at the index (i.e. the Unicode code point of the surrogate pair half) is returned instead. If you want to treat a string as an array of Unicode Code Points, use unsafeCodePointAt instead.

Example:

unsafeCodePointAt'  0 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == 120793
unsafeCodePointAt'  1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == 57304   -- Surrogate code point
unsafeCodePointAt'  2 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == 120794
unsafeCodePointAt' 19 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == 57313   -- Surrogate code point

unsafeCodePointAt   0 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == 120792
unsafeCodePointAt   1 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == 120793
unsafeCodePointAt   2 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" == 120794
unsafeCodePointAt  19 "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" -- Error

#unsafeRepeat Source

unsafeRepeat :: Int -> String -> String

Return a string that contains the specified number of copies of the input string concatenated together.

Unsafe: Throws runtime exception if the repeat count is negative or if the resulting string would overflow the maximum string size.

#words Source

words :: String -> Array String

Split a string into an array of strings which were delimited by white space characters.

Example:

words "Action is eloquence." == ["Action", "is", "eloquence."]

Modules: Data.Char.Utils; Data.String.Utils