normalize_string
normalize_string(s, normalform::Symbol)
Normalize the string s
according to one of the four "normal
forms" of the Unicode standard: normalform
can be :NFC
,
:NFD
, :NFKC
, or :NFKD
. Normal forms C (canonical
composition) and D (canonical decomposition) convert different
visually identical representations of the same abstract string into
a single canonical form, with form C being more compact. Normal
forms KC and KD additionally canonicalize "compatibility
equivalents": they convert characters that are abstractly similar
but visually distinct into a single canonical choice (e.g. they expand
ligatures into the individual characters), with form KC being more compact.
Alternatively, finer control and additional transformations may be
be obtained by calling normalize_string(s; keywords...)
, where
any number of the following boolean keywords options (which all default
to false
except for compose
) are specified:
compose=false
: do not perform canonical compositiondecompose=true
: do canonical decomposition instead of canonical composition (compose=true
is ignored if present)compat=true
: compatibility equivalents are canonicalizedcasefold=true
: perform Unicode case folding, e.g. for case-insensitive string comparisonnewline2lf=true
,newline2ls=true
, ornewline2ps=true
: convert various newline sequences (LF, CRLF, CR, NEL) into a linefeed (LF), line-separation (LS), or paragraph-separation (PS) character, respectivelystripmark=true
: strip diacritical marks (e.g. accents)stripignore=true
: strip Unicode's "default ignorable" characters (e.g. the soft hyphen or the left-to-right marker)stripcc=true
: strip control characters; horizontal tabs and form feeds are converted to spaces; newlines are also converted to spaces unless a newline-conversion flag was specifiedrejectna=true
: throw an error if unassigned code points are foundstable=true
: enforce Unicode Versioning Stability
For example, NFKC corresponds to the options compose=true, compat=true, stable=true
.
Examples
See Also
ascii, base64decode, Base64DecodePipe, base64encode, Base64EncodePipe, bin, bits, bytestring, charwidth, chomp, chop, chr2ind, contains, endswith, escape_string, graphemes, ind2chr, iscntrl, istext, isupper, isvalid, join, lcfirst, lowercase, lpad, lstrip, normalize_string, num2hex, parseip, randstring, readuntil, replace, repr, rpad, rsplit, rstrip, search, searchindex, split, startswith, string, stringmime, strip, strwidth, summary, takebuf_string, ucfirst, unescape_string, uppercase, utf16, utf32, utf8, wstring,User Contributed Notes
Add a Note
The format of note supported is markdown, use triple backtick to start and end a code block.