The normalize task is used to normalize a string. This task method returns a string whose binary representation is in a particular Unicode normalization form, which can be one of the following:
|NFC||Canonical Decomposition, followed by Canonical Composition|
|NFKC||Compatibility Decomposition, followed by Canonical Composition|
For additional information, please refer to the Unicode Normalization Forms standard.
In simple terms, normalization ensures two strings that may use a different binary representation for their characters have the same binary value after normalization.
Potential Use Cases
Unicode sometimes has multiple representations of the same character. For example, the letter "e" with the accute accent (é) can be represented in Unicode using either
U+00E9 (single code point), or
U+0301 together (two code points). This can cause unexpected errors, such as password mismatching that prevents user authentication or the inability to search and sort email addresses in a database. To ensure data is stored and accessed in a consistent manner, use normalize whenever you need to convert characters with diacritical marks, change letter case, decompose ligatures, or convert half-width characters to full-width characters and so on. In short, you should always normalize and maintain consistent representation of characters whenever you're accepting input from users.
Input and output properties are shown below.
||String||Required. The string to normalize.|
||String||Optional. One of the specified forms for Unicode Normalization: NFC, NFD, NFKC, or NFKD.|
||String||A string containing the Unicode Normalization Form of the given string.|
In this example, the incoming
str variable is "Chloé O'Leary" and the
form value is
NFC. Note the use of an accute accent in the first name and an apostrophe in the last name.
The task creates a
normalizedString upon output.