🔤 Unicode Normalizer (NFC/NFD/NFKC/NFKD)

Professional Unicode normalization tool supporting all standard forms (NFC, NFD, NFKC, NFKD). Features character-level diff visualization, Unicode code point analysis, and byte-level comparison to debug text processing and search issues in international applications.

Enter Unicode text to normalize. Try accented characters like café, naïve, or résumé
Select the Unicode normalization form to apply
Highlight character-level differences between original and normalized text
Include Unicode code points, byte representations, and character count analysis

Unicode Normalization Results:

🔤 UNICODE NORMALIZED

Text Normalized to NFC Form

Character differences highlighted for debugging

📝 Input Analysis

Original: café (c + a + f + é)
✓ 4 characters detected

NFC (Composed)

café

4 characters

🔗

NFD (Decomposed)

cafe´

5 characters

🔍 Character-Level Analysis

Original: c(U+0063) a(U+0061) f(U+0066) é(U+00E9)
NFC: c(U+0063) a(U+0061) f(U+0066) é(U+00E9)
NFD: c(U+0063) a(U+0061) f(U+0066) e(U+0065) ´(U+0301)

💡 Analysis:

The accented character 'é' is composed in NFC form but decomposed into 'e' + combining accent in NFD form.

🔢 UTF-8 Byte Analysis

/* UTF-8 Byte Representation */
NFC: [0x63, 0x61, 0x66, 0xC3, 0xA9] (5 bytes)
NFD: [0x63, 0x61, 0x66, 0x65, 0xCC, 0x81] (6 bytes)
Difference: +1 byte in NFD form

How to Use This Unicode Normalizer (NFC/NFD/NFKC/NFKD)

How to Use the Unicode Normalizer

  1. Enter Text: Paste or type Unicode text containing accented characters, ligatures, or special symbols
  2. Choose Form: Select the normalization form (NFC is most common for general use)
  3. Enable Analysis: Check options to show character differences and detailed Unicode analysis
  4. Normalize: Click to process and see normalized results with visual differences highlighted

Pro Tips:

  • Use NFC for web applications and general text processing
  • Use NFD when you need to separate accents for advanced text analysis
  • Enable "Show differences" to debug why two similar-looking texts aren't matching
  • Try "All Forms" to compare how each normalization affects your text

How It Works

How Unicode Normalization Works

Unicode normalization standardizes text representation to ensure consistent processing:

NFC (Canonical Composed): Combines base characters with combining marks into single precomposed characters. Most compact form.

NFD (Canonical Decomposed): Separates precomposed characters into base character plus combining marks. Useful for text processing.

NFKC (Compatibility Composed): Like NFC but also applies compatibility mappings (converts ligatures, etc.).

NFKD (Compatibility Decomposed): Like NFD but with compatibility mappings applied.

The tool uses JavaScript's native String.prototype.normalize() method and provides detailed analysis of character-level changes, Unicode code points, and UTF-8 byte representations.

When You Might Need This

Frequently Asked Questions

What's the difference between NFC and NFD normalization?

NFC (Canonical Composed) combines characters with their accents into single code points (é = U+00E9), while NFD (Canonical Decomposed) separates them into base character plus combining marks (é = e + ´). NFC is more compact and commonly used, while NFD is useful for text processing and searching.

When should I use NFKC vs NFKD normalization forms?

NFKC and NFKD are compatibility forms that also convert similar-looking characters (like ligatures fi → fi). Use NFKC when you want composed characters with compatibility mapping, and NFKD for decomposed characters with compatibility. These forms are useful for search and comparison but may lose typographic distinctions.

Why do some characters look the same but fail equality tests?

Characters can be encoded differently in Unicode - 'é' might be a single composed character (U+00E9) or separate base letter 'e' (U+0065) plus combining accent (U+0301). They look identical but have different byte representations. Unicode normalization fixes these comparison issues.

How does Unicode normalization affect text length and file size?

Normalization can change character count and byte size. NFD typically increases length by decomposing accented characters, while NFC is more compact. NFKC/NFKD may reduce length by converting ligatures. The tool shows exact character counts and UTF-8 byte sizes for each form.

Should I normalize text before storing it in databases?

Yes, normalizing to NFC before storage is recommended for consistency. This prevents duplicate entries where 'café' and 'cafe´' are treated as different strings. Most modern databases and search engines expect NFC-normalized text for proper indexing and comparison.