UnicodeChecker Icon

Internationalized Domain Names in Applications (IDNA)

UnicodeChecker implements the Internationalized Domain Names in Applications (IDNA) protocol in both, the 2008 and the (now obsolete) 2003 version. IDNA conversions are available as a UnicodeChecker Utility and to other applications from AppleScript or as a Service. Both, the utility and AppleScript let you choose between the 2008 and 2003 protocol versions, while the Service always uses the later (2008) protocol.

IDNA 2008

The 2008 protocol is specified in RFCs 5890, 5891, 5892, 5893, and 5894.

This version specifies conversions between “A-Labels” (an ASCII-Compatible-Encoding of an IDNA-valid string) and “U-Labels” (an IDNA-valid string of Unicode characters). The conversion in either direction may fail. In this case the UnicodeChecker IDNA utility displays the reason for the failure, the respective AppleScript command returns the missing value constant.

UnicodeChecker implements the stricter “Registration Protocol” of RFC 5891, which amongst other things restricts input strings to lowercase (more specifically, it disallows “unstable” characters as defined in Section 2.2 of RFC 5892). UnicodeChecker does not perform any mappings on the input strings, so you must convert the input string to lowercase yourself. (IDNA 2003 specified a normative input mapping that would include a conversion to lowercase.)

When converting to ASCII representation (U-Label → A-Label), UnicodeChecker leaves LDH-Labels unchanged. LDH-Labels represent the set of “traditional” labels, i.e. they may only contain letters, digits and hyphens (hence LDH), may not start or end with a hyphen, and the maximum label length is 63 characters. When converting to Unicode representation (A-Label → U-Label), UnicodeChecker leaves NR-LDH-Labels unchanged. For a label to qualify as NR-LDH-Labels (non-reserved), it may not contain hyphens at both, the third and fourth character position.

The requirements for labels (i.e. the parts between the dots of a domain name) containing characters from right-to-left (RTL) scripts as specified in Section 4.2.3.4. (Labels Containing Characters Written Right to Left) of RFC 5891 are tested individually for each label containing RTL characters. While RFC 5893 also sets up joint requirements for labels of domain names with at least one RTL label, these are currently not tested by UnicodeChecker.

See RFC 5890 for a definition of the terms A-Label, U-Label, LDH-Label, NR-LDH-Label.

IDNA 2003

The 2003 protocol is specified in RFC 3490 and has been obsoleted by IDNA 2008.

This version specifies two conversions: “ToUnicode”, which always succeeds, and “ToASCII”, which may fail for some input strings. The IDNA utility informs you whether the conversion succeeded or not. If the conversion fails the exact reason for the failure is shown. The AppleScript command returns the missing value constant on failure.