draft-ietf-precis-framework-11.txt   draft-ietf-precis-framework-12.txt 
PRECIS P. Saint-Andre PRECIS P. Saint-Andre
Internet-Draft Cisco Systems, Inc. Internet-Draft Cisco Systems, Inc.
Obsoletes: 3454 (if approved) M. Blanchet Obsoletes: 3454 (if approved) M. Blanchet
Intended status: Standards Track Viagenie Intended status: Standards Track Viagenie
Expires: April 21, 2014 October 18, 2013 Expires: May 25, 2014 November 21, 2013
PRECIS Framework: Preparation and Comparison of Internationalized PRECIS Framework: Preparation and Comparison of Internationalized
Strings in Application Protocols Strings in Application Protocols
draft-ietf-precis-framework-11 draft-ietf-precis-framework-12
Abstract Abstract
Application protocols using Unicode code points in protocol strings Application protocols using Unicode characters in protocol strings
need to properly prepare such strings in order to perform valid need to properly prepare such strings in order to perform valid
comparison operations (e.g., for purposes of authentication or comparison operations (e.g., for purposes of authentication or
authorization). This document defines a framework enabling authorization). This document defines a framework enabling
application protocols to perform the preparation and comparison of application protocols to perform the preparation and comparison of
internationalized strings (a.k.a. "PRECIS") in a way that depends on internationalized strings ("PRECIS") in a way that depends on the
the properties of Unicode code points and thus is agile with respect properties of Unicode characters and thus is agile with respect to
to versions of Unicode. As a result, this framework provides a more versions of Unicode. As a result, this framework provides a more
sustainable approach to the handling of internationalized strings sustainable approach to the handling of internationalized strings
than the previous framework, known as Stringprep (RFC 3454). This than the previous framework, known as Stringprep (RFC 3454). This
document obsoletes RFC 3454. document obsoletes RFC 3454.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 21, 2014. This Internet-Draft will expire on May 25, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 20 skipping to change at page 2, line 20
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. String Classes . . . . . . . . . . . . . . . . . . . . . . . . 6 3. String Classes . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2. IdentifierClass . . . . . . . . . . . . . . . . . . . . . 7 3.2. IdentifierClass . . . . . . . . . . . . . . . . . . . . . 7
3.3. FreeformClass . . . . . . . . . . . . . . . . . . . . . . 8 3.3. FreeformClass . . . . . . . . . . . . . . . . . . . . . . 9
4. Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4. Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1. Principles . . . . . . . . . . . . . . . . . . . . . . . . 10 4.1. Principles . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2. Building Application-Layer Constructs . . . . . . . . . . 12 4.2. Building Application-Layer Constructs . . . . . . . . . . 12
4.3. A Note about Spaces . . . . . . . . . . . . . . . . . . . 12 4.3. A Note about Spaces . . . . . . . . . . . . . . . . . . . 13
5. Order of Operations . . . . . . . . . . . . . . . . . . . . . 13 5. Order of Operations . . . . . . . . . . . . . . . . . . . . . 13
6. Code Point Properties . . . . . . . . . . . . . . . . . . . . 13 6. Code Point Properties . . . . . . . . . . . . . . . . . . . . 14
7. Category Definitions Used to Calculate Derived Property . . . 15 7. Category Definitions Used to Calculate Derived Property . . . 16
7.1. LetterDigits (A) . . . . . . . . . . . . . . . . . . . . . 16 7.1. LetterDigits (A) . . . . . . . . . . . . . . . . . . . . . 16
7.2. Unstable (B) . . . . . . . . . . . . . . . . . . . . . . . 16 7.2. Unstable (B) . . . . . . . . . . . . . . . . . . . . . . . 17
7.3. IgnorableProperties (C) . . . . . . . . . . . . . . . . . 16 7.3. IgnorableProperties (C) . . . . . . . . . . . . . . . . . 17
7.4. IgnorableBlocks (D) . . . . . . . . . . . . . . . . . . . 16 7.4. IgnorableBlocks (D) . . . . . . . . . . . . . . . . . . . 17
7.5. LDH (E) . . . . . . . . . . . . . . . . . . . . . . . . . 16 7.5. LDH (E) . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.6. Exceptions (F) . . . . . . . . . . . . . . . . . . . . . . 17 7.6. Exceptions (F) . . . . . . . . . . . . . . . . . . . . . . 17
7.7. BackwardCompatible (G) . . . . . . . . . . . . . . . . . . 18 7.7. BackwardCompatible (G) . . . . . . . . . . . . . . . . . . 19
7.8. JoinControl (H) . . . . . . . . . . . . . . . . . . . . . 18 7.8. JoinControl (H) . . . . . . . . . . . . . . . . . . . . . 19
7.9. OldHangulJamo (I) . . . . . . . . . . . . . . . . . . . . 19 7.9. OldHangulJamo (I) . . . . . . . . . . . . . . . . . . . . 19
7.10. Unassigned (J) . . . . . . . . . . . . . . . . . . . . . . 19 7.10. Unassigned (J) . . . . . . . . . . . . . . . . . . . . . . 20
7.11. ASCII7 (K) . . . . . . . . . . . . . . . . . . . . . . . . 19 7.11. ASCII7 (K) . . . . . . . . . . . . . . . . . . . . . . . . 20
7.12. Controls (L) . . . . . . . . . . . . . . . . . . . . . . . 20 7.12. Controls (L) . . . . . . . . . . . . . . . . . . . . . . . 20
7.13. PrecisIgnorableProperties (M) . . . . . . . . . . . . . . 20 7.13. PrecisIgnorableProperties (M) . . . . . . . . . . . . . . 20
7.14. Spaces (N) . . . . . . . . . . . . . . . . . . . . . . . . 20 7.14. Spaces (N) . . . . . . . . . . . . . . . . . . . . . . . . 21
7.15. Symbols (O) . . . . . . . . . . . . . . . . . . . . . . . 20 7.15. Symbols (O) . . . . . . . . . . . . . . . . . . . . . . . 21
7.16. Punctuation (P) . . . . . . . . . . . . . . . . . . . . . 20 7.16. Punctuation (P) . . . . . . . . . . . . . . . . . . . . . 21
7.17. HasCompat (Q) . . . . . . . . . . . . . . . . . . . . . . 21 7.17. HasCompat (Q) . . . . . . . . . . . . . . . . . . . . . . 21
7.18. OtherLetterDigits (R) . . . . . . . . . . . . . . . . . . 21 7.18. OtherLetterDigits (R) . . . . . . . . . . . . . . . . . . 21
8. Calculation of the Derived Property . . . . . . . . . . . . . 21 8. Calculation of the Derived Property . . . . . . . . . . . . . 21
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23
9.1. PRECIS Derived Property Value Registry . . . . . . . . . . 22 9.1. PRECIS Derived Property Value Registry . . . . . . . . . . 23
9.2. PRECIS Base Classes Registry . . . . . . . . . . . . . . . 22 9.2. PRECIS Base Classes Registry . . . . . . . . . . . . . . . 23
9.3. PRECIS Profiles Registry . . . . . . . . . . . . . . . . . 23 9.3. PRECIS Profiles Registry . . . . . . . . . . . . . . . . . 24
10. Security Considerations . . . . . . . . . . . . . . . . . . . 25 10. Security Considerations . . . . . . . . . . . . . . . . . . . 25
10.1. General Issues . . . . . . . . . . . . . . . . . . . . . . 25 10.1. General Issues . . . . . . . . . . . . . . . . . . . . . . 25
10.2. Use of the IdentifierClass . . . . . . . . . . . . . . . . 25 10.2. Use of the IdentifierClass . . . . . . . . . . . . . . . . 25
10.3. Use of the FreeformClass . . . . . . . . . . . . . . . . . 25 10.3. Use of the FreeformClass . . . . . . . . . . . . . . . . . 26
10.4. Local Character Set Issues . . . . . . . . . . . . . . . . 26 10.4. Local Character Set Issues . . . . . . . . . . . . . . . . 26
10.5. Visually Similar Characters . . . . . . . . . . . . . . . 26 10.5. Visually Similar Characters . . . . . . . . . . . . . . . 26
10.6. Security of Passwords . . . . . . . . . . . . . . . . . . 28 10.6. Security of Passwords . . . . . . . . . . . . . . . . . . 28
11. Interoperability Considerations . . . . . . . . . . . . . . . 28 11. Interoperability Considerations . . . . . . . . . . . . . . . 29
12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29
12.1. Normative References . . . . . . . . . . . . . . . . . . . 29 12.1. Normative References . . . . . . . . . . . . . . . . . . . 29
12.2. Informative References . . . . . . . . . . . . . . . . . . 29 12.2. Informative References . . . . . . . . . . . . . . . . . . 30
Appendix A. Codepoint Table . . . . . . . . . . . . . . . . . . . 32 Appendix A. Codepoint Table . . . . . . . . . . . . . . . . . . . 32
Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 62 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 63
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 62 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 63
1. Introduction 1. Introduction
As described in the problem statement for the preparation and As described in the problem statement for the preparation and
comparison of internationalized strings ("PRECIS") [RFC6885], many comparison of internationalized strings ("PRECIS") [RFC6885], many
IETF protocols have used the Stringprep framework [RFC3454] as the IETF protocols have used the Stringprep framework [RFC3454] as the
basis for preparing and comparing protocol strings that contain basis for preparing and comparing protocol strings that contain
Unicode code points [UNICODE] outside the ASCII range [RFC20]. The Unicode characters [UNICODE] outside the ASCII range [RFC20]. The
Stringprep framework was developed during work on the original Stringprep framework was developed during work on the original
technology for internationalized domain names (IDNs), here called technology for internationalized domain names (IDNs), here called
"IDNA2003" [RFC3490], and Nameprep [RFC3491] was the Stringprep "IDNA2003" [RFC3490], and Nameprep [RFC3491] was the Stringprep
profile for IDNs. At the time, Stringprep was designed as a general profile for IDNs. At the time, Stringprep was designed as a general
framework so that other application protocols could define their own framework so that other application protocols could define their own
Stringprep profiles for the preparation and comparison of strings and Stringprep profiles for the preparation and comparison of strings and
identifiers. Indeed, a number of application protocols defined such identifiers. Indeed, a number of application protocols defined such
profiles. profiles.
After the publication of [RFC3454] in 2002, several significant After the publication of [RFC3454] in 2002, several significant
issues arose with the use of Stringprep in the IDN case, as issues arose with the use of Stringprep in the IDN case, as
documented in the IAB's recommendations regarding IDNs [RFC4690] documented in the IAB's recommendations regarding IDNs [RFC4690]
(most significantly, Stringprep was tied to Unicode version 3.2). (most significantly, Stringprep was tied to Unicode version 3.2).
Therefore, the newer IDNA specifications, here called "IDNA2008" Therefore, the newer IDNA specifications, here called "IDNA2008"
([RFC5890], [RFC5891], [RFC5892], [RFC5893], [RFC5894]), no longer ([RFC5890], [RFC5891], [RFC5892], [RFC5893], [RFC5894]), no longer
use Stringprep and Nameprep. This migration away from Stringprep for use Stringprep and Nameprep. This migration away from Stringprep for
IDNs has prompted other "customers" of Stringprep to consider new IDNs has prompted other "customers" of Stringprep to consider new
approaches to the preparation and comparison of internationalized approaches to the preparation and comparison of internationalized
strings (a.k.a. "PRECIS"), as described in [RFC6885]. strings, as described in [RFC6885].
This document defines a framework for a post-Stringprep approach to This document defines a framework for a post-Stringprep approach to
the preparation and comparison of internationalized strings in the preparation and comparison of internationalized strings in
application protocols, based on several principles: application protocols, based on several principles:
1. Define a small set of string classes that specify the code points 1. Define a small set of string classes that specify the Unicode
appropriate for common application protocol constructs. characters (i.e., specific "code points") appropriate for common
application protocol constructs.
2. Define each PRECIS string class in terms of Unicode code points 2. Define each PRECIS string class in terms of Unicode code points
and their properties so that an algorithm can be used to and their properties so that an algorithm can be used to
determine whether each code point or character category is valid, determine whether each code point or character category is (a)
contextual rule required, disallowed, or unassigned. valid, (b) allowed in certain contexts, (c) disallowed, or (d)
3. Define string classes in terms of allowable code points, so that unassigned.
3. Use an "inclusion model" such that a string class consists only
of code points that are explicitly allowed, with the result that
any code point not explicitly allowed is forbidden. any code point not explicitly allowed is forbidden.
4. Enable application protocols to define profiles of the PRECIS 4. Enable application protocols to define profiles of the PRECIS
string classes, addressing matters such as width mapping, case string classes, addressing matters such as width mapping, case
folding and other forms of character mapping, Unicode folding and other forms of character mapping, Unicode
normalization, directionality, and further excluded code points normalization, directionality, and further excluded code points
or character categories. or character categories.
Whereas the string classes define the "baseline" code points for a Whereas the string classes define the "baseline" code points for a
range of applications, profiling enables application protocols to range of applications, profiling enables application protocols to
further restrict the allowable code points beyond those specified for further restrict the allowable code points beyond those specified for
skipping to change at page 5, line 20 skipping to change at page 5, line 24
[I-D.ietf-xmpp-6122bis], and free-form strings [I-D.ietf-xmpp-6122bis], and free-form strings
[I-D.ietf-xmpp-6122bis]. Profiles are responsible for defining the [I-D.ietf-xmpp-6122bis]. Profiles are responsible for defining the
handling of right-to-left characters as well as various mapping handling of right-to-left characters as well as various mapping
operations of the kind also discussed for IDNs in [RFC5895], such as operations of the kind also discussed for IDNs in [RFC5895], such as
case preservation or lowercasing, Unicode normalization, mapping of case preservation or lowercasing, Unicode normalization, mapping of
certain characters to other characters or to nothing, and mapping of certain characters to other characters or to nothing, and mapping of
full-width and half-width characters. full-width and half-width characters.
It is expected that this framework will yield the following benefits: It is expected that this framework will yield the following benefits:
o Application protocols will be more version-agile with regard to o Application protocols will be agile with regard to Unicode
the Unicode database. versions.
o Implementers will be able to share code point tables and software o Implementers will be able to share code point tables and software
code across application protocols, most likely by means of code across application protocols, most likely by means of
software libraries. software libraries.
o End users will be able to acquire more accurate expectations about o End users will be able to acquire more accurate expectations about
the code points that are acceptable in various contexts. Given the characters that are acceptable in various contexts. Given
this more uniform set of string classes, it is also expected that this more uniform set of string classes, it is also expected that
copy/paste operations between software implementing different copy/paste operations between software implementing different
application protocols will be more predictable and coherent. application protocols will be more predictable and coherent.
Although this framework is similar to IDNA2008 and borrows some of Although this framework is similar to IDNA2008 and borrows some of
the character categories defined in [RFC5892], it defines additional the character categories defined in [RFC5892], it defines additional
character categories to meet the needs of common application character categories to meet the needs of common application
protocols. protocols.
The character categories and calculation rules defined under The character categories and calculation rules defined under
Section 7 and Section 8 are normative and apply to all Unicode code Section 7 and Section 8 are normative and apply to all Unicode code
points. The list of code points provided under Appendix A are non- points. The code point table provided under Appendix A is non-
normative and merely show, for illustrative purposes, the normative and merely shows, for illustrative purposes, the
consequences of the character categories and calculation rules, and consequences of the character categories and calculation rules, as
the resulting property values. well as the resulting property values.
2. Terminology 2. Terminology
Many important terms used in this document are defined in [RFC5890], Many important terms used in this document are defined in [RFC5890],
[RFC6365], [RFC6885], and [UNICODE]. The terms "left-to-right" (LTR) [RFC6365], [RFC6885], and [UNICODE]. The terms "left-to-right" (LTR)
and "right-to-left" (RTL) are defined in [UAX9]. and "right-to-left" (RTL) are defined in Unicode Standard Annex #9
[UAX9].
As of the date of writing, the version of Unicode published by the
Unicode Consortium is 6.3; however, PRECIS is not tied to a specific
version of Unicode.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in "OPTIONAL" in this document are to be interpreted as described in
[RFC2119]. [RFC2119].
3. String Classes 3. String Classes
3.1. Overview 3.1. Overview
skipping to change at page 6, line 28 skipping to change at page 6, line 37
for non-DNS applications. for non-DNS applications.
Starting in 2010, various "customers" of Stringprep began to discuss Starting in 2010, various "customers" of Stringprep began to discuss
the need to define a post-Stringprep approach to the preparation and the need to define a post-Stringprep approach to the preparation and
comparison of internationalized strings other than IDNs. This comparison of internationalized strings other than IDNs. This
community analyzed the existing Stringprep profiles and also weighed community analyzed the existing Stringprep profiles and also weighed
the costs and benefits of defining a relatively small set of Unicode the costs and benefits of defining a relatively small set of Unicode
characters that would minimize the potential for user confusion characters that would minimize the potential for user confusion
caused by visually similar characters (and thus be relatively "safe") caused by visually similar characters (and thus be relatively "safe")
vs. defining a much larger set of Unicode characters that would vs. defining a much larger set of Unicode characters that would
maximize the potential for user expressiveness (and thus be maximize the potential for user creativity (and thus be relatively
relatively inclusive). As a result, the community concluded that "expressive"). As a result, the community concluded that most
most existing uses could be addressed by two string classes: existing uses could be addressed by two string classes:
IdentifierClass: a sequence of letters, numbers, and some symbols IdentifierClass: a sequence of letters, numbers, and some symbols
that is used to identify or address a network entity such as a that is used to identify or address a network entity such as a
user account, a venue (e.g., a chatroom), an information source user account, a venue (e.g., a chatroom), an information source
(e.g., a data feed), or a collection of data (e.g., a file); the (e.g., a data feed), or a collection of data (e.g., a file); the
intent is that this class will be very safe for use in a wide intent is that this class will minimize user confusion in a wide
variety of application protocols, with the result that safety has variety of application protocols, with the result that safety has
been prioritized over inclusiveness for this class. been prioritized over expressiveness for this class.
FreeformClass: a sequence of letters, numbers, symbols, spaces, and FreeformClass: a sequence of letters, numbers, symbols, spaces, and
other code points that is used for free-form strings, including other characters that is used for free-form strings, including
passwords as well as display elements such as human-friendly passwords as well as display elements such as human-friendly
nicknames in chatrooms; the intent is that this class will allow nicknames in chatrooms; the intent is that this class will allow
nearly any Unicode character, with the result that inclusiveness nearly any Unicode character, with the result that expressiveness
has been prioritized over safety for this class (e.g., protocol has been prioritized over safety for this class (e.g., protocol
designers, application developers, service providers, and end designers, application developers, service providers, and end
users might not understand or be able to enter all of the users might not understand or be able to enter all of the
characters that can be included in the FreeformClass). characters that can be included in the FreeformClass).
Future specifications might define additional PRECIS string classes, Future specifications might define additional PRECIS string classes,
such as a class that falls somewhere between the IdentifierClass and such as a class that falls somewhere between the IdentifierClass and
the FreeformClass. At this time, it is not clear how useful such a the FreeformClass. At this time, it is not clear how useful such a
class would be. In any case, because application developers are able class would be. In any case, because application developers are able
to define profiles of PRECIS string classes, a protocol needing a to define profiles of PRECIS string classes, a protocol needing a
construct between the IdentiferClass and the FreeformClass could of construct between the IdentiferClass and the FreeformClass could
course define a restricted profile of the FreeformClass if needed. define a restricted profile of the FreeformClass if needed.
The following subsections discuss the IdentifierClass and The following subsections discuss the IdentifierClass and
FreeformClass in more detail, with reference to the dimensions FreeformClass in more detail, with reference to the dimensions
described in Section 3 of [RFC6885]. (Naturally, future documents described in Section 3 of [RFC6885]. Each string class is defined by
can define PRECIS string classes beyond the IdentifierClass and the following behavioral rules:
FreeformClass; see Section 9.2.) Each string class is defined by the
following behavioral rules:
Valid: Defines which code points and character categories are Valid: Defines which code points and character categories are
treated as valid input to the string. treated as valid input to the string.
Contextual Rule Required: Defines which code points and character Contextual Rule Required: Defines which code points and character
categories are treated as requiring a contextual rule (i.e., categories are treated as allowed only if the requirements of a
either CONTEXTJ or CONTEXTO). contextual rule are met (i.e., either CONTEXTJ or CONTEXTO).
Disallowed: Defines which code points and character categories need Disallowed: Defines which code points and character categories need
to be excluded from the string. to be excluded from the string.
Unassigned: Defines application behavior in the presence of code Unassigned: Defines application behavior in the presence of code
points that are unassigned, i.e. unknown for the version of points that are unknown (i.e., not yet designated) for the version
Unicode the application is built upon. of Unicode used by the application.
This document defines the valid, contextual rule required, This document defines the valid, contextual rule required,
disallowed, and unassigned rules for the IdentifierClass and disallowed, and unassigned rules for the IdentifierClass and
FreeformClass. As described under Section 4, profiles of these FreeformClass. As described under Section 4, profiles of these
string classes are responsible for defining the width mapping, string classes are responsible for defining the width mapping,
additional mapping, case mapping, normalization, directionality, and additional mapping, case mapping, normalization, directionality, and
exclusion rules. exclusion rules.
3.2. IdentifierClass 3.2. IdentifierClass
skipping to change at page 8, line 6 skipping to change at page 8, line 19
o Code points traditionally used as letters and numbers in writing o Code points traditionally used as letters and numbers in writing
systems, i.e., the LetterDigits ("A") category first defined in systems, i.e., the LetterDigits ("A") category first defined in
[RFC5892] and listed here under Section 7.1. [RFC5892] and listed here under Section 7.1.
o Code points in the range U+0021 through U+007E, i.e., the o Code points in the range U+0021 through U+007E, i.e., the
(printable) ASCII7 ("K") rule defined under Section 7.11. These (printable) ASCII7 ("K") rule defined under Section 7.11. These
code points are "grandfathered" into PRECIS and thus are valid code points are "grandfathered" into PRECIS and thus are valid
even if they would otherwise be disallowed according to the even if they would otherwise be disallowed according to the
property-based rules specified in the next section. property-based rules specified in the next section.
Although the PRECIS IdentifierClass re-uses the LetterDigits category Note: Although the PRECIS IdentifierClass re-uses the LetterDigits
from IDNA2008, the range of characters allowed in the IdentifierClass category from IDNA2008, the range of characters allowed in the
is wider than the range of characters allowed in IDNA2008. The main IdentifierClass is wider than the range of characters allowed in
reason is that IDNA2008 applies the Unstable category before the IDNA2008. The main reason is that IDNA2008 applies the Unstable
LetterDigits category, thus disallowing uppercase characters, whereas category before the LetterDigits category, thus disallowing uppercase
the IdentifierClass does not apply the Unstable category. characters, whereas the IdentifierClass does not apply the Unstable
category.
3.2.2. Contextual Rule Required 3.2.2. Contextual Rule Required
o A number of characters from the Exceptions ("F") category defined o A number of characters from the Exceptions ("F") category defined
under Section 7.6 (see Section 7.6 for a full list). under Section 7.6 (see Section 7.6 for a full list).
o Joining characters, i.e., the JoinControl ("H") category defined o Joining characters, i.e., the JoinControl ("H") category defined
under Section 7.8. under Section 7.8.
3.2.3. Disallowed 3.2.3. Disallowed
skipping to change at page 10, line 37 skipping to change at page 11, line 6
[I-D.ietf-xmpp-6122bis]. [I-D.ietf-xmpp-6122bis].
4.1.1. Width Mapping 4.1.1. Width Mapping
The width mapping rule of a profile specifies whether width mapping The width mapping rule of a profile specifies whether width mapping
is performed on fullwidth and halfwidth characters, and how the is performed on fullwidth and halfwidth characters, and how the
mapping is done (e.g., mapping fullwidth and halfwidth characters to mapping is done (e.g., mapping fullwidth and halfwidth characters to
their decomposition equivalents). their decomposition equivalents).
The normalization form specified by a profile (see below) has an The normalization form specified by a profile (see below) has an
impact on the need for width mapping. Because one aspect of Unicode impact on the need for width mapping. Because width mapping is
normalization form KC (NFKC) is width mapping, a profile that uses performed as a part of compatibility decomposition, a profile
NFKC does not need to specify width mapping. However, if Unicode employing either normalization form KD (NFKD) or normalization form
normalization form C (NFC) is used then the profile needs to specify KC (NFKC) does not need to specify width mapping. However, if
whether to apply width mapping; in this case, width mapping is in Unicode normalization form C (NFC) is used then the profile needs to
general RECOMMENDED because allowing fullwidth and halfwidth specify whether to apply width mapping; in this case, width mapping
is in general RECOMMENDED because allowing fullwidth and halfwidth
characters to remain unmapped to their decomposition equivalents characters to remain unmapped to their decomposition equivalents
would violate the principle of least user surprise. For more would violate the principle of least user surprise. For more
information about the concept of width in East Asian scripts within information about the concept of width in East Asian scripts within
Unicode, see for instance [UAX11]. Unicode, see Unicode Standard Annex #11 [UAX11].
4.1.2. Additional Mappings 4.1.2. Additional Mappings
The additional mappings rule of a profile specifies whether The additional mappings rule of a profile specifies whether
additional mappings are to be applied, such as mapping of delimiter additional mappings are to be applied, such as mapping of delimiter
characters, mapping of special characters (e.g., non-ASCII space characters, mapping of special characters (e.g., non-ASCII space
characters to ASCII space or certain characters to nothing), and case characters to ASCII space or certain characters to nothing), and case
mapping based on language and local context (see mapping based on locale or on locale and context (see
[I-D.ietf-precis-mappings]). [I-D.ietf-precis-mappings]).
4.1.3. Case Mapping 4.1.3. Case Mapping
The case mapping rule of a profile specifies whether case mapping is The case mapping rule of a profile specifies whether case mapping is
performed (instead of case preservation) on uppercase and titlecase performed (instead of case preservation) on uppercase and titlecase
characters, and how the mapping is done (e.g., mapping uppercase and characters, and how the mapping is done (e.g., mapping uppercase and
titlecase characters to their lowercase equivalents). titlecase characters to their lowercase equivalents).
Use of the Unicode Default Case Folding algorithm is RECOMMENDED. If case preservation is not desired, it is RECOMMENDED to use Unicode
Default Case Folding as defined in Chapter 3 of the Unicode Standard
[UNICODE].
In order to maximize entropy and minimize the potential for false In order to maximize entropy and minimize the potential for false
positives, it is NOT RECOMMENDED for application protocols to map positives, it is NOT RECOMMENDED for application protocols to map
uppercase and titlecase code points to their lowercase equivalents uppercase and titlecase code points to their lowercase equivalents
when strings conforming to the FreeformClass, or a profile thereof, when strings conforming to the FreeformClass, or a profile thereof,
are used in passwords; instead, it is RECOMMENDED to preserve the are used in passwords; instead, it is RECOMMENDED to preserve the
case of all code points contained in such strings and then perform case of all code points contained in such strings and then perform
case-sensitive comparison. See also the related discussion in case-sensitive comparison. See also the related discussion in
[I-D.ietf-precis-saslprepbis]. [I-D.ietf-precis-saslprepbis].
4.1.4. Normalization 4.1.4. Normalization
The normalization rule of a profile specifies which Unicode The normalization rule of a profile specifies which Unicode
normalization form (D, KD, C, or KC) is to be applied (see [UAX15] normalization form (D, KD, C, or KC) is to be applied (see Unicode
for background information). Standard Annex #15 [UAX15] for background information).
In accordance with [RFC5198], normalization form C (NFC) is In accordance with [RFC5198], normalization form C (NFC) is
RECOMMENDED. RECOMMENDED.
4.1.5. Directionality 4.1.5. Directionality
The directionality rule of a profile specifies which strings are to The directionality rule of a profile specifies which strings are to
be considered left-to-right (LTR) and right-to-left (RTL), and the be considered left-to-right (LTR) and right-to-left (RTL), and the
allowable sequences of characters in LTR and RTL strings (see allowable sequences of characters in LTR and RTL strings (see Unicode
[UAX9]); note that mixed-direction strings are not supported, since Standard Annex #9 [UAX9]); note that mixed-direction strings are not
there is currently no widely accepted and implemented solution for supported, since there is currently no widely accepted and
the processing and display of mixed-direction strings. Possible implemented solution for the processing and display of mixed-
rules include, but are not limited to, (a) considering any string direction strings. Possible rules include, but are not limited to,
that contains a right-to-left code point to be a right-to-left (a) considering any string that contains a right-to-left code point
string, or (b) applying the "Bidi Rule" from [RFC5893]. to be a right-to-left string, or (b) applying the "Bidi Rule" from
[RFC5893].
4.1.6. Exclusions 4.1.6. Exclusions
The exclusions rule of a profile specifies whether the profile The exclusions rule of a profile specifies whether the profile
excludes additional code points or character categories above and excludes additional code points or character categories above and
beyond those excluded by the string class being profiled. That is, a beyond those excluded by the string class being profiled. That is, a
profile MAY do either of the following: profile MAY do either of the following:
1. Exclude specific code points that are allowed by the relevant 1. Exclude specific code points that are allowed by the relevant
string class. string class.
skipping to change at page 12, line 47 skipping to change at page 13, line 21
Similar techniques could be used to define many application-layer Similar techniques could be used to define many application-layer
constructs, say of the form "user@domain" or "/path/to/file". constructs, say of the form "user@domain" or "/path/to/file".
4.3. A Note about Spaces 4.3. A Note about Spaces
With regard to the IdentiferClass, the consensus of the PRECIS With regard to the IdentiferClass, the consensus of the PRECIS
Working Group was that spaces are problematic for many reasons, Working Group was that spaces are problematic for many reasons,
including: including:
o Many Unicode characters are confusable with ASCII space. o Many Unicode characters are confusable with ASCII space.
o Even if non-ASCII space characters are mapped to ASCII space o Even if non-ASCII space characters are mapped to ASCII space
(U+0020), space characters are often not rendered in user (U+0020), space characters are often not rendered in user
interfaces, leading to the possibility that human user might interfaces, leading to the possibility that a human user might
consider a string containing spaces to be equivalent to the same consider a string containing spaces to be equivalent to the same
string without spaces. string without spaces.
o In some locales, some devices are known to generate a character o In some locales, some devices are known to generate a character
other than ASCII space (such as ZERO WIDTH JOINER, U+200D) when a other than ASCII space (such as ZERO WIDTH JOINER, U+200D) when a
user performs an action like hit the space bar on a keyboard. user performs an action like hit the space bar on a keyboard.
One consequence of disallowing space characters in the One consequence of disallowing space characters in the
IdentifierClass might be to effectively discourage the use of ASCII IdentifierClass might be to effectively discourage the use of ASCII
space (or, even more problematically, non-ASCII space characters) space (or, even more problematically, non-ASCII space characters)
within identifiers created in newer application protocols; given the within identifiers created in newer application protocols; given the
challenges involved in properly handling space characters in challenges involved in properly handling space characters in
identifiers and other protocol strings, the Working Group considered identifiers and other protocol strings, the Working Group considered
this to be a feature, not a bug. this to be a feature, not a bug.
However, the FreeformClass does allow spaces, which enables However, the FreeformClass does allow spaces, which enables
application protocols to define profiles of the FreeformClass that application protocols to define profiles of the FreeformClass that
are more flexible than any profiles of the IdentifierClass. are more flexible than any profiles of the IdentifierClass. In
addition, as explained in the previous section, application protocols
can also define application-layer constructs containing spaces.
5. Order of Operations 5. Order of Operations
To ensure proper comparison, the following order of operations is To ensure proper comparison, the following order of operations is
REQUIRED: REQUIRED:
1. Width mapping 1. Width mapping
2. Additional mappings as specified in [I-D.ietf-precis-mappings]: 2. Optionally, additional mappings such as those as specified in
[I-D.ietf-precis-mappings]:
1. Delimiter mapping 1. Delimiter mapping
2. Special mapping 2. Special mapping
3. Local case mapping 3. Local case mapping
3. Non-local case mapping 3. Non-local case mapping
4. Normalization 4. Normalization
5. Behavioral rules for determining whether a code point is valid, 5. Behavioral rules for determining whether a code point is valid,
allowed under a contextual rule, disallowed, or unassigned allowed under a contextual rule, disallowed, or unassigned
As already described, the width mapping, additional mapping, non- As already described, the width mapping, additional mapping, non-
local case mapping, and normalization operations are specified for local case mapping, and normalization operations are specified for
skipping to change at page 14, line 17 skipping to change at page 14, line 42
relevant application protocol. relevant application protocol.
This document is not intended to specify precisely how derived This document is not intended to specify precisely how derived
property values are to be applied in protocol strings. That property values are to be applied in protocol strings. That
information is the responsibility of the protocol specification that information is the responsibility of the protocol specification that
uses or profiles a PRECIS string class from this document. uses or profiles a PRECIS string class from this document.
The value of the property is to be interpreted as follows. The value of the property is to be interpreted as follows.
PROTOCOL VALID Those code points that are allowed to be used in any PROTOCOL VALID Those code points that are allowed to be used in any
PRECIS string class (IdentifierClass and FreeformClass). Code PRECIS string class (currently, IdentifierClass and
points with this property value are permitted for general use in FreeformClass). Code points with this property value are
any string class. The abbreviated term PVALID is used to refer to permitted for general use in any string class. The abbreviated
this value in the remainder of this document. term "PVALID" is used to refer to this value in the remainder of
this document.
SPECIFIC CLASS PROTOCOL VALID Those code points that are allowed to SPECIFIC CLASS PROTOCOL VALID Those code points that are allowed to
be used in specific string classes. Code points with this be used in specific string classes. Code points with this
property value are permitted for use in specific string classes. property value are permitted for use in specific string classes.
In the remainder of this document, the abbreviated term *_PVAL is In the remainder of this document, the abbreviated term *_PVAL is
used, where * = (ID | FREE), i.e., either FREE_PVAL or ID_PVAL. used, where * = (ID | FREE), i.e., either "FREE_PVAL" or
"ID_PVAL".
CONTEXTUAL RULE REQUIRED Some characteristics of the character, such CONTEXTUAL RULE REQUIRED Some characteristics of the character, such
as its being invisible in certain contexts or problematic in as its being invisible in certain contexts or problematic in
others, require that it not be used in labels unless specific others, require that it not be used in labels unless specific
other characters or properties are present. The abbreviated term other characters or properties are present. As in IDNA2008, there
CONTEXT is used to refer to this value in the remainder of this are two subdivisions of CONTEXTUAL RULE REQUIRED, the first for
document. As in IDNA2008, there are two subdivisions of Join_controls (called "CONTEXTJ") and the second for other
CONTEXTUAL RULE REQUIRED, the first for Join_controls (called characters (called "CONTEXTO"). A character with the derived
CONTEXTJ) and the second for other characters (called CONTEXTO). property value CONTEXTJ or CONTEXTO MUST NOT be used unless an
A character with the derived property value CONTEXTJ or CONTEXTO appropriate rule has been established and the context of the
(CONTEXTUAL RULE REQUIRED) MUST NOT be used unless an appropriate character is consistent with that rule. The most notable of the
rule has been established and the context of the character is CONTEXTUAL RULE REQUIRED characters are the Join Control
consistent with that rule. characters U+200D ZERO WIDTH JOINER and U+200C ZERO WIDTH NON-
DISALLOWED Those code points that must not permitted in any PRECIS JOINER, which have a derived property value of CONTEXTJ. See
Appendix A of [RFC5892] for more information.
DISALLOWED Those code points that are not permitted in any PRECIS
string class. string class.
SPECIFIC CLASS DISALLOWED Those code points that are not to be SPECIFIC CLASS DISALLOWED Those code points that are not to be
included in a specific string class. Code points with this included in a specific string class. Code points with this
property value are not permitted in one of the string classes but property value are not permitted in one of the string classes but
might be permitted in others. In the remainder of this document, might be permitted in others. In the remainder of this document,
the abbreviated term *_DIS is used, where * = (ID | FREE), i.e., the abbreviated term *_DIS is used, where * = (ID | FREE), i.e.,
either FREE_DIS or ID_DIS. either "FREE_DIS" or "ID_DIS".
UNASSIGNED Those code points that are not designated (i.e. are UNASSIGNED Those code points that are not designated (i.e. are
unassigned) in the Unicode Standard. unassigned) in the Unicode Standard.
The mechanisms described here allow determination of the value of the The mechanisms described here allow determination of the value of the
property for future versions of Unicode (including characters added property for future versions of Unicode (including characters added
after Unicode 5.2 or 6.2 depending on the category, since some after Unicode 5.2 or 6.3 depending on the category, since some
categories in this document are reused from IDNA2008 and therefore categories in this document are reused from IDNA2008 and therefore
were defined at the time of Unicode 5.2). Changes in Unicode were defined at the time of Unicode 5.2). Changes in Unicode
properties that do not affect the outcome of this process do not properties that do not affect the outcome of this process therefore
affect this framework. For example, a character can have its Unicode do not affect this framework. For example, a character can have its
General_Category value [UNICODE] change from So to Sm, or from Lo to Unicode General_Category value [UNICODE] change from So to Sm, or
Ll, without affecting the algorithm results. Moreover, even if such from Lo to Ll, without affecting the algorithm results. Moreover,
changes were to result, the BackwardCompatible list (Section 7.7) can even if such changes were to result, the BackwardCompatible list
be adjusted to ensure the stability of the results. (Section 7.7) can be adjusted to ensure the stability of the results.
Some code points need to be allowed in exceptional circumstances, but
ought to be excluded in all other cases; these rules are also
described in other documents. The most notable of these are the Join
Control characters, U+200D ZERO WIDTH JOINER and U+200C ZERO WIDTH
NON-JOINER. Both of them have the derived property value CONTEXTJ.
A character with the derived property value CONTEXTJ or CONTEXTO
(CONTEXTUAL RULE REQUIRED) is not to be used unless an appropriate
rule has been established and the context of the character is
consistent with that rule. It is invalid to generate a string
containing these characters unless such a contextual rule is found
and satisfied. PRECIS does not define its own contextual rules, but
instead re-uses the contextual rules defined for IDNA2008; please see
Appendix A of [RFC5892] for more information.
7. Category Definitions Used to Calculate Derived Property 7. Category Definitions Used to Calculate Derived Property
The derived property obtains its value based on a two-step procedure: The derived property obtains its value based on a two-step procedure:
1. Characters are placed in one or more character categories either 1. Characters are placed in one or more character categories either
(1) based on core properties defined by the Unicode Standard or (1) based on core properties defined by the Unicode Standard or
(2) by treating the code point as an exception and addressing the (2) by treating the code point as an exception and addressing the
code point based on its code point value. These categories are code point based on its code point value. These categories are
not mutually exclusive. not mutually exclusive.
2. Set operations are used with these categories to determine the 2. Set operations are used with these categories to determine the
values for a property that is specific to a given string class. values for a property specific to a given string class. These
These operations are specified under Section 8. operations are specified under Section 8.
(Note: Unicode property names and property value names might have (Note: Unicode property names and property value names might have
short abbreviations, such as "gc" for the General_Category property short abbreviations, such as "gc" for the General_Category property
and "Ll" for the Lowercase_Letter property value of the gc property.) and "Ll" for the Lowercase_Letter property value of the gc property.)
In the following specification of character categories, the operation In the following specification of character categories, the operation
that returns the value of a particular Unicode character property for that returns the value of a particular Unicode character property for
a code point is designated by using the formal name of that property a code point is designated by using the formal name of that property
(from the Unicode PropertyAliases.txt [1]) followed by '(cp)' for (from the Unicode PropertyAliases.txt [1]) followed by '(cp)' for
"code point". For example, the value of the General_Category "code point". For example, the value of the General_Category
skipping to change at page 16, line 20 skipping to change at page 16, line 46
7.1. LetterDigits (A) 7.1. LetterDigits (A)
Note: This category is defined in [RFC5892] and copied here for use Note: This category is defined in [RFC5892] and copied here for use
in PRECIS. in PRECIS.
A: General_Category(cp) is in {Ll, Lu, Lm, Lo, Mn, Mc, Nd} A: General_Category(cp) is in {Ll, Lu, Lm, Lo, Mn, Mc, Nd}
These rules identify characters commonly used in mnemonics and often These rules identify characters commonly used in mnemonics and often
informally described as "language characters". informally described as "language characters".
For more information, see section 4.5 of [UNICODE]. For more information, see Chapter 4 of the Unicode Standard
[UNICODE].
The categories used in this rule are: The categories used in this rule are:
o Ll - Lowercase_Letter o Ll - Lowercase_Letter
o Lu - Uppercase_Letter o Lu - Uppercase_Letter
o Lm - Modifier_Letter o Lm - Modifier_Letter
o Lo - Other_Letter o Lo - Other_Letter
o Mn - Nonspacing_Mark o Mn - Nonspacing_Mark
o Mc - Spacing_Mark o Mc - Spacing_Mark
o Nd - Decimal_Number o Nd - Decimal_Number
skipping to change at page 17, line 19 skipping to change at page 17, line 45
F: cp is in {00B7, 00DF, 0375, 03C2, 05F3, 05F4, 0640, 0660, F: cp is in {00B7, 00DF, 0375, 03C2, 05F3, 05F4, 0640, 0660,
0661, 0662, 0663, 0664, 0665, 0666, 0667, 0668, 0661, 0662, 0663, 0664, 0665, 0666, 0667, 0668,
0669, 06F0, 06F1, 06F2, 06F3, 06F4, 06F5, 06F6, 0669, 06F0, 06F1, 06F2, 06F3, 06F4, 06F5, 06F6,
06F7, 06F8, 06F9, 06FD, 06FE, 07FA, 0F0B, 3007, 06F7, 06F8, 06F9, 06FD, 06FE, 07FA, 0F0B, 3007,
302E, 302F, 3031, 3032, 3033, 3034, 3035, 303B, 302E, 302F, 3031, 3032, 3033, 3034, 3035, 303B,
30FB} 30FB}
This category explicitly lists code points for which the category This category explicitly lists code points for which the category
cannot be assigned using only the core property values that exist in cannot be assigned using only the core property values that exist in
the Unicode standard. The values are according to the table below: the Unicode Standard. The values are according to the table below:
PVALID -- Would otherwise have been DISALLOWED PVALID -- Would otherwise have been DISALLOWED
00DF; PVALID # LATIN SMALL LETTER SHARP S 00DF; PVALID # LATIN SMALL LETTER SHARP S
03C2; PVALID # GREEK SMALL LETTER FINAL SIGMA 03C2; PVALID # GREEK SMALL LETTER FINAL SIGMA
06FD; PVALID # ARABIC SIGN SINDHI AMPERSAND 06FD; PVALID # ARABIC SIGN SINDHI AMPERSAND
06FE; PVALID # ARABIC SIGN SINDHI POSTPOSITION MEN 06FE; PVALID # ARABIC SIGN SINDHI POSTPOSITION MEN
0F0B; PVALID # TIBETAN MARK INTERSYLLABIC TSHEG 0F0B; PVALID # TIBETAN MARK INTERSYLLABIC TSHEG
3007; PVALID # IDEOGRAPHIC NUMBER ZERO 3007; PVALID # IDEOGRAPHIC NUMBER ZERO
skipping to change at page 19, line 21 skipping to change at page 19, line 49
in PRECIS. in PRECIS.
I: Hangul_Syllable_Type(cp) is in {L, V, T} I: Hangul_Syllable_Type(cp) is in {L, V, T}
This category consists of all conjoining Hangul Jamo (Leading Jamo, This category consists of all conjoining Hangul Jamo (Leading Jamo,
Vowel Jamo, and Trailing Jamo). Vowel Jamo, and Trailing Jamo).
Elimination of conjoining Hangul Jamos from the set of PVALID Elimination of conjoining Hangul Jamos from the set of PVALID
characters results in restricting the set of Korean PVALID characters characters results in restricting the set of Korean PVALID characters
just to preformed, modern Hangul syllable characters. Old Hangul just to preformed, modern Hangul syllable characters. Old Hangul
syllables, which must be spelled with sequences of conjoining Hangul syllables, which are spelled with sequences of conjoining Hangul
Jamos, are not PVALID for string classes. Jamos, are not PVALID for string classes.
7.10. Unassigned (J) 7.10. Unassigned (J)
Note: This category is defined in [RFC5892] and copied here for use Note: This category is defined in [RFC5892] and copied here for use
in PRECIS. in PRECIS.
J: General_Category(cp) is in {Cn} and J: General_Category(cp) is in {Cn} and
Noncharacter_Code_Point(cp) = False Noncharacter_Code_Point(cp) = False
This category consists of code points in the Unicode character set This category consists of code points in the Unicode character set
that are not (yet) designated. It should be noted that Unicode that are not (yet) designated. Implementers might want to keep in
distinguishes between 'unassigned code points' and 'unassigned mind that the Unicode Standard distinguishes between 'unassigned code
characters'. The unassigned code points are all but (Cn - points' and 'unassigned characters'. The unassigned code points are
Noncharacters), while the unassigned characters are all but (Cn + all but (Cn - Noncharacters), whereas the unassigned characters are
Cs). all but (Cn + Cs).
7.11. ASCII7 (K) 7.11. ASCII7 (K)
This PRECIS-specific category exempts most characters in the This PRECIS-specific category consists of all printable, non-space
(printable) ASCII-7 range from other rules that might be applied characters from the 7-bit ASCII range. By applying this category,
during PRECIS processing, on the assumption that these code points the algorithm specified under Section 8 exempts these characters from
are in such wide use that disallowing them would be counter- other rules that might be applied during PRECIS processing, on the
productive. assumption that these code points are in such wide use that
disallowing them would be counter-productive.
K: cp is in {0021..007E} K: cp is in {0021..007E}
7.12. Controls (L) 7.12. Controls (L)
L: Control(cp) = True L: Control(cp) = True
7.13. PrecisIgnorableProperties (M) 7.13. PrecisIgnorableProperties (M)
This PRECIS-specific category is used to group code points that are This PRECIS-specific category is used to group code points that are
not recommended for use in PRECIS string classes. discouraged from use in PRECIS string classes.
M: Default_Ignorable_Code_Point(cp) = True or M: Default_Ignorable_Code_Point(cp) = True or
Noncharacter_Code_Point(cp) = True Noncharacter_Code_Point(cp) = True
The definition for Default_Ignorable_Code_Point can be found in the The definition for Default_Ignorable_Code_Point can be found in the
DerivedCoreProperties.txt [2] file, and at the time of Unicode 6.2 is DerivedCoreProperties.txt [2] file, and at the time of Unicode 6.3 is
as follows: as follows:
Other_Default_Ignorable_Code_Point Other_Default_Ignorable_Code_Point
+ Cf (Format characters) + Cf (Format characters)
+ Variation_Selector + Variation_Selector
- White_Space - White_Space
- FFF9..FFFB (Annotation Characters) - FFF9..FFFB (Annotation Characters)
- 0600..0604, 06DD, 070F, 110BD (exceptional Cf characters - 0600..0604, 06DD, 070F, 110BD (exceptional Cf characters
that should be visible) that should be visible)
skipping to change at page 21, line 9 skipping to change at page 21, line 31
This PRECIS-specific category is used to group code points that are This PRECIS-specific category is used to group code points that are
punctuation characters. punctuation characters.
P: General_Category(cp) is in {Pc, Pd, Ps, Pe, Pi, Pf, Po} P: General_Category(cp) is in {Pc, Pd, Ps, Pe, Pi, Pf, Po}
7.17. HasCompat (Q) 7.17. HasCompat (Q)
This PRECIS-specific category is used to group code points that have This PRECIS-specific category is used to group code points that have
compatibility equivalents as explained in Chapter 2 and Chapter 3 of compatibility equivalents as explained in Chapter 2 and Chapter 3 of
[UNICODE]. the Unicode Standard [UNICODE].
Q: toNFKC(cp) != cp Q: toNFKC(cp) != cp
The toNFKC() operation returns the code point in normalization form The toNFKC() operation returns the code point in normalization form
KC. For more information, see Section 5 of [UAX15]. KC. For more information, see Section 5 of Unicode Standard Annex
#15 [UAX15].
7.18. OtherLetterDigits (R) 7.18. OtherLetterDigits (R)
This PRECIS-specific category is used to group code points that are This PRECIS-specific category is used to group code points that are
letters and digits other than the "traditional" letters and digits letters and digits other than the "traditional" letters and digits
grouped under the LetterDigits (A) class (see Section 7.1). grouped under the LetterDigits (A) class (see Section 7.1).
R: General_Category(cp) is in {Lt, Nl, No, Me} R: General_Category(cp) is in {Lt, Nl, No, Me}
8. Calculation of the Derived Property 8. Calculation of the Derived Property
skipping to change at page 21, line 51 skipping to change at page 22, line 28
space character such as U+0020 would be assigned to ID_DIS, whereas space character such as U+0020 would be assigned to ID_DIS, whereas
if an identifier is defined as profiling the PRECIS FreeformClass if an identifier is defined as profiling the PRECIS FreeformClass
then the character would be assigned to FREE_PVAL. For the sake of then the character would be assigned to FREE_PVAL. For the sake of
brevity, the designation "FREE_PVAL" is used in the code point brevity, the designation "FREE_PVAL" is used in the code point
tables, instead of the longer designation "ID_DIS or FREE_PVAL". In tables, instead of the longer designation "ID_DIS or FREE_PVAL". In
practice, the derived properties ID_PVAL and FREE_DIS are not used in practice, the derived properties ID_PVAL and FREE_DIS are not used in
this specification, since every ID_PVAL code point is PVALID and this specification, since every ID_PVAL code point is PVALID and
every FREE_DIS code point is DISALLOWED. every FREE_DIS code point is DISALLOWED.
The algorithm to calculate the value of the derived property is as The algorithm to calculate the value of the derived property is as
follows. (Note: Use of the name of a rule (such as "Exception") follows:
implies the set of code points that the rule defines, whereas the
same name as a function call (such as "Exception(cp)") implies the
value that the code point has in the Exceptions table.)
If .cp. .in. Exceptions Then Exceptions(cp); If .cp. .in. Exceptions Then Exceptions(cp);
Else If .cp. .in. BackwardCompatible Then BackwardCompatible(cp); Else If .cp. .in. BackwardCompatible Then BackwardCompatible(cp);
Else If .cp. .in. Unassigned Then UNASSIGNED; Else If .cp. .in. Unassigned Then UNASSIGNED;
Else If .cp. .in. ASCII7 Then PVALID; Else If .cp. .in. ASCII7 Then PVALID;
Else If .cp. .in. JoinControl Then CONTEXTJ; Else If .cp. .in. JoinControl Then CONTEXTJ;
Else If .cp. .in. OldHangulJamo Then DISALLOWED; Else If .cp. .in. OldHangulJamo Then DISALLOWED;
Else If .cp. .in. PrecisIgnorableProperties Then DISALLOWED; Else If .cp. .in. PrecisIgnorableProperties Then DISALLOWED;
Else If .cp. .in. Controls Then DISALLOWED; Else If .cp. .in. Controls Then DISALLOWED;
Else If .cp. .in. HasCompat Then ID_DIS or FREE_PVAL; Else If .cp. .in. HasCompat Then ID_DIS or FREE_PVAL;
Else If .cp. .in. LetterDigits Then PVALID; Else If .cp. .in. LetterDigits Then PVALID;
Else If .cp. .in. OtherLetterDigits Then ID_DIS or FREE_PVAL; Else If .cp. .in. OtherLetterDigits Then ID_DIS or FREE_PVAL;
Else If .cp. .in. Spaces Then ID_DIS or FREE_PVAL; Else If .cp. .in. Spaces Then ID_DIS or FREE_PVAL;
Else If .cp. .in. Symbols Then ID_DIS or FREE_PVAL; Else If .cp. .in. Symbols Then ID_DIS or FREE_PVAL;
Else If .cp. .in. Punctuation Then ID_DIS or FREE_PVAL; Else If .cp. .in. Punctuation Then ID_DIS or FREE_PVAL;
Else DISALLOWED; Else DISALLOWED;
Note: Use of the name of a rule (such as "Exceptions") implies the
set of code points that the rule defines, whereas the same name as a
function call (such as "Exceptions(cp)") implies the value that the
code point has in the Exceptions table.
9. IANA Considerations 9. IANA Considerations
9.1. PRECIS Derived Property Value Registry 9.1. PRECIS Derived Property Value Registry
IANA is requested to create a PRECIS-specific registry with the IANA is requested to create a PRECIS-specific registry with the
Derived Properties for the versions of Unicode that are released Derived Properties for the versions of Unicode that are released
after (and including) version 6.2. The derived property value is to after (and including) version 6.3. The derived property value is to
be calculated in cooperation with a designated expert [RFC5226] be calculated in cooperation with a designated expert [RFC5226]
according to the rules specified under Section 7 and Section 8, not according to the rules specified under Section 7 and Section 8, not
by copying the non-normative table found under Appendix A. by copying the non-normative table found under Appendix A.
The IESG is to be notified if backward-incompatible changes to the The IESG is to be notified if backward-incompatible changes to the
table of derived properties are discovered or if other problems arise table of derived properties are discovered or if other problems arise
during the process of creating the table of derived property values during the process of creating the table of derived property values
or during expert review. Changes to the rules defined under or during expert review. Changes to the rules defined under
Section 7 and Section 8 require IETF Review. Section 7 and Section 8 require IETF Review.
skipping to change at page 23, line 30 skipping to change at page 24, line 10
Description: A sequence of letters, numbers, and symbols that is Description: A sequence of letters, numbers, and symbols that is
used to identify or address a network entity. used to identify or address a network entity.
Specification: RFC XXXX. [Note to RFC Editor: please change XXXX to Specification: RFC XXXX. [Note to RFC Editor: please change XXXX to
the number issued for this specification.] the number issued for this specification.]
9.3. PRECIS Profiles Registry 9.3. PRECIS Profiles Registry
IANA is requested to create a registry of profiles that use the IANA is requested to create a registry of profiles that use the
PRECIS string classes. In accordance with [RFC5226], the PRECIS string classes. In accordance with [RFC5226], the
registration policy is "Expert Review". This policy was chosen in registration policy is "Expert Review". This policy was chosen in
order to ensure that "customers" of PRECIS receive appropriate order to ease the burden of registration while ensuring that
guidance regarding the sometimes complex and subtle "customers" of PRECIS receive appropriate guidance regarding the
internationalization issues related to profiles of PRECIS string sometimes complex and subtle internationalization issues related to
classes. profiles of PRECIS string classes.
The registration template is as follows: The registration template is as follows:
Name: [the name of the profile] Name: [the name of the profile]
Applicability: [the specific protocol elements to which this profile Applicability: [the specific protocol elements to which this profile
applies, e.g., "Localparts in XMPP addresses."] applies, e.g., "Localparts in XMPP addresses."]
Base Class: [which PRECIS string class is being profiled] Base Class: [which PRECIS string class is being profiled]
Replaces: [the Stringprep profile that this PRECIS profile replaces, Replaces: [the Stringprep profile that this PRECIS profile replaces,
if any] if any]
Width Mapping: [the behavioral rule for handling of width, e.g., Width Mapping: [the behavioral rule for handling of width, e.g.,
skipping to change at page 24, line 23 skipping to change at page 24, line 47
in the ASCII range" or "Any character that has a compatibility in the ASCII range" or "Any character that has a compatibility
equivalent, i.e., the HasCompat category"] equivalent, i.e., the HasCompat category"]
Enforcement: [which entities enforce the rules, and when that Enforcement: [which entities enforce the rules, and when that
enforcement occurs during protocol operations] enforcement occurs during protocol operations]
Specification: [a pointer to relevant documentation, such as an RFC Specification: [a pointer to relevant documentation, such as an RFC
or Internet-Draft] or Internet-Draft]
In order to request a review, the registrant shall send a completed In order to request a review, the registrant shall send a completed
template to the precis@ietf.org list or its designated successor. template to the precis@ietf.org list or its designated successor.
Factors to focus on while reviewing profile registrations include the Factors to focus on while defining profiles and reviewing profile
following: registrations include the following:
o Is the problem well-defined? o Is the problem being addressed by this profile well-defined?
o Does the specification define what kinds of applications are o Does the specification define what kinds of applications are
involved and the protocol elements to which this profile applies? involved and the protocol elements to which this profile applies?
o Would an existing PRECIS string class or profile solve the o Would an existing PRECIS string class or profile solve the
problem? problem?
o Are the defined exclusions a reasonable solution to the problem
for the relevant applications?
o Is the profile clearly defined? o Is the profile clearly defined?
o Does the profile reduce the degree to which human users could be o Does the profile reduce the degree to which human users could be
surprised by application behavior (the "principle of least user surprised by application behavior (the "principle of least user
surprise")? surprise")?
o Is the profile based on an appropriate dividing line between user o Is the profile based on an appropriate dividing line between user
interface (culture, context, intent, locale, device limitations, interface (culture, context, intent, locale, device limitations,
etc.) and the use of conformant strings in protocol elements? etc.) and the use of conformant strings in protocol elements?
o Are the normalization, case mapping, width mapping, additional o Are the width mapping, case mapping, additional mapping,
mapping, and directionality rules appropriate for the intended normalization, exclusion, and directionality rules appropriate for
use? the intended use?
o Does the profile explain which entities enforce the rules of the o Does the profile explain which entities enforce the rules, and
profile, and when such enforcement occurs during protocol when such enforcement occurs during protocol operations?
operations?
o Does the profile reduce the degree to which human users could be o Does the profile reduce the degree to which human users could be
surprised or confused by application behavior (the "principle of surprised or confused by application behavior (the "principle of
least user surprise")? least user surprise")?
o Does the profile introduce any new security concerns (e.g., false o Does the profile introduce any new security concerns such as those
positives for authentication or authorization)? described under Section 10 of this document (e.g., false positives
for authentication or authorization)?
10. Security Considerations 10. Security Considerations
10.1. General Issues 10.1. General Issues
The security of applications that use this framework can depend in The security of applications that use this framework can depend in
part on the proper preparation and comparison of internationalized part on the proper preparation and comparison of internationalized
strings. For example, such strings can be used to make strings. For example, such strings can be used to make
authentication and authorization decisions, and the security of an authentication and authorization decisions, and the security of an
application could be compromised if an entity providing a given application could be compromised if an entity providing a given
skipping to change at page 25, line 33 skipping to change at page 26, line 8
10.2. Use of the IdentifierClass 10.2. Use of the IdentifierClass
Strings that conform to the IdentifierClass and any profile thereof Strings that conform to the IdentifierClass and any profile thereof
are intended to be relatively safe for use in a broad range of are intended to be relatively safe for use in a broad range of
applications, primarily because they include only letters, digits, applications, primarily because they include only letters, digits,
and "grandfathered" non-space characters from the ASCII range; thus and "grandfathered" non-space characters from the ASCII range; thus
they exclude spaces, characters with compatibility equivalents, and they exclude spaces, characters with compatibility equivalents, and
almost all symbols and punctuation marks. However, because such almost all symbols and punctuation marks. However, because such
strings can still include so-called confusable characters (see strings can still include so-called confusable characters (see
Section 10.5, protocol designers and implementers are encouraged to Section 10.5), protocol designers and implementers are encouraged to
pay close attention to the security considerations described pay close attention to the security considerations described
elsewhere in this document. elsewhere in this document.
10.3. Use of the FreeformClass 10.3. Use of the FreeformClass
Strings that conform to the FreeformClass and many profiles thereof Strings that conform to the FreeformClass and many profiles thereof
can include virtually any Unicode character. This makes the can include virtually any Unicode character. This makes the
FreeformClass quite expressive, but also problematic from the FreeformClass quite expressive, but also problematic from the
perspective of possible user confusion. Protocol designers are perspective of possible user confusion. Protocol designers are
hereby warned that the FreeformClass contains codepoints they might hereby warned that the FreeformClass contains codepoints they might
not understand, and are encouraged to profile the IdentifierClass not understand, and are encouraged to profile the IdentifierClass
wherever feasible; however, if an application protocol requires more wherever feasible; however, if an application protocol requires more
code points than are allowed by the IdentifierClass, protocol code points than are allowed by the IdentifierClass, protocol
designers are encouraged to define a profile of the FreeformClass designers are encouraged to define a profile of the FreeformClass
that restricts the allowable code points as tightly as possible. that restricts the allowable code points as tightly as possible.
(The working group considered the option of allowing superclasses as (The PRECIS Working Group considered the option of allowing
well as profiles of PRECIS string classes, but decided against superclasses as well as profiles of PRECIS string classes, but
allowing superclasses to reduce the likelihood of security and decided against allowing superclasses to reduce the likelihood of
interoperability problems.) security and interoperability problems.)
10.4. Local Character Set Issues 10.4. Local Character Set Issues
When systems use local character sets other than ASCII and Unicode, When systems use local character sets other than ASCII and Unicode,
these specifications leave the problem of converting between the this specification leaves the problem of converting between the local
local character set and Unicode up to the application or local character set and Unicode up to the application or local system. If
system. If different applications (or different versions of one different applications (or different versions of one application)
application) implement different rules for conversions among coded implement different rules for conversions among coded character sets,
character sets, they could interpret the same name differently and they could interpret the same name differently and contact different
contact different application servers or other network entities. application servers or other network entities. This problem is not
This problem is not solved by security protocols, such as Transport solved by security protocols, such as Transport Layer Security (TLS)
Layer Security (TLS) [RFC5246] and the Simple Authentication and [RFC5246] and the Simple Authentication and Security Layer (SASL)
Security Layer (SASL) [RFC4422], that do not take local character [RFC4422], that do not take local character sets into account.
sets into account.
10.5. Visually Similar Characters 10.5. Visually Similar Characters
Some characters are visually similar and thus can cause confusion Some characters are visually similar and thus can cause confusion
among humans. Such characters are often called "confusable among humans. Such characters are often called "confusable
characters" or "confusables". characters" or "confusables".
The problem of confusable characters is not necessarily caused by the The problem of confusable characters is not necessarily caused by the
use of Unicode code points outside the ASCII range. For example, in use of Unicode code points outside the ASCII range. For example, in
some presentations and to some individuals the string "ju1iet" some presentations and to some individuals the string "ju1iet"
(spelled with the Arabic numeral one as the third character) might (spelled with DIGIT ONE, U+0031, as the third character) might appear
appear to be the same as "juliet" (spelled with the lowercase version to be the same as "juliet" (spelled with LATIN SMALL LETTER L,
of the letter "L"), especially on casual visual inspection. This U+006C), especially on casual visual inspection. This phenomenon is
phenomenon is sometimes called "typejacking". sometimes called "typejacking".
However, the problem is made more serious by introducing the full However, the problem is made more serious by introducing the full
range of Unicode code points into protocol strings. For example, the range of Unicode code points into protocol strings. For example, the
characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 from the characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 from the
Cherokee block look similar to the ASCII characters "STPETER" as they Cherokee block look similar to the ASCII characters "STPETER" as they
might look when presented using a "creative" font family. might appear when presented using a "creative" font family.
In some examples of confusable characters, it is unlikely that the In some examples of confusable characters, it is unlikely that the
average human could tell the difference between the real string and average human could tell the difference between the real string and
the fake string. (Indeed, there is no programmatic way to the fake string. (Indeed, there is no programmatic way to
distinguish with full certainty which is the fake string and which is distinguish with full certainty which is the fake string and which is
the real string; in some contexts, the string formed of Cherokee the real string; in some contexts, the string formed of Cherokee
characters might be the real string and the string formed of ASCII characters might be the real string and the string formed of ASCII
characters might be the fake string.) Because PRECIS-compliant characters might be the fake string.) Because PRECIS-compliant
strings can contain almost any properly-encoded Unicode code point, strings can contain almost any properly-encoded Unicode code point,
it can be relatively easy to fake or mimic some strings in systems it can be relatively easy to fake or mimic some strings in systems
skipping to change at page 27, line 17 skipping to change at page 27, line 40
Considerations [UTR36] and the Unicode Security Mechanisms [UTS39], Considerations [UTR36] and the Unicode Security Mechanisms [UTS39],
it is also true (as noted in [RFC5890]) that "there are no it is also true (as noted in [RFC5890]) that "there are no
comprehensive technical solutions to the problems of confusable comprehensive technical solutions to the problems of confusable
characters". Because it is impossible to map visually similar characters". Because it is impossible to map visually similar
characters without a great deal of context (such as knowing the font characters without a great deal of context (such as knowing the font
families used), the PRECIS framework does nothing to map similar- families used), the PRECIS framework does nothing to map similar-
looking characters together, nor does it prohibit some characters looking characters together, nor does it prohibit some characters
because they look like others. because they look like others.
Nevertheless, specifications for application protocols that use this Nevertheless, specifications for application protocols that use this
framework MUST describe how confusable characters can be used to framework MUST describe how confusable characters can be abused to
compromise the security of systems that use the protocol in question, compromise the security of systems that use the protocol in question,
along with any protocol-specific suggestions for overcoming those along with any protocol-specific suggestions for overcoming those
threats. In particular, software implementations and service threats. In particular, software implementations and service
deployments that use PRECIS-based technologies are strongly deployments that use PRECIS-based technologies are strongly
encouraged to define and implement consistent policies regarding the encouraged to define and implement consistent policies regarding the
registration, storage, and presentation of visually similar registration, storage, and presentation of visually similar
characters. The following recommendations are appropriate: characters. The following recommendations are appropriate:
1. An application service SHOULD define a policy that specifies the 1. An application service SHOULD define a policy that specifies the
scripts or blocks of characters that the service will allow to be scripts or blocks of characters that the service will allow to be
skipping to change at page 29, line 9 skipping to change at page 29, line 33
Although strings that are consumed in PRECIS-based application Although strings that are consumed in PRECIS-based application
protocols are often encoded using UTF-8 [RFC3629], the exact encoding protocols are often encoded using UTF-8 [RFC3629], the exact encoding
is a matter for the application protocol that uses PRECIS, not for is a matter for the application protocol that uses PRECIS, not for
the PRECIS framework. the PRECIS framework.
It is known that some existing systems are unable to support the full It is known that some existing systems are unable to support the full
Unicode character set, or even any characters outside the ASCII Unicode character set, or even any characters outside the ASCII
range. If two (or more) applications need to interoperate when range. If two (or more) applications need to interoperate when
exchanging data (e.g., for the purpose of authenticating a username exchanging data (e.g., for the purpose of authenticating a username
or password), they will naturally need have in common at least one or password), they will naturally need to have in common at least one
coded character set (as defined by [RFC6365]). Establishing such a coded character set (as defined by [RFC6365]). Establishing such a
baseline is a matter for the application protocol that uses PRECIS, baseline is a matter for the application protocol that uses PRECIS,
not for the PRECIS framework. not for the PRECIS framework.
The PRECIS framework, which is defined in terms of the latest version The PRECIS framework, which is defined in terms of the latest version
of Unicode as of the time of this writing (6.2), treats the character of Unicode as of the time of this writing (6.3), treats the character
U+19DA NEW TAI LUE THAM as DISALLOWED. Implementers need to be aware U+19DA NEW TAI LUE THAM as DISALLOWED. Implementers need to be aware
that this treatment is different from IDNA2008 (originally defined in that this treatment is different from IDNA2008 (originally defined in
terms of Unicode 5.2), which treats U+19DA as PVALID. terms of Unicode 5.2), which treats U+19DA as PVALID.
12. References 12. References
12.1. Normative References 12.1. Normative References
[I-D.ietf-precis-mappings] [I-D.ietf-precis-mappings]
Yoneya, Y. and T. NEMOTO, "Mapping characters for PRECIS Yoneya, Y. and T. NEMOTO, "Mapping characters for PRECIS
classes", draft-ietf-precis-mappings-04 (work in classes", draft-ietf-precis-mappings-05 (work in
progress), October 2013. progress), October 2013.
[RFC20] Cerf, V., "ASCII format for network interchange", RFC 20, [RFC20] Cerf, V., "ASCII format for network interchange", RFC 20,
October 1969. October 1969.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network
Interchange", RFC 5198, March 2008. Interchange", RFC 5198, March 2008.
[UNICODE] The Unicode Consortium, "The Unicode Standard, Version [UNICODE] The Unicode Consortium, "The Unicode Standard", 2013,
6.2", 2012, <http://www.unicode.org/versions/latest/>.
<http://www.unicode.org/versions/Unicode6.2.0/>.
12.2. Informative References 12.2. Informative References
[I-D.ietf-precis-nickname] [I-D.ietf-precis-nickname]
Saint-Andre, P., "Preparation and Comparison of Saint-Andre, P., "Preparation and Comparison of
Nicknames", draft-ietf-precis-nickname-06 (work in Nicknames", draft-ietf-precis-nickname-07 (work in
progress), July 2013. progress), October 2013.
[I-D.ietf-precis-saslprepbis] [I-D.ietf-precis-saslprepbis]
Saint-Andre, P. and A. Melnikov, "Username and Password Saint-Andre, P. and A. Melnikov, "Username and Password
Preparation Algorithms", draft-ietf-precis-saslprepbis-04 Preparation Algorithms", draft-ietf-precis-saslprepbis-05
(work in progress), August 2013. (work in progress), October 2013.
[I-D.ietf-xmpp-6122bis] [I-D.ietf-xmpp-6122bis]
Saint-Andre, P., "Extensible Messaging and Presence Saint-Andre, P., "Extensible Messaging and Presence
Protocol (XMPP): Address Format", Protocol (XMPP): Address Format",
draft-ietf-xmpp-6122bis-07 (work in progress), April 2013. draft-ietf-xmpp-6122bis-09 (work in progress),
November 2013.
[RFC2865] Rigney, C., Willens, S., Rubens, A., and W. Simpson, [RFC2865] Rigney, C., Willens, S., Rubens, A., and W. Simpson,
"Remote Authentication Dial In User Service (RADIUS)", "Remote Authentication Dial In User Service (RADIUS)",
RFC 2865, June 2000. RFC 2865, June 2000.
[RFC3454] Hoffman, P. and M. Blanchet, "Preparation of [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
Internationalized Strings ("stringprep")", RFC 3454, Internationalized Strings ("stringprep")", RFC 3454,
December 2002. December 2002.
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
skipping to change at page 32, line 18 skipping to change at page 32, line 43
URIs URIs
[1] <http://unicode.org/Public/UNIDATA/PropertyAliases.txt> [1] <http://unicode.org/Public/UNIDATA/PropertyAliases.txt>
[2] <http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt> [2] <http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt>
Appendix A. Codepoint Table Appendix A. Codepoint Table
If one applies the property calculation rules from Section 8 to the If one applies the property calculation rules from Section 8 to the
code points 0x0000 to 0x10FFFF in Unicode 6.2, the result is as shown code points 0x0000 to 0x10FFFF in Unicode 6.3, the result is as shown
in the following table, in Unicode Character Database (UCD) format. in the following table, in Unicode Character Database (UCD) format.
The columns of the table are as follows: The columns of the table are as follows:
1. The code point or codepoint range. 1. The code point or codepoint range.
2. The assignment for the code point or range, where the value is 2. The assignment for the code point or range, where the value is
one of PVALID, DISALLOWED, UNASSIGNED, CONTEXTO, CONTEXTJ, or one of PVALID, DISALLOWED, UNASSIGNED, CONTEXTO, CONTEXTJ, or
FREE_PVAL (which includes ID_DIS). FREE_PVAL (where the latter includes ID_DIS).
3. The name or names for the code point or range. 3. The name or names for the code point or range.
This table is non-normative, is included only for illustrative This table is non-normative, is included only for illustrative
purposes, and applies only to Unicode 6.2, not to past or future purposes, and applies only to Unicode 6.3, not to past or future
versions of Unicode. Please note that the strings displayed in the versions of Unicode. Please note that the strings displayed in the
third column are not necessarily the formal name of the code point third column are not necessarily the formal name of the code point
(as defined in [UNICODE]) because the fixed width of the RFC format (as defined in [UNICODE]) because the fixed width of the RFC format
necessitated truncation of many names. necessitated truncation of many names.
0000..001F ; DISALLOWED # <control> 0000..001F ; DISALLOWED # <control>
0020 ; FREE_PVAL # SPACE 0020 ; FREE_PVAL # SPACE
0021..007E ; PVALID # EXCLAM MARK .. TILDE 0021..007E ; PVALID # EXCLAM MARK..TILDE
007F..009F ; DISALLOWED # <control> 007F..009F ; DISALLOWED # <control>
00A0..00AC ; FREE_PVAL # NO-BREAK SPACE .. NOT SIGN 00A0..00AC ; FREE_PVAL # NO-BREAK SPACE..NOT SIGN
00AD ; DISALLOWED # SOFT HYPH 00AD ; DISALLOWED # SOFT HYPH
00AE..00B6 ; FREE_PVAL # REGISTERED SIGN .. PILCROW SIGN 00AE..00B6 ; FREE_PVAL # REGISTERED SIGN..PILCROW SIGN
00B7 ; CONTEXTO # MIDDLE DOT 00B7 ; CONTEXTO # MIDDLE DOT
00B8..00BF ; FREE_PVAL # CEDILLA..INV QUEST IND 00B8..00BF ; FREE_PVAL # CEDILLA..INV QUEST IND
00C0..00D6 ; PVALID # LAT CAP LET A W GRAV..LAT CAP O 00C0..00D6 ; PVALID # LAT CAP LET A W GRAV..LAT CAP O
00D7 ; FREE_PVAL # MULTIPLICATION SIGN 00D7 ; FREE_PVAL # MULTIPLICATION SIGN
00D8..00F6 ; PVALID # LAT CAP LET O W STROKE..LAT SM 00D8..00F6 ; PVALID # LAT CAP LET O W STROKE..LAT SM
00F7 ; FREE_PVAL # DIVISION SIGN 00F7 ; FREE_PVAL # DIVISION SIGN
00F8..0131 ; PVALID # LAT SM LET O W STROKE..LAT SM LET 00F8..0131 ; PVALID # LAT SM LET O W STROKE..LAT SM LET
0132..0133 ; FREE_PVAL # LAT CAP LIG IJ..LAT SM LIB IJ 0132..0133 ; FREE_PVAL # LAT CAP LIG IJ..LAT SM LIB IJ
0134..013E ; PVALID # LAT CAP LET J W CIRCUM..LAT SM LET 0134..013E ; PVALID # LAT CAP LET J W CIRCUM..LAT SM LET
013F..0140 ; FREE_PVAL # LAT CAP LET L W MID DOT..LAT SM LET 013F..0140 ; FREE_PVAL # LAT CAP LET L W MID DOT..LAT SM LET
skipping to change at page 34, line 41 skipping to change at page 35, line 19
05D0..05EA ; PVALID # HEBR LET ALEF..HEBR LET TAV 05D0..05EA ; PVALID # HEBR LET ALEF..HEBR LET TAV
05EB..05EF ; UNASSIGNED # <reserved>..<reserved> 05EB..05EF ; UNASSIGNED # <reserved>..<reserved>
05F0..05F2 ; PVALID # HEBR LIG YIDDISH DOUBLE VAV..HEBR L 05F0..05F2 ; PVALID # HEBR LIG YIDDISH DOUBLE VAV..HEBR L
05F3..05F4 ; CONTEXTO # HEBR PUNCT GERESH..HEBR PUNCTUATIO 05F3..05F4 ; CONTEXTO # HEBR PUNCT GERESH..HEBR PUNCTUATIO
05F5..05FF ; UNASSIGNED # <reserved>..<reserved> 05F5..05FF ; UNASSIGNED # <reserved>..<reserved>
0600..0604 ; DISALLOWED # ARAB NUM SIGN..ARAB SIGN SAM 0600..0604 ; DISALLOWED # ARAB NUM SIGN..ARAB SIGN SAM
0605 ; UNASSIGNED # <reserved>..<reserved> 0605 ; UNASSIGNED # <reserved>..<reserved>
0606..060F ; FREE_PVAL # AR-IND CUBE ROOT..ARAB SIGN MISRA 0606..060F ; FREE_PVAL # AR-IND CUBE ROOT..ARAB SIGN MISRA
0610..061A ; PVALID # ARAB SIGN SALLALLAHOU ALAYHE ..AR 0610..061A ; PVALID # ARAB SIGN SALLALLAHOU ALAYHE ..AR
061B ; FREE_PVAL # ARAB SEMICOLON 061B ; FREE_PVAL # ARAB SEMICOLON
061C..061D ; UNASSIGNED # <reserved>..<reserved> 061C ; DISALLOWED # ARAB LET MARK
061D..061D ; UNASSIGNED # <reserved>..<reserved>
061E..061F ; FREE_PVAL # ARAB TRIPLE DOT PUNCT MARK..ARAB Q 061E..061F ; FREE_PVAL # ARAB TRIPLE DOT PUNCT MARK..ARAB Q
0620..063F ; PVALID # ARAB LET KASH..ARAB LET FARSI YEH 0620..063F ; PVALID # ARAB LET KASH..ARAB LET FARSI YEH
0640 ; DISALLOWED # ARAB TATWEEL 0640 ; DISALLOWED # ARAB TATWEEL
0641..065F ; PVALID # ARAB LET FEH..ARAB WAVY HAMZA BEL 0641..065F ; PVALID # ARAB LET FEH..ARAB WAVY HAMZA BEL
0660..0669 ; CONTEXTO # AR-IND DIG ZERO..AR-IND DIG 0660..0669 ; CONTEXTO # AR-IND DIG ZERO..AR-IND DIG
066A..066D ; FREE_PVAL # ARAB PCT SIGN..ARAB FIVE PNTED STA 066A..066D ; FREE_PVAL # ARAB PCT SIGN..ARAB FIVE PNTED STA
066E..0674 ; PVALID # ARAB LET DOTLESS BEH..ARAB LET HIG 066E..0674 ; PVALID # ARAB LET DOTLESS BEH..ARAB LET HIG
0675..0678 ; FREE_PVAL # ARAB LET HIGH HAMZA ALEF..ARAB LET 0675..0678 ; FREE_PVAL # ARAB LET HIGH HAMZA ALEF..ARAB LET
0679..06D3 ; PVALID # ARAB LET TTEH..ARAB LET YEH BARREE 0679..06D3 ; PVALID # ARAB LET TTEH..ARAB LET YEH BARREE
06D4 ; FREE_PVAL # ARAB FULL STOP 06D4 ; FREE_PVAL # ARAB FULL STOP
skipping to change at page 40, line 11 skipping to change at page 40, line 38
0CCA..0CCD ; PVALID # KANNADA VOW SIGN O..KANNADA SIGN VI 0CCA..0CCD ; PVALID # KANNADA VOW SIGN O..KANNADA SIGN VI
0CCE..0CD4 ; UNASSIGNED # <reserved>..<reserved> 0CCE..0CD4 ; UNASSIGNED # <reserved>..<reserved>
0CD5..0CD6 ; PVALID # KANNADA LEN MARK..KANNADA AI LEN MA 0CD5..0CD6 ; PVALID # KANNADA LEN MARK..KANNADA AI LEN MA
0CD7..0CDD ; UNASSIGNED # <reserved>..<reserved> 0CD7..0CDD ; UNASSIGNED # <reserved>..<reserved>
0CDE ; PVALID # KANNADA LET FA 0CDE ; PVALID # KANNADA LET FA
0CDF ; UNASSIGNED # <reserved> 0CDF ; UNASSIGNED # <reserved>
0CE0..0CE3 ; PVALID # KANNADA LET VOC RR..KANNADA VOW SIG 0CE0..0CE3 ; PVALID # KANNADA LET VOC RR..KANNADA VOW SIG
0CE4..0CE5 ; UNASSIGNED # <reserved>..<reserved> 0CE4..0CE5 ; UNASSIGNED # <reserved>..<reserved>
0CE6..0CEF ; PVALID # KANNADA DIG ZERO..KANNADA DIG NINE 0CE6..0CEF ; PVALID # KANNADA DIG ZERO..KANNADA DIG NINE
0CF0 ; UNASSIGNED # <reserved> 0CF0 ; UNASSIGNED # <reserved>
0CF1..0CF2 ; DISALLOWED # KANNADA SIGN JIHVAMULIYA..KANNADA S 0CF1..0CF2 ; PVALID # KANNADA SIGN JIHVAMULIYA..KANNADA S
0CF3..0D01 ; UNASSIGNED # <reserved>..<reserved> 0CF3..0D01 ; UNASSIGNED # <reserved>..<reserved>
0D02..0D03 ; PVALID # MALAY SIGN ANUSVARA..MALAY SIGN VIS 0D02..0D03 ; PVALID # MALAY SIGN ANUSVARA..MALAY SIGN VIS
0D04 ; UNASSIGNED # <reserved> 0D04 ; UNASSIGNED # <reserved>
0D05..0D0C ; PVALID # MALAY LET A..MALAY LET VOC 0D05..0D0C ; PVALID # MALAY LET A..MALAY LET VOC
0D0D ; UNASSIGNED # <reserved> 0D0D ; UNASSIGNED # <reserved>
0D0E..0D10 ; PVALID # MALAY LET E..MALAY LET AI 0D0E..0D10 ; PVALID # MALAY LET E..MALAY LET AI
0D11 ; UNASSIGNED # <reserved> 0D11 ; UNASSIGNED # <reserved>
0D12..0D3A ; PVALID # MALAY LET O..MALAY LET TTTA 0D12..0D3A ; PVALID # MALAY LET O..MALAY LET TTTA
0D3B..0D3C ; UNASSIGNED # <reserved>..<reserved> 0D3B..0D3C ; UNASSIGNED # <reserved>..<reserved>
0D3D..0D44 ; PVALID # MALAY SIGN AVAGRAHA..MALAY VOW SIG 0D3D..0D44 ; PVALID # MALAY SIGN AVAGRAHA..MALAY VOW SIG
skipping to change at page 42, line 32 skipping to change at page 43, line 11
0F36 ; FREE_PVAL # TIB MARK CARET DZUD RTAGS BZHI MIG C 0F36 ; FREE_PVAL # TIB MARK CARET DZUD RTAGS BZHI MIG C
0F37 ; PVALID # TIB MARK NGAS BZUNG SGOR RTAGS 0F37 ; PVALID # TIB MARK NGAS BZUNG SGOR RTAGS
0F38 ; FREE_PVAL # TIB MARK CHE MGO 0F38 ; FREE_PVAL # TIB MARK CHE MGO
0F39 ; PVALID # TIB MARK TSA PHRU 0F39 ; PVALID # TIB MARK TSA PHRU
0F3A..0F3D ; FREE_PVAL # TIB MARK GUG RTAGS GYON..TIB MARK AN 0F3A..0F3D ; FREE_PVAL # TIB MARK GUG RTAGS GYON..TIB MARK AN
0F3E..0F47 ; PVALID # TIB SIGN YAR TSHES..TIB LET JA 0F3E..0F47 ; PVALID # TIB SIGN YAR TSHES..TIB LET JA
0F48 ; UNASSIGNED # <reserved> 0F48 ; UNASSIGNED # <reserved>
0F49..0F6C ; PVALID # TIB LET NYA..TIB LET RRA 0F49..0F6C ; PVALID # TIB LET NYA..TIB LET RRA
0F6D..0F70 ; UNASSIGNED # <reserved>..<reserved> 0F6D..0F70 ; UNASSIGNED # <reserved>..<reserved>
0F71..0F76 ; PVALID # TIB VOW SIGN AA..TIB VOW SIGN VO 0F71..0F76 ; PVALID # TIB VOW SIGN AA..TIB VOW SIGN VO
0F77..0F79 ; FREE_PVAL # TIB VOW SIGN UU..TIB VOW SIGN VO 0F77 ; FREE_PVAL # TIB VOW SIGN VO RR
0F7A..0F80 ; PVALID # TIB VOW SIGN E..TIB VOW SIGN REV 0F78 ; PVALID # TIB VOW SIGN VO L
0F81 ; FREE_PVAL # TIB VOW SIGN REV II 0F79 ; FREE_PVAL # TIB VOW SIGN VO LL
0F82..0F84 ; PVALID # TIB SIGN NYI ZLA NAA DA..TIB MARK H 0F7A..0F84 ; PVALID # TIB VOW SIGN E..TIB MARK H
0F85 ; FREE_PVAL # TIB MARK PALUTA 0F85 ; FREE_PVAL # TIB MARK PALUTA
0F86..0F8F ; PVALID # TIB SIGN LCI RTAGS..TIB SUBJOIN S 0F86..0F8F ; PVALID # TIB SIGN LCI RTAGS..TIB SUBJOIN S
0F90..0F92 ; PVALID # TIB SUBJOIN LET KA..TIB SUBJOIN 0F90..0F97 ; PVALID # TIB SUBJOIN LET KA..TIB SUBJOIN
0F93 ; FREE_PVAL # TIB SUBJOIN LET GHA
0F94..0F97 ; PVALID # TIB SUBJOIN LET NGA..TIB SUBJOI
0F98 ; UNASSIGNED # <reserved> 0F98 ; UNASSIGNED # <reserved>
0F99..0FBC ; PVALID # TIB SUBJOIN LET NYA..TIB SUBJOI 0F99..0FBC ; PVALID # TIB SUBJOIN LET NYA..TIB SUBJOI
0FBD ; UNASSIGNED # <reserved> 0FBD ; UNASSIGNED # <reserved>
0FBE..0FC5 ; FREE_PVAL # TIB KU RU KHA..TIB SYM RDO RJE 0FBE..0FC5 ; FREE_PVAL # TIB KU RU KHA..TIB SYM RDO RJE
0FC6 ; PVALID # TIB SYM PADMA GDAN 0FC6 ; PVALID # TIB SYM PADMA GDAN
0FC7..0FCC ; FREE_PVAL # TIB SYM RDO RJE RGYA GRAM..TIB SY 0FC7..0FCC ; FREE_PVAL # TIB SYM RDO RJE RGYA GRAM..TIB SY
0FCD ; UNASSIGNED # <reserved> 0FCD ; UNASSIGNED # <reserved>
0FCE..0FDA ; FREE_PVAL # TIB SIGN RDEL NAG RDEL DKAR..TIB MA 0FCE..0FDA ; FREE_PVAL # TIB SIGN RDEL NAG RDEL DKAR..TIB MA
0FDB..0FFF ; UNASSIGNED # <reserved>..<reserved> 0FDB..0FFF ; UNASSIGNED # <reserved>..<reserved>
1000..1049 ; PVALID # MYAN LET KA..MYAN DIG NINE 1000..1049 ; PVALID # MYAN LET KA..MYAN DIG NINE
skipping to change at page 44, line 34 skipping to change at page 45, line 11
1735..1736 ; FREE_PVAL # PHILIP SINGLE PUNCT..PHILIP DOUBLE 1735..1736 ; FREE_PVAL # PHILIP SINGLE PUNCT..PHILIP DOUBLE
1737..173F ; UNASSIGNED # <reserved>..<reserved> 1737..173F ; UNASSIGNED # <reserved>..<reserved>
1740..1753 ; PVALID # BUHID LET A..BUHID VOW SIGN U 1740..1753 ; PVALID # BUHID LET A..BUHID VOW SIGN U
1754..175F ; UNASSIGNED # <reserved>..<reserved> 1754..175F ; UNASSIGNED # <reserved>..<reserved>
1760..176C ; PVALID # TAGBANWA LET A..TAGBANWA LET YA 1760..176C ; PVALID # TAGBANWA LET A..TAGBANWA LET YA
176D ; UNASSIGNED # <reserved> 176D ; UNASSIGNED # <reserved>
176E..1770 ; PVALID # TAGBANWA LET LA..TAGBANWA LET SA 176E..1770 ; PVALID # TAGBANWA LET LA..TAGBANWA LET SA
1771 ; UNASSIGNED # <reserved> 1771 ; UNASSIGNED # <reserved>
1772..1773 ; PVALID # TAGBANWA VOW SIGN I..TAGBANWA VOW S 1772..1773 ; PVALID # TAGBANWA VOW SIGN I..TAGBANWA VOW S
1774..177F ; UNASSIGNED # <reserved>..<reserved> 1774..177F ; UNASSIGNED # <reserved>..<reserved>
1780..17D3 ; PVALID # KHMER LET KA..KHMER SIGN BATHAMASAT 1780..17B3 ; PVALID # KHMER LET KA..KHMER IND VOW QAU
17B4..17B5 ; DISALLOWED # KHMER VOW INH AQ..KHMER VOW INH AA
17B6..17D3 ; PVALID # KHMER VOW SIGN AA..KHMER SIGN BATHA
17D4..17D6 ; FREE_PVAL # KHMER SIGN KHAN..KHMER SIGN CAMNUC 17D4..17D6 ; FREE_PVAL # KHMER SIGN KHAN..KHMER SIGN CAMNUC
17D7 ; PVALID # KHMER SIGN LEK TOO 17D7 ; PVALID # KHMER SIGN LEK TOO
17D8..17DB ; FREE_PVAL # KHMER SIGN BEYYAL..KHMER CURR SYM R 17D8..17DB ; FREE_PVAL # KHMER SIGN BEYYAL..KHMER CURR SYM R
17DC..17DD ; PVALID # KHMER SIGN AVAKRAHASANYA..KHMER SIG 17DC..17DD ; PVALID # KHMER SIGN AVAKRAHASANYA..KHMER SIG
17DE..17DF ; UNASSIGNED # <reserved>..<reserved> 17DE..17DF ; UNASSIGNED # <reserved>..<reserved>
17E0..17E9 ; PVALID # KHMER DIG ZERO..KHMER DIG NINE 17E0..17E9 ; PVALID # KHMER DIG ZERO..KHMER DIG NINE
17EA..17EF ; UNASSIGNED # <reserved>..<reserved> 17EA..17EF ; UNASSIGNED # <reserved>..<reserved>
17F0..17F9 ; FREE_PVAL # KHMER SYM LEK ATTAK SON..KHMER SYM 17F0..17F9 ; FREE_PVAL # KHMER SYM LEK ATTAK SON..KHMER SYM
17FA..17FF ; UNASSIGNED # <reserved>..<reserved> 17FA..17FF ; UNASSIGNED # <reserved>..<reserved>
1800..180A ; FREE_PVAL # MONG BIRGA..MONG NIRUGU 1800..180A ; FREE_PVAL # MONG BIRGA..MONG NIRUGU
180B..180D ; PVALID # MONG FREE VAR SEL ONE..MONG FREE VA 180B..180E ; DISALLOWED # MONG FREE VAR SEL ONE..MONG VOW SEP
180E ; FREE_PVAL # MONG VOW SEP
180F ; UNASSIGNED # <reserved> 180F ; UNASSIGNED # <reserved>
1810..1819 ; PVALID # MONG DIG ZERO..MONG DIG NINE 1810..1819 ; PVALID # MONG DIG ZERO..MONG DIG NINE
181A..181F ; UNASSIGNED # <reserved>..<reserved> 181A..181F ; UNASSIGNED # <reserved>..<reserved>
1820..1877 ; PVALID # MONG LET A..MONG LET MANCHU 1820..1877 ; PVALID # MONG LET A..MONG LET MANCHU
1878..187F ; UNASSIGNED # <reserved>..<reserved> 1878..187F ; UNASSIGNED # <reserved>..<reserved>
1880..18AA ; PVALID # MONG LET ALI GALI ANUSVARA ONE..MON 1880..18AA ; PVALID # MONG LET ALI GALI ANUSVARA ONE..MON
18AB..18AF ; UNASSIGNED # <reserved>..<reserved> 18AB..18AF ; UNASSIGNED # <reserved>..<reserved>
18B0..18F5 ; PVALID # CAN SYL OY..CAN SYL CA 18B0..18F5 ; PVALID # CAN SYL OY..CAN SYL CA
18F6..18FF ; UNASSIGNED # <reserved>..<reserved> 18F6..18FF ; UNASSIGNED # <reserved>..<reserved>
1900..191C ; PVALID # LIMBU VOW-CARRIER LET..LIMBU LET HA 1900..191C ; PVALID # LIMBU VOW-CARRIER LET..LIMBU LET HA
skipping to change at page 46, line 13 skipping to change at page 46, line 39
1B80..1BF3 ; PVALID # SUND SIGN PANYECEK..BATAK PANONGONAN 1B80..1BF3 ; PVALID # SUND SIGN PANYECEK..BATAK PANONGONAN
1BF4..1BFB ; UNASSIGNED # <reserved>..<reserved> 1BF4..1BFB ; UNASSIGNED # <reserved>..<reserved>
1BFC..1BFF ; FREE_PVAL # BATAK SYM BINDU NA METEK..BATAK SYM 1BFC..1BFF ; FREE_PVAL # BATAK SYM BINDU NA METEK..BATAK SYM
1C00..1C37 ; PVALID # LEPCHA LET KA..LEPCHA SIGN NUKTA 1C00..1C37 ; PVALID # LEPCHA LET KA..LEPCHA SIGN NUKTA
1C38..1C3A ; UNASSIGNED # <reserved>..<reserved> 1C38..1C3A ; UNASSIGNED # <reserved>..<reserved>
1C3B..1C3F ; FREE_PVAL # LEPCHA PUNCT TA-ROL..LEPCHA PUNCT T 1C3B..1C3F ; FREE_PVAL # LEPCHA PUNCT TA-ROL..LEPCHA PUNCT T
1C40..1C49 ; PVALID # LEPCHA DIG ZERO..LEPCHA DIG NINE 1C40..1C49 ; PVALID # LEPCHA DIG ZERO..LEPCHA DIG NINE
1C4A..1C4C ; UNASSIGNED # <reserved>..<reserved> 1C4A..1C4C ; UNASSIGNED # <reserved>..<reserved>
1C4D..1C7D ; PVALID # LEPCHA LET TTA..OL CHIKI AHAD 1C4D..1C7D ; PVALID # LEPCHA LET TTA..OL CHIKI AHAD
1C7E..1C7F ; FREE_PVAL # OL CHIKI PUNCT MUCAAD..OL CHIKI PUN 1C7E..1C7F ; FREE_PVAL # OL CHIKI PUNCT MUCAAD..OL CHIKI PUN
1C80..1C9F ; UNASSIGNED # <reserved>..<reserved> 1C80..1CBF ; UNASSIGNED # <reserved>..<reserved>
1CC0..1CC7 ; FREE_PVAL # SUNDA PUNCT BINDU SURYA..SUNDA PUNC 1CC0..1CC7 ; FREE_PVAL # SUNDA PUNCT BINDU SURYA..SUNDA PUNC
1CC8..1CCF ; UNASSIGNED # <reserved>..<reserved> 1CC8..1CCF ; UNASSIGNED # <reserved>..<reserved>
1CD0..1CD2 ; PVALID # VED TONE KARSHANA..VED TONE PRENKHA 1CD0..1CD2 ; PVALID # VED TONE KARSHANA..VED TONE PRENKHA
1CD3 ; FREE_PVAL # VED SIGN NIHSHVASA 1CD3 ; FREE_PVAL # VED SIGN NIHSHVASA
1CD4..1CF6 ; PVALID # VED SIGN YAJURVEDIC MID SVARITA..VE 1CD4..1CF6 ; PVALID # VED SIGN YAJURVEDIC MID SVARITA..VE
1CF7..1CFF ; UNASSIGNED # <reserved>..<reserved> 1CF7..1CFF ; UNASSIGNED # <reserved>..<reserved>
1D00..1D2B ; PVALID # LAT LET SM CAP A..CYR LET SM 1D00..1D2B ; PVALID # LAT LET SM CAP A..CYR LET SM
1D2C..1D2E ; FREE_PVAL # MOD LET CAP A..MOD LET C 1D2C..1D2E ; FREE_PVAL # MOD LET CAP A..MOD LET C
1D2F ; PVALID # MOD LET CAP BARRED B 1D2F ; PVALID # MOD LET CAP BARRED B
1D30..1D3A ; FREE_PVAL # MOD LET CAP D..MOD LET C 1D30..1D3A ; FREE_PVAL # MOD LET CAP D..MOD LET C
1D3B ; PVALID # MOD LET CAP REV N 1D3B ; PVALID # MOD LET CAP REV N
1D3C..1D4D ; FREE_PVAL # MOD LET CAP O..MOD LET S 1D3C..1D4D ; FREE_PVAL # MOD LET CAP O..MOD LET S
1D4E ; PVALID # MOD LET SM TURNED I 1D4E ; PVALID # MOD LET SM TURNED I
1D4F..1D6A ; FREE_PVAL # MOD LET SM K..GREEK SUB SMA 1D4F..1D6A ; FREE_PVAL # MOD LET SM K..GREEK SUB SMA
1D6B..1D77 ; PVALID # LAT SM LET UE..LAT SM LET TU 1D6B..1D77 ; PVALID # LAT SM LET UE..LAT SM LET TU
1D78 ; FREE_PVAL # MOD LET CYR EN 1D78 ; FREE_PVAL # MOD LET CYR EN
1D79..1D9A ; PVALID # LAT SM LET INSULAR G..LAT SM LE 1D79..1D9A ; PVALID # LAT SM LET INSULAR G..LAT SM LE
1D9B..1DBF ; FREE_PVAL # MOD LET SM TURNED ALPHA..MOD 1D9B..1DBF ; FREE_PVAL # MOD LET SM TURNED ALPHA..MOD
1DC0..1DE6 ; PVALID # COMB DOTTED GRAVE ACCENT..COMB LAT 1DC0..1DE6 ; PVALID # COMB DOTTED GRAVE ACCENT..COMB LAT
1DE7..1DFB ; UNASSIGNED # <reserved>..<reserved> 1DE7..1DFB ; UNASSIGNED # <reserved>..<reserved>
1DFC..1DFF ; PVALID # COMB DOUBLE INV BREVE BEL..COMB R 1DFC..1E99 ; PVALID # COMB DOUBLE INV BREVE BEL..LAT SM L
1E9A..1E9B ; FREE_PVAL # LAT SM LET A WITH R HALF RING..LAT 1E9A ; FREE_PVAL # LAT SM LET A W R HALF RING
1E9C..1F15 ; PVALID # LAT SM LET LONG S W DIAG STR..GR 1E9B..1F15 ; PVALID # LAT SM LET LONG S W BOT ABOVE..GR
1F16..1F17 ; UNASSIGNED # <reserved>..<reserved> 1F16..1F17 ; UNASSIGNED # <reserved>..<reserved>
1F18..1F1D ; FREE_PVAL # GREEK CAP LET EPSILON W PSILI..GRE 1F18..1F1D ; FREE_PVAL # GREEK CAP LET EPSILON W PSILI..GRE
1F1E..1F1F ; UNASSIGNED # <reserved>..<reserved> 1F1E..1F1F ; UNASSIGNED # <reserved>..<reserved>
1F20..1F45 ; PVALID # GREEK SM LET ETA W PSILI..GREEK SMA 1F20..1F45 ; PVALID # GREEK SM LET ETA W PSILI..GREEK SMA
1F46..1F47 ; UNASSIGNED # <reserved>..<reserved> 1F46..1F47 ; UNASSIGNED # <reserved>..<reserved>
1F48..1F4D ; FREE_PVAL # GREEK CAP LET OMICRON W PSILI..GRE 1F48..1F4D ; FREE_PVAL # GREEK CAP LET OMICRON W PSILI..GRE
1F4E..1F4F ; UNASSIGNED # <reserved>..<reserved> 1F4E..1F4F ; UNASSIGNED # <reserved>..<reserved>
1F50..1F57 ; PVALID # GREEK SM LET UPSILON W PSILI..GREEK 1F50..1F57 ; PVALID # GREEK SM LET UPSILON W PSILI..GREEK
1F58 ; UNASSIGNED # <reserved> 1F58 ; UNASSIGNED # <reserved>
1F59 ; PVALID # GREEK CAP LET UPSILON W DASIA 1F59 ; PVALID # GREEK CAP LET UPSILON W DASIA
skipping to change at page 47, line 43 skipping to change at page 48, line 21
1FFC..1FFE ; FREE_PVAL # GREEK CAP LET OMEGA W PROSGEGRA..GR 1FFC..1FFE ; FREE_PVAL # GREEK CAP LET OMEGA W PROSGEGRA..GR
1FFF ; UNASSIGNED # <reserved> 1FFF ; UNASSIGNED # <reserved>
2000..200A ; FREE_PVAL # EN QUAD..HAIR SPACE 2000..200A ; FREE_PVAL # EN QUAD..HAIR SPACE
200B ; DISALLOWED # ZERO WIDTH SPACE 200B ; DISALLOWED # ZERO WIDTH SPACE
200C..200D ; CONTEXTJ # ZERO WIDTH NON-JOINER..ZERO WIDTH J 200C..200D ; CONTEXTJ # ZERO WIDTH NON-JOINER..ZERO WIDTH J
200E..200F ; DISALLOWED # LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT M 200E..200F ; DISALLOWED # LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT M
2010..2027 ; FREE_PVAL # HYPHEN..HYPHENATION POINT 2010..2027 ; FREE_PVAL # HYPHEN..HYPHENATION POINT
2028..202E ; DISALLOWED # LINE SEP..RIGHT-TO-LEFT OVERRIDE 2028..202E ; DISALLOWED # LINE SEP..RIGHT-TO-LEFT OVERRIDE
202F..205F ; FREE_PVAL # NARROW NO-BREAK SPACE..MED MATH SP 202F..205F ; FREE_PVAL # NARROW NO-BREAK SPACE..MED MATH SP
2060..2064 ; DISALLOWED # WORD JOINER..INVISIBLE PLUS 2060..2064 ; DISALLOWED # WORD JOINER..INVISIBLE PLUS
2065..2069 ; UNASSIGNED # <reserved>..<reserved> 2065 ; UNASSIGNED # <reserved>
206A..206F ; DISALLOWED # INHIBIT SYMM SWAP..NOM DIGIT SHAPES 2066..206F ; DISALLOWED # LEFT-TO-RIGHT IS..NOM DIGIT SHAPES
2070..2071 ; FREE_PVAL # SUPER ZERO..SUPER LAT SM LET I 2070..2071 ; FREE_PVAL # SUPER ZERO..SUPER LAT SM LET I
2072..2073 ; UNASSIGNED # <reserved>..<reserved> 2072..2073 ; UNASSIGNED # <reserved>..<reserved>
2074..208E ; FREE_PVAL # SUPER FOUR..SUB RIGHT PARENTHESIS 2074..208E ; FREE_PVAL # SUPER FOUR..SUB RIGHT PARENTHESIS
208F ; UNASSIGNED # <reserved> 208F ; UNASSIGNED # <reserved>
2090..209C ; FREE_PVAL # LAT SUB SM LET A..LAT SUB SM LET T 2090..209C ; FREE_PVAL # LAT SUB SM LET A..LAT SUB SM LET T
209D..209F ; UNASSIGNED # <reserved>..<reserved> 209D..209F ; UNASSIGNED # <reserved>..<reserved>
20A0..20B9 ; FREE_PVAL # EURO-CURRENCY SIGN..INDIAN RUPEE SI 20A0..20BA ; FREE_PVAL # EURO-CURRENCY SIGN..TURKISH LIRA SI
20BA..20CF ; UNASSIGNED # <reserved>..<reserved> 20BB..20CF ; UNASSIGNED # <reserved>..<reserved>
20D0..20DC ; PVALID # COMB LEFT HARPOON ABOVE..COMB FOUR 20D0..20DC ; PVALID # COMB LEFT HARPOON ABOVE..COMB FOUR
20DD..20E0 ; FREE_PVAL # COMB ENC CIRC..COMB ENC CIRC BACKS 20DD..20E0 ; FREE_PVAL # COMB ENC CIRC..COMB ENC CIRC BACKS
20E1 ; PVALID # COMB L R ARROW ABOVE 20E1 ; PVALID # COMB L R ARROW ABOVE
20E2..20E4 ; FREE_PVAL # COMB ENC SCREEN..COMB ENC UPWARD PO 20E2..20E4 ; FREE_PVAL # COMB ENC SCREEN..COMB ENC UPWARD PO
20E5..20F0 ; PVALID # COMB REV SOLIDUS OVERLAY..COMB ASTE 20E5..20F0 ; PVALID # COMB REV SOLIDUS OVERLAY..COMB ASTE
20F1..20FF ; UNASSIGNED # <reserved>..<reserved> 20F1..20FF ; UNASSIGNED # <reserved>..<reserved>
2100..2129 ; FREE_PVAL # ACCOUNT OF..TURNED GREEK SM LET IOT 2100..2129 ; FREE_PVAL # ACCOUNT OF..TURNED GREEK SM LET IOT
212A..212B ; PVALID # KELVIN SIGN..ANGSTROM SIGN 212A..212B ; PVALID # KELVIN SIGN..ANGSTROM SIGN
212C..2131 ; FREE_PVAL # SCRIPT CAP C..SCRIPT CAP F 212C..2131 ; FREE_PVAL # SCRIPT CAP C..SCRIPT CAP F
2132 ; PVALID # TURNED CAP F 2132 ; PVALID # TURNED CAP F
2133..214D ; FREE_PVAL # SCRIPT CAP M..AKTIESELSKAB 2133..214D ; FREE_PVAL # SCRIPT CAP M..AKTIESELSKAB
214E ; PVALID # TURNED SM F 214E ; PVALID # TURNED SM F
214F..2182 ; DISALLOWED # SYM FOR SAMAR SOURCE..ROM NUM TEN T 214F..2182 ; FREE_PVAL # SYM FOR SAMAR SOURCE..ROM NUM TEN T
2183..2184 ; PVALID # ROM NUM REV ONE HUNDRED..LAT SM LET 2183..2184 ; PVALID # ROM NUM REV ONE HUNDRED..LAT SM LET
2185..2189 ; FREE_PVAL # ROM NUM SIX LATE FORM..VULGAR FRACT 2185..2189 ; FREE_PVAL # ROM NUM SIX LATE FORM..VULGAR FRACT
218A..218F ; UNASSIGNED # <reserved>..<reserved> 218A..218F ; UNASSIGNED # <reserved>..<reserved>
2190..23F3 ; FREE_PVAL # LEFTWARDS ARROW..HOURGLASS WITH FLO 2190..23F3 ; FREE_PVAL # LEFTWARDS ARROW..HOURGLASS W FLO
23F4..23FF ; UNASSIGNED # <reserved>..<reserved> 23F4..23FF ; UNASSIGNED # <reserved>..<reserved>
2400..2426 ; FREE_PVAL # SYM FOR NULL..SYM FOR SUB FORM 2400..2426 ; FREE_PVAL # SYM FOR NULL..SYM FOR SUB FORM
2427..243F ; UNASSIGNED # <reserved>..<reserved> 2427..243F ; UNASSIGNED # <reserved>..<reserved>
2440..244A ; FREE_PVAL # OCR HOOK..OCR DOUBLE BACKSLASH 2440..244A ; FREE_PVAL # OCR HOOK..OCR DOUBLE BACKSLASH
244B..245F ; UNASSIGNED # <reserved>..<reserved> 244B..245F ; UNASSIGNED # <reserved>..<reserved>
2460..26FF ; FREE_PVAL # CIRCLED DIG ONE..WHITE FLAG W HORIZ 2460..26FF ; FREE_PVAL # CIRCLED DIG ONE..WHITE FLAG W HORIZ
2700 ; UNASSIGNED # <reserved> 2700 ; UNASSIGNED # <reserved>
2701..2B4C ; FREE_PVAL # UP BLADE SCISSORS..RIGHTWARDS ARROW 2701..2B4C ; FREE_PVAL # UP BLADE SCISSORS..RIGHTWARDS ARROW
2B4D..2B4F ; UNASSIGNED # <reserved>..<reserved> 2B4D..2B4F ; UNASSIGNED # <reserved>..<reserved>
2B50..2B59 ; FREE_PVAL # WHITE MEDIUM STAR..HEAVY CIRCLED SA 2B50..2B59 ; FREE_PVAL # WHITE MEDIUM STAR..HEAVY CIRCLED SA
skipping to change at page 49, line 4 skipping to change at page 49, line 30
2CF4..2CF8 ; UNASSIGNED # <reserved>..<reserved> 2CF4..2CF8 ; UNASSIGNED # <reserved>..<reserved>
2CF9..2CFF ; FREE_PVAL # COPT OLD NUB FULL STOP..COPT MORPHO 2CF9..2CFF ; FREE_PVAL # COPT OLD NUB FULL STOP..COPT MORPHO
2D00..2D25 ; PVALID # GEORG SM LET AN..GEORG SM LET 2D00..2D25 ; PVALID # GEORG SM LET AN..GEORG SM LET
2D26 ; UNASSIGNED # <reserved> 2D26 ; UNASSIGNED # <reserved>
2D27 ; PVALID # GEORG SM LET YN 2D27 ; PVALID # GEORG SM LET YN
2D28..2D2C ; UNASSIGNED # <reserved>..<reserved> 2D28..2D2C ; UNASSIGNED # <reserved>..<reserved>
2D2D ; PVALID # GEORG SM LET AEN 2D2D ; PVALID # GEORG SM LET AEN
2D2E..2D2F ; UNASSIGNED # <reserved>..<reserved> 2D2E..2D2F ; UNASSIGNED # <reserved>..<reserved>
2D30..2D67 ; PVALID # TIFINAGH LET YA..TIFINAGH LETTER YO 2D30..2D67 ; PVALID # TIFINAGH LET YA..TIFINAGH LETTER YO
2D68..2D6E ; UNASSIGNED # <reserved>..<reserved> 2D68..2D6E ; UNASSIGNED # <reserved>..<reserved>
2D6F..2D70 ; PVALID # TIFINAGH MOD LET LABIALIZATION MARK 2D6F..2D70 ; FREE_PVAL # TIFINAGH MOD LET LABIALIZATION MARK
2D71..2D7E ; UNASSIGNED # <reserved>..<reserved> 2D71..2D7E ; UNASSIGNED # <reserved>..<reserved>
2D7F..2D96 ; PVALID # TIFINAGH CONS JOINER..ETHI SYL GGW 2D7F..2D96 ; PVALID # TIFINAGH CONS JOINER..ETHI SYL GGW
2D97..2D9F ; UNASSIGNED # <reserved>..<reserved> 2D97..2D9F ; UNASSIGNED # <reserved>..<reserved>
2DA0..2DA6 ; PVALID # ETHI SYL SSA..ETHI SYL SSO 2DA0..2DA6 ; PVALID # ETHI SYL SSA..ETHI SYL SSO
2DA7 ; UNASSIGNED # <reserved> 2DA7 ; UNASSIGNED # <reserved>
2DA8..2DAE ; PVALID # ETHI SYL CCA..ETHI SYL CCO 2DA8..2DAE ; PVALID # ETHI SYL CCA..ETHI SYL CCO
2DAF ; UNASSIGNED # <reserved> 2DAF ; UNASSIGNED # <reserved>
2DB0..2DB6 ; PVALID # ETHI SYL ZZA..ETHI SYL ZZO 2DB0..2DB6 ; PVALID # ETHI SYL ZZA..ETHI SYL ZZO
2DB7 ; UNASSIGNED # <reserved> 2DB7 ; UNASSIGNED # <reserved>
2DB8..2DBE ; PVALID # ETHI SYL CCHA..ETHI SYL CC 2DB8..2DBE ; PVALID # ETHI SYL CCHA..ETHI SYL CC
skipping to change at page 49, line 41 skipping to change at page 50, line 19
2E9B..2EF3 ; FREE_PVAL # CJK RAD CHOKE..CJK RAD C-SIMPLIFIED 2E9B..2EF3 ; FREE_PVAL # CJK RAD CHOKE..CJK RAD C-SIMPLIFIED
2EF4..2EFF ; UNASSIGNED # <reserved>..<reserved> 2EF4..2EFF ; UNASSIGNED # <reserved>..<reserved>
2F00..2FD5 ; FREE_PVAL # KANGXI RAD ONE..KANGXI RAD FLUTE 2F00..2FD5 ; FREE_PVAL # KANGXI RAD ONE..KANGXI RAD FLUTE
2FD6..2FEF ; UNASSIGNED # <reserved>..<reserved> 2FD6..2FEF ; UNASSIGNED # <reserved>..<reserved>
2FF0..2FFB ; FREE_PVAL # IDEO DESC CHAR LEFT TO RIGHT..IDEO 2FF0..2FFB ; FREE_PVAL # IDEO DESC CHAR LEFT TO RIGHT..IDEO
2FFC..2FFF ; UNASSIGNED # <reserved>..<reserved> 2FFC..2FFF ; UNASSIGNED # <reserved>..<reserved>
3000..3004 ; FREE_PVAL # IDEO SPACE..JAPAN INDUST STAND 3000..3004 ; FREE_PVAL # IDEO SPACE..JAPAN INDUST STAND
3005..3007 ; PVALID # IDEO ITER MARK..IDEO NUMB ZERO 3005..3007 ; PVALID # IDEO ITER MARK..IDEO NUMB ZERO
3008..3029 ; FREE_PVAL # LEFT ANGLE BRACKET..HANGZH NUM NINE 3008..3029 ; FREE_PVAL # LEFT ANGLE BRACKET..HANGZH NUM NINE
302A..302D ; PVALID # IDEO LEVEL TONE MARK..IDEO ENT 302A..302D ; PVALID # IDEO LEVEL TONE MARK..IDEO ENT
302E..302F ; FREE_PVAL # HANGUL SING DOT TONE MARK..WAVY DAS 302E..302F ; DISALLOWED # HANGUL SING DOT TONE MARK..WAVY DAS
3030 ; FREE_PVAL # WAVY DASH
3031..3035 ; DISALLOWED # VERT KANA REP MARK..VERT KANA REP M 3031..3035 ; DISALLOWED # VERT KANA REP MARK..VERT KANA REP M
3036..303A ; FREE_PVAL # CIRCLED POSTAL MARK..HANGZH NUM THI 3036..303A ; FREE_PVAL # CIRCLED POSTAL MARK..HANGZH NUM THI
303B ; DISALLOWED # VERT IDEO ITER MARK 303B ; DISALLOWED # VERT IDEO ITER MARK
303C ; PVALID # MASU MARK 303C ; PVALID # MASU MARK
303D..303F ; DISALLOWED # PART ALTER MARK..IDEO HALF FILL 303D..303F ; FREE_PVAL # PART ALTER MARK..IDEO HALF FILL
3040 ; UNASSIGNED # <reserved> 3040 ; UNASSIGNED # <reserved>
3041..3096 ; PVALID # HIRAGANA LET SM A..HIRAGANA LET SMA 3041..3096 ; PVALID # HIRAGANA LET SM A..HIRAGANA LET SMA
3097..3098 ; UNASSIGNED # <reserved>..<reserved> 3097..3098 ; UNASSIGNED # <reserved>..<reserved>
3099..309A ; PVALID # COMB KAT-HIR VOICED SOUND 3099..309A ; PVALID # COMB KAT-HIR VOICED SOUND
309B..309C ; FREE_PVAL # KAT-HIR VOICED SOUND MARK..KAT-HIR 309B..309C ; FREE_PVAL # KAT-HIR VOICED SOUND MARK..KAT-HIR
309D..309E ; PVALID # HIRAGANA ITER MARK..HIRAGANA VOICED 309D..309E ; PVALID # HIRAGANA ITER MARK..HIRAGANA VOICED
309F..30A0 ; FREE_PVAL # HIRAGANA DIGRAPH YORI..KAT-HIR DOU 309F..30A0 ; FREE_PVAL # HIRAGANA DIGRAPH YORI..KAT-HIR DOU
30A1..30FA ; PVALID # KATAKANA LET SM A..KATAKANA LET VO 30A1..30FA ; PVALID # KATAKANA LET SM A..KATAKANA LET VO
30FB ; CONTEXTO # KATAKANA MIDDLE DOT 30FB ; CONTEXTO # KATAKANA MIDDLE DOT
30FC..30FE ; PVALID # KAT-HIR PROLONGED SOUND MARK..KATA 30FC..30FE ; PVALID # KAT-HIR PROLONGED SOUND MARK..KATA
skipping to change at page 50, line 32 skipping to change at page 51, line 11
31F0..31FF ; PVALID # KATAKANA LET SM KU..KATAKANA LET SM 31F0..31FF ; PVALID # KATAKANA LET SM KU..KATAKANA LET SM
3200..321E ; FREE_PVAL # PAREN HANGUL KIYEOK..PAREN KOREAN C 3200..321E ; FREE_PVAL # PAREN HANGUL KIYEOK..PAREN KOREAN C
321F ; UNASSIGNED # <reserved> 321F ; UNASSIGNED # <reserved>
3220..32FE ; FREE_PVAL # PAREN IDEO ONE..CIRCLED KATAKANA WO 3220..32FE ; FREE_PVAL # PAREN IDEO ONE..CIRCLED KATAKANA WO
32FF ; UNASSIGNED # <reserved> 32FF ; UNASSIGNED # <reserved>
3300..33FF ; FREE_PVAL # SQUARE APAATO..SQUARE GAL 3300..33FF ; FREE_PVAL # SQUARE APAATO..SQUARE GAL
3400..4DB5 ; PVALID # <CJK Ideograph Extension A> 3400..4DB5 ; PVALID # <CJK Ideograph Extension A>
4DB6..4DBF ; UNASSIGNED # <reserved>..<reserved> 4DB6..4DBF ; UNASSIGNED # <reserved>..<reserved>
4DC0..4DFF ; FREE_PVAL # HEX FOR THE CREATIVE HEAVEN..HEX FO 4DC0..4DFF ; FREE_PVAL # HEX FOR THE CREATIVE HEAVEN..HEX FO
4E00..9FCC ; PVALID # <CJK Ideograph> 4E00..9FCC ; PVALID # <CJK Ideograph>
9FCE..9FFF ; UNASSIGNED # <reserved>..<reserved> 9FCD..9FFF ; UNASSIGNED # <reserved>..<reserved>
A000..A48C ; PVALID # YI SYL IT..YI SYL YYR A000..A48C ; PVALID # YI SYL IT..YI SYL YYR
A48D..A48F ; UNASSIGNED # <reserved>..<reserved> A48D..A48F ; UNASSIGNED # <reserved>..<reserved>
A490..A4C6 ; FREE_PVAL # YI RAD QOT..YI RAD KE A490..A4C6 ; FREE_PVAL # YI RAD QOT..YI RAD KE
A4C7..A4CF ; UNASSIGNED # <reserved>..<reserved> A4C7..A4CF ; UNASSIGNED # <reserved>..<reserved>
A4D0..A4FD ; PVALID # LISU LET BA..LISU LET TONE MYA JEU A4D0..A4FD ; PVALID # LISU LET BA..LISU LET TONE MYA JEU
A4FE..A4FF ; FREE_PVAL # LISU PUNCT COMMA..LISU PUNCT FUL A4FE..A4FF ; FREE_PVAL # LISU PUNCT COMMA..LISU PUNCT FUL
A500..A60C ; PVALID # VAI SYL EE..VAI SYL LENENER A500..A60C ; PVALID # VAI SYL EE..VAI SYL LENENER
A60D..A60F ; FREE_PVAL # VAI COMMA..VAI QUEST MARK A60D..A60F ; FREE_PVAL # VAI COMMA..VAI QUEST MARK
A610..A62B ; PVALID # VAI SYL NDOLE FA..VAI SYL NDOLE DO A610..A62B ; PVALID # VAI SYL NDOLE FA..VAI SYL NDOLE DO
A62C..A63F ; UNASSIGNED # <reserved>..<reserved> A62C..A63F ; UNASSIGNED # <reserved>..<reserved>
skipping to change at page 52, line 47 skipping to change at page 53, line 26
ABFA..ABFF ; UNASSIGNED # <reserved>..<reserved> ABFA..ABFF ; UNASSIGNED # <reserved>..<reserved>
AC00..D7A3 ; PVALID # <Hangul Syllable> AC00..D7A3 ; PVALID # <Hangul Syllable>
D7A4..D7AF ; UNASSIGNED # <reserved>..<reserved> D7A4..D7AF ; UNASSIGNED # <reserved>..<reserved>
D7B0..D7C6 ; DISALLOWED # HANGUL JUNG O-YEO..HANGUL JUNG ARAE D7B0..D7C6 ; DISALLOWED # HANGUL JUNG O-YEO..HANGUL JUNG ARAE
D7C7..D7CA ; UNASSIGNED # <reserved>..<reserved> D7C7..D7CA ; UNASSIGNED # <reserved>..<reserved>
D7CB..D7FB ; DISALLOWED # HANGUL JONG NIEUN-RIEUL..HANGUL JON D7CB..D7FB ; DISALLOWED # HANGUL JONG NIEUN-RIEUL..HANGUL JON
D7FC..D7FF ; UNASSIGNED # <reserved>..<reserved> D7FC..D7FF ; UNASSIGNED # <reserved>..<reserved>
D800..F8FF ; DISALLOWED # <Non Private Use High Surrogate> D800..F8FF ; DISALLOWED # <Non Private Use High Surrogate>
F900..FA6D ; PVALID # CJK COMP IDEO-F900..CJK COMP IDEO F900..FA6D ; PVALID # CJK COMP IDEO-F900..CJK COMP IDEO
FA6E..FA6F ; UNASSIGNED # <reserved>..<reserved> FA6E..FA6F ; UNASSIGNED # <reserved>..<reserved>
FA70..FAD9 ; FREE_PVAL # CJK COMP IDEO-FA70..CJK COMP IDEO FA70..FAD9 ; PVALID # CJK COMP IDEO-FA70..CJK COMP IDEO
FADA..FAFF ; UNASSIGNED # <reserved>..<reserved> FADA..FAFF ; UNASSIGNED # <reserved>..<reserved>
FB00..FB06 ; FREE_PVAL # LAT SM LIG FF..LAT SM LIG ST FB00..FB06 ; FREE_PVAL # LAT SM LIG FF..LAT SM LIG ST
FB07..FB12 ; UNASSIGNED # <reserved>..<reserved> FB07..FB12 ; UNASSIGNED # <reserved>..<reserved>
FB13..FB17 ; FREE_PVAL # ARMENIAN SM LIG MEN NOW..ARMENIAN SM FB13..FB17 ; FREE_PVAL # ARMENIAN SM LIG MEN NOW..ARMENIAN SM
FB18..FB1C ; UNASSIGNED # <reserved>..<reserved> FB18..FB1C ; UNASSIGNED # <reserved>..<reserved>
FB1D..FB1F ; PVALID # HEBR LET YOD W HIRIQ..HEBR LIG YID Y FB1D..FB1F ; PVALID # HEBR LET YOD W HIRIQ..HEBR LIG YID Y
FB20..FB29 ; FREE_PVAL # HEBR LET ALT AYIN..HEB LET ALT PLUS FB20..FB29 ; FREE_PVAL # HEBR LET ALT AYIN..HEB LET ALT PLUS
FB2A..FB36 ; PVALID # HEBR LET SHIN W SHIN DOT..HEBR LET Z FB2A..FB36 ; PVALID # HEBR LET SHIN W SHIN DOT..HEBR LET Z
FB37 ; UNASSIGNED # <reserved> FB37 ; UNASSIGNED # <reserved>
FB38..FB3C ; FREE_PVAL # HEBR LET TET W DAGESH..HEBR LET FB38..FB3C ; PVALID # HEBR LET TET W DAGESH..HEBR LET
FB3D ; UNASSIGNED # <reserved> FB3D ; UNASSIGNED # <reserved>
FB3E ; FREE_PVAL # HEBR LET MEM W DAGESH FB3E ; PVALID # HEBR LET MEM W DAGESH
FB3F ; UNASSIGNED # <reserved> FB3F ; UNASSIGNED # <reserved>
FB40..FB41 ; FREE_PVAL # HEBR LET NUN W DAGESH..HEBR LET FB40..FB41 ; PVALID # HEBR LET NUN W DAGESH..HEBR LET
FB42 ; UNASSIGNED # <reserved> FB42 ; UNASSIGNED # <reserved>
FB43..FB44 ; FREE_PVAL # HEBR LET FIN PE W DAGESH..HEBR L FB43..FB44 ; PVALID # HEBR LET FIN PE W DAGESH..HEBR L
FB45 ; UNASSIGNED # <reserved> FB45 ; UNASSIGNED # <reserved>
FB46..FB4E ; PVALID # HEBR LET TSADI W DAGESH..HEBR LET P FB46..FB4E ; PVALID # HEBR LET TSADI W DAGESH..HEBR LET P
FB4F..FBC1 ; FREE_PVAL # HEBR LIG ALEF LAMED..ARAB SYM S FB4F..FBC1 ; FREE_PVAL # HEBR LIG ALEF LAMED..ARAB SYM S
FBC2..FBD2 ; UNASSIGNED # <reserved>..<reserved> FBC2..FBD2 ; UNASSIGNED # <reserved>..<reserved>
FBD3..FD3F ; FREE_PVAL # ARAB LET NG ISO FORM..ORNATE RIGHT FBD3..FD3F ; FREE_PVAL # ARAB LET NG ISO FORM..ORNATE RIGHT
FD40..FD4F ; UNASSIGNED # <reserved>..<reserved> FD40..FD4F ; UNASSIGNED # <reserved>..<reserved>
FD50..FD8F ; FREE_PVAL # ARAB LIG TEH W JEEM W MEEM INIT FD50..FD8F ; FREE_PVAL # ARAB LIG TEH W JEEM W MEEM INIT
FD90..FD91 ; UNASSIGNED # <reserved>..<reserved> FD90..FD91 ; UNASSIGNED # <reserved>..<reserved>
FD92..FDC7 ; FREE_PVAL # ARAB LIG MEEM W JEEM W KHAH INI FD92..FDC7 ; FREE_PVAL # ARAB LIG MEEM W JEEM W KHAH INI
FDC8..FDCF ; UNASSIGNED # <reserved>..<reserved> FDC8..FDCF ; UNASSIGNED # <reserved>..<reserved>
FDD0..FDEF ; DISALLOWED # <noncharacter>..<noncharacter> FDD0..FDEF ; DISALLOWED # <noncharacter>..<noncharacter>
FDF0..FDFD ; FREE_PVAL # ARAB LIG SALLA USED..ARAB LIG BISMI FDF0..FDFD ; FREE_PVAL # ARAB LIG SALLA USED..ARAB LIG BISMI
FDFE..FDFF ; UNASSIGNED # <reserved>..<reserved> FDFE..FDFF ; UNASSIGNED # <reserved>..<reserved>
FE00..FE0F ; PVALID # VAR SEL-1..VAR SEL-16 FE00..FE0F ; DISALLOWED # VAR SEL-1..VAR SEL-16
FE10..FE19 ; FREE_PVAL # PRES FORM FOR VERT COMMA..PRES FORM FE10..FE19 ; FREE_PVAL # PRES FORM FOR VERT COMMA..PRES FORM
FE1A..FE1F ; UNASSIGNED # <reserved>..<reserved>
FE20..FE26 ; PVALID # COMB LIG LEFT HALF..COMB CONJ MACRO FE20..FE26 ; PVALID # COMB LIG LEFT HALF..COMB CONJ MACRO
FE27..FE2F ; UNASSIGNED # <reserved>..<reserved> FE27..FE2F ; UNASSIGNED # <reserved>..<reserved>
FE30..FE52 ; FREE_PVAL # PRES FORM FOR VERT TWO DOT LEAD..SM FE30..FE52 ; FREE_PVAL # PRES FORM FOR VERT TWO DOT LEAD..SM
FE53 ; UNASSIGNED # <reserved> FE53 ; UNASSIGNED # <reserved>
FE54..FE66 ; FREE_PVAL # SM SEMICOLON..SM EQUALS SIGN FE54..FE66 ; FREE_PVAL # SM SEMICOLON..SM EQUALS SIGN
FE67 ; UNASSIGNED # <reserved> FE67 ; UNASSIGNED # <reserved>
FE68..FE6B ; FREE_PVAL # SM REV SOLIDUS..SM COMM AT FE68..FE6B ; FREE_PVAL # SM REV SOLIDUS..SM COMM AT
FE6C..FE6F ; UNASSIGNED # <reserved>..<reserved> FE6C..FE6F ; UNASSIGNED # <reserved>..<reserved>
FE70..FE72 ; FREE_PVAL # ARAB FATHATAN ISO FORM..ARAB DAMMAT FE70..FE72 ; FREE_PVAL # ARAB FATHATAN ISO FORM..ARAB DAMMAT
FE73 ; PVALID # ARAB TAIL FRAGMENT FE73 ; PVALID # ARAB TAIL FRAGMENT
skipping to change at page 61, line 21 skipping to change at page 61, line 49
1F3C6..1F3CA; FREE_PVAL # TROPHY..SWIMMER 1F3C6..1F3CA; FREE_PVAL # TROPHY..SWIMMER
1F3CB..1F3DF; UNASSIGNED # <reserved>..<reserved> 1F3CB..1F3DF; UNASSIGNED # <reserved>..<reserved>
1F3E0..1F3F0; FREE_PVAL # HOUSE BUILDING..EUROPEAN CASTLE 1F3E0..1F3F0; FREE_PVAL # HOUSE BUILDING..EUROPEAN CASTLE
1F3F1..1F3FF; UNASSIGNED # <reserved>..<reserved> 1F3F1..1F3FF; UNASSIGNED # <reserved>..<reserved>
1F400..1F43E; FREE_PVAL # RAT..PAW PRINTS 1F400..1F43E; FREE_PVAL # RAT..PAW PRINTS
1F43F ; UNASSIGNED # <reserved> 1F43F ; UNASSIGNED # <reserved>
1F440 ; FREE_PVAL # EYES 1F440 ; FREE_PVAL # EYES
1F441 ; UNASSIGNED # <reserved> 1F441 ; UNASSIGNED # <reserved>
1F442..1F4F7; FREE_PVAL # EAR..CAMERA 1F442..1F4F7; FREE_PVAL # EAR..CAMERA
1F4F8 ; UNASSIGNED # <reserved> 1F4F8 ; UNASSIGNED # <reserved>
1F4F9..1F4FC; FREE_PVAL # VIDEOCASSETTE 1F4F9..1F4FC; FREE_PVAL # VIDEO CAMERA..VIDEOCASSETTE
1F4FD..1F4FF; UNASSIGNED # <reserved>..<reserved> 1F4FD..1F4FF; UNASSIGNED # <reserved>..<reserved>
1F500..1F53D; FREE_PVAL # TWISTED RIGHTWARDS ARROWS..DOWN-POINTI 1F500..1F53D; FREE_PVAL # TWISTED RIGHTWARDS ARROWS..DOWN-POINTI
1F53E..1F53F; UNASSIGNED # <reserved>..<reserved> 1F53E..1F53F; UNASSIGNED # <reserved>..<reserved>
1F540..1F543; FREE_PVAL # CIRCLED CROSS POMMEE..NOTCHED LEFT SEM 1F540..1F543; FREE_PVAL # CIRCLED CROSS POMMEE..NOTCHED LEFT SEM
1F544..1F54F; UNASSIGNED # <reserved>..<reserved> 1F544..1F54F; UNASSIGNED # <reserved>..<reserved>
1F550..1F567; FREE_PVAL # CLOCK FACE ONE OCLOCK..CLOCK FACE TWEL 1F550..1F567; FREE_PVAL # CLOCK FACE ONE OCLOCK..CLOCK FACE TWEL
1F568..1F5FA; UNASSIGNED # <reserved>..<reserved> 1F568..1F5FA; UNASSIGNED # <reserved>..<reserved>
1F5FB..1F640; FREE_PVAL # MOUNT FUJI..WEARY CAT FACE 1F5FB..1F640; FREE_PVAL # MOUNT FUJI..WEARY CAT FACE
1F641..1F644; UNASSIGNED # <reserved>..<reserved> 1F641..1F644; UNASSIGNED # <reserved>..<reserved>
1F645..1F650; FREE_PVAL # FACE WITH NO GOOD GESTURE..PERSON W FO 1F645..1F650; FREE_PVAL # FACE W NO GOOD GESTURE..PERSON W FO
1F650..1F67F; UNASSIGNED # <reserved>..<reserved> 1F650..1F67F; UNASSIGNED # <reserved>..<reserved>
1F680..1F6C5; FREE_PVAL # ROCKET..LEFT LUGGAGE 1F680..1F6C5; FREE_PVAL # ROCKET..LEFT LUGGAGE
1F6C6..1F6FF; UNASSIGNED # <reserved>..<reserved> 1F6C6..1F6FF; UNASSIGNED # <reserved>..<reserved>
1F700..1F773; FREE_PVAL # ALCHEMICAL SYMBOL FOR QUINTESSENCE..AL 1F700..1F773; FREE_PVAL # ALCHEMICAL SYMBOL FOR QUINTESSENCE..AL
1F774..1FFFF; UNASSIGNED # <reserved>..<reserved> 1F774..1FFFF; UNASSIGNED # <reserved>..<reserved>
20000..2A6D6; PVALID # <CJK Ideograph Extension B> 20000..2A6D6; PVALID # <CJK Ideograph Extension B>
2A6D7..2A6FF; UNASSIGNED # <reserved>..<reserved> 2A6D7..2A6FF; UNASSIGNED # <reserved>..<reserved>
2A700..2B734; PVALID # <CJK Ideograph Extension C> 2A700..2B734; PVALID # <CJK Ideograph Extension C>
2A735..2A739; UNASSIGNED # <reserved>..<reserved> 2A735..2A739; UNASSIGNED # <reserved>..<reserved>
2A740..2B81D; PVALID # <CJK Ideograph Extension D> 2A740..2B81D; PVALID # <CJK Ideograph Extension D>
2F800..2FA1D; FREE_PVAL # CJK COMP IDEO-2F800..CJK COMPA 2B81E..2F7FF; UNASSIGNED # <reserved>..<reserved>
2F800..2FA1D; PVALID # CJK COMP IDEO-2F800..CJK COMPA
2FA1E..2FFFD; UNASSIGNED # <reserved>..<reserved> 2FA1E..2FFFD; UNASSIGNED # <reserved>..<reserved>
2FFFE..2FFFF; DISALLOWED # <noncharacter>..<noncharacter> 2FFFE..2FFFF; DISALLOWED # <noncharacter>..<noncharacter>
30000..3FFFD; UNASSIGNED # <reserved>..<reserved> 30000..3FFFD; UNASSIGNED # <reserved>..<reserved>
3FFFE..3FFFF; DISALLOWED # <noncharacter>..<noncharacter> 3FFFE..3FFFF; DISALLOWED # <noncharacter>..<noncharacter>
40000..4FFFD; UNASSIGNED # <reserved>..<reserved> 40000..4FFFD; UNASSIGNED # <reserved>..<reserved>
4FFFE..4FFFF; DISALLOWED # <noncharacter>..<noncharacter> 4FFFE..4FFFF; DISALLOWED # <noncharacter>..<noncharacter>
50000..5FFFD; UNASSIGNED # <reserved>..<reserved> 50000..5FFFD; UNASSIGNED # <reserved>..<reserved>
5FFFE..5FFFF; DISALLOWED # <noncharacter>..<noncharacter> 5FFFE..5FFFF; DISALLOWED # <noncharacter>..<noncharacter>
60000..6FFFD; UNASSIGNED # <reserved>..<reserved> 60000..6FFFD; UNASSIGNED # <reserved>..<reserved>
6FFFE..6FFFF; DISALLOWED # <noncharacter>..<noncharacter> 6FFFE..6FFFF; DISALLOWED # <noncharacter>..<noncharacter>
skipping to change at page 62, line 24 skipping to change at page 63, line 5
BFFFE..BFFFF; DISALLOWED # <noncharacter>..<noncharacter> BFFFE..BFFFF; DISALLOWED # <noncharacter>..<noncharacter>
C0000..CFFFD; UNASSIGNED # <reserved>..<reserved> C0000..CFFFD; UNASSIGNED # <reserved>..<reserved>
CFFFE..CFFFF; DISALLOWED # <noncharacter>..<noncharacter> CFFFE..CFFFF; DISALLOWED # <noncharacter>..<noncharacter>
D0000..DFFFD; UNASSIGNED # <reserved>..<reserved> D0000..DFFFD; UNASSIGNED # <reserved>..<reserved>
DFFFE..DFFFF; DISALLOWED # <noncharacter>..<noncharacter> DFFFE..DFFFF; DISALLOWED # <noncharacter>..<noncharacter>
E0000 ; UNASSIGNED # <reserved> E0000 ; UNASSIGNED # <reserved>
E0001 ; DISALLOWED # LANGUAGE TAG E0001 ; DISALLOWED # LANGUAGE TAG
E0002..E001F; UNASSIGNED # <reserved>..<reserved> E0002..E001F; UNASSIGNED # <reserved>..<reserved>
E0020..E007F; DISALLOWED # TAG SPACE..CANCEL TAG E0020..E007F; DISALLOWED # TAG SPACE..CANCEL TAG
E0080..E00FF; UNASSIGNED # <reserved>..<reserved> E0080..E00FF; UNASSIGNED # <reserved>..<reserved>
E0100..E01EF; PVALID # VAR SEL-17..VAR SEL-256 E0100..E01EF; DISALLOWED # VAR SEL-17..VAR SEL-256
E01F0..EFFFD; UNASSIGNED # <reserved>..<reserved> E01F0..EFFFD; UNASSIGNED # <reserved>..<reserved>
EFFFE..10FFFF; DISALLOWED # <noncharacter>..<noncharacter> EFFFE..10FFFF; DISALLOWED # <noncharacter>..<noncharacter>
Appendix B. Acknowledgements Appendix B. Acknowledgements
The authors would like to acknowledge the comments and contributions The authors would like to acknowledge the comments and contributions
of the following individuals: David Black, Mark Davis, Alan DeKok, of the following individuals: David Black, Mark Davis, Alan DeKok,
Martin Duerst, Patrik Faltstrom, Ted Hardie, Joe Hildebrand, Paul Martin Duerst, Patrik Faltstrom, Ted Hardie, Joe Hildebrand, Paul
Hoffman, Jeffrey Hutzelman, Simon Josefsson, John Klensin, Alexey Hoffman, Jeffrey Hutzelman, Simon Josefsson, John Klensin, Alexey
Melnikov, Takahiro Nemoto, Yoav Nir, Mike Parker, Pete Resnick, Melnikov, Takahiro Nemoto, Yoav Nir, Mike Parker, Pete Resnick,
 End of changes. 119 change blocks. 
242 lines changed or deleted 263 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/