draft-ietf-precis-framework-20.txt   draft-ietf-precis-framework-21.txt 
PRECIS P. Saint-Andre PRECIS P. Saint-Andre
Internet-Draft &yet Internet-Draft &yet
Obsoletes: 3454 (if approved) M. Blanchet Obsoletes: 3454 (if approved) M. Blanchet
Intended status: Standards Track Viagenie Intended status: Standards Track Viagenie
Expires: May 25, 2015 November 21, 2014 Expires: June 13, 2015 December 10, 2014
PRECIS Framework: Preparation, Enforcement, and Comparison of PRECIS Framework: Preparation, Enforcement, and Comparison of
Internationalized Strings in Application Protocols Internationalized Strings in Application Protocols
draft-ietf-precis-framework-20 draft-ietf-precis-framework-21
Abstract Abstract
Application protocols using Unicode characters in protocol strings Application protocols using Unicode characters in protocol strings
need to properly handle such strings in order to enforce need to properly handle such strings in order to enforce
internationalization rules for strings placed in various protocol internationalization rules for strings placed in various protocol
slots (such as addresses and identifiers) and to perform valid slots (such as addresses and identifiers) and to perform valid
comparison operations (e.g., for purposes of authentication or comparison operations (e.g., for purposes of authentication or
authorization). This document defines a framework enabling authorization). This document defines a framework enabling
application protocols to perform the preparation, enforcement, and application protocols to perform the preparation, enforcement, and
skipping to change at page 1, line 44 skipping to change at page 1, line 44
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 25, 2015. This Internet-Draft will expire on June 13, 2015.
Copyright Notice Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 27 skipping to change at page 2, line 27
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6
3. Preparation, Enforcement, and Comparison . . . . . . . . . . 6 3. Preparation, Enforcement, and Comparison . . . . . . . . . . 6
4. String Classes . . . . . . . . . . . . . . . . . . . . . . . 7 4. String Classes . . . . . . . . . . . . . . . . . . . . . . . 7
4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 7 4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2. IdentifierClass . . . . . . . . . . . . . . . . . . . . . 8 4.2. IdentifierClass . . . . . . . . . . . . . . . . . . . . . 9
4.2.1. Valid . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2.1. Valid . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.2. Contextual Rule Required . . . . . . . . . . . . . . 9 4.2.2. Contextual Rule Required . . . . . . . . . . . . . . 9
4.2.3. Disallowed . . . . . . . . . . . . . . . . . . . . . 9 4.2.3. Disallowed . . . . . . . . . . . . . . . . . . . . . 9
4.2.4. Unassigned . . . . . . . . . . . . . . . . . . . . . 10 4.2.4. Unassigned . . . . . . . . . . . . . . . . . . . . . 10
4.2.5. Examples . . . . . . . . . . . . . . . . . . . . . . 10 4.2.5. Examples . . . . . . . . . . . . . . . . . . . . . . 10
4.3. FreeformClass . . . . . . . . . . . . . . . . . . . . . . 10 4.3. FreeformClass . . . . . . . . . . . . . . . . . . . . . . 10
4.3.1. Valid . . . . . . . . . . . . . . . . . . . . . . . . 11 4.3.1. Valid . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3.2. Contextual Rule Required . . . . . . . . . . . . . . 11 4.3.2. Contextual Rule Required . . . . . . . . . . . . . . 11
4.3.3. Disallowed . . . . . . . . . . . . . . . . . . . . . 11 4.3.3. Disallowed . . . . . . . . . . . . . . . . . . . . . 11
4.3.4. Unassigned . . . . . . . . . . . . . . . . . . . . . 12 4.3.4. Unassigned . . . . . . . . . . . . . . . . . . . . . 12
4.3.5. Examples . . . . . . . . . . . . . . . . . . . . . . 12 4.3.5. Examples . . . . . . . . . . . . . . . . . . . . . . 12
5. Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5. Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.1. Profiles Must Not Be Multiplied Beyond Necessity . . . . 12 5.1. Profiles Must Not Be Multiplied Beyond Necessity . . . . 12
5.2. Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.2. Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2.1. Width Mapping Rule . . . . . . . . . . . . . . . . . 13 5.2.1. Width Mapping Rule . . . . . . . . . . . . . . . . . 13
5.2.2. Additional Mapping Rule . . . . . . . . . . . . . . . 13 5.2.2. Additional Mapping Rule . . . . . . . . . . . . . . . 13
5.2.3. Case Mapping Rule . . . . . . . . . . . . . . . . . . 14 5.2.3. Case Mapping Rule . . . . . . . . . . . . . . . . . . 14
5.2.4. Normalization Rule . . . . . . . . . . . . . . . . . 14 5.2.4. Normalization Rule . . . . . . . . . . . . . . . . . 14
5.2.5. Directionality Rule . . . . . . . . . . . . . . . . . 14 5.2.5. Directionality Rule . . . . . . . . . . . . . . . . . 15
5.3. Further Excluded Characters . . . . . . . . . . . . . . . 15 5.3. A Note about Spaces . . . . . . . . . . . . . . . . . . . 15
5.4. Building Application-Layer Constructs . . . . . . . . . . 15 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . 16
5.5. A Note about Spaces . . . . . . . . . . . . . . . . . . . 16 6.1. How to Use PRECIS in Applications . . . . . . . . . . . . 16
6. Order of Operations . . . . . . . . . . . . . . . . . . . . . 17 6.2. Further Excluded Characters . . . . . . . . . . . . . . . 16
7. Code Point Properties . . . . . . . . . . . . . . . . . . . . 17 6.3. Building Application-Layer Constructs . . . . . . . . . . 17
8. Category Definitions Used to Calculate Derived Property . . . 20 7. Order of Operations . . . . . . . . . . . . . . . . . . . . . 18
8.1. LetterDigits (A) . . . . . . . . . . . . . . . . . . . . 20 8. Code Point Properties . . . . . . . . . . . . . . . . . . . . 18
8.2. Unstable (B) . . . . . . . . . . . . . . . . . . . . . . 20 9. Category Definitions Used to Calculate Derived Property . . . 21
8.3. IgnorableProperties (C) . . . . . . . . . . . . . . . . . 21 9.1. LetterDigits (A) . . . . . . . . . . . . . . . . . . . . 21
8.4. IgnorableBlocks (D) . . . . . . . . . . . . . . . . . . . 21 9.2. Unstable (B) . . . . . . . . . . . . . . . . . . . . . . 21
8.5. LDH (E) . . . . . . . . . . . . . . . . . . . . . . . . . 21 9.3. IgnorableProperties (C) . . . . . . . . . . . . . . . . . 22
8.6. Exceptions (F) . . . . . . . . . . . . . . . . . . . . . 21 9.4. IgnorableBlocks (D) . . . . . . . . . . . . . . . . . . . 22
8.7. BackwardCompatible (G) . . . . . . . . . . . . . . . . . 21 9.5. LDH (E) . . . . . . . . . . . . . . . . . . . . . . . . . 22
8.8. JoinControl (H) . . . . . . . . . . . . . . . . . . . . . 21 9.6. Exceptions (F) . . . . . . . . . . . . . . . . . . . . . 22
8.9. OldHangulJamo (I) . . . . . . . . . . . . . . . . . . . . 22 9.7. BackwardCompatible (G) . . . . . . . . . . . . . . . . . 22
8.10. Unassigned (J) . . . . . . . . . . . . . . . . . . . . . 22 9.8. JoinControl (H) . . . . . . . . . . . . . . . . . . . . . 22
8.11. ASCII7 (K) . . . . . . . . . . . . . . . . . . . . . . . 22 9.9. OldHangulJamo (I) . . . . . . . . . . . . . . . . . . . . 22
8.12. Controls (L) . . . . . . . . . . . . . . . . . . . . . . 22 9.10. Unassigned (J) . . . . . . . . . . . . . . . . . . . . . 23
8.13. PrecisIgnorableProperties (M) . . . . . . . . . . . . . . 22 9.11. ASCII7 (K) . . . . . . . . . . . . . . . . . . . . . . . 23
8.14. Spaces (N) . . . . . . . . . . . . . . . . . . . . . . . 23 9.12. Controls (L) . . . . . . . . . . . . . . . . . . . . . . 23
8.15. Symbols (O) . . . . . . . . . . . . . . . . . . . . . . . 23 9.13. PrecisIgnorableProperties (M) . . . . . . . . . . . . . . 23
8.16. Punctuation (P) . . . . . . . . . . . . . . . . . . . . . 23 9.14. Spaces (N) . . . . . . . . . . . . . . . . . . . . . . . 23
8.17. HasCompat (Q) . . . . . . . . . . . . . . . . . . . . . . 23 9.15. Symbols (O) . . . . . . . . . . . . . . . . . . . . . . . 23
8.18. OtherLetterDigits (R) . . . . . . . . . . . . . . . . . . 23 9.16. Punctuation (P) . . . . . . . . . . . . . . . . . . . . . 24
9. Guidelines for Designated Experts . . . . . . . . . . . . . . 23 9.17. HasCompat (Q) . . . . . . . . . . . . . . . . . . . . . . 24
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 9.18. OtherLetterDigits (R) . . . . . . . . . . . . . . . . . . 24
10.1. PRECIS Derived Property Value Registry . . . . . . . . . 24 10. Guidelines for Designated Experts . . . . . . . . . . . . . . 24
10.2. PRECIS Base Classes Registry . . . . . . . . . . . . . . 24 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25
10.3. PRECIS Profiles Registry . . . . . . . . . . . . . . . . 25 11.1. PRECIS Derived Property Value Registry . . . . . . . . . 25
11. Security Considerations . . . . . . . . . . . . . . . . . . . 27 11.2. PRECIS Base Classes Registry . . . . . . . . . . . . . . 25
11.1. General Issues . . . . . . . . . . . . . . . . . . . . . 27 11.3. PRECIS Profiles Registry . . . . . . . . . . . . . . . . 26
11.2. Use of the IdentifierClass . . . . . . . . . . . . . . . 27 12. Security Considerations . . . . . . . . . . . . . . . . . . . 28
11.3. Use of the FreeformClass . . . . . . . . . . . . . . . . 28 12.1. General Issues . . . . . . . . . . . . . . . . . . . . . 28
11.4. Local Character Set Issues . . . . . . . . . . . . . . . 28 12.2. Use of the IdentifierClass . . . . . . . . . . . . . . . 29
11.5. Visually Similar Characters . . . . . . . . . . . . . . 28 12.3. Use of the FreeformClass . . . . . . . . . . . . . . . . 29
11.6. Security of Passwords . . . . . . . . . . . . . . . . . 30 12.4. Local Character Set Issues . . . . . . . . . . . . . . . 29
12. Interoperability Considerations . . . . . . . . . . . . . . . 31 12.5. Visually Similar Characters . . . . . . . . . . . . . . 29
13. References . . . . . . . . . . . . . . . . . . . . . . . . . 32 12.6. Security of Passwords . . . . . . . . . . . . . . . . . 31
13.1. Normative References . . . . . . . . . . . . . . . . . . 32 13. Interoperability Considerations . . . . . . . . . . . . . . . 32
13.2. Informative References . . . . . . . . . . . . . . . . . 32 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 33
13.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 35 14.1. Normative References . . . . . . . . . . . . . . . . . . 33
Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 35 14.2. Informative References . . . . . . . . . . . . . . . . . 33
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35 14.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 36
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 37
1. Introduction 1. Introduction
Application protocols using Unicode characters [Unicode7.0] in Application protocols using Unicode characters [Unicode7.0] in
protocol strings need to properly handle such strings in order to protocol strings need to properly handle such strings in order to
enforce internationalization rules for strings placed in various enforce internationalization rules for strings placed in various
protocol slots (such as addresses and identifiers) and to perform protocol slots (such as addresses and identifiers) and to perform
valid comparison operations (e.g., for purposes of authentication or valid comparison operations (e.g., for purposes of authentication or
authorization). This document defines a framework enabling authorization). This document defines a framework enabling
application protocols to perform the preparation, enforcement, and application protocols to perform the preparation, enforcement, and
skipping to change at page 5, line 25 skipping to change at page 5, line 27
o End users will be able to acquire more accurate expectations about o End users will be able to acquire more accurate expectations about
the characters that are acceptable in various contexts. Given the characters that are acceptable in various contexts. Given
this more uniform set of string classes, it is also expected that this more uniform set of string classes, it is also expected that
copy/paste operations between software implementing different copy/paste operations between software implementing different
application protocols will be more predictable and coherent. application protocols will be more predictable and coherent.
Whereas the string classes define the "baseline" code points for a Whereas the string classes define the "baseline" code points for a
range of applications, profiling enables application protocols to range of applications, profiling enables application protocols to
apply the string classes in ways that are appropriate for common apply the string classes in ways that are appropriate for common
constructs such as usernames and passwords constructs such as usernames [I-D.ietf-precis-saslprepbis], opaque
[I-D.ietf-precis-saslprepbis], nicknames [I-D.ietf-precis-nickname], strings such as passwords [I-D.ietf-precis-saslprepbis], and
the localparts of account names [I-D.ietf-xmpp-6122bis], and free- nicknames [I-D.ietf-precis-nickname]. Profiles are responsible for
form strings [I-D.ietf-xmpp-6122bis]. Profiles are responsible for
defining the handling of right-to-left characters as well as various defining the handling of right-to-left characters as well as various
mapping operations of the kind also discussed for IDNs in [RFC5895], mapping operations of the kind also discussed for IDNs in [RFC5895],
such as case preservation or lowercasing, Unicode normalization, such as case preservation or lowercasing, Unicode normalization,
mapping of certain characters to other characters or to nothing, and mapping of certain characters to other characters or to nothing, and
mapping of full-width and half-width characters. mapping of full-width and half-width characters.
When an application applies a profile of a PRECIS string class, it When an application applies a profile of a PRECIS string class, it
can achieve the following objectives: transforms an input string (which might or might not be conforming)
into an output string that definitively conforms to the profile. In
particular, this document focuses on the resulting ability to achieve
the following objectives:
a. Determine if a given string conforms to the profile, thus a. Enforcing all the the rules of a profile for a single output
enabling enforcement of the rules (e.g., to determine if a string string (e.g., to determine if a string can be included protocol
is allowed for use in the relevant protocol slot specified by an slot, communicated to another entity within a protocol, stored in
application protocol). a retrieval system, etc.).
b. Determine if any two given strings are equivalent, thus enabling b. Comparing two output strings to determine if they equivalent,
comparision (e.g., to make an access decision for purposes of typically through octet-for-octet matching to test for "bit-
authentication or authorization as further described in string identity" (e.g., to make an access decision for purposes
of authentication or authorization as further described in
[RFC6943]). [RFC6943]).
The opportunity to define profiles naturally introduces the The opportunity to define profiles naturally introduces the
possibility of a proliferation of profiles, thus potentially possibility of a proliferation of profiles, thus potentially
mitigating the benefits of common code and violating user mitigating the benefits of common code and violating user
expectations. See Section 5 for a discussion of this important expectations. See Section 5 for a discussion of this important
topic. topic.
In addition, it is extremely important for protocol designers and
application developers to understand that the transformation of an
input string to an output string is rarely reversible. As one
relatively simple example, case mapping would transform an input
string of "StPeter" to "stpeter", and information about the
capitalization of the first and third characters would be lost.
Similar considerations apply to other forms of mapping and
normalization.
Although this framework is similar to IDNA2008 and includes by Although this framework is similar to IDNA2008 and includes by
reference some of the character categories defined in [RFC5892], it reference some of the character categories defined in [RFC5892], it
defines additional character categories to meet the needs of common defines additional character categories to meet the needs of common
application protocols. application protocols other than DNS.
The character categories and calculation rules defined under The character categories and calculation rules defined under
Section 7 and Section 8 are normative and apply to all Unicode code Section 8 and Section 9 are normative and apply to all Unicode code
points. The code point table that results from applying the points. The code point table that results from applying the
character categories and calculation rules to the latest version of character categories and calculation rules to the latest version of
Unicode can be found in an IANA registry. Unicode can be found in an IANA registry.
2. Terminology 2. Terminology
Many important terms used in this document are defined in [RFC5890], Many important terms used in this document are defined in [RFC5890],
[RFC6365], [RFC6885], and [Unicode7.0]. The terms "left-to-right" [RFC6365], [RFC6885], and [Unicode7.0]. The terms "left-to-right"
(LTR) and "right-to-left" (RTL) are defined in Unicode Standard Annex (LTR) and "right-to-left" (RTL) are defined in Unicode Standard Annex
#9 [UAX9]. #9 [UAX9].
skipping to change at page 8, line 8 skipping to change at page 8, line 18
FreeformClass: a sequence of letters, numbers, symbols, spaces, and FreeformClass: a sequence of letters, numbers, symbols, spaces, and
other characters that is used for free-form strings, including other characters that is used for free-form strings, including
passwords as well as display elements such as human-friendly passwords as well as display elements such as human-friendly
nicknames for devices or for participants in a chatroom; the nicknames for devices or for participants in a chatroom; the
intent is that this class will allow nearly any Unicode character, intent is that this class will allow nearly any Unicode character,
with the result that expressiveness has been prioritized over with the result that expressiveness has been prioritized over
safety for this class. Note well that protocol designers, safety for this class. Note well that protocol designers,
application developers, service providers, and end users might not application developers, service providers, and end users might not
understand or be able to enter all of the characters that can be understand or be able to enter all of the characters that can be
included in the FreeformClass - see Section 11.3 for details. included in the FreeformClass - see Section 12.3 for details.
Future specifications might define additional PRECIS string classes, Future specifications might define additional PRECIS string classes,
such as a class that falls somewhere between the IdentifierClass and such as a class that falls somewhere between the IdentifierClass and
the FreeformClass. At this time, it is not clear how useful such a the FreeformClass. At this time, it is not clear how useful such a
class would be. In any case, because application developers are able class would be. In any case, because application developers are able
to define profiles of PRECIS string classes, a protocol needing a to define profiles of PRECIS string classes, a protocol needing a
construct between the IdentiferClass and the FreeformClass could construct between the IdentiferClass and the FreeformClass could
define a restricted profile of the FreeformClass if needed. define a restricted profile of the FreeformClass if needed.
The following subsections discuss the IdentifierClass and The following subsections discuss the IdentifierClass and
FreeformClass in more detail, with reference to the dimensions FreeformClass in more detail, with reference to the dimensions
described in Section 3 of [RFC6885]. Each string class is defined by described in Section 3 of [RFC6885]. Each string class is defined by
the following behavioral rules: the following behavioral rules:
Valid: Defines which code points and character categories are Valid: Defines which code points are treated as valid for the
treated as valid input to the string. string.
Contextual Rule Required: Defines which code points and character Contextual Rule Required: Defines which code points are treated as
categories are treated as allowed only if the requirements of a allowed only if the requirements of a contextual rule are met
contextual rule are met (i.e., either CONTEXTJ or CONTEXTO). (i.e., either CONTEXTJ or CONTEXTO).
Disallowed: Defines which code points and character categories need Disallowed: Defines which code points need to be excluded from the
to be excluded from the string. string.
Unassigned: Defines application behavior in the presence of code Unassigned: Defines application behavior in the presence of code
points that are unknown (i.e., not yet designated) for the version points that are unknown (i.e., not yet designated) for the version
of Unicode used by the application. of Unicode used by the application.
This document defines the valid, contextual rule required, This document defines the valid, contextual rule required,
disallowed, and unassigned rules for the IdentifierClass and disallowed, and unassigned rules for the IdentifierClass and
FreeformClass. As described under Section 5, profiles of these FreeformClass. As described under Section 5, profiles of these
string classes are responsible for defining the width mapping, string classes are responsible for defining the width mapping,
additional mappings, case mapping, normalization, and directionality additional mappings, case mapping, normalization, and directionality
skipping to change at page 9, line 9 skipping to change at page 9, line 19
Most application technologies need strings that can be used to refer Most application technologies need strings that can be used to refer
to, include, or communicate protocol strings like usernames, file to, include, or communicate protocol strings like usernames, file
names, data feed identifiers, and chatroom names. We group such names, data feed identifiers, and chatroom names. We group such
strings into a class called "IdentifierClass" having the following strings into a class called "IdentifierClass" having the following
features. features.
4.2.1. Valid 4.2.1. Valid
o Code points traditionally used as letters and numbers in writing o Code points traditionally used as letters and numbers in writing
systems, i.e., the LetterDigits ("A") category first defined in systems, i.e., the LetterDigits ("A") category first defined in
[RFC5892] and listed here under Section 8.1. [RFC5892] and listed here under Section 9.1.
o Code points in the range U+0021 through U+007E, i.e., the o Code points in the range U+0021 through U+007E, i.e., the
(printable) ASCII7 ("K") rule defined under Section 8.11. These (printable) ASCII7 ("K") rule defined under Section 9.11. These
code points are "grandfathered" into PRECIS and thus are valid code points are "grandfathered" into PRECIS and thus are valid
even if they would otherwise be disallowed according to the even if they would otherwise be disallowed according to the
property-based rules specified in the next section. property-based rules specified in the next section.
Note: Although the PRECIS IdentifierClass re-uses the LetterDigits Note: Although the PRECIS IdentifierClass re-uses the LetterDigits
category from IDNA2008, the range of characters allowed in the category from IDNA2008, the range of characters allowed in the
IdentifierClass is wider than the range of characters allowed in IdentifierClass is wider than the range of characters allowed in
IDNA2008. The main reason is that IDNA2008 applies the Unstable IDNA2008. The main reason is that IDNA2008 applies the Unstable
category before the LetterDigits category, thus disallowing category before the LetterDigits category, thus disallowing
uppercase characters, whereas the IdentifierClass does not apply uppercase characters, whereas the IdentifierClass does not apply
the Unstable category. the Unstable category.
4.2.2. Contextual Rule Required 4.2.2. Contextual Rule Required
o A number of characters from the Exceptions ("F") category defined o A number of characters from the Exceptions ("F") category defined
under Section 8.6 (see Section 8.6 for a full list). under Section 9.6 (see Section 9.6 for a full list).
o Joining characters, i.e., the JoinControl ("H") category defined o Joining characters, i.e., the JoinControl ("H") category defined
under Section 8.8. under Section 9.8.
4.2.3. Disallowed 4.2.3. Disallowed
o Old Hangul Jamo characters, i.e., the OldHangulJamo ("I") category o Old Hangul Jamo characters, i.e., the OldHangulJamo ("I") category
defined under Section 8.9. defined under Section 9.9.
o Control characters, i.e., the Controls ("L") category defined o Control characters, i.e., the Controls ("L") category defined
under Section 8.12. under Section 9.12.
o Ignorable characters, i.e., the PrecisIgnorableProperties ("M") o Ignorable characters, i.e., the PrecisIgnorableProperties ("M")
category defined under Section 8.13. category defined under Section 9.13.
o Space characters, i.e., the Spaces ("N") category defined under o Space characters, i.e., the Spaces ("N") category defined under
Section 8.14. Section 9.14.
o Symbol characters, i.e., the Symbols ("O") category defined under o Symbol characters, i.e., the Symbols ("O") category defined under
Section 8.15. Section 9.15.
o Punctuation characters, i.e., the Punctuation ("P") category o Punctuation characters, i.e., the Punctuation ("P") category
defined under Section 8.16. defined under Section 9.16.
o Any character that has a compatibility equivalent, i.e., the o Any character that has a compatibility equivalent, i.e., the
HasCompat ("Q") category defined under Section 8.17. These code HasCompat ("Q") category defined under Section 9.17. These code
points are disallowed even if they would otherwise be valid points are disallowed even if they would otherwise be valid
according to the property-based rules specified in the previous according to the property-based rules specified in the previous
section. section.
o Letters and digits other than the "traditional" letters and digits o Letters and digits other than the "traditional" letters and digits
allowed in IDNs, i.e., the OtherLetterDigits ("R") category allowed in IDNs, i.e., the OtherLetterDigits ("R") category
defined under Section 8.18. defined under Section 9.18.
4.2.4. Unassigned 4.2.4. Unassigned
Any code points that are not yet designated in the Unicode character Any code points that are not yet designated in the Unicode character
set are considered Unassigned for purposes of the IdentifierClass, set are considered Unassigned for purposes of the IdentifierClass,
and such code points are to be treated as Disallowed. See and such code points are to be treated as Disallowed. See
Section 8.10. Section 9.10.
4.2.5. Examples 4.2.5. Examples
As described in the Introduction to this document, the string classes As described in the Introduction to this document, the string classes
do not handle all issues related to string preparation and comparison do not handle all issues related to string preparation and comparison
(such as case mapping); instead, such issues are handled at the level (such as case mapping); instead, such issues are handled at the level
of profiles. Examples for two profiles of the IdentifierClass can be of profiles. Examples for two profiles of the IdentifierClass can be
found in [I-D.ietf-precis-saslprepbis] (the UsernameIdentifierClass found in [I-D.ietf-precis-saslprepbis] (the UsernameIdentifierClass
profile) and in [I-D.ietf-xmpp-6122bis] (the LocalpartIdentifierClass profile) and in [I-D.ietf-xmpp-6122bis] (the LocalpartIdentifierClass
profile). profile).
4.3. FreeformClass 4.3. FreeformClass
Some application technologies need strings that can be used in a Some application technologies need strings that can be used in a
free-form way, e.g., as a password in an authentication exchange (see free-form way, e.g., as a password in an authentication exchange (see
[I-D.ietf-precis-saslprepbis]) or a nickname in a chatroom (see [I-D.ietf-precis-saslprepbis]) or a nickname in a chatroom (see
[I-D.ietf-precis-nickname]). We group such things into a class [I-D.ietf-precis-nickname]). We group such things into a class
called "FreeformClass" having the following features. called "FreeformClass" having the following features.
Security Warning: As mentioned, the FreeformClass prioritizes Security Warning: As mentioned, the FreeformClass prioritizes
expressiveness over safety; Section 11.3 describes some of the expressiveness over safety; Section 12.3 describes some of the
security hazards involved with using or profiling the security hazards involved with using or profiling the
FreeformClass. FreeformClass.
Security Warning: Consult Section 11.6 for relevant security Security Warning: Consult Section 12.6 for relevant security
considerations when strings conforming to the FreeformClass, or a considerations when strings conforming to the FreeformClass, or a
profile thereof, are used as passwords. profile thereof, are used as passwords.
4.3.1. Valid 4.3.1. Valid
o Traditional letters and numbers, i.e., the LetterDigits ("A") o Traditional letters and numbers, i.e., the LetterDigits ("A")
category first defined in [RFC5892] and listed here under category first defined in [RFC5892] and listed here under
Section 8.1. Section 9.1.
o Letters and digits other than the "traditional" letters and digits o Letters and digits other than the "traditional" letters and digits
allowed in IDNs, i.e., the OtherLetterDigits ("R") category allowed in IDNs, i.e., the OtherLetterDigits ("R") category
defined under Section 8.18. defined under Section 9.18.
o Code points in the range U+0021 through U+007E, i.e., the o Code points in the range U+0021 through U+007E, i.e., the
(printable) ASCII7 ("K") rule defined under Section 8.11. (printable) ASCII7 ("K") rule defined under Section 9.11.
o Any character that has a compatibility equivalent, i.e., the o Any character that has a compatibility equivalent, i.e., the
HasCompat ("Q") category defined under Section 8.17. HasCompat ("Q") category defined under Section 9.17.
o Space characters, i.e., the Spaces ("N") category defined under o Space characters, i.e., the Spaces ("N") category defined under
Section 8.14. Section 9.14.
o Symbol characters, i.e., the Symbols ("O") category defined under o Symbol characters, i.e., the Symbols ("O") category defined under
Section 8.15. Section 9.15.
o Punctuation characters, i.e., the Punctuation ("P") category o Punctuation characters, i.e., the Punctuation ("P") category
defined under Section 8.16. defined under Section 9.16.
4.3.2. Contextual Rule Required 4.3.2. Contextual Rule Required
o A number of characters from the Exceptions ("F") category defined o A number of characters from the Exceptions ("F") category defined
under Section 8.6 (see Section 8.6 for a full list). under Section 9.6 (see Section 9.6 for a full list).
o Joining characters, i.e., the JoinControl ("H") category defined o Joining characters, i.e., the JoinControl ("H") category defined
under Section 8.8. under Section 9.8.
4.3.3. Disallowed 4.3.3. Disallowed
o Old Hangul Jamo characters, i.e., the OldHangulJamo ("I") category o Old Hangul Jamo characters, i.e., the OldHangulJamo ("I") category
defined under Section 8.9. defined under Section 9.9.
o Control characters, i.e., the Controls ("L") category defined o Control characters, i.e., the Controls ("L") category defined
under Section 8.12. under Section 9.12.
o Ignorable characters, i.e., the PrecisIgnorableProperties ("M") o Ignorable characters, i.e., the PrecisIgnorableProperties ("M")
category defined under Section 8.13. category defined under Section 9.13.
4.3.4. Unassigned 4.3.4. Unassigned
Any code points that are not yet designated in the Unicode character Any code points that are not yet designated in the Unicode character
set are considered Unassigned for purposes of the FreeformClass, and set are considered Unassigned for purposes of the FreeformClass, and
such code points are to be treated as Disallowed. such code points are to be treated as Disallowed.
4.3.5. Examples 4.3.5. Examples
As described in the Introduction to this document, the string classes As described in the Introduction to this document, the string classes
skipping to change at page 12, line 30 skipping to change at page 12, line 36
5. Profiles 5. Profiles
This framework document defines the valid, contextual-rule-required, This framework document defines the valid, contextual-rule-required,
disallowed, and unassigned rules for the IdentifierClass and the disallowed, and unassigned rules for the IdentifierClass and the
FreeformClass. A profile of a PRECIS string class MUST define the FreeformClass. A profile of a PRECIS string class MUST define the
width mapping, additional mappings (if any), case mapping, width mapping, additional mappings (if any), case mapping,
normalization, and directionality rules. A profile MAY also restrict normalization, and directionality rules. A profile MAY also restrict
the allowable characters above and beyond the definition of the the allowable characters above and beyond the definition of the
relevant PRECIS string class (but MUST NOT add as valid any code relevant PRECIS string class (but MUST NOT add as valid any code
points or character categories that are disallowed by the relevant points that are disallowed by the relevant PRECIS string class).
PRECIS string class). These matters are discussed in the following These matters are discussed in the following subsections.
subsections.
Profiles of the PRECIS string classes are registered with the IANA as Profiles of the PRECIS string classes are registered with the IANA as
described under Section 10.3. Profile names use the following described under Section 11.3. Profile names use the following
convention: they are of the form "ProfilenameBaseClass", where the convention: they are of the form "Profilename of BaseClass", where
"Profilename" string is a differentiator and "BaseClass" is the name the "Profilename" string is a differentiator and "BaseClass" is the
of the PRECIS string class being profiled; for example, the profile name of the PRECIS string class being profiled; for example, the
of the IdentifierClass used for localparts of Jabber Identifiers profile of the Freeform used for opaque strings such as passwords is
(JIDs) in the Extensible Messaging and Presence Protocol (XMPP) is the "OpaqueString" profile [I-D.ietf-precis-saslprepbis].
named "LocalpartIdentifierClass" [I-D.ietf-xmpp-6122bis].
5.1. Profiles Must Not Be Multiplied Beyond Necessity 5.1. Profiles Must Not Be Multiplied Beyond Necessity
The risk of profile proliferation is significant because having too The risk of profile proliferation is significant because having too
many profiles will result in different behavior across various many profiles will result in different behavior across various
applications, thus violating what is known in user interface design applications, thus violating what is known in user interface design
as the Principle of Least Astonishment. as the Principle of Least Astonishment.
Indeed, we already have too many profiles. Ideally we would have at Indeed, we already have too many profiles. Ideally we would have at
most two or three profiles. Unfortunately, numerous application most two or three profiles. Unfortunately, numerous application
skipping to change at page 13, line 16 skipping to change at page 13, line 19
nicknames, filenames, authentication identifiers, passwords, and nicknames, filenames, authentication identifiers, passwords, and
other strings are already out there in the wild and need to be other strings are already out there in the wild and need to be
supported in existing application protocols such as DNS, SMTP, XMPP, supported in existing application protocols such as DNS, SMTP, XMPP,
IRC, NFS, iSCSI, EAP, and SASL among others. IRC, NFS, iSCSI, EAP, and SASL among others.
Nevertheless, profiles must not be multiplied beyond necessity. Nevertheless, profiles must not be multiplied beyond necessity.
To help prevent profile proliferation, this document recommends To help prevent profile proliferation, this document recommends
sensible defaults for the various options offered to profile creators sensible defaults for the various options offered to profile creators
(such as width mapping and Unicode normalization). In addition, the (such as width mapping and Unicode normalization). In addition, the
guidelines for designated experts provided under Section 9 are meant guidelines for designated experts provided under Section 10 are meant
to encourage a high level of due diligence regarding new profiles. to encourage a high level of due diligence regarding new profiles.
5.2. Rules 5.2. Rules
5.2.1. Width Mapping Rule 5.2.1. Width Mapping Rule
The width mapping rule of a profile specifies whether width mapping The width mapping rule of a profile specifies whether width mapping
is performed on fullwidth and halfwidth characters, and how the is performed on the characters of a string, and how the mapping is
mapping is done. Typically such mapping consists of mapping done. Typically such mapping consists of mapping fullwidth and
fullwidth and halfwidth characters, i.e., code points with a halfwidth characters, i.e., code points with a Decomposition Type of
Decomposition Type of Wide or Narrow, to their decomposition Wide or Narrow, to their decomposition mappings; as an example,
mappings; as an example, FULLWIDTH DIGIT ZERO (U+FF10) would be FULLWIDTH DIGIT ZERO (U+FF10) would be mapped to DIGIT ZERO (U+0030).
mapped to DIGIT ZERO (U+0030).
The normalization form specified by a profile (see below) has an The normalization form specified by a profile (see below) has an
impact on the need for width mapping. Because width mapping is impact on the need for width mapping. Because width mapping is
performed as a part of compatibility decomposition, a profile performed as a part of compatibility decomposition, a profile
employing either normalization form KD (NFKD) or normalization form employing either normalization form KD (NFKD) or normalization form
KC (NFKC) does not need to specify width mapping. However, if KC (NFKC) does not need to specify width mapping. However, if
Unicode normalization form C (NFC) is used (as is recommended) then Unicode normalization form C (NFC) is used (as is recommended) then
the profile needs to specify whether to apply width mapping; in this the profile needs to specify whether to apply width mapping; in this
case, width mapping is in general RECOMMENDED because allowing case, width mapping is in general RECOMMENDED because allowing
fullwidth and halfwidth characters to remain unmapped to their fullwidth and halfwidth characters to remain unmapped to their
compatibility variants would violate the Principle of Least compatibility variants would violate the Principle of Least
Astonishment. For more information about the concept of width in Astonishment. For more information about the concept of width in
East Asian scripts within Unicode, see Unicode Standard Annex #11 East Asian scripts within Unicode, see Unicode Standard Annex #11
[UAX11]. [UAX11].
5.2.2. Additional Mapping Rule 5.2.2. Additional Mapping Rule
The additional mapping rule of a profile specifies whether additional The additional mapping rule of a profile specifies whether additional
mappings are to be applied, such as: mappings is performed on the characters of a string, such as:
Mapping of delimiter characters (such as '@', ':', '/', '+', and Mapping of delimiter characters (such as '@', ':', '/', '+', and
'-') '-')
Mapping of special characters (e.g., non-ASCII space characters to Mapping of special characters (e.g., non-ASCII space characters to
ASCII space or control characters to nothing). ASCII space or control characters to nothing).
The PRECIS mappings document [I-D.ietf-precis-mappings] describes The PRECIS mappings document [I-D.ietf-precis-mappings] describes
such mappings in more detail. such mappings in more detail.
5.2.3. Case Mapping Rule 5.2.3. Case Mapping Rule
The case mapping rule of a profile specifies whether case mapping is The case mapping rule of a profile specifies whether case mapping
performed (instead of case preservation) on uppercase and titlecase (instead of case preservation) is performed on the characters of a
characters, and how the mapping is done (e.g., mapping uppercase and string, and how the mapping is applied (e.g., mapping uppercase and
titlecase characters to their lowercase equivalents). titlecase characters to their lowercase equivalents).
If case mapping is desired (instead of case preservation), it is If case mapping is desired (instead of case preservation), it is
RECOMMENDED to use Unicode Default Case Folding as defined in Chapter RECOMMENDED to use Unicode Default Case Folding as defined in Chapter
3 of the Unicode Standard [Unicode7.0]. 3 of the Unicode Standard [Unicode7.0].
Note: Unicode Default Case Folding is not designed to handle Note: Unicode Default Case Folding is not designed to handle
various localization issues (such as so-called "dotless i" in various localization issues (such as so-called "dotless i" in
several Turkic languages). The PRECIS mappings document several Turkic languages). The PRECIS mappings document
[I-D.ietf-precis-mappings] describes these issues in greater [I-D.ietf-precis-mappings] describes these issues in greater
skipping to change at page 14, line 49 skipping to change at page 15, line 8
The normalization rule of a profile specifies which Unicode The normalization rule of a profile specifies which Unicode
normalization form (D, KD, C, or KC) is to be applied (see Unicode normalization form (D, KD, C, or KC) is to be applied (see Unicode
Standard Annex #15 [UAX15] for background information). Standard Annex #15 [UAX15] for background information).
In accordance with [RFC5198], normalization form C (NFC) is In accordance with [RFC5198], normalization form C (NFC) is
RECOMMENDED. RECOMMENDED.
5.2.5. Directionality Rule 5.2.5. Directionality Rule
The directionality rule of a profile specifies how to treat strings The directionality rule of a profile specifies how to treat strings
containing left-to-right (LTR) and right-to-left (RTL) characters that contain right-to-left (RTL) characters (see Unicode Standard
(see Unicode Standard Annex #9 [UAX9]). A profile usually specifies Annex #9 [UAX9]). In general this document recommends applying the
a directionality rule that restricts strings to be entirely LTR "Bidi Rule" from [RFC5893] to strings that contain RTL characters.
strings or entirely RTL strings and defines the allowable sequences
of characters in LTR and RTL strings. Possible rules include, but
are not limited to, (a) considering any string that contains a right-
to-left code point to be a right-to-left string, or (b) applying the
"Bidi Rule" from [RFC5893].
Mixed-direction strings are not directly supported by the PRECIS Mixed-direction strings (that is, strings containing some portions
framework itself, since there is currently no widely accepted and that are left-to-right and other portions that are right-to-left) are
implemented solution for the safe display of mixed-direction strings. not directly supported by the PRECIS framework itself, since there is
An application protocol that uses the PRECIS framework (or an currently no widely accepted and implemented solution for the safe
extension to the framework) could define better ways to present display of mixed-direction strings. An application protocol that
mixed-direction strings; however, that work is outside the scope of uses the PRECIS framework (or an extension to the framework) could
this framework and would likely require a great deal of careful define better ways to present mixed-direction strings; however, that
research into the problems of displaying bidirectional text. work is outside the scope of this framework and would likely require
a great deal of careful research into the problems of displaying
bidirectional text.
5.3. Further Excluded Characters 5.3. A Note about Spaces
With regard to the IdentiferClass, the consensus of the PRECIS
Working Group was that spaces are problematic for many reasons,
including:
o Many Unicode characters are confusable with ASCII space.
o Even if non-ASCII space characters are mapped to ASCII space
(U+0020), space characters are often not rendered in user
interfaces, leading to the possibility that a human user might
consider a string containing spaces to be equivalent to the same
string without spaces.
o In some locales, some devices are known to generate a character
other than ASCII space (such as ZERO WIDTH JOINER, U+200D) when a
user performs an action like hitting the space bar on a keyboard.
One consequence of disallowing space characters in the
IdentifierClass might be to effectively discourage their use within
identifiers created in newer application protocols; given the
challenges involved with properly handling space characters
(especially non-ASCII space characters) in identifiers and other
protocol strings, the PRECIS Working Group considered this to be a
feature, not a bug.
However, the FreeformClass does allow spaces, which enables
application protocols to define profiles of the FreeformClass that
are more flexible than any profiles of the IdentifierClass. In
addition, as explained in the previous section, application protocols
can also define application-layer constructs containing spaces.
6. Applications
6.1. How to Use PRECIS in Applications
Although PRECIS has been designed with applications in mind,
internationalization is not suddenly made easy though the use of
PRECIS. Application developers still need to give some thought to
how they will use the PRECIS string classes, or profiles thereof, in
their applications. This section provides some guidelines to
application developers (and to expert reviewers of application
protocol specifications).
o Don't define your own profile unless absolutely necessary (see
Section 5.1). Existing profiles have been design for wide re-use.
It is highly likely that an existing profile will meet your needs,
especially given the ability to specify further excluded
characters (Section 6.2) and to build application-layer constructs
(see Section 6.3).
o Do specify:
* Exactly which entities are responsible for preparation,
enforcement, and comparison of internationalized strings (e.g.,
servers or clients).
* Exactly when those entities need to complete their tasks (e.g.,
a server might need to enforce the rules of a profile before
allowing a client to gain network access).
* Exactly which protocol slots need to be checked against which
profiles (e.g., checking the address of a message's intended
recipient against the UsernameCaseMapped profile
[I-D.ietf-precis-saslprepbis] of the IdentifierClass, or
checking the password of a user against the OpaqueString
profile [I-D.ietf-precis-saslprepbis] of the FreeformClass).
See [I-D.ietf-precis-saslprepbis] and [I-D.ietf-xmpp-6122bis] for
definitions of these matters for several applications.
6.2. Further Excluded Characters
An application protocol that uses a profile MAY specify particular An application protocol that uses a profile MAY specify particular
characters or character categories that are not allowed in relevant code points that are not allowed in relevant slots within that
slots within that application protocol, above and beyond those application protocol, above and beyond those excluded by the string
excluded by the string class or profile. class or profile.
That is, an application protocol MAY do either of the following: That is, an application protocol MAY do either of the following:
1. Exclude specific code points that are allowed by the relevant 1. Exclude specific code points that are allowed by the relevant
string class. string class.
2. Exclude characters matching certain Unicode properties (e.g., 2. Exclude characters matching certain Unicode properties (e.g.,
math symbols) that are included in the relevant PRECIS string math symbols) that are included in the relevant PRECIS string
class. class.
As a result of such exclusions, code points that are defined as valid As a result of such exclusions, code points that are defined as valid
for the PRECIS string class or profile will be defined as disallowed for the PRECIS string class or profile will be defined as disallowed
for the relevant protocol slot. for the relevant protocol slot.
Typically, such exclusions are defined for the purpose of backward- Typically, such exclusions are defined for the purpose of backward-
compatibility with legacy formats within an application protocol. compatibility with legacy formats within an application protocol.
These are defined for application protocols, not profiles, in order These are defined for application protocols, not profiles, in order
to prevent multiplication of profiles beyond necessity (see to prevent multiplication of profiles beyond necessity (see
Section 5.1). Section 5.1).
5.4. Building Application-Layer Constructs 6.3. Building Application-Layer Constructs
Sometimes, an application-layer construct does not map in a Sometimes, an application-layer construct does not map in a
straightforward manner to one of the base string classes or a profile straightforward manner to one of the base string classes or a profile
thereof. Consider, for example, the "simple user name" construct in thereof. Consider, for example, the "simple user name" construct in
the Simple Authentication and Security Layer (SASL) [RFC4422]. the Simple Authentication and Security Layer (SASL) [RFC4422].
Depending on the deployment, a simple user name might take the form Depending on the deployment, a simple user name might take the form
of a user's full name (e.g., the user's personal name followed by a of a user's full name (e.g., the user's personal name followed by a
space and then the user's family name). Such a simple user name space and then the user's family name). Such a simple user name
cannot be defined as an instance of the IdentifierClass or a profile cannot be defined as an instance of the IdentifierClass or a profile
thereof, since space characters are not allowed in the thereof, since space characters are not allowed in the
skipping to change at page 16, line 24 skipping to change at page 18, line 5
; ;
; an "idbyte" is a byte used to represent a ; an "idbyte" is a byte used to represent a
; UTF-8 encoded Unicode code point that can be ; UTF-8 encoded Unicode code point that can be
; contained in a string that conforms to the ; contained in a string that conforms to the
; PRECIS "IdentifierClass" ; PRECIS "IdentifierClass"
; ;
Similar techniques could be used to define many application-layer Similar techniques could be used to define many application-layer
constructs, say of the form "user@domain" or "/path/to/file". constructs, say of the form "user@domain" or "/path/to/file".
5.5. A Note about Spaces 7. Order of Operations
With regard to the IdentiferClass, the consensus of the PRECIS
Working Group was that spaces are problematic for many reasons,
including:
o Many Unicode characters are confusable with ASCII space.
o Even if non-ASCII space characters are mapped to ASCII space
(U+0020), space characters are often not rendered in user
interfaces, leading to the possibility that a human user might
consider a string containing spaces to be equivalent to the same
string without spaces.
o In some locales, some devices are known to generate a character
other than ASCII space (such as ZERO WIDTH JOINER, U+200D) when a
user performs an action like hitting the space bar on a keyboard.
One consequence of disallowing space characters in the
IdentifierClass might be to effectively discourage their use within
identifiers created in newer application protocols; given the
challenges involved with properly handling space characters
(especially non-ASCII space characters) in identifiers and other
protocol strings, the PRECIS Working Group considered this to be a
feature, not a bug.
However, the FreeformClass does allow spaces, which enables
application protocols to define profiles of the FreeformClass that
are more flexible than any profiles of the IdentifierClass. In
addition, as explained in the previous section, application protocols
can also define application-layer constructs containing spaces.
6. Order of Operations
To ensure proper comparison, the rules specified for a particular To ensure proper comparison, the rules specified for a particular
string class or profile MUST be applied in the following order: string class or profile MUST be applied in the following order:
1. Width Mapping Rule 1. Width Mapping Rule
2. Additional Mapping Rule 2. Additional Mapping Rule
3. Case Mapping Rule 3. Case Mapping Rule
skipping to change at page 17, line 33 skipping to change at page 18, line 30
6. Behavioral rules for determining whether a code point is valid, 6. Behavioral rules for determining whether a code point is valid,
allowed under a contextual rule, disallowed, or unassigned allowed under a contextual rule, disallowed, or unassigned
As already described, the width mapping, additional mapping, case As already described, the width mapping, additional mapping, case
mapping, normalization, and directionality rules are specified for mapping, normalization, and directionality rules are specified for
each profile, whereas the behavioral rules are specified for each each profile, whereas the behavioral rules are specified for each
string class. Some of the logic behind this order is provided under string class. Some of the logic behind this order is provided under
Section 5.2.1 (see also the PRECIS mappings document Section 5.2.1 (see also the PRECIS mappings document
[I-D.ietf-precis-mappings]). [I-D.ietf-precis-mappings]).
7. Code Point Properties 8. Code Point Properties
In order to implement the string classes described above, this In order to implement the string classes described above, this
document does the following: document does the following:
1. Reviews and classifies the collections of code points in the 1. Reviews and classifies the collections of code points in the
Unicode character set by examining various code point properties. Unicode character set by examining various code point properties.
2. Defines an algorithm for determining a derived property value, 2. Defines an algorithm for determining a derived property value,
which can vary depending on the string class being used by the which can vary depending on the string class being used by the
relevant application protocol. relevant application protocol.
skipping to change at page 19, line 48 skipping to change at page 20, line 48
The mechanisms described here allow determination of the value of the The mechanisms described here allow determination of the value of the
property for future versions of Unicode (including characters added property for future versions of Unicode (including characters added
after Unicode 5.2 or 7.0 depending on the category, since some after Unicode 5.2 or 7.0 depending on the category, since some
categories mentioned in this document are simply pointers to IDNA2008 categories mentioned in this document are simply pointers to IDNA2008
and therefore were defined at the time of Unicode 5.2). Changes in and therefore were defined at the time of Unicode 5.2). Changes in
Unicode properties that do not affect the outcome of this process Unicode properties that do not affect the outcome of this process
therefore do not affect this framework. For example, a character can therefore do not affect this framework. For example, a character can
have its Unicode General_Category value (see Chapter 4 of the Unicode have its Unicode General_Category value (see Chapter 4 of the Unicode
Standard [Unicode7.0]) change from So to Sm, or from Lo to Ll, Standard [Unicode7.0]) change from So to Sm, or from Lo to Ll,
without affecting the algorithm results. Moreover, even if such without affecting the algorithm results. Moreover, even if such
changes were to result, the BackwardCompatible list (Section 8.7) can changes were to result, the BackwardCompatible list (Section 9.7) can
be adjusted to ensure the stability of the results. be adjusted to ensure the stability of the results.
8. Category Definitions Used to Calculate Derived Property 9. Category Definitions Used to Calculate Derived Property
The derived property obtains its value based on a two-step procedure: The derived property obtains its value based on a two-step procedure:
1. Characters are placed in one or more character categories either 1. Characters are placed in one or more character categories either
(1) based on core properties defined by the Unicode Standard or (1) based on core properties defined by the Unicode Standard or
(2) by treating the code point as an exception and addressing the (2) by treating the code point as an exception and addressing the
code point based on its code point value. These categories are code point based on its code point value. These categories are
not mutually exclusive. not mutually exclusive.
2. Set operations are used with these categories to determine the 2. Set operations are used with these categories to determine the
values for a property specific to a given string class. These values for a property specific to a given string class. These
operations are specified under Section 7. operations are specified under Section 8.
Note: Unicode property names and property value names might have Note: Unicode property names and property value names might have
short abbreviations, such as "gc" for the General_Category short abbreviations, such as "gc" for the General_Category
property and "Ll" for the Lowercase_Letter property value of the property and "Ll" for the Lowercase_Letter property value of the
gc property. gc property.
In the following specification of character categories, the operation In the following specification of character categories, the operation
that returns the value of a particular Unicode character property for that returns the value of a particular Unicode character property for
a code point is designated by using the formal name of that property a code point is designated by using the formal name of that property
(from the Unicode PropertyAliases.txt [1]) followed by '(cp)' for (from the Unicode PropertyAliases.txt [1]) followed by '(cp)' for
skipping to change at page 20, line 40 skipping to change at page 21, line 40
The first ten categories (A-J) shown below were previously defined The first ten categories (A-J) shown below were previously defined
for IDNA2008 and are referenced from [RFC5892] to ease the for IDNA2008 and are referenced from [RFC5892] to ease the
understanding of how PRECIS handles various characters. Some of understanding of how PRECIS handles various characters. Some of
these categories are reused in PRECIS and some of them are not; these categories are reused in PRECIS and some of them are not;
however, the lettering of categories is retained to prevent overlap however, the lettering of categories is retained to prevent overlap
and to ease implementation of both IDNA2008 and PRECIS in a single and to ease implementation of both IDNA2008 and PRECIS in a single
software application. The next eight categories (K-R) are specific software application. The next eight categories (K-R) are specific
to PRECIS. to PRECIS.
8.1. LetterDigits (A) 9.1. LetterDigits (A)
This category is defined in Secton 2.1 of [RFC5892] and is included This category is defined in Secton 2.1 of [RFC5892] and is included
by reference for use in PRECIS. by reference for use in PRECIS.
8.2. Unstable (B) 9.2. Unstable (B)
This category is defined in Secton 2.2 of [RFC5892] but is not used This category is defined in Secton 2.2 of [RFC5892] but is not used
in PRECIS. in PRECIS.
8.3. IgnorableProperties (C) 9.3. IgnorableProperties (C)
This category is defined in Secton 2.3 of [RFC5892] but is not used This category is defined in Secton 2.3 of [RFC5892] but is not used
in PRECIS. in PRECIS.
Note: See the "PrecisIgnorableProperties (M)" category below for a Note: See the "PrecisIgnorableProperties (M)" category below for a
more inclusive category used in PRECIS identifiers. more inclusive category used in PRECIS identifiers.
8.4. IgnorableBlocks (D) 9.4. IgnorableBlocks (D)
This category is defined in Secton 2.4 of [RFC5892] but is not used This category is defined in Secton 2.4 of [RFC5892] but is not used
in PRECIS. in PRECIS.
8.5. LDH (E) 9.5. LDH (E)
This category is defined in Secton 2.5 of [RFC5892] but is not used This category is defined in Secton 2.5 of [RFC5892] but is not used
in PRECIS. in PRECIS.
Note: See the "ASCII7 (K)" category below for a more inclusive Note: See the "ASCII7 (K)" category below for a more inclusive
category used in PRECIS identifiers. category used in PRECIS identifiers.
8.6. Exceptions (F) 9.6. Exceptions (F)
This category is defined in Secton 2.6 of [RFC5892] and is included This category is defined in Secton 2.6 of [RFC5892] and is included
by reference for use in PRECIS. by reference for use in PRECIS.
8.7. BackwardCompatible (G) 9.7. BackwardCompatible (G)
This category is defined in Secton 2.7 of [RFC5892] and is included This category is defined in Secton 2.7 of [RFC5892] and is included
by reference for use in PRECIS. by reference for use in PRECIS.
Note: Because of how the PRECIS string classes are defined, only Note: Management of this category is handled via the processes
changes that would result in code points being added to or removed specified in [RFC5892]. At the time of this writing (and also at the
from the LetterDigits ("A") category would result in backward- time that RFC 5892 was published), this category consisted of the
incompatible modifications to code point assignments. Therefore, empty set; however, that is subject to change as described in RFC
management of this category is handled via the processes specified in 5892.
[RFC5892]. At the time of this writing (and also at the time that
RFC 5892 was published), this category consisted of the empty set;
however, that is subject to change as described in RFC 5892.
8.8. JoinControl (H) 9.8. JoinControl (H)
This category is defined in Secton 2.8 of [RFC5892] and is included This category is defined in Secton 2.8 of [RFC5892] and is included
by reference for use in PRECIS. by reference for use in PRECIS.
8.9. OldHangulJamo (I) 9.9. OldHangulJamo (I)
This category is defined in Secton 2.9 of [RFC5892] and is included This category is defined in Secton 2.9 of [RFC5892] and is included
by reference for use in PRECIS. by reference for use in PRECIS.
8.10. Unassigned (J) 9.10. Unassigned (J)
This category is defined in Secton 2.10 of [RFC5892] and is included This category is defined in Secton 2.10 of [RFC5892] and is included
by reference for use in PRECIS. by reference for use in PRECIS.
8.11. ASCII7 (K) 9.11. ASCII7 (K)
This PRECIS-specific category consists of all printable, non-space This PRECIS-specific category consists of all printable, non-space
characters from the 7-bit ASCII range. By applying this category, characters from the 7-bit ASCII range. By applying this category,
the algorithm specified under Section 7 exempts these characters from the algorithm specified under Section 8 exempts these characters from
other rules that might be applied during PRECIS processing, on the other rules that might be applied during PRECIS processing, on the
assumption that these code points are in such wide use that assumption that these code points are in such wide use that
disallowing them would be counter-productive. disallowing them would be counter-productive.
K: cp is in {0021..007E} K: cp is in {0021..007E}
8.12. Controls (L) 9.12. Controls (L)
This PRECIS-specific category consists of all control characters. This PRECIS-specific category consists of all control characters.
L: Control(cp) = True L: Control(cp) = True
8.13. PrecisIgnorableProperties (M) 9.13. PrecisIgnorableProperties (M)
This PRECIS-specific category is used to group code points that are This PRECIS-specific category is used to group code points that are
discouraged from use in PRECIS string classes. discouraged from use in PRECIS string classes.
M: Default_Ignorable_Code_Point(cp) = True or M: Default_Ignorable_Code_Point(cp) = True or
Noncharacter_Code_Point(cp) = True Noncharacter_Code_Point(cp) = True
The definition for Default_Ignorable_Code_Point can be found in the The definition for Default_Ignorable_Code_Point can be found in the
DerivedCoreProperties.txt [2] file, and at the time of Unicode 7.0 is DerivedCoreProperties.txt [2] file.
as follows:
Other_Default_Ignorable_Code_Point
+ Cf (Format characters)
+ Variation_Selector
- White_Space
- FFF9..FFFB (Annotation Characters)
- 0600..0604, 06DD, 070F, 110BD (exceptional Cf characters
that should be visible)
8.14. Spaces (N) 9.14. Spaces (N)
This PRECIS-specific category is used to group code points that are This PRECIS-specific category is used to group code points that are
space characters. space characters.
N: General_Category(cp) is in {Zs} N: General_Category(cp) is in {Zs}
8.15. Symbols (O) 9.15. Symbols (O)
This PRECIS-specific category is used to group code points that are This PRECIS-specific category is used to group code points that are
symbols. symbols.
O: General_Category(cp) is in {Sm, Sc, Sk, So} O: General_Category(cp) is in {Sm, Sc, Sk, So}
8.16. Punctuation (P) 9.16. Punctuation (P)
This PRECIS-specific category is used to group code points that are This PRECIS-specific category is used to group code points that are
punctuation characters. punctuation characters.
P: General_Category(cp) is in {Pc, Pd, Ps, Pe, Pi, Pf, Po} P: General_Category(cp) is in {Pc, Pd, Ps, Pe, Pi, Pf, Po}
8.17. HasCompat (Q) 9.17. HasCompat (Q)
This PRECIS-specific category is used to group code points that have This PRECIS-specific category is used to group code points that have
compatibility equivalents as explained in Chapter 2 and Chapter 3 of compatibility equivalents as explained in Chapter 2 and Chapter 3 of
the Unicode Standard [Unicode7.0]. the Unicode Standard [Unicode7.0].
Q: toNFKC(cp) != cp Q: toNFKC(cp) != cp
The toNFKC() operation returns the code point in normalization form The toNFKC() operation returns the code point in normalization form
KC. For more information, see Section 5 of Unicode Standard Annex KC. For more information, see Section 5 of Unicode Standard Annex
#15 [UAX15]. #15 [UAX15].
8.18. OtherLetterDigits (R) 9.18. OtherLetterDigits (R)
This PRECIS-specific category is used to group code points that are This PRECIS-specific category is used to group code points that are
letters and digits other than the "traditional" letters and digits letters and digits other than the "traditional" letters and digits
grouped under the LetterDigits (A) class (see Section 8.1). grouped under the LetterDigits (A) class (see Section 9.1).
R: General_Category(cp) is in {Lt, Nl, No, Me} R: General_Category(cp) is in {Lt, Nl, No, Me}
9. Guidelines for Designated Experts 10. Guidelines for Designated Experts
Experience with internationalization in application protocols has Experience with internationalization in application protocols has
shown that protocol designers usually do not understand the shown that protocol designers and application developers usually do
subtleties and tradeoffs involved with internationalization and that not understand the subtleties and tradeoffs involved with
they need considerable guidance in making reasonable decisions with internationalization and that they need considerable guidance in
regard to the options before them. Therefore, although the making reasonable decisions with regard to the options before them.
registration policy for PRECIS profiles is Expert Review and a stable
specification is not strictly required, the designated experts for
profile registration requests ought to encourage applicants to
provide a stable specification documenting the profile.
Internationalization can be difficult and contentious; the designated Therefore:
experts and applicants are strongly encouraged to work together in a
spirit of good faith and mutual understanding to achieve rough o Protocol designers are strongly encouraged to question the
consensus on progressing registrations through the process. They are assumption that they need to define new profiles, since existing
profiles are designed for wide re-use (see Section 5 for further
discussion).
o Those who persist in defining new profiles are strongly encouraged
to clearly explain a strong justification for doing so, and to
publish a stable specification that provides all of the
information described under Section 11.3.
o The designated experts for profile registration requests ought to
seek answers to all of the questions provided under Section 11.3
and to encourage applicants to provide a stable specification
documenting the profile (even though the registration policy for
PRECIS profiles is Expert Review and a stable specification is not
strictly required).
o Developers of applications that use PRECIS are strongly encouraged
to apply the guidelines provided under Section 6 and to seek out
the advice of the designated experts or other knowledgeable
individuals in doing so.
o All parties are strongly encouraged to help prevent the
multiplication of profiles beyond necessity, as described under
Section 5.1, and to use PRECIS in ways that will minimize user
confusion and insecure application behavior.
Internationalization can be difficult and contentious; designated
experts, profile registrants, and application developers are strongly
encouraged to work together in a spirit of good faith and mutual
understanding to achieve rough consensus on profile registration
requests and the use of PRECIS in particular applications. They are
also encouraged to bring additional expertise into the discussion if also encouraged to bring additional expertise into the discussion if
that would be helpful in adding perspective or otherwise resolving that would be helpful in adding perspective or otherwise resolving
issues. issues.
Registrants and designated experts alike are strongly encouraged to 11. IANA Considerations
help prevent the multiplication of profiles beyond necessity, as
described under Section 5.1.
Further considerations for profile registrants and designated experts
can be found under Section 10.3.
10. IANA Considerations
10.1. PRECIS Derived Property Value Registry 11.1. PRECIS Derived Property Value Registry
IANA is requested to create a PRECIS-specific registry with the IANA is requested to create a PRECIS-specific registry with the
Derived Properties for the versions of Unicode that are released Derived Properties for the versions of Unicode that are released
after (and including) version 7.0. The derived property value is to after (and including) version 7.0. The derived property value is to
be calculated in cooperation with a designated expert [RFC5226] be calculated in cooperation with a designated expert [RFC5226]
according to the rules specified under Section 7 and Section 8. according to the rules specified under Section 8 and Section 9.
The IESG is to be notified if backward-incompatible changes to the The IESG is to be notified if backward-incompatible changes to the
table of derived properties are discovered or if other problems arise table of derived properties are discovered or if other problems arise
during the process of creating the table of derived property values during the process of creating the table of derived property values
or during expert review. Changes to the rules defined under or during expert review. Changes to the rules defined under
Section 7 and Section 8 require IETF Review. Section 8 and Section 9 require IETF Review.
10.2. PRECIS Base Classes Registry 11.2. PRECIS Base Classes Registry
IANA is requested to create a registry of PRECIS string classes. In IANA is requested to create a registry of PRECIS string classes. In
accordance with [RFC5226], the registration policy is "RFC Required". accordance with [RFC5226], the registration policy is "RFC Required".
The registration template is as follows: The registration template is as follows:
Base Class: [the name of the PRECIS string class] Base Class: [the name of the PRECIS string class]
Description: [a brief description of the PRECIS string class and its Description: [a brief description of the PRECIS string class and its
intended use, e.g., "A sequence of letters, numbers, and symbols intended use, e.g., "A sequence of letters, numbers, and symbols
skipping to change at page 25, line 23 skipping to change at page 26, line 29
[Note to RFC Editor: please change "this document" [Note to RFC Editor: please change "this document"
to the RFC number issued for this specification.] to the RFC number issued for this specification.]
Base Class: IdentifierClass. Base Class: IdentifierClass.
Description: A sequence of letters, numbers, and symbols that is Description: A sequence of letters, numbers, and symbols that is
used to identify or address a network entity. used to identify or address a network entity.
Specification: Section 4.2 of this document. Specification: Section 4.2 of this document.
[Note to RFC Editor: please change "this document" [Note to RFC Editor: please change "this document"
to the RFC number issued for this specification.] to the RFC number issued for this specification.]
10.3. PRECIS Profiles Registry 11.3. PRECIS Profiles Registry
IANA is requested to create a registry of profiles that use the IANA is requested to create a registry of profiles that use the
PRECIS string classes. In accordance with [RFC5226], the PRECIS string classes. In accordance with [RFC5226], the
registration policy is "Expert Review". This policy was chosen in registration policy is "Expert Review". This policy was chosen in
order to ease the burden of registration while ensuring that order to ease the burden of registration while ensuring that
"customers" of PRECIS receive appropriate guidance regarding the "customers" of PRECIS receive appropriate guidance regarding the
sometimes complex and subtle internationalization issues related to sometimes complex and subtle internationalization issues related to
profiles of PRECIS string classes. profiles of PRECIS string classes.
The registration template is as follows: The registration template is as follows:
Name: [the name of the profile] Name: [the name of the profile]
Base Class: [which PRECIS string class is being profiled]
Applicability: [the specific protocol elements to which this profile Applicability: [the specific protocol elements to which this profile
applies, e.g., "Localparts in XMPP addresses."] applies, e.g., "Localparts in XMPP addresses."]
Base Class: [which PRECIS string class is being profiled]
Replaces: [the Stringprep profile that this PRECIS profile replaces, Replaces: [the Stringprep profile that this PRECIS profile replaces,
if any] if any]
Width Mapping Rule: [the behavioral rule for handling of width, Width Mapping Rule: [the behavioral rule for handling of width,
e.g., "Map fullwidth and halfwidth characters to their e.g., "Map fullwidth and halfwidth characters to their
compatibility variants."] compatibility variants."]
Additional Mapping Rule: [any additional mappings are required or Additional Mapping Rule: [any additional mappings are required or
recommended, e.g., "Map non-ASCII space characters to ASCII recommended, e.g., "Map non-ASCII space characters to ASCII
space."] space."]
skipping to change at page 27, line 6 skipping to change at page 28, line 13
intended use? intended use?
o Does the profile explain which entities enforce the rules, and o Does the profile explain which entities enforce the rules, and
when such enforcement occurs during protocol operations? when such enforcement occurs during protocol operations?
o Does the profile reduce the degree to which human users could be o Does the profile reduce the degree to which human users could be
surprised or confused by application behavior (the "Principle of surprised or confused by application behavior (the "Principle of
Least Astonishment")? Least Astonishment")?
o Does the profile introduce any new security concerns such as those o Does the profile introduce any new security concerns such as those
described under Section 11 of this document (e.g., false positives described under Section 12 of this document (e.g., false positives
for authentication or authorization)? for authentication or authorization)?
11. Security Considerations 12. Security Considerations
11.1. General Issues 12.1. General Issues
If input strings that appear "the same" to users are programmatically If input strings that appear "the same" to users are programmatically
considered to be distinct in different systems, or if input strings considered to be distinct in different systems, or if input strings
that appear distinct to users are programmatically considered to be that appear distinct to users are programmatically considered to be
"the same" in different systems, then users can be confused. Such "the same" in different systems, then users can be confused. Such
confusion can have security implications, such as the false positives confusion can have security implications, such as the false positives
and false negatieves discussed in [RFC6943]. One starting goal of and false negatieves discussed in [RFC6943]. One starting goal of
work on the PRECIS framework was to limit the number of times that work on the PRECIS framework was to limit the number of times that
users are confused (consistent with the "Principle of Least users are confused (consistent with the "Principle of Least
Astonishment"). Unfortunately, this goal has been difficult to Astonishment"). Unfortunately, this goal has been difficult to
skipping to change at page 27, line 45 skipping to change at page 29, line 5
string is connected to the wrong account or online resource based on string is connected to the wrong account or online resource based on
different interpretations of the string (again, see [RFC6943]). different interpretations of the string (again, see [RFC6943]).
Specifications of application protocols that use this framework are Specifications of application protocols that use this framework are
strongly encouraged to describe how internationalized strings are strongly encouraged to describe how internationalized strings are
used in the protocol, including the security implications of any used in the protocol, including the security implications of any
false positives and false negatives that might result from various false positives and false negatives that might result from various
enforcement and comparison operations. For some helpful guidelines, enforcement and comparison operations. For some helpful guidelines,
refer to [RFC6943], [RFC5890], [UTR36], and [UTS39]. refer to [RFC6943], [RFC5890], [UTR36], and [UTS39].
11.2. Use of the IdentifierClass 12.2. Use of the IdentifierClass
Strings that conform to the IdentifierClass and any profile thereof Strings that conform to the IdentifierClass and any profile thereof
are intended to be relatively safe for use in a broad range of are intended to be relatively safe for use in a broad range of
applications, primarily because they include only letters, digits, applications, primarily because they include only letters, digits,
and "grandfathered" non-space characters from the ASCII range; thus and "grandfathered" non-space characters from the ASCII range; thus
they exclude spaces, characters with compatibility equivalents, and they exclude spaces, characters with compatibility equivalents, and
almost all symbols and punctuation marks. However, because such almost all symbols and punctuation marks. However, because such
strings can still include so-called confusable characters (see strings can still include so-called confusable characters (see
Section 11.5), protocol designers and implementers are encouraged to Section 12.5), protocol designers and implementers are encouraged to
pay close attention to the security considerations described pay close attention to the security considerations described
elsewhere in this document. elsewhere in this document.
11.3. Use of the FreeformClass 12.3. Use of the FreeformClass
Strings that conform to the FreeformClass and many profiles thereof Strings that conform to the FreeformClass and many profiles thereof
can include virtually any Unicode character. This makes the can include virtually any Unicode character. This makes the
FreeformClass quite expressive, but also problematic from the FreeformClass quite expressive, but also problematic from the
perspective of possible user confusion. Protocol designers are perspective of possible user confusion. Protocol designers are
hereby warned that the FreeformClass contains codepoints they might hereby warned that the FreeformClass contains codepoints they might
not understand, and are encouraged to profile the IdentifierClass not understand, and are encouraged to profile the IdentifierClass
wherever feasible; however, if an application protocol requires more wherever feasible; however, if an application protocol requires more
code points than are allowed by the IdentifierClass, protocol code points than are allowed by the IdentifierClass, protocol
designers are encouraged to define a profile of the FreeformClass designers are encouraged to define a profile of the FreeformClass
that restricts the allowable code points as tightly as possible. that restricts the allowable code points as tightly as possible.
(The PRECIS Working Group considered the option of allowing (The PRECIS Working Group considered the option of allowing
"superclasses" as well as profiles of PRECIS string classes, but "superclasses" as well as profiles of PRECIS string classes, but
decided against allowing superclasses to reduce the likelihood of decided against allowing superclasses to reduce the likelihood of
security and interoperability problems.) security and interoperability problems.)
11.4. Local Character Set Issues 12.4. Local Character Set Issues
When systems use local character sets other than ASCII and Unicode, When systems use local character sets other than ASCII and Unicode,
this specification leaves the problem of converting between the local this specification leaves the problem of converting between the local
character set and Unicode up to the application or local system. If character set and Unicode up to the application or local system. If
different applications (or different versions of one application) different applications (or different versions of one application)
implement different rules for conversions among coded character sets, implement different rules for conversions among coded character sets,
they could interpret the same name differently and contact different they could interpret the same name differently and contact different
application servers or other network entities. This problem is not application servers or other network entities. This problem is not
solved by security protocols, such as Transport Layer Security (TLS) solved by security protocols, such as Transport Layer Security (TLS)
[RFC5246] and the Simple Authentication and Security Layer (SASL) [RFC5246] and the Simple Authentication and Security Layer (SASL)
[RFC4422], that do not take local character sets into account. [RFC4422], that do not take local character sets into account.
11.5. Visually Similar Characters 12.5. Visually Similar Characters
Some characters are visually similar and thus can cause confusion Some characters are visually similar and thus can cause confusion
among humans. Such characters are often called "confusable among humans. Such characters are often called "confusable
characters" or "confusables". characters" or "confusables".
The problem of confusable characters is not necessarily caused by the The problem of confusable characters is not necessarily caused by the
use of Unicode code points outside the ASCII range. For example, in use of Unicode code points outside the ASCII range. For example, in
some presentations and to some individuals the string "ju1iet" some presentations and to some individuals the string "ju1iet"
(spelled with DIGIT ONE, U+0031, as the third character) might appear (spelled with DIGIT ONE, U+0031, as the third character) might appear
to be the same as "juliet" (spelled with LATIN SMALL LETTER L, to be the same as "juliet" (spelled with LATIN SMALL LETTER L,
skipping to change at page 30, line 35 skipping to change at page 31, line 44
existence of such communities and encourages due caution when existence of such communities and encourages due caution when
presenting unfamiliar scripts or characters to human users.) presenting unfamiliar scripts or characters to human users.)
The challenges inherent in supporting the full range of Unicode code The challenges inherent in supporting the full range of Unicode code
points have in the past led some to hope for a way to points have in the past led some to hope for a way to
programmatically negotiate more restrictive ranges based on locale, programmatically negotiate more restrictive ranges based on locale,
script, or other relevant factors, to tag the locale associated with script, or other relevant factors, to tag the locale associated with
a particular string, etc. As a general-purpose internationalization a particular string, etc. As a general-purpose internationalization
technology, the PRECIS framework does not include such mechanisms. technology, the PRECIS framework does not include such mechanisms.
11.6. Security of Passwords 12.6. Security of Passwords
Two goals of passwords are to maximize the amount of entropy and to Two goals of passwords are to maximize the amount of entropy and to
minimize the potential for false positives. These goals can be minimize the potential for false positives. These goals can be
achieved in part by allowing a wide range of code points and by achieved in part by allowing a wide range of code points and by
ensuring that passwords are handled in such a way that code points ensuring that passwords are handled in such a way that code points
are not compared aggressively. Therefore, it is NOT RECOMMENDED for are not compared aggressively. Therefore, it is NOT RECOMMENDED for
application protocols to profile the FreeformClass for use in application protocols to profile the FreeformClass for use in
passwords in a way that removes entire categories (e.g., by passwords in a way that removes entire categories (e.g., by
disallowing symbols or punctuation). Furthermore, it is NOT disallowing symbols or punctuation). Furthermore, it is NOT
RECOMMENDED for application protocols to map uppercase and titlecase RECOMMENDED for application protocols to map uppercase and titlecase
skipping to change at page 31, line 27 skipping to change at page 32, line 36
is used. is used.
In protocols that provide passwords as input to a cryptographic In protocols that provide passwords as input to a cryptographic
algorithm such as a hash function, the client will need to perform algorithm such as a hash function, the client will need to perform
proper preparation of the password before applying the algorithm, proper preparation of the password before applying the algorithm,
since the password is not available to the server in plaintext form. since the password is not available to the server in plaintext form.
Further discussion of password handling can be found in Further discussion of password handling can be found in
[I-D.ietf-precis-saslprepbis]. [I-D.ietf-precis-saslprepbis].
12. Interoperability Considerations 13. Interoperability Considerations
Although strings that are consumed in PRECIS-based application Although strings that are consumed in PRECIS-based application
protocols are often encoded using UTF-8 [RFC3629], the exact encoding protocols are often encoded using UTF-8 [RFC3629], the exact encoding
is a matter for the application protocol that uses PRECIS, not for is a matter for the application protocol that uses PRECIS, not for
the PRECIS framework. the PRECIS framework.
It is known that some existing systems are unable to support the full It is known that some existing systems are unable to support the full
Unicode character set, or even any characters outside the ASCII Unicode character set, or even any characters outside the ASCII
range. If two (or more) applications need to interoperate when range. If two (or more) applications need to interoperate when
exchanging data (e.g., for the purpose of authenticating a username exchanging data (e.g., for the purpose of authenticating a username
skipping to change at page 32, line 7 skipping to change at page 33, line 16
Unicode Standard is modified from time to time. For example, three Unicode Standard is modified from time to time. For example, three
code points underwent changes in their GeneralCategory between code points underwent changes in their GeneralCategory between
Unicode 5.2 (current at the time IDNA2008 was originally published) Unicode 5.2 (current at the time IDNA2008 was originally published)
and Unicode 6.0, as described in [RFC6452]. Implementers might need and Unicode 6.0, as described in [RFC6452]. Implementers might need
to be aware that the treatment of these characters differs depending to be aware that the treatment of these characters differs depending
on which version of Unicode is available on the system that is using on which version of Unicode is available on the system that is using
IDNA2008 or PRECIS. Other such differences might arise between the IDNA2008 or PRECIS. Other such differences might arise between the
version of Unicode current at the time of this writing (7.0) and version of Unicode current at the time of this writing (7.0) and
future versions. future versions.
13. References 14. References
13.1. Normative References 14.1. Normative References
[RFC20] Cerf, V., "ASCII format for network interchange", RFC 20, [RFC20] Cerf, V., "ASCII format for network interchange", RFC 20,
October 1969. October 1969.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network
Interchange", RFC 5198, March 2008. Interchange", RFC 5198, March 2008.
[Unicode7.0] [Unicode7.0]
The Unicode Consortium, "The Unicode Standard, Version The Unicode Consortium, "The Unicode Standard, Version
7.0.0", 2014, 7.0.0", 2014,
<http://www.unicode.org/versions/Unicode7.0.0/>. <http://www.unicode.org/versions/Unicode7.0.0/>.
13.2. Informative References 14.2. Informative References
[I-D.ietf-precis-mappings] [I-D.ietf-precis-mappings]
Yoneya, Y. and T. NEMOTO, "Mapping characters for PRECIS Yoneya, Y. and T. NEMOTO, "Mapping characters for PRECIS
classes", draft-ietf-precis-mappings-08 (work in classes", draft-ietf-precis-mappings-08 (work in
progress), June 2014. progress), June 2014.
[I-D.ietf-precis-nickname] [I-D.ietf-precis-nickname]
Saint-Andre, P., "Preparation and Comparison of Saint-Andre, P., "Preparation and Comparison of
Nicknames", draft-ietf-precis-nickname-12 (work in Nicknames", draft-ietf-precis-nickname-13 (work in
progress), November 2014. progress), November 2014.
[I-D.ietf-precis-saslprepbis] [I-D.ietf-precis-saslprepbis]
Saint-Andre, P. and A. Melnikov, "Username and Password Saint-Andre, P. and A. Melnikov, "Username and Password
Preparation Algorithms", draft-ietf-precis-saslprepbis-10 Preparation Algorithms", draft-ietf-precis-saslprepbis-12
(work in progress), November 2014. (work in progress), December 2014.
[I-D.ietf-xmpp-6122bis] [I-D.ietf-xmpp-6122bis]
Saint-Andre, P., "Extensible Messaging and Presence Saint-Andre, P., "Extensible Messaging and Presence
Protocol (XMPP): Address Format", draft-ietf-xmpp- Protocol (XMPP): Address Format", draft-ietf-xmpp-
6122bis-16 (work in progress), November 2014. 6122bis-18 (work in progress), December 2014.
[RFC2865] Rigney, C., Willens, S., Rubens, A., and W. Simpson, [RFC2865] Rigney, C., Willens, S., Rubens, A., and W. Simpson,
"Remote Authentication Dial In User Service (RADIUS)", RFC "Remote Authentication Dial In User Service (RADIUS)", RFC
2865, June 2000. 2865, June 2000.
[RFC3454] Hoffman, P. and M. Blanchet, "Preparation of [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
Internationalized Strings ("stringprep")", RFC 3454, Internationalized Strings ("stringprep")", RFC 3454,
December 2002. December 2002.
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
skipping to change at page 35, line 9 skipping to change at page 36, line 17
2014-present, <http://www.unicode.org/versions/latest/>. 2014-present, <http://www.unicode.org/versions/latest/>.
[UTR36] The Unicode Consortium, "Unicode Technical Report #36: [UTR36] The Unicode Consortium, "Unicode Technical Report #36:
Unicode Security Considerations", July 2012, Unicode Security Considerations", July 2012,
<http://unicode.org/reports/tr36/>. <http://unicode.org/reports/tr36/>.
[UTS39] The Unicode Consortium, "Unicode Technical Standard #39: [UTS39] The Unicode Consortium, "Unicode Technical Standard #39:
Unicode Security Mechanisms", July 2012, Unicode Security Mechanisms", July 2012,
<http://unicode.org/reports/tr39/>. <http://unicode.org/reports/tr39/>.
13.3. URIs 14.3. URIs
[1] http://unicode.org/Public/UNIDATA/PropertyAliases.txt [1] http://unicode.org/Public/UNIDATA/PropertyAliases.txt
[2] http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt [2] http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt
Appendix A. Acknowledgements Appendix A. Acknowledgements
The authors would like to acknowledge the comments and contributions The authors would like to acknowledge the comments and contributions
of the following individuals during working group discussion: David of the following individuals during working group discussion: David
Black, Edward Burns, Dan Chiba, Mark Davis, Alan DeKok, Martin Black, Edward Burns, Dan Chiba, Mark Davis, Alan DeKok, Martin
Duerst, Patrik Faltstrom, Ted Hardie, Joe Hildebrand, Bjoern Duerst, Patrik Faltstrom, Ted Hardie, Joe Hildebrand, Bjoern
Hoehrmann, Paul Hoffman, Jeffrey Hutzelman, Simon Josefsson, John Hoehrmann, Paul Hoffman, Jeffrey Hutzelman, Simon Josefsson, John
Klensin, Alexey Melnikov, Takahiro Nemoto, Yoav Nir, Mike Parker, Klensin, Alexey Melnikov, Takahiro Nemoto, Yoav Nir, Mike Parker,
Pete Resnick, Andrew Sullivan, Dave Thaler, Yoshiro Yoneya, and Pete Resnick, Andrew Sullivan, Dave Thaler, Yoshiro Yoneya, and
Florian Zeitz. Florian Zeitz.
Special thanks are due to John Klensin and Patrik Faltstrom for their
challenging feedback and detailed reviews.
Charlie Kaufman, Tom Taylor, and Tim Wicinski reviewed the document Charlie Kaufman, Tom Taylor, and Tim Wicinski reviewed the document
on behalf of the Security Directorate, the General Area Review Team, on behalf of the Security Directorate, the General Area Review Team,
and the Operations and Management Directorate, respectively. and the Operations and Management Directorate, respectively.
During IESG review, Alissa Cooper, Stephen Farrell, and Barry Leiba During IESG review, Alissa Cooper, Stephen Farrell, and Barry Leiba
provided comments that led to further improvements. provided comments that led to further improvements.
Some algorithms and textual descriptions have been borrowed from Some algorithms and textual descriptions have been borrowed from
[RFC5892]. Some text regarding security has been borrowed from [RFC5892]. Some text regarding security has been borrowed from
[RFC5890], [I-D.ietf-precis-saslprepbis], and [RFC5890], [I-D.ietf-precis-saslprepbis], and
 End of changes. 111 change blocks. 
260 lines changed or deleted 318 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/