draft-ietf-precis-problem-statement-01.txt   draft-ietf-precis-problem-statement-02.txt 
Network Working Group M. Blanchet Network Working Group M. Blanchet
Internet-Draft Viagenie Internet-Draft Viagenie
Intended status: Informational A. Sullivan Intended status: Informational A. Sullivan
Expires: June 12, 2011 December 9, 2010 Expires: October 2, 2011 March 31, 2011
Stringprep Revision Problem Statement Stringprep Revision Problem Statement
draft-ietf-precis-problem-statement-01.txt draft-ietf-precis-problem-statement-02.txt
Abstract Abstract
Using Unicode codepoints in protocol strings that expect comparison Using Unicode codepoints in protocol strings that expect comparison
with other strings requires preparation of the string that contains with other strings requires preparation of the string that contains
the Unicode codepoints. Internationalizing Domain Names in the Unicode codepoints. Internationalizing Domain Names in
Applications (IDNA2003) defined and used Stringprep and Nameprep. Applications (IDNA2003) defined and used Stringprep and Nameprep.
Other protocols subsequently defined Stringprep profiles. A new Other protocols subsequently defined Stringprep profiles. A new
approach different from Stringprep and Nameprep is used for a approach different from Stringprep and Nameprep is used for a
revision of IDNA2003 (called IDNA2008). Other Stringprep profiles revision of IDNA2003 (called IDNA2008). Other Stringprep profiles
skipping to change at page 1, line 39 skipping to change at page 1, line 39
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on June 12, 2011. This Internet-Draft will expire on October 2, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at page 3, line 20 skipping to change at page 3, line 20
3.1. Comparison . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1. Comparison . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1. Comparison methods . . . . . . . . . . . . . . . . . . 6 3.1.1. Comparison methods . . . . . . . . . . . . . . . . . . 6
3.1.2. Effect of comparison . . . . . . . . . . . . . . . . . 7 3.1.2. Effect of comparison . . . . . . . . . . . . . . . . . 7
3.2. Dealing with characters . . . . . . . . . . . . . . . . . 7 3.2. Dealing with characters . . . . . . . . . . . . . . . . . 7
3.2.1. Case folding, case sensitivity, and case 3.2.1. Case folding, case sensitivity, and case
preservation . . . . . . . . . . . . . . . . . . . . . 7 preservation . . . . . . . . . . . . . . . . . . . . . 7
3.2.2. Stringprep and NFKC . . . . . . . . . . . . . . . . . 7 3.2.2. Stringprep and NFKC . . . . . . . . . . . . . . . . . 7
3.2.3. Character mapping . . . . . . . . . . . . . . . . . . 8 3.2.3. Character mapping . . . . . . . . . . . . . . . . . . 8
3.2.4. Prohibited characters . . . . . . . . . . . . . . . . 8 3.2.4. Prohibited characters . . . . . . . . . . . . . . . . 8
3.2.5. Internal structure, delimiters, and special 3.2.5. Internal structure, delimiters, and special
characters . . . . . . . . . . . . . . . . . . . . . . 8 characters . . . . . . . . . . . . . . . . . . . . . . 9
3.3. Where the data comes from and where it goes . . . . . . . 9 3.3. Where the data comes from and where it goes . . . . . . . 9
3.3.1. User input and the source of protocol elements . . . . 9 3.3.1. User input and the source of protocol elements . . . . 9
3.3.2. User output . . . . . . . . . . . . . . . . . . . . . 9 3.3.2. User output . . . . . . . . . . . . . . . . . . . . . 10
3.3.3. Operations . . . . . . . . . . . . . . . . . . . . . . 10 3.3.3. Operations . . . . . . . . . . . . . . . . . . . . . . 10
4. Considerations for Stringprep replacement . . . . . . . . . . 10 4. Considerations for Stringprep replacement . . . . . . . . . . 11
5. Security Considerations . . . . . . . . . . . . . . . . . . . 11 5. Security Considerations . . . . . . . . . . . . . . . . . . . 12
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12
7. Discussion home for this draft . . . . . . . . . . . . . . . . 11 7. Discussion home for this draft . . . . . . . . . . . . . . . . 12
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 12
9. Informative References . . . . . . . . . . . . . . . . . . . . 11 9. Informative References . . . . . . . . . . . . . . . . . . . . 12
Appendix A. Protocols known to be using Stringprep . . . . . . . 15 Appendix A. Protocols known to be using Stringprep . . . . . . . 15
Appendix B. Changes between versions . . . . . . . . . . . . . . 15 Appendix B. Changes between versions . . . . . . . . . . . . . . 16
B.1. 00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 B.1. 00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
B.2. 01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 B.2. 01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 16
1. Introduction 1. Introduction
Internationalizing Domain Names in Applications (IDNA2003) [RFC3490], Internationalizing Domain Names in Applications (IDNA2003) [RFC3490],
[RFC3491], [RFC3492], [RFC3454] described a mechanism for encoding [RFC3491], [RFC3492], [RFC3454] described a mechanism for encoding
Unicode labels making up Internationalized Domain Names (IDNs) as Unicode labels making up Internationalized Domain Names (IDNs) as
standard DNS labels. The labels were processed using a method called standard DNS labels. The labels were processed using a method called
Nameprep [RFC3491] and Punycode [RFC3492]. That method was specific Nameprep [RFC3491] and Punycode [RFC3492]. That method was specific
to IDNA2003, but is generalized as Stringprep [RFC3454]. The general to IDNA2003, but is generalized as Stringprep [RFC3454]. The general
mechanism can be used to help other protocols with similar needs, but mechanism can be used to help other protocols with similar needs, but
skipping to change at page 5, line 26 skipping to change at page 5, line 26
This document lists the shortcomings and issues found by protocols This document lists the shortcomings and issues found by protocols
listed above that defined Stringprep profiles. It also lists some listed above that defined Stringprep profiles. It also lists some
early conclusions and requirements for a potential replacement of early conclusions and requirements for a potential replacement of
Stringprep. Stringprep.
2. Issues raised during newprep BOF 2. Issues raised during newprep BOF
During IETF 77, a BOF discussed the current state of the protocols During IETF 77, a BOF discussed the current state of the protocols
that have defined Stringprep profiles [NEWPREP]. The main that have defined Stringprep profiles [NEWPREP]. The main
conclusions are: conclusions from that discussion were as follows:
o Stringprep is bound to a specific version of Unicode: 3.2. o Stringprep is bound to a specific version of Unicode: 3.2.
Stringprep has not been updated to new versions of Unicode. Stringprep has not been updated to new versions of Unicode.
Therefore, the protocols using Stringprep are stuck to Unicode Therefore, the protocols using Stringprep are stuck to Unicode
3.2. 3.2.
o The protocols need to be updated to support new versions of o The protocols need to be updated to support new versions of
Unicode. The protocols would like to not be bound to a specific Unicode. The protocols would like to not be bound to a specific
version of Unicode, but rather have better Unicode agility in the version of Unicode, but rather have better Unicode agility in the
way of IDNA2008. This is important partly because it is usually way of IDNA2008. This is important partly because it is usually
impossible for an application to require Unicode 3.2; the impossible for an application to require Unicode 3.2; the
application gets whatever version of Unicode is available on the application gets whatever version of Unicode is available on the
skipping to change at page 6, line 36 skipping to change at page 6, line 36
categories under which known Stringprep-using protocol RFCs have been categories under which known Stringprep-using protocol RFCs have been
evaluated. For the details of those evaluations, see Appendix A. evaluated. For the details of those evaluations, see Appendix A.
3.1. Comparison 3.1. Comparison
3.1.1. Comparison methods 3.1.1. Comparison methods
Identifiers can be conveniently organized into three classes or Identifiers can be conveniently organized into three classes or
"buckets": "buckets":
1. Identifiers that must compare equally byte for byte. [[anchor4: 1. Identifiers that must compare equally byte for byte.
In the Jabber discussion from Beijing, Ted Hardie asked that (1)
actually be "codepoint for codepoint". Is that what's intended?
It seems to me that that is already a case of (2), because
there's an algorithm to compare (say) my UTF-16 and your UCS-4.
Discussion would be helpful, please. --ajs@crankycanuck.ca]]
2. Identifiers that do not compare equally byte for byte, but that 2. Identifiers that do not compare equally byte for byte, but that
can always be compared for equality based on an algorithm can always be compared for equality based on an algorithm
everyone can agree on. everyone can agree on. (This includes cases like comparison of
Unicode codepoints that are in different encodings: two different
encodings do not match byte for byte, but can all be recoded to a
single encoding which then does match bye for byte.)
3. Identifiers for which there is no single comparison algorithm on 3. Identifiers for which there is no single comparison algorithm on
which everyone can agree. (For instance, there may be locale- which everyone can agree. (For instance, there may be locale-
sensitive comparison rules for identifiers.) sensitive comparison rules for identifiers.)
A subclass of case (3) is one in which, within some constrained A subclass of case (3) is one in which, within some constrained
population, the comparison rules are clear even though such rules are population, the comparison rules are clear even though such rules are
not universally applicable. So, for instance, users of US-ASCII may not universally applicable. So, for instance, users of US-ASCII may
all agree on a comparison function, but the set of US-ASCII users and all agree on a comparison function, but the set of US-ASCII users and
Turkish users may not all agree about the same comparison function. Turkish users may not all agree about the same comparison function.
For the purposes of the precis work, it is not plain whether this For the purposes of the present work, it is not plain whether this
subclass case is relevant, so categorization will include it. subclass case is relevant, so categorization will include it.
In the section treating the existing known cases, Appendix A, these In the section treating the existing known cases, Appendix A, these
"buckets" will be called Type 1, Type 2, Type 3, and Type 3a. "buckets" will be called Type 1, Type 2, Type 3, and Type 3a.
3.1.2. Effect of comparison 3.1.2. Effect of comparison
The comparisons outlined in Section 3.1.1 may have different effects The comparisons outlined in Section 3.1.1 may have different effects
when applied. It is necessary to evaluate the effects if a when applied. It is necessary to evaluate the effects if a
comparison results in a false positive, and what the effects are if a comparison results in a false positive, and what the effects are if a
comparison results in a false negative. It is particularly important comparison results in a false negative, especially in terms of the
to evaluate the effects on security of these answers. consequences to security and usability.
3.2. Dealing with characters 3.2. Dealing with characters
This section outlines a range of issues having to do with characters
in the target protocols, and spends some effort to outline the ways
in which IDNA2008 might be a good analogy to other protocols, and
ways in which it might be a poor one.
3.2.1. Case folding, case sensitivity, and case preservation 3.2.1. Case folding, case sensitivity, and case preservation
In IDNA2003, labels are always mapped to lower case before the In IDNA2003, labels are always mapped to lower case before the
Punycode transformation. In IDNA2003, there is no mapping at all: Punycode transformation. In IDNA2008, there is no mapping at all:
input is either a valid U-label or it is not. At the same time, input is either a valid U-label or it is not. At the same time,
upper-case characters are by definition not valid U-labels, because upper-case characters are by definition not valid U-labels, because
they fall into the Unstable category (category B) of [RFC5892]. they fall into the Unstable category (category B) of [RFC5892].
If there are protocols that require upper and lower cases be If there are protocols that require upper and lower cases be
preserved, then the analogy with IDNA2008 will break down. preserved, then the analogy with IDNA2008 will break down.
Accordingly, existing protocols are to be evaluated according to the Accordingly, existing protocols are to be evaluated according to the
following criteria: following criteria:
1. Does the protocol use case folding? For all blocks of code 1. Does the protocol use case folding? For all blocks of code
skipping to change at page 8, line 44 skipping to change at page 8, line 47
point actually is allowed in the protocol. point actually is allowed in the protocol.
Moreover, there is more than one class of "allowed in the protocol". Moreover, there is more than one class of "allowed in the protocol".
While some code points are disallowed outright, some are allowed only While some code points are disallowed outright, some are allowed only
in certain contexts. The reasons for the context-dependent rules in certain contexts. The reasons for the context-dependent rules
have to do with the way some characters are used. For instance, the have to do with the way some characters are used. For instance, the
ZERO WIDTH JOINER and ZERO WIDTH NON-JOINER (ZWJ, U+200D and ZWNJ, ZERO WIDTH JOINER and ZERO WIDTH NON-JOINER (ZWJ, U+200D and ZWNJ,
U+200C) are allowed with contextual rules because they are required U+200C) are allowed with contextual rules because they are required
in some circumstances, yet are considered punctuation by Unicode and in some circumstances, yet are considered punctuation by Unicode and
would therefore be DISALLOWED under the usual IDNA2008 derivation would therefore be DISALLOWED under the usual IDNA2008 derivation
rules. rules. The goal is to provide the widest possible repetoire of code
points possible and consistent with the traditional DNS, trusting to
the operators of individual zones to make sensible (and usually more
restrictive) policies for their zones.
IDNA2008 may be a poor model for what other protocols ought to do in
this case, because it is designed to support an old protocol that is
designed to operate on the scale of the entire Internet. Moreover,
IDNA2008 is intended to be deployed without any change to the base
DNS protocol. Other protocols may aim at deployment in more local
environments, or may have protocol version negotiation built in.
3.2.5. Internal structure, delimiters, and special characters 3.2.5. Internal structure, delimiters, and special characters
IDNA2008 has a special problem with delimiters, because the delimiter IDNA2008 has a special problem with delimiters, because the delimiter
"character" in the DNS wire format is not really part of the data. "character" in the DNS wire format is not really part of the data.
In DNS, the wire format indicates label length. When the label is In DNS, labels are not separated exactly; instead, a label carries
presented in presentation format as part of a fully qualified domain with it an indicator that says how long the label is. When the label
name, the label separator FULL STOP, U+002E (.) is used. But because is presented in presentation format as part of a fully qualified
that label separator does not travel with the wire format of the domain name, the label separator FULL STOP, U+002E (.) is used to
domain name, there is no way to encode a different, break up the labels. But because that label separator does not
"internationalized" separator in IDNA2008. travel with the wire format of the domain name, there is no way to
encode a different, "internationalized" separator in IDNA2008.
Other protocols may include characters with similar special meaning Other protocols may include characters with similar special meaning
within the protocol. Common characters for these purposes include within the protocol. Common characters for these purposes include
FULL STOP, U+002E (.); COMMERCIAL AT, U+0040 (@); HYPHEN-MINUS, FULL STOP, U+002E (.); COMMERCIAL AT, U+0040 (@); HYPHEN-MINUS,
U+002D (-); SOLIDUS, U+002F (/); and LOW LINE, U+005F (_). The mere U+002D (-); SOLIDUS, U+002F (/); and LOW LINE, U+005F (_). The mere
inclusion of such a character in the protocol is not enough for it to inclusion of such a character in the protocol is not enough for it to
be considered similar to another protocol using the same character; be considered similar to another protocol using the same character;
instead, handling of the character must be taken into consideration instead, handling of the character must be taken into consideration
as well. as well.
An important issue to tackle here is whether it is valuable to map to An important issue to tackle here is whether it is valuable to map to
or from these special characters as part of the Stringprep or from these special characters as part of the Stringprep
replacement. In some locales, the analogue to FULL STOP, U+002E is replacement. In some locales, the analogue to FULL STOP, U+002E is
some other character, and users may expect to be able to substitute some other character, and users may expect to be able to substitute
their normal stop for FULL STOP. their normal stop for FULL STOP, U+002E. At the same time, there are
predcitability arguments in favour of treating names with FULL STOP,
U+002E in them just the way they are treated under IDNA2008.
3.3. Where the data comes from and where it goes 3.3. Where the data comes from and where it goes
3.3.1. User input and the source of protocol elements 3.3.1. User input and the source of protocol elements
Some protocol elements are provided by users, and others are not. Some protocol elements are provided by users, and others are not.
Those that are not may presumably be subject to greater restrictions, Those that are not may presumably be subject to greater restrictions,
whereas those that users provide likely need to permit the broadest whereas those that users provide likely need to permit the broadest
range of code points. The following questions are helpful: range of code points. The following questions are helpful:
skipping to change at page 10, line 33 skipping to change at page 11, line 5
3.3.3.2. Community considerations 3.3.3.2. Community considerations
A Stringprep replacement that does anything more than just update A Stringprep replacement that does anything more than just update
Stringprep to the latest version of Unicode will probably entail some Stringprep to the latest version of Unicode will probably entail some
changes. It is important to identify the willingness of the changes. It is important to identify the willingness of the
protocol-using community to accept backwards-incompatible changes. protocol-using community to accept backwards-incompatible changes.
By the same token, it is important to evaluate the desire of the By the same token, it is important to evaluate the desire of the
community for features not available under Stringprep. community for features not available under Stringprep.
3.3.3.3. What to do about Unicode changes
IDNA2008 uses an algorithm to derive the validity of a Unicode code
point for use under IDNA2008. It does this by using the properties
of each code point to test its validity.
This approach depends crucially on the idea that code points, once
valid for a protocol profile, will not later be made invalid. That
is not a guarantee currently provided by Unicode. Properties of code
points may change between versions of Unicode. Rarely, such a change
could cause a given code point to become invalid under a protocol
profile, even though the code point would be valid with an earlier
version of Unicode. This is not merely a theoretical possibility,
because it has occurred ([I-D.faltstrom-5892bis]).
Accordingly, a Stringprep replacement that intends to be Unicode
version agnostic will need to work out a mechansism to address cases
where incompatible changes occur because of new Unicode versions.
4. Considerations for Stringprep replacement 4. Considerations for Stringprep replacement
The above suggests the following direction for the working group: The above suggests the following direction for the working group:
o A stringprep replacement should be defined. o A stringprep replacement should be defined.
o The replacement should take an approach similar to IDNA2008, in o The replacement should take an approach similar to IDNA2008, in
that it enables Unicode agility. that it enables Unicode agility.
o Protocols share similar characteristics of strings. Therefore, o Protocols share similar characteristics of strings. Therefore,
defining i18n preparation algorithms for a (small) set of string defining i18n preparation algorithms for a (small) set of string
classes may be sufficient for most cases and provides the classes may be sufficient for most cases and provides the
coherence among a set of protocol friends. coherence among a set of protocol friends.
o The sets of string classes need to be evaluated according to the o The sets of string classes need to be evaluated according to the
considerations that make up the headings in Section 3 considerations that make up the headings in Section 3
o It is reasonable to limit scope to Unicode code points, and rule o It is reasonable to limit scope to Unicode code points, and rule
the mapping of data from other character encodings outside the the mapping of data from other character encodings outside the
scope of this effort. scope of this effort.
o Recommendations for handling protocol incompatibilities resulting
from changes to Unicode are required.
Existing deployments already depend on Stringprep profiles. Existing deployments already depend on Stringprep profiles.
Therefore, the working group will need to consider the effects of any Therefore, the working group will need to consider the effects of any
new strategy on existing deployments. By way of comparison, it is new strategy on existing deployments. By way of comparison, it is
worth noting that some characters were acceptable in IDNA labels worth noting that some characters were acceptable in IDNA labels
under IDNA2003, but are not protocol-valid under IDNA2008 (and under IDNA2003, but are not protocol-valid under IDNA2008 (and
conversely). Different implementers may make different decisions conversely). Different implementers may make different decisions
about what to do in such cases; this could have interoperability about what to do in such cases; this could have interoperability
effects. The working group will need to trade better support for effects. The working group will need to trade better support for
different linguistic environments against the potential side effects different linguistic environments against the potential side effects
skipping to change at page 11, line 42 skipping to change at page 12, line 35
with the text. with the text.
Specific contributions came from Alan DeKok, Alexey Melnikov, Peter Specific contributions came from Alan DeKok, Alexey Melnikov, Peter
Saint-Andre, Dave Thaler, and Yoshiro Yoneya. Saint-Andre, Dave Thaler, and Yoshiro Yoneya.
Dave Thaler provided the "buckets" insight in Section 3.1.1, central Dave Thaler provided the "buckets" insight in Section 3.1.1, central
to the organization of the problem. to the organization of the problem.
9. Informative References 9. Informative References
[I-D.faltstrom-5892bis]
Faltstrom, P. and P. Hoffman, "The Unicode code points and
IDNA - Unicode 6.0", draft-faltstrom-5892bis-04 (work in
progress), March 2011.
[NEWPREP] "Newprep BoF Meeting Minutes", March 2010. [NEWPREP] "Newprep BoF Meeting Minutes", March 2010.
[RFC3454] Hoffman, P. and M. Blanchet, "Preparation of [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
Internationalized Strings ("stringprep")", RFC 3454, Internationalized Strings ("stringprep")", RFC 3454,
December 2002. December 2002.
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
"Internationalizing Domain Names in Applications (IDNA)", "Internationalizing Domain Names in Applications (IDNA)",
RFC 3490, March 2003. RFC 3490, March 2003.
 End of changes. 21 change blocks. 
36 lines changed or deleted 78 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/