Internet Draft December 16, 1992 Expires 5/31/93 Transition of Internet Mail from Just-Send-8 to 8bit-SMTP/MIME Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. (The file 1id-abstracts.txt on nic.ddn.mil describes the current status of each Internet Draft.) It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "work in progress". Abstract Protocols for extending SMTP to pass 8bit characters have been defined. [3] [4] The messages transported by the extended SMTP are required to be encoded in MIME. [1] [2] Several SMTP implementations adopted an ad-hoc mechanism for sending 8bit data prior to these standards and with which the extended SMTP mail system must interoperate. This document outlines the problems in this environment and an approach to minimizing the cost of transition. 1. Terminology RFC 821 defines a 7bit transport. An implementation which does not clear this bit upon receipt of octet with the high order bit set and passes a message unaltered to the user is called 8bit transparent in this document. An implementation of the general SMTP Extensions document and the 8bit extensions protocols [3], [4] which passes MIME messages using all 8 bits of an octet is called 8bit ESMTP. An implementation of extended SMTP which does not accept 8bit characters is called 7bit ESMTP. Internet-Draft Expires 5/31/93 2. The Problem SMTP as defined in RFC 821 limits the sending of Internet Mail to US-ASCII [5] characters. At the Internet has grown to include non-English correspondents, the need to communicate with character sets other than US-ASCII has prompted many vendors and users to extend SMTP or RFC 822 to use non-ASCII character sets. The two common approaches are to send a 7bit character set over current RFC822/SMTP or to extend SMTP and RFC822 to use 8bit ISO 8859 character sets. So long as these implementations can directly communicate and have a private agreement on the use of a specific character set, without benefit of tagging, basic mail service can be provided. In the transition to the negotiated 8bit system with MIME messages, it is important that mail sent by a currently non-conforming user can be read by another such user. This functionality is reduced by conversion from 8bit text to text encoded in unreadable Base-64 or "garbled" text encoded in quoted printable. There are several interesting non-interoperable cases that currently exist in non US-ASCII mail and several new ones likely to emerge in a transition to 8bit/MIME. \ Receiver \ 7bit 8bit MIME/ Sender \| only | transparent | ESMTP ---------------------------------------- 7bit only | (1) | (1) | (1) ---------------------------------------- 8bit transparent | (2) | (3) | (4) ---------------------------------------- MIME/ESMTP | (5) | (5) | (6) (1) Will work acceptably well with ISO 646 national variant ASCII or ISO 2022 character set shifting if an external "out of band" agreement to use a particular character set without tagging exists between the sender and the receiver. (2) The receiver will receive bit-stripped mail which results in the mis-interpretation of the data and the wrong character being displayed or printed. Mail sent using languages where most characters are in the US-ASCII subset of ISO 8859 may be somewhat readable. (3) Will work if an external agreement "out of band" to use a particular character set without tagging exists between the sender and the receiver. (4) Will work if a reasonable upgrade path is provided via gateways and the indicated character set tag inserted by the gateway is correct and the receiver supports the character set chosen by the sender. (5) Because the ESMTP/MIME sender cannot know that the receiver will understand 8bits, the sender will encode the text into base-64 or quoted-printable which may be considered "garbled" by the receiver. (6) Interoperability will be attained provided the receiver supports the character set chosen by the sender. Internet Draft Expires 5/1/93 3. Upgrade Path A gateway which has been upgraded to support Extended SMTP may upgrade a 8bit message received to MIME. This is consistent with the requirement that all 8bit mail sent by ESMTP be encoded in MIME. The upgrade should be done using the best available information. A site may "Upgrade" to MIME en-masse by implementing MIME conversion for all messages leaving the site. The conversion can be done by adding a mime-version header and a content-type/text header with the character set in use in the site. An appropriate Content-Transfer-Encoding header line must be added to indicate any encoding that may be necessary. Example: MIME-Version: 1.0 Content-Type: Text/Plain; Charset = "ISO-8859-1" Content-Transfer-Encoding: 8bit Content-Description: Untagged text converted to MIME. If no information is available, the gateway should upgrade the content by using the character set "unknown-8bit". Unknown-8bit states that the character set is only understandable with external information. MIME specifies that a MIME message with no character set specified is defined to be US-ASCII. Trace information should be added to the document as indicated in [2], with a convert clause "rfc822-to-8bit", "rfc822-to-base-64" or "rfc822-to-quoted-printable" e.g., Received: from dbc.mtview.ca.us by dbc.mtview.ca.us convert rfc822-to-8bit; Tue, 01 Sep 1992 01:18:00 -0700 Internet-Draft Expires 5/31/93 Appendix - The "unknown-8bit" Character Set This section defines a "charset" parameter, for use in a MIME Content-Type field. A special purpose character set called "unknown-8bit" is defined to be an unknown 8bit character set, encoded into a sequence of octets. It can represent any character set from any language, using any encoding. It may not be further defined. The use of this token in a "charset=" field of a message indicates that nothing is known about the character set used. This marker is intended for use by non-MIME to MIME gateways; specifically in those which translate from SMTP to 8bit ESMTP/MIME. This character set is not intended to be used by mail composers. It is assumed that the mail composer knows the character set in use and will mark it with a character set value as specified in [1], as amended by current Assigned Numbers documents [6]. The use of the "unknown-8bit" is intended only by gateway agents which cannot determine via out-of-band information the intended character set. The interpretation of the "unknown-8bit" is up to the mail reader. It is assumed that the human user will be able to interpret the information and choose an appropriate character set or pre-processor. Acknowledgements This document originated as a hallway conversation between Ned Freed, Neil Katin, and the author. Substantive input was received from Jonathan Laventhol, Craig Everhart and Olafur Gudmundsson. The document was refined with the input of many participants in the IETF SMTP Extensions Working Group. Author Address Greg Vaudreuil Corporation for National Research Initiatives 1895 Preston White Drive, Suite 100 Reston, VA 22091 USA GVaudre@CNRI.Reston.VA.US References [1] N.S. Borenstein, N. Freed. Multipurpose Internet Mail Extensions. Request for Comments 1341, (June, 1992). [2] K. Moore. Representation of Non-ASCII Text in Internet Message Headers. Request for Comments 1342, (June, 1992). [3] M.T. Rose, E.A. Stefferud, D.H. Crocker. SMTP Service Extensions. Internet-Draft, (November, 1992). [4] M.T. Rose, E.A. Stefferud, D.H. Crocker. SMTP Service Extensions for 8bit cleanliness. Internet-Draft, (November, 1992). [5] Coded Character Set--7-Bit American Standard Code for Information Interchange, ANSI X3.4-1986. [6] J. Reynolds, J. Postel. Assigned Numbers. Request for Comments 1340, (July 1992).