Unicode normalizations and Byte Order Mark
Table of Contents
- Unicode normalizations and Byte Order Mark
- Dynamic font mapping preferences
- Validator preferences
- Dreamweaver now allows keyboard shortcuts for code snippets
- Quick Tag Editor code hints are controlled by the Code Hints preferences dialog box
- Changing default file extensions from .htm to .html
- Importing Microsoft Word and Excel documents on the Mac
- Documentation for the New CSS Style dialog uses incorrect selector names
- Documentation for pasting formatted text uses incorrect menu option
- Documentation for the Sort Table command uses incorrect option labels
- Documentation for the ASP.NET Hyperlink Column dialog box is incorrectly titled.
- Documentation for animating layers with timelines
- Documentation errors in Animating Layers with Timelines
This feature description was omitted from the documentation.
Dreamweaver lets you specify the document encoding type that is specific to the language used to author your web pages as well as specify which Unicode Normalization Form to use with that encoding type. There are four Unicode Normalization Forms. The most important is Normalization Form C (NFC) because it's the most common form used in the Character Model for the World Wide Web. Macromedia provides the other three Unicode Normalization Forms for completeness.
The Byte Order Mark (BOM, or Unicode Signature) is 2 to 4 bytes at the beginning of a text file that identifies a file as Unicode, and if so, the byte order of the following bytes. Because UTF-8 has no byte order, adding a UTF-8 BOM is optional; for UTF-16 and UTF-32, it is required.
About Unicode normalization
In Unicode, there are some characters that are visually similar but can be stored within the document in different ways. For example, "ë" (e-umlaut) can be represented as a single character, "e-umlaut," or as two characters, "regular Latin e" + "combining umlaut." A Unicode combining character is one that gets used with the previous character, so the umlaut would appear above the "Latin e." Both forms result in the same visual typography, but what is saved in the file is different for each form.
Normalization is the process of making sure all characters that can be saved in different forms are all saved using the same from. That is, all "ë" characters in a document are saved as single "e-umlaut" or as "e" + "combining umlaut," and not as both forms in one document.
For more information on Unicode Normalization, and the specific forms that can be used, see: www.unicode.org/reports/tr15.
Specifying Unicode forms
To specify Unicode document encoding for all new documents:
- Select Edit > Preferences.
The Preferences dialog box appears.
- Select the New Document category.
- Select a document encoding type from the Document encoding pop-up menu.
If the document encoding form you select can have a Unicode form associated with it, the Unicode normalization form pop-up menu lets you select a Unicode form to include in all new documents you create. If there is no Unicode form that applies to the selected document encoding type, the Unicode normalization form pop-up menu is dimmed.
- If applicable, select a Unicode form from the Unicode Normalization Form pop-up menu.
- To include the Byte Mark Order in the document, select the Include Unicode Signature (BOM) checkbox.
To modify the Unicode document encoding for a specific page:
- Open the page whose Unicode encoding you want to modify.
- Select Modify > Page Properties.
The Page Properties dialog box appears.
- Select the Title/Encoding category.
- From the Encoding pop-up menu, select the document encoding type you want to use.
- If applicable, select a Unicode form from the Unicode Normalization Form pop-up menu.
- To include the Byte Mark Order in the document, select the Include Unicode Signature (BOM) checkbox.
- Click the Reload button to update the page with the specified document encoding.