Topics |
UnicodeMany files posted at sacred texts since the spring of 2002 have embedded Unicode. Unicode is a multi-byte alphabet which can represent all major world scripts, and many obscure ones as well. This solves a major problem for creators of etexts, as it is now possible to fully transcribe texts in multiple languages without requiring ASCII transliterations, special fonts or browsing software. Unicode enabling also takes care of right-to-left scripts more-or-less automatically. The major version 4 and up browsers support Unicode if you have a decent Unicode font installed, provided you designate that font as your default font. That said, this is definitely still on the cutting edge, and you may need to tweak your browser settings to get the full character set. And there are some features which are buggy in particular browsers, although support seems to be getting better in newer versions; having an up-to-date version of your operating system also helps. For instance, Netscape appears to have a few problems displaying some subscript and superscript characters such as Hebrew vowel points (they get displayed to the left of where they should be, with a space above them); this does not occur in Internet Explorer. Ironically, some versions of IE5 do not display medial and final forms when displaying Arabic (which makes it unusable for this purpose), while Netscape handles this issue correctly. For this reason, we have also posted a version of the Quran which uses gif images to display Arabic. But this is an exception. And this may have been fixed in more recent versions of the browser. It appears that Firefox does not render Devanagari 'i' correctly: it places it after the associated consonant, not before. IE and Safari do not display the correct presentation forms for Unicode Cyrillic italics: Safari does not even allow Cyrillic to be italicized, whereas IE shows italicized forms of the base graphemes, which is incorrect. Opera and Firefox display these presentation forms correctly. Strangely enough, the italic Cyrillic presentation forms are displayed correctly in MS Word 2003. Some problems viewing some polytonic Greek files on the 5.0 CD-ROM under Mac OS-X have been reported. These have been fixed on the website and the 6.0 DVD-ROM, but not on the 5.0 CD-ROM. We welcome any comments or questions about the visibility of Unicode on this site in various browsers, and we will add advisories on this page. Extensive Unicode resources can be found at unicode.org [External Site]. Recommended Unicode FontsIf you need a Unicode font, we recommend the Code 2000 shareware font [External Site]. This is a very extensive Windows font, and the one which we use to test the site with. We also recommend the site http://www.alanwood.net/unicode/fonts.html, which lists dozens of Unicode fonts for a variety of platforms. A Unicode font, Arial Unicode MS, comes with Windows XP. It has some good points: it seems to have better coverage of some of the more obscure Arabic characters than Code2000. That said, Arial Unicode MS is not pretty, and if reading everything in a sans serif font isn't your cup of tea, you may want to look elsewhere. Note that this font may not be installed on your XP system by default. If you have XP and don't see Arial Unicode MS as one of your available fonts, you may need to dig out your Windows disk. You also can buy it from Microsoft, but they charge an exorbitant $99 for it. With so many free and inexpensive Unicode fonts, there is no reason to pay that much! There is also a page about font issues regarding the Unicode Hebrew Bible at sacred-texts which includes a specialized redistributable font. Enabling Unicode in Your BrowserThe most common complaint is 'I downloaded and installed Code2000 but I still see little boxes in your files'. This is because you also have to tell your browser that you want to view Unicode content using that font. First of all, we recommend that if you have an older browser, you should obtain the most recent version. If you are using AOL or another ISP which has a bundled browser, you may wish to get the most recent version of Internet Explorer or Netscape and use it for browsing Unicode content; the bundled browsers are notoriously buggy, particularly when it comes to cutting-edge features such as Unicode. Here's how to get Unicode working in Internet Explorer using Code2000. The procedure is very similar for other browsers. 1. Download and Install the Unicode FontFirst of all you need to download the font and install it. For instance, if you are using Windows XP, you start the Control Panel 'Fonts' program, and then select 'Install New Font' from the 'File' menu. 2. Make the Unicode Font Your Default Web Page FontLet's assume you have downloaded and installed the 'Code2000' font. Start Internet Explorer and go into 'Tools | Internet Options' and select the 'Fonts' dialog. On the 'Web Page Font', Code2000 should show up in the scrolling listbox, if you downloaded it and installed it correctly. Select it. Unless you do this, some Unicode characters (such as the accented Greek characters and some Hebrew characters) may not show up. I'm still seeing little boxes! What to do?The most common problem is skipping step two in the previous section. If you don't designate a full Unicode font as your default 'Web Page Font', you will still only have whatever minimal Unicode support is built into your operating system. Typically this will include some of the simplest extended Latin accented characters, as well as basic Greek and Hebrew characters. However, you won't be able to view specialized accented Latin characters, polytonic Greek, or pointed Hebrew. You won't be able to see any Arabic or Devanagari characters, astrological symbols, and so on. These will show up as the dreaded 'boxes' (or question marks in some browsers). The web pages with heavy Unicode dependencies at this site don't have embedded font information because that would greatly inflate their size; and in the case of sections such as the Hebrew Bible and Sanskrit/Transliterated Rig Veda, that adds up to some serious extra baggage. Therefore I leave it up to you to tell your browser which font to use. You can always switch it back easily if you aren't reading specialized Unicode content. Manually Selecting Unicode EncodingYou may need to also manually select 'Unicode (UTF-8)' in certain browsers. For instance, under Internet Explorer, you can select 'View | Encoding', and 'Unicode (UTF-8)'. Under Netscape, this is 'View | Character Coding'. Technically, some of these pages don't use the UTF-8 encoding scheme. However this seems to be the only way to specify that you are viewing Unicode content for some browsers. I've started to add UTF-8 META tags to all files which have any amount of Unicode. This seems to have helped. Unicode ImplementationTechnically speaking, the Unicode characters are embedded in 8 bit HTML using 'character entities', for instance: ॐ = ॐ If your browser is Unicode-enabled, you should see the Sanskrit letter for 'Aum' (see this image); the Hebrew letter Aleph, and a Greek capital Omega above. For disk space and bandwidth reasons, I've also started to use the UTF-8 encoding scheme in the files which are predominantly Unicode, such as the Greek and Hebrew portions of the Bible and the Rig Veda. This is a variable-length binary compression scheme which encodes Unicode efficiently. Instead of the 6 bytes per character that the HTML entity requires, UTF-8 requires one to three bytes to represent the 16 bit Unicode character set. Most modern browsers handle UTF-8 automatically, assuming you have installed a complete Unicode font. In some cases Unicode has been used to transcribe Latin characters with accents outside the ISO-8859-1 HTML character set. In other cases complete texts or extensive portions of the text are in Unicode. Among the Unicode character sets in use currently are Arabic, Chinese, Extended Latin, Greek, Hebrew, Tibetan, Runic and Sanskrit. Some of the Unicode-enabled files at sacred-texts include:
|
|