sacred-texts.com Homesacred-texts.com HomeAbout sacred-texts.comFrequently Asked QuestionsHow to contact sacred-textsSearch sacred-textsBuy the Internet Sacred Text Archive on CD-ROM
Wisdom is priceless, the sacred-texts CD-ROM is 49.95. Click here to learn more
Topics
Home
  World Religions
  Traditions
  Mysteries
  What's New?
  About
  Abuse
  Books
  Bibliography
  Contact
  Credits
  Copyrights
  Donate
  Downloads
  FAQ
  Links
  Map
  Press
  Privacy
  Search
  Top Level
  Terms of Service
  Translate
  Standards
  Unicode
  Volunteer
African
Age of Reason
Alchemy
Americana
Ancient Near East
Asia
Atlantis
Australia
Basque
Baha'i
Bible
Book of Shadows
Buddhism
Celtic
Christianity
Classics
Confucianism
DNA
Earth Mysteries
Egyptian
England
Esoteric/Occult
Evil
Fortean
Freemasonry
Gothic
Grimoires
Hinduism
I Ching
Islam
Icelandic
Jainism
Journals
Judaism
Legends/Sagas
LGBT
Miscellaneous
Mormonism
Native American
Neopaganism/Wicca
Nostradamus
Oahspe
Pacific
Paleolithic
Philosophy
Piri Re'is Map
Prophecy
Roma
Sacred Books of the East
Sacred Sexuality
Shakespeare
Shamanism
Shinto
Sikhism
Sky Lore
Tantra
Taoism
Tarot
Thelema
Theosophy
Time
Tolkien
UFOs
Utopia
Women
Zoroastrianism

 

Unicode

sacred-texts.com collage of texts, (c) 1999, J.B. Hare

Many files posted at sacred texts since the spring of 2002 have embedded Unicode. Unicode is a multi-byte alphabet which can represent all major world scripts, and many obscure ones as well. This solves a major problem for creators of etexts, as it is now possible to fully transcribe texts in multiple languages without requiring ASCII transliterations, special fonts or browsing software. Unicode enabling also takes care of right-to-left scripts more-or-less automatically.

The major version 4 and up browsers support Unicode if you have a decent Unicode font installed, provided you designate that font as your default font.

That said, this is definitely still on the cutting edge, and you may need to tweak your browser settings to get the full character set. And there are some features which are buggy in particular browsers, although support seems to be getting better in newer versions; having an up-to-date version of your operating system also helps.

For instance, Netscape appears to have a few problems displaying some subscript and superscript characters such as Hebrew vowel points (they get displayed to the left of where they should be, with a space above them); this does not occur in Internet Explorer. Ironically, some versions of IE5 do not display medial and final forms when displaying Arabic (which makes it unusable for this purpose), while Netscape handles this issue correctly. For this reason, we have also posted a version of the Quran which uses gif images to display Arabic. But this is an exception. And this may have been fixed in more recent versions of the browser.

We welcome any comments or questions about the visibility of Unicode on this site in various browsers, and we will add advisories on this page. Extensive Unicode resources can be found at unicode.org [External Site].

Recommended Unicode Fonts

If you need a Unicode font, we recommend the Code 2000 shareware font [External Site]. This is a very extensive Windows font. If you download this, we encourage making a shareware donation to the creator of the font [External Site] to support his worthy effort; instructions are in the zip file.

There is also a Unicode font included with the Microsoft Office 2000 product line, Lucinda Sans Unicode. This used to be downloadable from the Microsoft website, but no longer. However, you can install it as an option if you have Office 2000.

There is also a page about font issues regarding the Unicode Hebrew Bible at sacred-texts which includes a specialized redistributable font.

Enabling Unicode in Your Browser

The most common complaint is 'I downloaded and installed Code2000 but I still see little boxes in your files'. This is because you also have to tell your browser that you want to view Unicode content using that font.

First of all, we recommend that if you have an older browser, you should obtain the most recent version. If you are using AOL or another ISP which has a bundled browser, you may wish to get the most recent version of Internet Explorer or Netscape and use it for browsing Unicode content; the bundled browsers are notoriously buggy, particularly when it comes to cutting-edge features such as Unicode.

Here's how to get Unicode working in Internet Explorer using Code2000. The procedure is very similar for other browsers.

1. Download and Install the Unicode Font

First of all you need to download the font and install it. For instance, if you are using Windows XP, you start the Control Panel 'Fonts' program, and then select 'Install New Font' from the 'File' menu.

2. Make the Unicode Font Your Default Web Page Font

Let's assume you have downloaded and installed the 'Code2000' font. Start Internet Explorer and go into 'Tools | Internet Options' and select the 'Fonts' dialog.

On the 'Web Page Font', Code2000 should show up in the scrolling listbox, if you downloaded it and installed it correctly. Select it.

Unless you do this, some Unicode characters (such as the accented Greek characters and some Hebrew characters) may not show up.

I'm still seeing little boxes! What to do?

The most common problem is skipping step two in the previous section. If you don't designate a full Unicode font as your default 'Web Page Font', you will still only have whatever minimal Unicode support is built into your operating system.

Typically this will include some of the simplest extended Latin accented characters, as well as basic Greek and Hebrew characters. However, you won't be able to view specialized accented Latin characters, polytonic Greek, or pointed Hebrew. You won't be able to see any Arabic or Devanagari characters, astrological symbols, and so on. These will show up as the dreaded 'boxes' (or question marks in some browsers).

The web pages with heavy Unicode dependencies at this site don't have embedded font information because that would greatly inflate their size; and in the case of sections such as the Hebrew Bible and Sanskrit/Transliterated Rig Veda, that adds up to some serious extra baggage. Therefore I leave it up to you to tell your browser which font to use. You can always switch it back easily if you aren't reading specialized Unicode content.

Manually Selecting Unicode Encoding

You may need to also manually select 'Unicode (UTF-8)' in certain browsers. For instance, under Internet Explorer, you can select 'View | Encoding', and 'Unicode (UTF-8)'. Under Netscape, this is 'View | Character Coding'.

Technically, some of these pages don't use the UTF-8 encoding scheme. However this seems to be the only way to specify that you are viewing Unicode content for some browsers. I've started to add UTF-8 META tags to all files which have any amount of Unicode. This seems to have helped.

Unicode Implementation

Technically speaking, the Unicode characters are embedded in 8 bit HTML using 'character entities', for instance:

ॐ = ॐ
א = א‎
Ω = Ω

If your browser is Unicode-enabled, you should see the Sanskrit letter for 'Aum' (see this image); the Hebrew letter Aleph, and a Greek capital Omega above.

For disk space and bandwidth reasons, I've also started to use the UTF-8 encoding scheme in the files which are predominantly Unicode, such as the Greek and Hebrew portions of the Bible and the Rig Veda. This is a variable-length binary compression scheme which encodes Unicode efficiently. Instead of the 6 bytes per character that the HTML entity requires, UTF-8 requires one to three bytes to represent the 16 bit Unicode character set. Most modern browsers handle UTF-8 automatically, assuming you have installed a complete Unicode font.

In some cases Unicode has been used to transcribe Latin characters with accents outside the ISO-8859-1 HTML character set. In other cases complete texts or extensive portions of the text are in Unicode. Among the Unicode character sets in use currently are Arabic, Chinese, Extended Latin, Greek, Hebrew, Tibetan, Runic and Sanskrit.

Some of the Unicode-enabled files at sacred-texts include:

SACRED TEXTS NEEDS YOUR SUPPORT

It costs thousands of dollars a year to pay for this sites' bandwidth and maintenance. Without your continued support, sacred-texts would go offline or have to be scaled back. Your support is crucial; this site does not receive grants or institutional support.

The best way to support the site is to purchase the CD-ROM. The Sacred-texts CD-ROM has hundreds of books on it that are extremely hard to locate, including all of the major world scriptures. If you buy a copy, you can feel good knowing that you are helping keep this site online.

--J.B. Hare

This site is available on CD-ROM!

Buy it now

"Stunning"
read more...

The Internet Sacred Text Archive CD-ROM includes electronic texts of nearly a thousand of the most important books and articles ever written, including over two hundred transcribed specially for sacred-texts. Years of extensive research and scholarship went into this CD-ROM: all the core texts of religion, mythology, folklore and the esoteric are on one disk.

"worth far more than the price"
read more...

This collection includes the full text of each book, many with footnotes and illustrations. To buy all of these books you would have to spend tens of thousands of dollars, even if you could find them: many are out of print and hard to obtain at any price. You pay just pennies a book: the CD-ROM costs just $49.95; worldwide shipping is free when you buy direct!

"Phenomenal collection"
read more...

Everything on the disk can be viewed in a standard web browser on your PC or Macintosh. Proceeds from sales of the CD-ROM go to support free access to the Internet Sacred Text Archive on the web and development of new etexts.