sacred-texts.com Homesacred-texts.com HomeAbout sacred-texts.comFrequently Asked QuestionsHow to contact sacred-textsSearch sacred-textsBuy the Internet Sacred Text Archive on CD-ROM
Wisdom is priceless, the sacred-texts CD-ROM is 49.95. Click here to learn more
Topics
Home
  World Religions
  Traditions
  Mysteries
  What's New?
  About
  Abuse
  Books
  Bibliography
  Contact
  Credits
  Copyrights
  Donate
  Downloads
  FAQ
  Links
  Map
  Press
  Privacy
  Search
  Top Level
  Terms of Service
  Translate
  Standards
  Unicode
  Volunteer
African
Age of Reason
Alchemy
Americana
Ancient Near East
Asia
Atlantis
Australia
Basque
Baha'i
Bible
Book of Shadows
Buddhism
Celtic
Christianity
Classics
Confucianism
DNA
Earth Mysteries
Egyptian
England
Esoteric/Occult
Evil
Fortean
Freemasonry
Gothic
Grimoires
Hinduism
I Ching
Islam
Icelandic
Jainism
Journals
Judaism
Legends/Sagas
LGBT
Miscellaneous
Mormonism
Native American
Neopaganism/Wicca
Nostradamus
Oahspe
Pacific
Paleolithic
Philosophy
Piri Re'is Map
Prophecy
Roma
Sacred Books of the East
Sacred Sexuality
Shakespeare
Shamanism
Shinto
Sikhism
Sky Lore
Tantra
Taoism
Tarot
Thelema
Theosophy
Time
Tolkien
UFOs
Utopia
Women
Zoroastrianism

 

How to Volunteer

sacred-texts.com collage of texts, (c) 1999, J.B. Hare

General Comments  Copyrighted Redistributable Material  Public Domain Works  Sacred Text Markup Language 

General Comments

I am always looking for new and unique material to add to the Internet Sacred Text Archive (ISTA). The ISTA is a repository for public domain and redistributable etexts on the subjects of:

World Religion, including scriptures and historical material.

Traditional beliefs of Africa, Native America, Australia, and the Pacific.

Folklore and Fairytales

Mythology and Legends

Esoteric and Occult

No compensation is offered for volunteers, just bragging rights and worldwide exposure. The material submitted is posted at the site for permanent free download by anyone with a web connection. I reserve the right to reject any submission or to remove the submission at any time without notice.

Important: if you want to work on a project for the archive, please email me about your plans before starting. This is so that we can avoid duplicate projects and ensure that you understand the archives' procedures for adding new content to the site. Otherwise you may expend a lot of labor on a file that we can't post, or have to go back and do additional work on it.

To volunteer to create etexts, you will need to have access to or obtain 1) a flat-bed scanner 2) OCR software (e.g. OmniPage) and 3) word processing software which can edit HTML files and perform spell checking (e.g. Microsoft Word). You will also be responsible for obtaining the book which you will be scanning, either by checking it out from a library or by purchasing a copy. I will assist in clearing the copyright, but you need to have the time, skills and resources to finish the project on your own, without a lot of handholding.

Note that donated etexts may be cross-posted (i.e., 'stolen') by other sites without notice to sacred-texts or the originator of the etext, or converted for submission to Project Gutenberg. I make no attempt to prevent these uses of material at the site and actually encourage it. While I give attribution to volunteers in the etext and in the on-site bibliography, I can't guarantee that attribution will be provided by other sites.

The file, if based on public domain material, will be included on the next version of the ISTA CD-ROM. This CD-ROM is sold to defray costs of the site, currently in the middle four figures per year.

Texts must be in HTML or 7 or 8 bit ASCII. All graphics submitted must be in JPG format. All files submitted to the ISTA must either be verifiably in the public domain or bear a clear release from the author for unlimited reproduction.

I do not publish PDF files or other binary formats, and any HTML must be non-CSS based and without any JavaScript, or other 'moving parts'. I don't generally post material which occurs elsewhere on the Internet at stable, long term sites. I welcome contributions of compatible material from existing sites as a mirror site by special arrangement.

When submitting material for the ISTA, please compress it in the ZIP format and attach the ZIP file to an email sent to this address.

Formatting

There are three kinds of texts which are submitted to the ISTA.

I. Copyrighted Redistributable Material

ISTA has several specialized archives on the following subjects for which it is accepting submissions of articles. This includes:

Internet Book of Shadows
Zen Buddhism
Tibetan Buddhism
the UFO files

These are repositories of copyrighted but redistributable articles, for which ISTA is soliciting new and compatible articles.

Material submitted can be short or long articles on these topics.

Please include an explicit release in your submission, such as:

Copyright (c), 2003, Your Name Here. This article may be reproduced for non-commercial purposes, providing that this original copyright notice stays in place at all times.

I will add some navigational links at the top of your file and possibly do some minor formatting on it, but otherwise leave it intact.

Material in these archives is not included on the ISTA CD-ROM and does not become Project Gutenberg candidate material.

II. Public Domain

I welcome submission of public domain texts which are preformatted for sacred-texts. These must be in HTML or 7 or 8 bit ASCII Text (no PDFs or other formats). This will typically be a redacted etext based on a public domain book. It will save a great deal of time if you use the following conventions for marking up submitted files. I have a program which allows me to instantly convert files marked up using these conventions for posting at the website.


Scan this Book [PDF format, 96 Kb]

This is a work-in-progress book which I'm writing. This contains many tricks of the trade for scanning and proofing etexts. I'm posting this in PDF format because it is a copyrighted document which I plan to publish eventually. I'm posting a draft version here to help out other people interested in creating etexts. You can download and read this file but it is not for redistribution or republication without my permission.


 

III. Sacred Texts Markup Language (STML)

Download the STML source for Bushido, the Soul of Japan. [85Kb zip format]. This sample file shows what STML markup looks like. This is the actual proof file which I use to generate the multi-file version at the site.

ISTA has developed a set of coding standards which make it possible to create standardized etexts with anchored footnotes, linkable page numbers, and embedded Unicode. This coding standard is called 'STML' for Sacred Text Markup Language (although it should be of general use).

I've been using STML internally since the middle of 2002, and have found it incredibly useful for creating HTML content from physical books.

Several texts have now been submitted in this format. Even if you just insert file breaks and page numbers, and markup the footnotes using this method, it will make it a lot easier to post the material at sacred-texts.

The advantage of formatting your submission in STML is that you don't have to create footnote links, break the file down into separate sections with navigation links manually, and you can insert Greek and Hebrew text or arbitrary Unicode into the body of the file without having to use specialized editors.

STML is a markup language which is piggybacked on HTML. This means that if your editor (for instance, MS Word) can edit HTML directly, you can insert the STML tags as normal text without having to understand HTML. So if you don't know HTML, no problem.

STML tags are enclosed in wavy brackets {}. I chose this because wavy brackets occur very rarely in 19th century and early 20th centry books, as opposed to square or angle brackets. If an actual wavy bracket occurs in the text, you write a backslash preceeding it so that it will appear at the website (\{ or \}).

Each STML tag starts with an abbreviation, such as 'p' for page. This may be followed one or more arguments; for instance {file "Title of this file"}. STML tags can sometimes occur inside other SMTL tags. For instance, you can add STML markup inside the body of a footnote.

The STML tags are meant to be read by a computer program called a 'parser'. Computer programs aren't very forgiving about ambiguity or quirks. Therefore, all STML tag abbreviations must be lower case. The parser is also picky about where you put periods; there must be a period after the 'p' in the page tags ({p. NNN}); likewise one in the footref ({fr. NNN}) and two in the footnote ({fn. NNN. text...}). Don't put in extraneous spaces, for instance, before the abbreviations, or multiple spaces after the abbreviation. If you are using curved quotes in your word processor, be sure to use normal straight double quotes in the STML tags.

To prepare an STML based document, an HTML document is marked up with STML tags, and then an STML parser reads the document and emits a finished document with markup based on the STML tags. The parser performs some error checking on the tags so that the resulting etext is well-formed, such as syntax checking on the tags, checking to see whether all footnotes are properly referenced, all footnote references have a corresponding footnote, all page references refer to valid pages, and so on.

I have developed a Windows-based STML parser which has undergone several revisions on a set of increasingly complicated projects. At some point I will release an open source version of this program along with a binary. However, at this point in time, the only way to use this markup is to prepare a file and send it to sacred-texts. The parser is not available as a stand-alone program; rather there is a main routine, which I customize for each project, and is compiled specifically for that project, linked into the STML implementation library, So the only way to build an STML parser currently is if you have a C compiler and know how to edit a C program. Don't worry; I am more than willing to do this for you if you add the STML tags.

The following is a list of STML tags in common use. This is not complete, nor is it guaranteed to remain the same; but this is pretty much the core, stable part of the STML syntax and semantics. In this part of the document arguments in italics can take on values as specified in the description for each tag.

In this section I use the term 'logical' to refer to the structure of the output, formatted version of the etext, and 'physical' to refer to the original printed book. e.g. logical volume is a volume of the etext (a span of the etext where the page numbering gets reset) and physical volume is a separate book.

FILE TAG

{file "title"}

DESCRIPTION: This indicates where a new subfile will start. This tag should be written on a line by itself.

Don't forget to follow this with the first page number in the file (this is easy to forget since the page numbers are often left out on the first page of a chapter in books). If a logical file starts within a physical page (i.e., a chapter starting halfway down the page in the original book), you may wish to add a second page number with a letter after the file tag (e.g. 12b) after this because a page reference will land on the previous page.

OUTPUT: "title" is the title of the file which will be used in the output index.htm file and in the HTML TITLE tag in the output subfile. There is currently no way to specify filename of the new file: files are simply numbered sequentially.

SECTION TAG

{section "title"}

DESCRIPTION: This indicates the start of a section. Typically this is used when new section of the book starts. Note that this does not create a file break; if you want a file break at this point you must follow this tag up immediately with a file tag. This tag should be written on a line by itself.

OUTPUT: Nothing special at this point; currently the "title" is output into the index.htm file in a header style, centered.

VOLUME TAG

{volume "title" "prefix"}

DESCRIPTION: This indicates the start of a new volume, effectively a place in the logical book where the page numbering restarts. If the input file or files contain the text of a set of physical books, a volume tag should be added at the start of each separate book. Note that this does not create a file break; if you want a file break at this point you must follow this tag up immediately with a file tag. This tag should be written on a line by itself.

The prefix is used as to name successive subfiles in this volume. It is used until the prefix is reset at the next volume tag.

OUTPUT: The "title" is output into the index.htm file in a header style, centered. The page numbering gets reset. Note that page references can only refer to pages within the current volume currently.

PAGE TAG

TAG: {p. NNN}

DESCRIPTION: This defines a page number NNN. This is normally written on a line by itself. The page number doesn't have to be numeric, it can be any string without spaces, such as {p. xiv}. This is normally the same as the page number printed in the book.

If a word is hyphenated across a page boundary, the word is joined in the etext on the previous page. This is to facilitate text searches.

If it is desirable to not create a break in the text then the page tag can be embedded in the text. This is useful for short poetry which spans page boundaries. Longer poems should have pages indicated on a separate line.

OUTPUT: The page number is output in small green text in the same location in the output file with an HTML anchor next to it so the page can be linked to.

PAGE REFERENCE TAG

TAG: {pr. NNN}

DESCRIPTION: This creates a link to the given page number. The page number must be within the same logical volume as the reference. The page number can be in a different logical file, however.

OUTPUT: This outputs the text p. NNN in a smaller font, linked to the particular page number.

SILENT PAGE REFERENCE TAG

TAG: {prr. NNN}

DESCRIPTION: This is effectively the same as a pr. tag, except that the output does not include 'p.'

OUTPUT: This outputs the text NNN in a smaller font, linked to the particular page number.

NOTE: This is used in the case that the page reference is 'See page NNN'. This would be coded 'See page {prr. NNN}'.

CONTINUATION TAG

TAG: {cont}

DESCRIPTION: This indicates a paragraph continuation. This is traditionally ignored in etext formatting, but I feel that is important for preserving the paragraph structure of a document.

If a paragraph goes over a page boundary, and the first letter on the second page is not a lower case alphabetic character (including any kind of punctuation), then the tag {cont} (continuation) is inserted at the beginning of the second page.

Specifically, this must be inserted if there is a paragraph which is not indented and there is anything other than a lower case alphabetic letter at the start of the first line at the top of the page. This includes a capital letter, a number, a quotation mark or any other punctuation character.

A human being can distinguish most of the cases where a sentence continues onto the next page, but a computer program can't easily. This is why this tag must be inserted in every case where there is non-indented non-lowercase letter at the start of a page.

The {cont} is usually placed either on the second page or (in some cases) on the page leading into the continuation.

The {cont} tag is also used to indicate that a paragraph continues within a page, for instance if there is some indented material (such as a block quote or a poem), followed by a non-indented paragraph, not beginning with a lower case letter:

There once was a Lady from Bright, Who travelled much faster than light.
{cont}This is the beginning of a limerick.

A {cont} is not required if there is a actual new paragraph at the start of the second page.

This can also occur if a hyphenated word is joined and a sentence in the same paragraph starts on the second page as a result.

OUTPUT: Currently I'm outputting paragraph continues at this point, but it will be used in the future to prevent indentation from occurring in this location (i.e., so that it doesn't appear to be a spurious paragraph).

FOOTNOTE REFERENCE TAG

{fr. NNN}

DESCRIPTION: This is a footnote reference. Typically this refers to a footnote on the same page as the reference.

The NNN does not need to be a number (it could be a letter or an asterisk, for instance). However, where the original book uses asterisk, dagger, virgule, etc. for footnotes, it is good practice just to convert them to numbers for readability.

The footnote numbers do not have to start at '1'.

The STML parser checks to see whether the footnote reference refers to an existing footnote. It is okay if there is more than one footnote reference to the same footnote number.

OUTPUT: This is converted to a link with the text NNN in a small font, linked to the corresponding footnote.

FOOTNOTE TAG

{fn. NNN. text}

DESCRIPTION: This is a footnote. The footnote tag conceptually has two parts, separated by the period after the footnote number. NNN is the number. The comments about the number format under fr., above apply.

Text can be any arbitrary text, and can contain other tags, including a fr. tag if required! (Occasionally foonotes have foonotes).

The corresponding footnote must appear on the same logical page as any footnote references to it.

If the footnotes in the original file are all at the end of the chapter, then a flag can be set in the STML parser to take care of this. Just code the footnote references and footnotes in the appropriate places.

If the footnotes appear at the end of the book, use the xr./xn. tags instead.

The footnote tag may be inserted anywhere, even if it isn't where the footnote appeared in the original text. This is necessary if there is a file break in the page so that the corresponding footnote will be in the same logical file as the footnote reference, and in certain situations where there is poetry that spans a page boundary.

The STML parser checks to see whether there is at least one footnote reference matching the footnote. It will also complain if there is no period following the number.

INTRA-FOOTNOTE PAGE NUMBER TAG

TAG: {footnote p. NNN} or {fp. NNN}

DESCRIPTION: This is a specialized version of the page tag which is used only in footnotes; this outputs visible page reference, but the page cannot be referenced currently. This is used when a footnote is broken across one or more pages. The footnote text of the entire footnote should be concatenated into one fn. tag and this tag used to indicate page breaks. The normal rules for concatenating words that cross page boundaries inside the footnote apply.

OUTPUT: Outputs the page number in small green type as p. NNN. As opposed to the regular page tag, does not currently output an HTML anchor.

IMAGE TAG

TAG: {img filename}
{img filename@symname}
{img filename "Caption"}

DESCRIPTION: This inserts an image file or thumbnail at the given location. Images are stored in a subdirectory named 'img'. All images are in the jpg format; the filename argument is the name of the file exclusive of the .jpg extension, for instance {img front} refers to 'front.jpg'. This tag can be inserted into the middle of a paragraph to insert a small image into the body of the text, or set aside to insert a larger image between paragraphs.

You can add a caption to the image by inserting it as a double quoted string after the filename.

The second form is used if the image is to be referenced using the ir. tag. In this case, symname is the symbolic name used to reference the image. This must be an alphanumeric name with no spaces in it, such as 'fig17'. This can be (but doesn't have to be) different than the filename.

To left align the image, use 'limg'. To right align the image use 'rimg'. To include a thumbnail, place a smaller image with the same name in a subdirectory named 'tn', and use 'thumb' instead of 'img'. You can also specify 'lthumb' and 'rthumb' for left and right aligned thumbnails. These variations all have the same usage as 'img'.

OUTPUT: Outputs the following HTML into the output file: <IMG SRC="img/filename.jpg">. If a symbolic name is supplied, creates an anchor above the image as well. The caption, if any, is output below the file in a small font. If a 'thumb' is used, it outputs the image tag referencing the thumbnail, linked to the file in the 'img' subdirectory. The 'rimg/rthumb' and 'limg/lthumb' add the ALIGN="RIGHT" or ALIGN="LEFT" attributes. These are useful for adding 'initials', and placing other images aligned as they are in the original book, as the text wraps around the image on the specified side of the page.

NOTE: This is useful since it is difficult to edit a long file with numerous embedded images.

IMPORTANT: There is no period after the abbreviation in this tag.

IMAGE REFERENCE TAG

TAG: {ir. symname}
{ir. symname "Caption"}

DESCRIPTION: Creates a reference to an img tag with a symbolic name. If there is no symbolic name for the image, the filename is used instead.

OUTPUT: Outputs the symbolic name linked to the image; if "Caption". is present, outputs the caption as the title of the link.

NOTE: The symbolic name must match the link text.

EXAMPLE:

{img [email protected]}

...

(See Figure {ir. 15}).

The image file img/015.jpg gets inserted into the file. At some point before or after this point, the text (See Figure 15) gets inserted. The 15 in this text is linked to the image.

UNICODE OUTPUT TAGS

TAG: {greek text}
{hbw text}
{cyr text}
{sym text}
{u hex-number}

DESCRIPTION: These tags insert Unicode strings encoded as HTML numeric character entities into the output file. The system of transcription is available on request.

The 'greek' tag inserts Greek characters, 'hbw' inserts Hebrew characters; 'cyr' inserts Cyrillic characters. The 'sym' tag outputs a single named Unicode character into the output file; I have so far defined a few ad-hoc characters such as {sym mercury}, which inserts the astrological sign for the planet Mercury. The u tag simply outputs a single arbitrary Unicode character into the output file, where hex-number is its numeric value. This is useful in a limited way to add markup for characters not yet supported.

MACRO-PREPROCESSOR

I have developed a macro-preprocessor as well. This is a program which reads the file prior to all other parsing and performs arbitrary substitutions on the input. I can create specialized ways to mark up particular characters which get transformed into Unicode. This most useful for transcribing extended Latin characters. Conventionally I use a_ for 'long a', a^ for 'short a', and so on. After the macro-preprocessor is run on the file, these appear as ā and ă respectively.

This is most useful if you have numerous instances of extended Latin characters in a book. If there are only a few places where they occur, it is better to use the {u} tag instead (see above). The values of Unicode characters can be found in the charts at unicode.org. Let me know if you need any assistance with this.

If you need to use a set of conventions like this, let me know so I can program the parser accordingly. When designing a macro set, it is important not to use characters which have special meaning to HTML, particularly the equals sign or double quotes. The macro sequence must be a sequence of letters which can't appear normally in the text of the particular book which you are transcribing. For istance, using the characters 'ee' to indicate 'long e' would change every instance of 'ee' in the file to 'long e' (ē), which is probably not desirable.

Additionally, I use $ and /$ to enclose text which should be in a small font, and | and /| to enclose text which is indented. This is because I've moved to editing using Word 200x. This uses embedded styles and Internet Explorer-specific tags which greatly inflate the size of files. It turns any file which it touches into a non-standard, extremely opaque dialect of HTML. Therefore, I have written a routine which strips out the extraneous HTML markup from these files. Unfortunately, it also strips out code for font size and indentation; therefore I use the dollar sign and pipe conventions to manually mark up these sections. These then get converted into <SMALL> and <DIR> tags.

INDEX FILE GENERATION

The text at the start of the index.htm file for the project is generated from a file named 'index.h'. This file is separate from the proof file. It is formatted as a text file with HTML markup. The contents of 'index.h' are inserted verbatim into the index.htm file after the navigation section and before the list of files. Normally, I insert the name of the book in HTML header format, along with a splash image, and then include any commentary that I want, followed by a horizontal rule. I can compose the index.h file if you don't feel confident coding HTML; you can supply one if you want, though.

SACRED TEXTS NEEDS YOUR SUPPORT

It costs thousands of dollars a year to pay for this sites' bandwidth and maintenance. Without your continued support, sacred-texts would go offline or have to be scaled back. Your support is crucial; this site does not receive grants or institutional support.

The best way to support the site is to purchase the CD-ROM. The Sacred-texts CD-ROM has hundreds of books on it that are extremely hard to locate, including all of the major world scriptures. If you buy a copy, you can feel good knowing that you are helping keep this site online.

--J.B. Hare

This site is available on CD-ROM!

Buy it now

"Stunning"
read more...

The Internet Sacred Text Archive CD-ROM includes electronic texts of nearly a thousand of the most important books and articles ever written, including over two hundred transcribed specially for sacred-texts. Years of extensive research and scholarship went into this CD-ROM: all the core texts of religion, mythology, folklore and the esoteric are on one disk.

"worth far more than the price"
read more...

This collection includes the full text of each book, many with footnotes and illustrations. To buy all of these books you would have to spend tens of thousands of dollars, even if you could find them: many are out of print and hard to obtain at any price. You pay just pennies a book: the CD-ROM costs just $49.95; worldwide shipping is free when you buy direct!

"Phenomenal collection"
read more...

Everything on the disk can be viewed in a standard web browser on your PC or Macintosh. Proceeds from sales of the CD-ROM go to support free access to the Internet Sacred Text Archive on the web and development of new etexts.