Transliteration Principles for Sarvamūla (Classical & Vedic Sanskrit)


Kyoto-Harvard System which is used in "Cologne Digital Sanskrit Lexicon" is modified for the Sarvamula Project. The ASCII transliteration is updated to be compatible with Unicoide 6.0.0 (13.Dec.2010).

Koeln KH

a  A i I  u  U  R  RR  lR  lRR  e  E  ai  o  O  au  M  H
k kh g gh G
c ch j jh J
T Th D Dh N
t th d dh n
p ph b bh m
y r  l L  v z S s h

avagraha            "
candrabindu         ~       Not needed in Sanskrit. Vedas use a NON-COMBINING ~

jihvAmUlIya         x
upadhmAnIya         q
udAtta              `       Maps to "DEVANAGARI STRESS SIGN UDATTA" \u0951
svarita (indep.)    '       (acute accent)
anudAtta            _       Maps to "DEVANAGARI STRESS SIGN ANUDATTA" \u0952
compound words      -
daNDa               |
double daNDa        !
zero width joiner   Z

Escape Sequences

The punctuation marks used are re-obtained via escape sequences

apostrophe      #
double quote    @
reverse quote   <
exclamation     &
hyphen          :

Other punctuation marks that can be freely used in Classical Skt are: ( ) { } [ ] > , . ? / \ * =

These substitutes are chosen because these marks don't occur in any Sanskrit work; they are a part of Latin punctuation, but are nowadays seen in school textbooks in India.

The substitutes are based on the comment by Acharya IITM

""" While it is true that classical texts seldom used punctuation, it is seen that modern texts do require at least the following: comma, fullstop, hyphen, question mark, exclamation mark, plus, minus, forward slash and the quotation mark. The following are often desirable: left and right parentheses, multiplication sign, semicolon """

The substitutes fully satisfy the above requirement.

Comma , and fullstop . are used in Vedic Skt. Their escape sequences in Vedic Skt (ONLY) are respectively > and *

The % + $ are reserved for markup.
Comments begin with %
Line continuation character is + (always occurs in column 0 of ANY line)
Numbering scheme uses $

Annotations remarks etc. by the editor of the transliteration are enclosed in square brackets [ ]

Annotations by the editor(s) of the edition which served as source of the transliteration (e..g. conjectures, markers for lacunae etc.) which are part of the printed edition are enclosed in ordinary round parentheses ( )

The whole scheme is so designed so that white space and blanks can be used as freely as possible and we restrict ourselves to single letters as much as possible.

SANDHI

Sandhi is never marked - because it is up to the commentator to decide on where to split a word ! sandhi is not explicitly shown/marked in Devanagari at all. So it's not necessary in transliteration either.

What can be done in the future? Take sandhi into consideration and split the words. Insert markers (plus sign ??) where needed to show the split has occured. Then apply rules of Sanskrit grammar to reconstitute the original word, thus removing the markers.

Caution : When compound separator (hyphen) is used, sandhi marker is irrelavant !

COMPOUNDS

Nominal compounds (excluding upapAdas) are separated by a hyphen By simply dropping the hyphen, you can reconstitute the original word.

VARIANT READINGS

    How to include variants of a line or passage ?
    For example:
    Version 1 (base text):          mama nAma satISaH asti
    Version 2 (alternate text 1):   mama nAma sadISaH asti
    Version 3 (alternate text 2):   mama nAma sazESaH asti

    will be combined as follws:
    mama nAma =satISaH= {sadISaH ; sazESaH} asti

    Similarly, a word or passage that is omitted in base text but
    included in alternate text will have an empty equals = =
    Version 1 (base text):          mama nAma asti
    Version 2 (alternate text 1):   mama nAma satISaH asti
    Version 3 (alternate text 2):   mama nAma sadISaH asti

    will be combined as below:
    mama nAma == {satISaH ; sadISaH} asti

    Can you guess what do the following imply ?
    mama nAma =satISaH= { ; } asti
    mama nAma =satISaH= { ; sazESaH} asti
    mama nAma == {satISaH ; } asti

In general, it means that the words enclosed in equals sign, =word= will be replaced by the corresponding words enclosed in flower brackets {alt_word} in the alternate text. Multiple alternatives are separated semicolon.

As always, white space is insignificant in such a scheme !

VEDIC ACCENTS
 
 Any type of udAtta (in Unicoide name) always begins with   `
 Any type of svarita (in Unicoide name) always begins with  '
 Any type of anudAtta (in Unicoide name) always begins with _

 Something (other than udAtta, anudAtta and svarita) combining
 ABOVE begins with a caret ( ^ )

 Something (other than udAtta, anudAtta and svarita) combining
 BELOW has no sign at all

 Any other letter following these marks is suggestive of the way
 the Vedic accent resembles to the given English letter

        = \u1CD0         VEDIC TONE KARSHANA
        = \u1CD1         VEDIC TONE SHARA
        = \u1CD2         VEDIC TONE PRENKHA
        = \u1CD3         VEDIC TONE NIHSHVASA
        = \u1CD4         VEDIC SIGN YAJURVEDIC MIDLINE SVARITA
'3      = \u1CD5         YAJURVEDIC AGGRAVATED INDEPENDENT SVARITA
'L      = \u1CD6         YAJURVEDIC INDEPENDENT SVARITA
'7      = \u1CD7         YAJURVEDIC KATHAKA INDEPENDENT SVARITA
C       = \u1CD8         CANDRA BELOW
'^      = \u1CD9         YAJURVEDIC KATHAKA INDEPENDENT SVARITA SCHROEDER
''      = \u1CDA         DOUBLE SVARITA
'''     = \u1CDB         TRIPLE SVARITA
_|      = \u1CDC         KATHAKA ANUDATTA
.       = \u1CDD         DOT BELOW
..      = \u1CDE         TWO DOTS BELOW
...     = \u1CDF         THREE DOTS BELOW
'J      = \u1CE0         RIGVEDIC KASHMIRI INDEPENDENT SVARITA
'Z      = \u1CE1         ATHARVAVEDIC INDEPENDENT SVARITA
'H      = \u1CE2         VISARGA SVARITA
`H      = \u1CE3         VISARGA UDATTA
``H     = \u1CE4         REVERSED VISARGA UDATTA        (reverse of above)
_H      = \u1CE5         VISARGA ANUDATTA
__H     = \u1CE6         REVERSED VISARGA ANUDATTA      (reverse of above)
`V      = \u1CE7         VISARGA UDATTA WITH TAIL
_V      = \u1CE8         VISARGA ANUDATTA WITH TAIL
,       = \u1CED         TIRYAK

The following marks are NOT combining diacritics !

        = \u1CE9        ANUSVARA ANTARGOMUKHA
        = \u1CEA        ANUSVARA BAHIRGOMUKHA
        = \u1CEB        ANUSVARA VAMAGOMUKHA
        = \u1CEC        ANUSVARA VAMAGOMUKHA WITH TAIL
M6      = \u1CEE        HEXIFORM LONG ANUSVARA
MM      = \u1CEF        LONG ANUSVARA
MR      = \u1CF0        RTHANG LONG ANUSVARA
        = \u1CF1        ANUSVARA UBHAYATO MUKHA
        = \u1CF2        ARDHAVISARGA

Devanagari Extended (Samavedic Cantillations)

^0      = \uA8E0         COMBINING DEVANAGARI DIGIT ZERO
^1      = \uA8E1         COMBINING DEVANAGARI DIGIT ONE
^2      = \uA8E2         COMBINING DEVANAGARI DIGIT TWO
^3      = \uA8E3         COMBINING DEVANAGARI DIGIT THREE
^4      = \uA8E4         COMBINING DEVANAGARI DIGIT FOUR
^5      = \uA8E5         COMBINING DEVANAGARI DIGIT FIVE
^6      = \uA8E6         COMBINING DEVANAGARI DIGIT SIX
^7      = \uA8E7         COMBINING DEVANAGARI DIGIT SEVEN
^8      = \uA8E8         COMBINING DEVANAGARI DIGIT EIGHT
^9      = \uA8E9         COMBINING DEVANAGARI DIGIT  NINE
^a      = \uA8EA         COMBINING DEVANAGARI LETTER A
^u      = \uA8EB         COMBINING DEVANAGARI LETTER U
^k      = \uA8EC         COMBINING DEVANAGARI LETTER KA
^n      = \uA8ED         COMBINING DEVANAGARI LETTER NA
^p      = \uA8EE         COMBINING DEVANAGARI LETTER PA
^r      = \uA8EF         COMBINING DEVANAGARI LETTER RA
^v      = \uA8F0         COMBINING DEVANAGARI LETTER VI
^"      = \uA8F1         COMBINING DEVANAGARI SIGN AVAGRAHA
~*      = \uA8F2         DEVANAGARI SIGN SPACING CANDRABINDU
~,      = \uA8F3         DEVANAGARI SIGN CANDRABINDU VIRAMA
~~,     = \uA8F4         DEVANAGARI SIGN DOUBLE CANDRABINDU VIRAMA
~2      = \uA8F5         DEVANAGARI SIGN CANDRABINDU TWO
~3      = \uA8F6         DEVANAGARI SIGN CANDRABINDU THREE
~"      = \uA8F7         DEVANAGARI SIGN CANDRABINDU AVAGRAHA
        = \uA8F8         DEVANAGARI SIGN PUSHPIKA
        = \uA8F9         DEVANAGARI GAP FILLER
        = \uA8FA         DEVANAGARI CARET
        = \uA8FB         DEVANAGARI HEADSTROKE

MISSING LETTERS IN OTHER SCRIPTS

The following is a list of missing characters which are part of my Harvard-Kyoto character set (excluding Vedic accents) but not supported by the script itself.

Devanagari -- has all the letters. None missing.
Bengali -- e o v L
Gurmukhi -- R RR lR lRR e o S
Gujarati -- e o
Oriya -- e o
Tamil -- ~ R RR lR lRR kh g gh ch jh Th D Dh th d dh ph b bh
Telugu -- has all the letters. None missing.
Kannada -- ~
Malayalam -- ~

Hence, for lossless transmission of Sanskrit, you CANNOT use Bengali, Gurmukhi and Tamil.

Only Devanagari and Telugu completely have all the characters in my Harvard-Kyoto.

It is interesting to note that Sinhala and Myanmar too have all the characters (including vowels) but the order in Unicoide charts do not correspond to other Indic scripts ! Almost all of Sanskrit's consonants appear in the same order even in Thai and Tibetan.

Rig-Veda Example

 a_gnimI`LE purOhi`taM ya_jJasya` dE_vam-Rtvija`m | hOtA`raM ratna_dhAtama`m !
 a_gni pUrvE`bhi_rZRSi`bhi_rIDyO_ nUta`nairu_ta | sa dE_vA~ Eha va`kSati !

To get the unaccented version simply remove ALL accent marks - as simple as that !

 agnimILE purOhitaM yajJasya dEvam-Rtvijam | hOtAraM ratnadhAtamam !
 agni pUrvEbhirZRSibhirIDyO nUtanairuta | sa dEvA~ Eha vakSati !

RSS Feed