Kyoto-Harvard System which is used in "Cologne Digital Sanskrit Lexicon" is modified for the Sarvamula Project. The ASCII transliteration is updated to be compatible with Unicoide 6.0.0 (13.Dec.2010).
a A i I u U R RR lR lRR e E ai o O au M H k kh g gh G c ch j jh J T Th D Dh N t th d dh n p ph b bh m y r l L v z S s h avagraha " candrabindu ~ Not needed in Sanskrit. Vedas use a NON-COMBINING ~ jihvAmUlIya x upadhmAnIya q udAtta ` Maps to "DEVANAGARI STRESS SIGN UDATTA" \u0951 svarita (indep.) ' (acute accent) anudAtta _ Maps to "DEVANAGARI STRESS SIGN ANUDATTA" \u0952 compound words - daNDa | double daNDa ! zero width joiner Z
The punctuation marks used are re-obtained via escape sequences
apostrophe # double quote @ reverse quote < exclamation & hyphen :
Other punctuation marks that can be freely used in Classical Skt are: ( ) { } [ ] > , . ? / \ * =
These substitutes are chosen because these marks don't occur in any Sanskrit work; they are a part of Latin punctuation, but are nowadays seen in school textbooks in India.
The substitutes are based on the comment by Acharya IITM
""" While it is true that classical texts seldom used punctuation, it is seen that modern texts do require at least the following: comma, fullstop, hyphen, question mark, exclamation mark, plus, minus, forward slash and the quotation mark. The following are often desirable: left and right parentheses, multiplication sign, semicolon """
The substitutes fully satisfy the above requirement.
Comma , and fullstop . are used in Vedic Skt. Their escape sequences in Vedic Skt (ONLY) are respectively > and *
The % + $ are reserved for markup. Comments begin with % Line continuation character is + (always occurs in column 0 of ANY line) Numbering scheme uses $
Annotations remarks etc. by the editor of the transliteration are enclosed in square brackets [ ]
Annotations by the editor(s) of the edition which served as source of the transliteration (e..g. conjectures, markers for lacunae etc.) which are part of the printed edition are enclosed in ordinary round parentheses ( )
The whole scheme is so designed so that white space and blanks can be used as freely as possible and we restrict ourselves to single letters as much as possible.
Sandhi is never marked - because it is up to the commentator to decide on where to split a word ! sandhi is not explicitly shown/marked in Devanagari at all. So it's not necessary in transliteration either.
What can be done in the future? Take sandhi into consideration and split the words. Insert markers (plus sign ??) where needed to show the split has occured. Then apply rules of Sanskrit grammar to reconstitute the original word, thus removing the markers.
Caution : When compound separator (hyphen) is used, sandhi marker is irrelavant !
Nominal compounds (excluding upapAdas) are separated by a hyphen By simply dropping the hyphen, you can reconstitute the original word.
How to include variants of a line or passage ?
For example:
Version 1 (base text): mama nAma satISaH asti
Version 2 (alternate text 1): mama nAma sadISaH asti
Version 3 (alternate text 2): mama nAma sazESaH asti
will be combined as follws:
mama nAma =satISaH= {sadISaH ; sazESaH} asti
Similarly, a word or passage that is omitted in base text but
included in alternate text will have an empty equals = =
Version 1 (base text): mama nAma asti
Version 2 (alternate text 1): mama nAma satISaH asti
Version 3 (alternate text 2): mama nAma sadISaH asti
will be combined as below:
mama nAma == {satISaH ; sadISaH} asti
Can you guess what do the following imply ?
mama nAma =satISaH= { ; } asti
mama nAma =satISaH= { ; sazESaH} asti
mama nAma == {satISaH ; } asti
In general, it means that the words enclosed in equals sign, =word= will be replaced by the corresponding words enclosed in flower brackets {alt_word} in the alternate text. Multiple alternatives are separated semicolon.
As always, white space is insignificant in such a scheme !
Any type of udAtta (in Unicoide name) always begins with `
Any type of svarita (in Unicoide name) always begins with '
Any type of anudAtta (in Unicoide name) always begins with _
Something (other than udAtta, anudAtta and svarita) combining
ABOVE begins with a caret ( ^ )
Something (other than udAtta, anudAtta and svarita) combining
BELOW has no sign at all
Any other letter following these marks is suggestive of the way
the Vedic accent resembles to the given English letter
= \u1CD0 VEDIC TONE KARSHANA
= \u1CD1 VEDIC TONE SHARA
= \u1CD2 VEDIC TONE PRENKHA
= \u1CD3 VEDIC TONE NIHSHVASA
= \u1CD4 VEDIC SIGN YAJURVEDIC MIDLINE SVARITA
'3 = \u1CD5 YAJURVEDIC AGGRAVATED INDEPENDENT SVARITA
'L = \u1CD6 YAJURVEDIC INDEPENDENT SVARITA
'7 = \u1CD7 YAJURVEDIC KATHAKA INDEPENDENT SVARITA
C = \u1CD8 CANDRA BELOW
'^ = \u1CD9 YAJURVEDIC KATHAKA INDEPENDENT SVARITA SCHROEDER
'' = \u1CDA DOUBLE SVARITA
''' = \u1CDB TRIPLE SVARITA
_| = \u1CDC KATHAKA ANUDATTA
. = \u1CDD DOT BELOW
.. = \u1CDE TWO DOTS BELOW
... = \u1CDF THREE DOTS BELOW
'J = \u1CE0 RIGVEDIC KASHMIRI INDEPENDENT SVARITA
'Z = \u1CE1 ATHARVAVEDIC INDEPENDENT SVARITA
'H = \u1CE2 VISARGA SVARITA
`H = \u1CE3 VISARGA UDATTA
``H = \u1CE4 REVERSED VISARGA UDATTA (reverse of above)
_H = \u1CE5 VISARGA ANUDATTA
__H = \u1CE6 REVERSED VISARGA ANUDATTA (reverse of above)
`V = \u1CE7 VISARGA UDATTA WITH TAIL
_V = \u1CE8 VISARGA ANUDATTA WITH TAIL
, = \u1CED TIRYAK
The following marks are NOT combining diacritics !
= \u1CE9 ANUSVARA ANTARGOMUKHA
= \u1CEA ANUSVARA BAHIRGOMUKHA
= \u1CEB ANUSVARA VAMAGOMUKHA
= \u1CEC ANUSVARA VAMAGOMUKHA WITH TAIL
M6 = \u1CEE HEXIFORM LONG ANUSVARA
MM = \u1CEF LONG ANUSVARA
MR = \u1CF0 RTHANG LONG ANUSVARA
= \u1CF1 ANUSVARA UBHAYATO MUKHA
= \u1CF2 ARDHAVISARGA
^0 = \uA8E0 COMBINING DEVANAGARI DIGIT ZERO
^1 = \uA8E1 COMBINING DEVANAGARI DIGIT ONE
^2 = \uA8E2 COMBINING DEVANAGARI DIGIT TWO
^3 = \uA8E3 COMBINING DEVANAGARI DIGIT THREE
^4 = \uA8E4 COMBINING DEVANAGARI DIGIT FOUR
^5 = \uA8E5 COMBINING DEVANAGARI DIGIT FIVE
^6 = \uA8E6 COMBINING DEVANAGARI DIGIT SIX
^7 = \uA8E7 COMBINING DEVANAGARI DIGIT SEVEN
^8 = \uA8E8 COMBINING DEVANAGARI DIGIT EIGHT
^9 = \uA8E9 COMBINING DEVANAGARI DIGIT NINE
^a = \uA8EA COMBINING DEVANAGARI LETTER A
^u = \uA8EB COMBINING DEVANAGARI LETTER U
^k = \uA8EC COMBINING DEVANAGARI LETTER KA
^n = \uA8ED COMBINING DEVANAGARI LETTER NA
^p = \uA8EE COMBINING DEVANAGARI LETTER PA
^r = \uA8EF COMBINING DEVANAGARI LETTER RA
^v = \uA8F0 COMBINING DEVANAGARI LETTER VI
^" = \uA8F1 COMBINING DEVANAGARI SIGN AVAGRAHA
~* = \uA8F2 DEVANAGARI SIGN SPACING CANDRABINDU
~, = \uA8F3 DEVANAGARI SIGN CANDRABINDU VIRAMA
~~, = \uA8F4 DEVANAGARI SIGN DOUBLE CANDRABINDU VIRAMA
~2 = \uA8F5 DEVANAGARI SIGN CANDRABINDU TWO
~3 = \uA8F6 DEVANAGARI SIGN CANDRABINDU THREE
~" = \uA8F7 DEVANAGARI SIGN CANDRABINDU AVAGRAHA
= \uA8F8 DEVANAGARI SIGN PUSHPIKA
= \uA8F9 DEVANAGARI GAP FILLER
= \uA8FA DEVANAGARI CARET
= \uA8FB DEVANAGARI HEADSTROKE
The following is a list of missing characters which are part of my Harvard-Kyoto character set (excluding Vedic accents) but not supported by the script itself.
Devanagari -- has all the letters. None missing. Bengali -- e o v L Gurmukhi -- R RR lR lRR e o S Gujarati -- e o Oriya -- e o Tamil -- ~ R RR lR lRR kh g gh ch jh Th D Dh th d dh ph b bh Telugu -- has all the letters. None missing. Kannada -- ~ Malayalam -- ~
Hence, for lossless transmission of Sanskrit, you CANNOT use Bengali, Gurmukhi and Tamil.
Only Devanagari and Telugu completely have all the characters in my Harvard-Kyoto.
It is interesting to note that Sinhala and Myanmar too have all the characters (including vowels) but the order in Unicoide charts do not correspond to other Indic scripts ! Almost all of Sanskrit's consonants appear in the same order even in Thai and Tibetan.
a_gnimI`LE purOhi`taM ya_jJasya` dE_vam-Rtvija`m | hOtA`raM ratna_dhAtama`m ! a_gni pUrvE`bhi_rZRSi`bhi_rIDyO_ nUta`nairu_ta | sa dE_vA~ Eha va`kSati ! To get the unaccented version simply remove ALL accent marks - as simple as that ! agnimILE purOhitaM yajJasya dEvam-Rtvijam | hOtAraM ratnadhAtamam ! agni pUrvEbhirZRSibhirIDyO nUtanairuta | sa dEvA~ Eha vakSati !