Devanagari Transliteration in LaTeX
Write in Devanagari to render as IAST, Harvard-Kyoto, Velthuis, SLP1, WX etc.
Devanagari is the fourth most widely adopted writing system in the world, primarily used in the Indian subcontinent. The script is being used for more than 120 languages, some of the more notable languages being, Sanskrit, Hindi, Marathi, Pali, Nepali and several variations of these languages.
Devanagari text can be transliterated in various standard schemes. There exist several input systems based on these transliteration schemes to enable users easily input the text. More often than not, a user has a preference of scheme to type the input in. Similarly, at times, one faces a need to render it in a different scheme in the PDF document.
In my case, I prefer using ibus-m17n
to type text in Devanagari. While writing articles that contain Devanagari text, I also faced the need to render the text as IAST in the final PDF.
One could always learn to input text in another input scheme, but that may get tedious. Similarly, transliterating each word using online systems such as Aksharamukha can also be a tedious task. So, I was looking for a way where I can type in Devanagari, and have it rendered in IAST after PDF compilation. As a solution, I came up with a system consisting of a small set of LaTeX commands to add custom syntax to LaTeX and a python transliteration script (based on indic-transliteration
package) to serve as a middle-layer and process the LaTeX file to create a new LaTeX file with proper transliteration.
LaTeX Compilation System with Transliteration Support
There are two primary components to the system,
- LaTeX Synatx
- Transliteration Script
LaTeX Syntax
XeTeX (xelatex
) and LuaTeX (lualatex
) have good unicode support and can be used to write Devanagari text. In the current example, I mention the setup with XeTeX.
We first add the required packages in the preamble of the LaTeX (.tex
) file.
% This assumes your files are encoded as UTF8
\usepackage[utf8]{inputenc}
% Devanagari Related Packages
\usepackage{fontspec, xunicode, xltxtra}
Using fontspec
, we can define environments for font families, to write text in specific scripts. To write Devanagari text, one needs to have a Devanagari font available. (It is assumed here that one may need to write both in Devanagari as well as other transliteration schemes.)
For more on Devanagari fonts, you may check the fonts section of this document. In this section, it is assumed that Sanskrit 2003
font is installed in the system.
To define the environments as mentioned earlier, we add the following lines in the preamble.
% Define Fonts
\newfontfamily\textskt[Script=Devanagari]{Sanskrit 2003}
\newfontfamily\textiast{Noto Serif}
% Commands for Devanagari Transliterations
\newcommand{\skt}[1]{{\textskt{#1}}}
\newcommand{\iast}[1]{{\textiast{#1}}}
\newcommand{\Iast}[1]{{\textiast{#1}}}
\newcommand{\IAST}[1]{{\textiast{#1}}}
This provides us with four commands. \skt{}
can be used to render Devanagari text. \iast{}
, \Iast{}
and \IAST{}
can be used to render devanagari text in IAST format in lower case, title case and upper case respectively. It should be noted that from the perspective of LaTeX engine, the commands \iast{}
, \Iast{}
and \IAST{}
are identical. They are just different syntactically to aid the python script to perform transliteration and apply appropriate modifications.
It should further be noted that we can define new font families and new commands for any of the valid schemes as per the requirement, which can potentially give us additional commands such \velthuis{}
, \hk{}
and so on.
Minimal Example
Equipped with these commands, and some Devanagari text, we have a minimal example as follows, stored in the file minimal.tex
,
\documentclass[10pt]{article}
% This assumes your files are encoded as UTF8
\usepackage[utf8]{inputenc}
% Devanagari Related Packages
\usepackage{fontspec, xunicode, xltxtra}
% Define Fonts
\newfontfamily\textskt[Script=Devanagari]{Sanskrit 2003}
\newfontfamily\textiast{Noto Serif}
% Commands for Devanagari Transliterations
\newcommand{\skt}[1]{{\textskt{#1}}}
\newcommand{\iast}[1]{{\textiast{#1}}}
\newcommand{\Iast}[1]{{\textiast{#1}}}
\newcommand{\IAST}[1]{{\textiast{#1}}}
\title{Transliteration of Devanagari Text}
\author{Hrishikesh Terdalkar}
\begin{document}
\maketitle
\skt{को न्वस्मिन् साम्प्रतं लोके गुणवान् कश्च वीर्यवान्।}
\iast{को न्वस्मिन् साम्प्रतं लोके गुणवान् कश्च वीर्यवान्।}
\Iast{को न्वस्मिन् साम्प्रतं लोके गुणवान् कश्च वीर्यवान्।}
\IAST{को न्वस्मिन् साम्प्रतं लोके गुणवान् कश्च वीर्यवान्।}
\end{document}
Transliteration Script
The python script is used to perform transliteration and some clean-up on the LaTeX.
python3 finalize.py minimal.tex final.tex
This result in the content being transformed in the following way,
% ...
\skt{को न्वस्मिन् साम्प्रतं लोके गुणवान् कश्च वीर्यवान्।}
\iast{ko nvasmin sāmprataṃ loke guṇavān kaśca vīryavān|}
\Iast{Ko Nvasmin Sāmprataṃ Loke Guṇavān Kaśca Vīryavān|}
\IAST{KO NVASMIN SĀMPRATAṂ LOKE GUṆAVĀN KAŚCA VĪRYAVĀN|}
% ...
We can now proceed to compile the final.tex
file.
xelatex final
This results in the following output,
Anatomy of the Transliteration Script
At the core of the transliteration script, there is a function transliterate_between
.
def transliterate_between(
text: str,
from_scheme: str,
to_scheme: str,
start_pattern: str,
end_pattern: str,
post_hook: Callable[[str], str] = lambda x: x,
) -> str:
"""Transliterate the text appearing between two patterns
Only the text appearing between patterns `start_pattern` and `end_pattern`
it transliterated.
`start_pattern` and `end_pattern` can appear multiple times in the full
text, and for every occurrence, the text between them is transliterated.
`from_scheme` and `to_scheme` should be compatible with scheme names from
`indic-transliteration`
Parameters
----------
text : str
Full text
from_scheme : str
Input transliteration scheme
to_scheme : str
Output transliteration scheme
start_pattern : regexp
Pattern describing the start tag
end_pattern : regexp
Pattern describing the end tag
post_hook : Callable[[str], str], optional
Function to be applied on the text within tags after transliteration
The default is `lambda x: x`.
Returns
-------
str
Text after replacements
"""
if from_scheme == to_scheme:
return text
def transliterate_match(matchobj):
target = matchobj.group(1)
replacement = transliterate(target, from_scheme, to_scheme)
replacement = post_hook(replacement)
return f"{start_pattern}{replacement}{end_pattern}"
pattern = "%s(.*?)%s" % (re.escape(start_pattern), re.escape(end_pattern))
return re.sub(pattern, transliterate_match, text, flags=re.DOTALL)
We can provide the start and end patterns as \iast{
and }
respsectively, to transliterate the text enclosed in these tags.
Using this function, we can write a generic function to work with any transliteration scheme.
def latex_transliteration(
input_text: str,
from_scheme: str,
to_scheme: str
) -> str:
"""Transliaterate parts of the LaTeX input enclosed in scheme tags
A scheme tag is of the form `\\to_scheme_lowercase{}` and is used
when the desired output is in `to_scheme`.
i.e.,
- Tags for IAST scheme are enclosed in \\iast{} tags
- Tags for VH scheme are enclosed in \\vh{} tags
- ...
Parameters
----------
input_text : str
Input text
from_scheme : str
Transliteration scheme of the text written within the input tags
to_scheme : str
Transliteration scheme to which the text within tags should be
transliterated
Returns
-------
str
Text after replacement of text within the scheme tags
"""
start_tag_pattern = f"\\{to_scheme.lower()}"
end_tag_pattern = "}"
return transliterate_between(
input_text,
from_scheme=from_scheme,
to_scheme=to_scheme,
start_pattern=start_tag_pattern,
end_pattern=end_tag_pattern
)
Note: The names of schemes (and therefore the corresponding LaTeX commands) have to conform to the names of schemes used
by the indic-transliteration
package.
IAST is a case-insensitive transliteration scheme, and as such, we might be interested in specific capitalization of certain words (e.g. proper nouns). We can use the post_hook
argument to provide this function. Using that, we can create a function to handle the three variants of IAST mentioned previously, namely, \iast{}
(lower), \Iast{}
(title) and \IAST{}
(upper).
def devanagari_to_iast(input_text: str) -> str:
"""Transliaterate parts of the input enclosed in
\\iast{}, \\Iast{} or \\IAST{} tags from Devanagari to IAST
Text in \\Iast{} tags also undergoes a `.title()` post-hook.
Text in \\IAST{} tags also undergoes a `.upper()` post-hook.
Parameters
----------
input_text : str
Input text
Returns
-------
str
Text after replacement of text within the IAST tags
"""
intermediate_text = transliterate_between(
input_text,
from_scheme=sanscript.DEVANAGARI,
to_scheme=sanscript.IAST,
start_pattern="\\iast{",
end_pattern="}"
)
intermediate_text = transliterate_between(
intermediate_text,
from_scheme=sanscript.DEVANAGARI,
to_scheme=sanscript.IAST,
start_pattern="\\Iast{",
end_pattern="}",
post_hook=lambda x: x.title()
)
final_text = transliterate_between(
intermediate_text,
from_scheme=sanscript.DEVANAGARI,
to_scheme=sanscript.IAST,
start_pattern="\\IAST{",
end_pattern="}",
post_hook=lambda x: x.upper()
)
return final_text
Finally, there are other utility functions to remove comments and clean excessive whitespaces.
Extras
Additionally, we may want some more structure to our setup, such as,
- Separation of ontent into multiple files
\input{sections/section_devanagari.tex}
\input{sections/section_iast_lower.tex}
\input{sections/section_iast_title.tex}
\input{sections/section_iast_upper.tex}
- Bibliography
\bibliographystyle{acm}
\bibliography{papers}
Final LaTeX Preparation
We may have used the scheme tags across multiple sections. One option is to apply the transliteration script on every section file, to create a new set of section files and use those to compile the final LaTeX file.
A simpler solution is available in the form of latexpand
which resolves the \input{}
commands to actually include the content and create a single consolidated LaTeX file.
latexpand main.tex > single.tex
Now, we can run the python script on this file to resolve the transliteration tags.
python3 finalize.py main.tex final.tex
Compilation
When working with BibTeX, we often need to multiple times to get the correct rendering of references in the PDF. Usually, this requires
xelatex final
bibtex final
xelatex final
xelatex final
Alternatively, we can use latexmk
which takes care of the tedious compilation routines and reduces our job to a single command,
latexmk -pdflatex='xelatex %O %S' -pdf -ps- -dvi- final.tex
Another benefit of using latexmk
is, we can clean the numerous files generated by LaTeX engine using a one-liner as well,
latexmk -c
Makefile
Finally, we can place all of the console commands together in a Makefile
.
all: .all
.all: main.tex sections/*.tex papers.bib
latexpand main.tex > single.tex
python3 finalize.py single.tex final.tex
latexmk -pdflatex='xelatex %O %S' -pdf -ps- -dvi- final.tex
clear:
latexmk -C
rm single.tex
rm final.tex
clean:
latexmk -c
Thus, now we can focus on writing content in the .tex
files and once we are done, simply use the command,
make
Requirements
We have made use of a number of external tools, and it is required to have these setup prior to the described solution.
Minimal Requirements
The minimal example mentioned earlier requires only three things,
- XeLaTeX (unicode support) (included in TeX Live)
- Python3
indic-transliteration
Extra Requirements
The extras have some more dependencies.
- BibTeX (optional) (bibliography support)
latexpand
(optional) (resolve\input{}
)latexmk
(optional) (simpler TeX compilation)
Devanagari Fonts
Nowadays, there are several good Devanagari fonts available. Google Fonts also provides a wide variety of Devanagari fonts.
Two of my personal favourites are,
Code
The source code for the entire setup is available at hrishikeshrt/devanagari-transliteration-latex.