Framework for Question-Answering in Sanskrit through Automated Construction of Knowledge Graphs

Hrishikesh Terdalkar, Arnab Bhattacharya

October, 2019

Abstract

Sanskrit (Saṃskṛta) enjoys one of the largest and most varied literature in the whole world. Extracting the knowledge from it, however, is a challenging task due to multiple reasons including complexity of the language and paucity of standard natural language processing tools. In this paper, we target the problem of building knowledge graphs for particular types of relationships from Saṃskṛta texts. We build a natural language question-answering system in Saṃskṛta that uses the knowledge graph to answer factoid questions. We design a framework for the overall system and implement two separate instances of the system on human relationships from Mahābhārata and Rāmāyaṇa, and one instance on synonymous relationships from Bhāvaprakāśa Nighaṇṭu, a technical text from Āyurveda. We show that about 50% of the factoid questions can be answered correctly by the system. More importantly, we analyse the shortcomings of the system in detail for each step, and discuss the possible ways forward.

Type

Conference paper

Publication

Proceedings of the 6th International Sanskrit Computational Linguistics Symposium

Hrishikesh Terdalkar

Assistant Professor

My research lies at the intersection of Computational Linguistics, Natural Language Processing, and Knowledge Graphs with a particular emphasis on low-resource languages such as Sanskrit and other Indian languages. My recent work has focused on building datasets, models, benchmarks, and evaluation frameworks grounded in linguistic structure. My interests also include Artificial Intelligence, Information Retrieval, Human-Computer Interaction, and Data Mining.