Aganittyam: Learning Tamil Grammar through Knowledge Graph based Templatized Question Answering

Abstract

In this work, we introduce a novel Grammar Question-Answering System (Aganittyam) and its associated corpus on the dravidian language Tamil. It is one of the oldest surviving languages, with a documented history spanning over 2,000 years. Tamil is a classical language and it is official in three countries including India and various diasporic communities around the world speak it. Learning Tamil Grammar is still challenging due to its Agglutination and Complex Morphology. We created a Tamil Grammar Corpus focusing all kinds of learners and manually annotated the corpus since automated tools are not efficient enough. This made us to create a ontology on entity types and relationship types for the same. We identified entities and relationships and store the resultant triplets (subject-predicate–object) as a Knowledge Graph (KG) consisting of 63,587 entities. We also developed a framework for templatized Question-Answering along with it. We performed bi-fold evaluation (Query metrics and Human-Centric based) with thorough experimentation and show that our QA system is robust, reliable and fun in answering various objective questions.

Publication
The 38th Pacific Asia Conference on Language, Information and Computation, December 2024
Hrishikesh Terdalkar
Hrishikesh Terdalkar
Postdoctoral Researcher

My research lies in the intersection of Computational Linguistics, Natural Language Processing, and Graph Databases with a particular emphasis on low-resource languages such as Sanskrit and other Indian languages. I am committed to pioneering NLP innovations that have a real-world impact. I enjoy building user-friendly GUIs and CLIs for various applications. My interests also include Artificial Intelligence, Databases, Human-Computer Interaction, Information Retrieval, and Data Mining.