News Science & Technology

Campus Talk: How a software designed by IIT-B is playing a key role in translating academic books into Indian languages


With National Education Policy (NEP) laying heavy emphasis on higher education in Indian languages, Project Udaan – an Artificial Intelligence (AI)-based translation software developed at the Indian Institute of Technology (IIT), Bombay — is playing a pivotal role in making the resources available in various Indian languages to the students.

Ganesh Ramakrishnan, Institute Chair Professor, Department of Computer Science and Engineering, IIT Bombay, who is leading Project Udaan, said, “It is a complete ecosystem for machine translation aided by human effort. It involves development and adoption of domain-specific vocabulary of more than 5 million words across 11 languages. This reduces the number of edits by the translator.”

Armed with dictionaries of multiple Indian languages such as Hindi, Marathi, Bengali, Gujarati, Kannada among others, this end-to-end machine translation and post-editing ecosystem has become a major translation facilitator for curriculum textbooks of different higher education courses.

Explaining how the system works, Ramakrishnan said, “It begins with digitisation of the input source material, perhaps a textbook which may be available in any format currently. The digitisation internally invokes OCR (Optical Character Recognisation) if the input is not machine-readable (such as scanned pages). Then, the translation engine works guided by technical domain-specific dictionaries which can be dynamically inserted. Our output from the translation engine in conjunction with our post-editing tool helps the publishing house bring the final output in less than 1/6th the time it would take otherwise.” All these features have resulted in Udaan being awarded the best demo paper at Cods-Comad 2023, a premier ACM international conference focusing on scientific work in Databases, Data Sciences and their applications, according to Ramakrishnan.

“We have also built a large open-source platform called https://decile.org/ which is playing a critical role in human in the loop learning in the post-editing framework. Through a team of translators who work in close coordination, we have evolved the tool to be as publisher-friendly as possible through features such as preservation of alignment between the original source and translation, tools for online vs offline editing, among others,” he said.

Project Udaan is already working with the All India Council for Technical Education (AICTE) and is in the process of translating engineering and technology textbook. It is being used for translation of second- and third-year engineering textbooks into 12 Indian languages as well as all 20 textbooks of first year in Malayalam.

Other professional as well as traditional courses in streams such as commerce, science, pharmacy and management will also be included in the project soon.

Last week, Maharashtra government signed a Memorandum of Understanding (MoU) with a team from IIT Bombay for the purpose of translation of textbooks from English to Marathi.

Details about project Udaan can be found on https://udaanproject.org/





Source link