Document Type : Research Paper
Author
Jimma University, Ethiopia
Abstract
Effective communication through digital platforms often faces issues like misspellings and inefficient typing. A next-word prediction system that suggests probable words can significantly enhance sentence construction, especially for Afaan Oromo - a Cushitic language spoken by over 41.7 million people in Ethiopia. Despite its importance as the official language of Oromia and its complex linguistic features, Afaan Oromo lacks advanced digital tools. This study evaluates various deep learning models, including Long Short Term Memory (LSTM), Attention-based LSTM, Bidirectional LSTM (Bi-LSTM), Attention-based Bi-LSTM, and Recurrent Neural Network (RNN), to determine the most accurate model for Afaan Oromo next word generation. Our methodology involves developing and benchmarking these models using a comprehensive dataset of 201,538 words sourced from various media, academic literature, and religious texts. The Attention-driven Bi-LSTM model emerged as the most effective, achieving an accuracy of 95.0% and a low loss value of 0.27. These findings highlight the potential of the Attention-driven Bi-LSTM model to improve the next word generation for Afaan Oromo texts. This advancement addresses specific linguistic challenges and enhances the overall digital interaction experience for Afaan Oromo speakers.
Keywords