Papers by Leander Chris Dsouza

Konkani was influenced by Portuguese due to centuries of colonialism in Goa, thus borrowing Portu... more Konkani was influenced by Portuguese due to centuries of colonialism in Goa, thus borrowing Portuguese vocabularies. This paper proposes a character-level machine learning approach to identify Portuguese loanwords in Konkani. A dataset of both Portuguese loanwords and native Konkani words was made where Character n-grams (2gram to 4-gram) were extracted using bag-of-characters, capturing orthographic and morphological patterns. Logistic Regression and Support Vector Machine models were trained and evaluated. Character-level features differentiate Portuguese loanwords from native Konkani words, even with a small dataset. Feature interpretability analysis showed that specific character sequences correlate with Portuguese lexical patterns. Error analysis points out the challenges from phonological adaptation and shared orthographic structures. This gives a method to detect loanwords and to document language, historical linguistics, and to preserve culture.
Uploads
Papers by Leander Chris Dsouza