Improving POS Tagging for Singlish via Data Weighting
DOI: 10.23977/langl.2026.090104 | Downloads: 2 | Views: 28
Author(s)
Chaojie Lin 1, Xiaoxi Luo 2
Affiliation(s)
1 Trinity College School, Cambridge, Ontario, Canada
2 University of Waterloo, Waterloo, Ontario, Canada
Corresponding Author
Xiaoxi LuoABSTRACT
Singlish, or Colloquial Singapore English, is an English-based contact language influenced by multiple substrate languages, including Malay, Tamil, and Southern Chinese varieties. Its mixed grammatical patterns and vocabulary pose significant challenges for standard NLP tools, particularly part-of-speech (POS) tagging. In this study, we investigate whether a simple data-centric strategy—up-weighting Singlish training data while including Standard English UD examples—can improve POS tagging performance without complex architectures. Using an averaged perceptron tagger, we show that the weighted training setup achieves higher accuracy than the Singlish-only baseline, reduces error variance, and successfully captures Chinese-derived grammatical structures. Error analysis indicates that most tagging errors arise from POS polysemy rather than code-mixing, highlighting the effectiveness of data weighting in low-resource settings. Our results suggest that careful data design alone can yield meaningful improvements for processing creole and contact languages.
KEYWORDS
Singlish, Part-of-Speech Tagging, Low-Resource Languages, Perceptron Tagger, Data WeightingCITE THIS PAPER
Chaojie Lin, Xiaoxi Luo. Improving POS Tagging for Singlish via Data Weighting. Lecture Notes on Language and Literature (2026). Vol. 9, No.1, 27-32. DOI: http://dx.doi.org/10.23977/langl.2026.090104.
REFERENCES
[1] Bao, Z. (2005). The aspectual system of Singapore English and the systemic substratist explanation. Journal of Linguistics, 41(2), 237–267. https://doi.org/10.1017/S0022226705003269
[2] Gupta, A. F. (1992). The pragmatic particles of Singapore Colloquial English. Journal of Pragmatics, 18(1), 31–57. https://doi.org/10.1016/0378-2166(92)90106-L
[3]Honnibal, M., & Johnson, M. (2015). An improved non-monotonic transition system for dependency parsing. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1373–1378). Association for Computational Linguistics.
[4] Jurafsky, D., & Martin, J. H. (2025). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition with language models (3rd ed.). Online manuscript. Stanford University.
[5] Lim, L. (2004). Singapore English. John Benjamins Publishing Company. https://doi.org/10.1075/veaw.g33
[6] Wang, H., Yang, J., & Zhang, Y. (2019). From genesis to creole language. ACM Transactions on Asian and Low-Resource Language Information Processing, 19(1), 1–29. https://doi.org/10.1145/3321128
[7] Wang, H., Zhang, Y., Chan, G. L., Yang, J., & Chieu, H. L. (2017). Universal dependencies parsing for colloquial Singaporean English. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1732–1744). Association for Computational Linguistics.
| Downloads: | 56663 |
|---|---|
| Visits: | 1235348 |
Sponsors, Associates, and Links
-
Journal of Language Testing & Assessment
-
Information and Knowledge Management
-
Military and Armament Science
-
Media and Communication Research
-
Journal of Human Movement Science
-
Art and Performance Letters
-
Lecture Notes on History
-
Philosophy Journal
-
Science of Law Journal
-
Journal of Political Science Research
-
Journal of Sociology and Ethnology
-
Advances in Broadcasting

Download as PDF