Slang Spelling Detection in Indonesian Religious Content Using Sequence-to-Sequence Algorithm
DOI:
https://doi.org/10.15575/kjrt.v2i2.1110Keywords:
Religious Content, Sequence-to-Sequence, Slang, Spelling DetectionAbstract
Spelling errors are a common problem in text processing, one of which is in Indonesian. The increasing use of non-standard language, especially in digital text communication, is the background for this research. Spelling errors in sensitive religious content can even cause misunderstandings. This article examines the development of a model for spelling correction with a deep learning-based approach using Sequence-to-Sequence with a GRU-based encoder-decoder architecture and attention mechanism. A dataset containing standard and non-standard text pairs is used to test the model. The experimental results show that the proposed model produces 76.82% accuracy, but is able to recognize and correct spelling errors. However, this research is expected to contribute in the future, so that it can improve improvements in the Indonesian spelling correction system.
References
[1] M. Qulub, R. Hammad, and P. Irfan, “Improvement of Spelling Correction Accuracy in Indonesian Language through the Application of Hamming Distance Method,” 2023. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC
[2] F. Lubis et al., “Analysis of Methods to Correct Indonesian Language Spelling Errors in Thesis Writing Among Students of State University of Medan,” EDUCTUM: Journal Research, vol. 2, no. 6, pp. 5–9, Dec. 2023, doi: 10.56495/ejr.v2i6.407.
[3] A. P. S and W. J. Hartono, “Pentingnya Penggunaan Bahasa Indonesia di Perguruan Tinggi,” Jotika Journal in Education, vol. 2, no. 2, pp. 57–64, Feb. 2023, doi: 10.56445/jje.v2i2.84.
[4] R. Devianty, “Penggunaan Kata Baku dan Tidak Baku dalam Bahasa Indonesia,” Jurnal Pendidikan Bahasa Indonesia), vol. 1, no. 2, pp. 121–132, 2021, [Online]. Available: http://jurnaltarbiyah.uinsu.ac.id/index.php/eunoia/index
[5] A. M. B. Ledjap, F. P. Rochmawati, D. A. E. Marsanda, and A. P. Sari, “Pemanfaatan Natural Language Processing Untuk Pengecekan Ejaan Sesuai KBBI,” Jurnal Mahasiswa Teknik Informatika, vol. 3, no. 2, pp. 46–56, Oct. 2024, doi: 10.35473/jamastika.v3i2.3255.
[6] M. H. Ferdiansyah and I. K. D. Nuryana, “Analisis Perbandingan Metode Burkhard Keller Tree dan SymSpell dalam Spell Correction Bahasa Indonesia,” Journal of Informatics and Computer Science (JINACS), pp. 305–313, Jan. 2023, doi: 10.26740/jinacs.v4n03.p305-313.
[7] K. S. Nugroho, I. Akbar, A. N. Suksmawati, and Istiadi, “Deteksi Depresi dan Kecemasan Pengguna Twitter Menggunakan Bidirectional LSTM,” Jan. 2023.
[8] M. Khadapi and V. M. Pakpahan, “Analisis Sentimen Berbasis Jaringan LSTM dan BERT terhadap Diskusi Twitter tentang Pemilu 2024,” JUKI: Jurnal Komputer dan Informatika, vol. 6, no. 2, pp. 130–137, 2024.
[9] M. A. Syifa and D. R. S. Saputro, “Stance Detection Dengan Algoritme Gated Recurrent Unit (GRU),” in Prosiding Seminar Nasional Matematika dan Statistika, 2023, pp. 267–275.
[10] A. N. Khasanah and M. Hayaty, “Abstractive-based Automatic Text Summarization on Indonesian News Using GPT-2,” JURTEKSI (Jurnal Teknologi dan Sistem Informasi), vol. 10, no. 1, pp. 9–18, Dec. 2023, doi: 10.33330/jurteksi.v10i1.2492.
[11] M. H. Mori Hovipah, E. Hearani, J. Jasril, and F. Syafria, “Klasifikasi Clickbait Menggunakan Transformers,” Jurnal CoSciTech (Computer Science and Information Technology), vol. 4, no. 1, pp. 172–181, Apr. 2023, doi: 10.37859/coscitech.v4i1.4713.
[12] A. Bahari and K. E. Dewi, “Peringkasan Teks Otomatis Abstraktif Menggunakan Transformer Pada Teks Bahasa Indonesia,” Komputa : Jurnal Ilmiah Komputer dan Informatika, vol. 13, no. 1, pp. 83–91, Apr. 2024, doi: 10.34010/komputa.v13i1.11197.
[13] M. I. Yahya, Arini, V. Amrizal, I. M. M. Matin, and D. Khairani, “Spelling Correction Using the Levenshtein Distance and Nazief and Adriani Algorithm for Keyword Search Process Indonesian Qur’an Translation,” in 2022 Seventh International Conference on Informatics and Computing (ICIC), IEEE, Dec. 2022, pp. 01–06. doi: 10.1109/ICIC56845.2022.10006994.
[14] A. Musyafa, Y. Gao, A. Solyman, C. Wu, and S. Khan, “Automatic Correction of Indonesian Grammatical Errors Based on Transformer,” Applied Sciences, vol. 12, no. 20, p. 10380, Oct. 2022, doi: 10.3390/app122010380.
[15] Muhammad zaky ramadhan and Kemas Muslim Lhaksmana, “Improving Document Retrieval with Spelling Correction for Weak and Fabricated Indonesian-Translated Hadith,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 3, pp. 551–557, Jun. 2020, doi: 10.29207/resti.v4i3.1913.
[16] R. Martin, D. S. Naga, and V. C. Mawardi, “Penggunaan Spelling Correction Dengan Metode Peter Norvig dan N-Gram,” Jurnal Ilmu Komputer dan Sistem Informasi, vol. 9, no. 1, p. 175, Jan. 2021, doi: 10.24912/jiksi.v9i1.11591.
[17] H. Sujaini, H. Muhardi, and J. H. Simanjuntak, “Aplikasi Pengoreksi Ejaan (Spelling Correction) pada Naskah Jurnal Bidang Informatika dengan N-Gram dan Jaro-Winkler Distance,” Jurnal Edukasi dan Penelitian Informatika (JEPIN), vol. 8, no. 2, p. 235, Aug. 2022, doi: 10.26418/jp.v8i2.48092.
[18] E. Erwina, T. Tommy, and M. Mayasari, “Indonesian Spelling Error Detection and Type Identification Using Bigram Vector and Minimum Edit Distance Based Probabilities,” SinkrOn, vol. 6, no. 1, pp. 183–190, Nov. 2021, doi: 10.33395/sinkron.v6i1.11224.
[19] Y. Yanfi, H. Soeparno, R. Setiawan, and W. Budiharto, “Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language,” IEEE Access, vol. 12, pp. 188560–188571, 2024, doi: 10.1109/ACCESS.2024.3422318.
[20] M. Salhab and F. Abu-Khzam, “AraSpell: A Deep Learning Approach for Arabic Spelling Correction,” Jun. 02, 2023. doi: 10.21203/rs.3.rs-2974359/v1.
[21] Sa. Kasmaiee, Si. Kasmaiee, and M. Homayounpour, “Correcting spelling mistakes in Persian texts with rules and deep learning methods,” Sci Rep, vol. 13, no. 1, p. 19945, Nov. 2023, doi: 10.1038/s41598-023-47295-2.
[22] Y.-C. Chao and C.-H. Chang, “Automatic Spelling Correction for ASR Corpus in Traditional Chinese Language using Seq2Seq Models,” in 2020 International Computer Symposium (ICS), IEEE, Dec. 2020, pp. 553–558. doi: 10.1109/ICS51289.2020.00113.
[23] O. Büyük, “Context-Dependent Sequence-to-Sequence Turkish Spelling Correction,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 19, no. 4, pp. 1–16, Jul. 2020, doi: 10.1145/3383200.
[24] O. Büyük and L. M. Arslan, “Learning from mistakes: Improving spelling correction performance with automatic generation of realistic misspellings,” Expert Syst, vol. 38, no. 5, Aug. 2021, doi: 10.1111/exsy.12692.
[25] E. Egonmwan and Y. Chali, “Transformer and seq2seq model for Paraphrase Generation,” in Proceedings of the 3rd Workshop on Neural Generation and Translation, Stroudsburg, PA, USA: Association for Computational Linguistics, 2019, pp. 249–255. doi: 10.18653/v1/D19-5627.
[26] Y. Zhang, “Encoder-decoder models in sequence-to-sequence learning: A survey of RNN and LSTM approaches,” Applied and Computational Engineering, vol. 22, no. 1, pp. 218–226, Oct. 2023, doi: 10.54254/2755-2721/22/20231220.
[27] J. Ismail, A. Ahmed, and E. ouaazizi Aziza, “Improving a Sequence-to-sequence NLP Model using a Reinforcement Learning Policy Algorithm,” in Artificial Intelligence, Soft Computing and Applications, Academy and Industry Research Collaboration Center (AIRCC), Dec. 2022, pp. 221–231. doi: 10.5121/csit.2022.122317.
[28] F. Bous, L. Benaroya, N. Obin, and A. Roebel, “Voice Reenactment with F0 and timing constraints and adversarial learning of conversions,” Oct. 2021.
[29] J. Li, R. Chen, and X. Huang, “A sequence-to-sequence remaining useful life prediction method combining unsupervised LSTM encoding-decoding and temporal convolutional network,” Meas Sci Technol, vol. 33, no. 8, p. 085013, Aug. 2022, doi: 10.1088/1361-6501/ac632d.
[30] M. A. Hasanah, S. Soim, and A. S. Handayani, “Implementasi CRISP-DM Model Menggunakan Metode Decision Tree dengan Algoritma CART untuk Prediksi Curah Hujan Berpotensi Banjir,” Journal of Applied Informatics and Computing, vol. 5, no. 2, pp. 103–108, Oct. 2021, doi: 10.30871/jaic.v5i2.3200.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Nanda Tiara Sabina Hidayatulloh, Nur Halizah, Pitriani
This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.