Slang Spelling Detection in Indonesian Religious Content Using Sequence-to-Sequence Algorithm

Nanda Tiara Sabina Hidayatulloh; Nur Halizah; Pitriani

doi:10.15575/kjrt.v2i2.1110

Slang Spelling Detection in Indonesian Religious Content Using Sequence-to-Sequence Algorithm

Authors

Nanda Tiara Sabina Hidayatulloh Reaksi Community, Bandung
Nur Halizah Community of Arunika Mengabdi, Bandung
Pitriani Community of Arunika Mengabdi, Bandung

DOI:

https://doi.org/10.15575/kjrt.v2i2.1110

Keywords:

Religious Content, Sequence-to-Sequence, Slang, Spelling Detection

Abstract

Spelling errors are a common problem in text processing, one of which is in Indonesian. The increasing use of non-standard language, especially in digital text communication, is the background for this research. Spelling errors in sensitive religious content can even cause misunderstandings. This article examines the development of a model for spelling correction with a deep learning-based approach using Sequence-to-Sequence with a GRU-based encoder-decoder architecture and attention mechanism. A dataset containing standard and non-standard text pairs is used to test the model. The experimental results show that the proposed model produces 76.82% accuracy, but is able to recognize and correct spelling errors. However, this research is expected to contribute in the future, so that it can improve improvements in the Indonesian spelling correction system.

References

[1] M. Qulub, R. Hammad, and P. Irfan, “Improvement of Spelling Correction Accuracy in Indonesian Language through the Application of Hamming Distance Method,” 2023. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC

[2] F. Lubis et al., “Analysis of Methods to Correct Indonesian Language Spelling Errors in Thesis Writing Among Students of State University of Medan,” EDUCTUM: Journal Research, vol. 2, no. 6, pp. 5–9, Dec. 2023, doi: 10.56495/ejr.v2i6.407.

[3] A. P. S and W. J. Hartono, “Pentingnya Penggunaan Bahasa Indonesia di Perguruan Tinggi,” Jotika Journal in Education, vol. 2, no. 2, pp. 57–64, Feb. 2023, doi: 10.56445/jje.v2i2.84.

[4] R. Devianty, “Penggunaan Kata Baku dan Tidak Baku dalam Bahasa Indonesia,” Jurnal Pendidikan Bahasa Indonesia), vol. 1, no. 2, pp. 121–132, 2021, [Online]. Available: http://jurnaltarbiyah.uinsu.ac.id/index.php/eunoia/index

[5] A. M. B. Ledjap, F. P. Rochmawati, D. A. E. Marsanda, and A. P. Sari, “Pemanfaatan Natural Language Processing Untuk Pengecekan Ejaan Sesuai KBBI,” Jurnal Mahasiswa Teknik Informatika, vol. 3, no. 2, pp. 46–56, Oct. 2024, doi: 10.35473/jamastika.v3i2.3255.

[6] M. H. Ferdiansyah and I. K. D. Nuryana, “Analisis Perbandingan Metode Burkhard Keller Tree dan SymSpell dalam Spell Correction Bahasa Indonesia,” Journal of Informatics and Computer Science (JINACS), pp. 305–313, Jan. 2023, doi: 10.26740/jinacs.v4n03.p305-313.

[7] K. S. Nugroho, I. Akbar, A. N. Suksmawati, and Istiadi, “Deteksi Depresi dan Kecemasan Pengguna Twitter Menggunakan Bidirectional LSTM,” Jan. 2023.

[8] M. Khadapi and V. M. Pakpahan, “Analisis Sentimen Berbasis Jaringan LSTM dan BERT terhadap Diskusi Twitter tentang Pemilu 2024,” JUKI: Jurnal Komputer dan Informatika, vol. 6, no. 2, pp. 130–137, 2024.

[9] M. A. Syifa and D. R. S. Saputro, “Stance Detection Dengan Algoritme Gated Recurrent Unit (GRU),” in Prosiding Seminar Nasional Matematika dan Statistika, 2023, pp. 267–275.

[10] A. N. Khasanah and M. Hayaty, “Abstractive-based Automatic Text Summarization on Indonesian News Using GPT-2,” JURTEKSI (Jurnal Teknologi dan Sistem Informasi), vol. 10, no. 1, pp. 9–18, Dec. 2023, doi: 10.33330/jurteksi.v10i1.2492.

[11] M. H. Mori Hovipah, E. Hearani, J. Jasril, and F. Syafria, “Klasifikasi Clickbait Menggunakan Transformers,” Jurnal CoSciTech (Computer Science and Information Technology), vol. 4, no. 1, pp. 172–181, Apr. 2023, doi: 10.37859/coscitech.v4i1.4713.

[12] A. Bahari and K. E. Dewi, “Peringkasan Teks Otomatis Abstraktif Menggunakan Transformer Pada Teks Bahasa Indonesia,” Komputa : Jurnal Ilmiah Komputer dan Informatika, vol. 13, no. 1, pp. 83–91, Apr. 2024, doi: 10.34010/komputa.v13i1.11197.

[13] M. I. Yahya, Arini, V. Amrizal, I. M. M. Matin, and D. Khairani, “Spelling Correction Using the Levenshtein Distance and Nazief and Adriani Algorithm for Keyword Search Process Indonesian Qur’an Translation,” in 2022 Seventh International Conference on Informatics and Computing (ICIC), IEEE, Dec. 2022, pp. 01–06. doi: 10.1109/ICIC56845.2022.10006994.

[14] A. Musyafa, Y. Gao, A. Solyman, C. Wu, and S. Khan, “Automatic Correction of Indonesian Grammatical Errors Based on Transformer,” Applied Sciences, vol. 12, no. 20, p. 10380, Oct. 2022, doi: 10.3390/app122010380.

[15] Muhammad zaky ramadhan and Kemas Muslim Lhaksmana, “Improving Document Retrieval with Spelling Correction for Weak and Fabricated Indonesian-Translated Hadith,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 3, pp. 551–557, Jun. 2020, doi: 10.29207/resti.v4i3.1913.

[16] R. Martin, D. S. Naga, and V. C. Mawardi, “Penggunaan Spelling Correction Dengan Metode Peter Norvig dan N-Gram,” Jurnal Ilmu Komputer dan Sistem Informasi, vol. 9, no. 1, p. 175, Jan. 2021, doi: 10.24912/jiksi.v9i1.11591.

[17] H. Sujaini, H. Muhardi, and J. H. Simanjuntak, “Aplikasi Pengoreksi Ejaan (Spelling Correction) pada Naskah Jurnal Bidang Informatika dengan N-Gram dan Jaro-Winkler Distance,” Jurnal Edukasi dan Penelitian Informatika (JEPIN), vol. 8, no. 2, p. 235, Aug. 2022, doi: 10.26418/jp.v8i2.48092.

[18] E. Erwina, T. Tommy, and M. Mayasari, “Indonesian Spelling Error Detection and Type Identification Using Bigram Vector and Minimum Edit Distance Based Probabilities,” SinkrOn, vol. 6, no. 1, pp. 183–190, Nov. 2021, doi: 10.33395/sinkron.v6i1.11224.

[19] Y. Yanfi, H. Soeparno, R. Setiawan, and W. Budiharto, “Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language,” IEEE Access, vol. 12, pp. 188560–188571, 2024, doi: 10.1109/ACCESS.2024.3422318.

[20] M. Salhab and F. Abu-Khzam, “AraSpell: A Deep Learning Approach for Arabic Spelling Correction,” Jun. 02, 2023. doi: 10.21203/rs.3.rs-2974359/v1.

[21] Sa. Kasmaiee, Si. Kasmaiee, and M. Homayounpour, “Correcting spelling mistakes in Persian texts with rules and deep learning methods,” Sci Rep, vol. 13, no. 1, p. 19945, Nov. 2023, doi: 10.1038/s41598-023-47295-2.

[22] Y.-C. Chao and C.-H. Chang, “Automatic Spelling Correction for ASR Corpus in Traditional Chinese Language using Seq2Seq Models,” in 2020 International Computer Symposium (ICS), IEEE, Dec. 2020, pp. 553–558. doi: 10.1109/ICS51289.2020.00113.

[23] O. Büyük, “Context-Dependent Sequence-to-Sequence Turkish Spelling Correction,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 19, no. 4, pp. 1–16, Jul. 2020, doi: 10.1145/3383200.

[24] O. Büyük and L. M. Arslan, “Learning from mistakes: Improving spelling correction performance with automatic generation of realistic misspellings,” Expert Syst, vol. 38, no. 5, Aug. 2021, doi: 10.1111/exsy.12692.

[25] E. Egonmwan and Y. Chali, “Transformer and seq2seq model for Paraphrase Generation,” in Proceedings of the 3rd Workshop on Neural Generation and Translation, Stroudsburg, PA, USA: Association for Computational Linguistics, 2019, pp. 249–255. doi: 10.18653/v1/D19-5627.

[26] Y. Zhang, “Encoder-decoder models in sequence-to-sequence learning: A survey of RNN and LSTM approaches,” Applied and Computational Engineering, vol. 22, no. 1, pp. 218–226, Oct. 2023, doi: 10.54254/2755-2721/22/20231220.

[27] J. Ismail, A. Ahmed, and E. ouaazizi Aziza, “Improving a Sequence-to-sequence NLP Model using a Reinforcement Learning Policy Algorithm,” in Artificial Intelligence, Soft Computing and Applications, Academy and Industry Research Collaboration Center (AIRCC), Dec. 2022, pp. 221–231. doi: 10.5121/csit.2022.122317.

[28] F. Bous, L. Benaroya, N. Obin, and A. Roebel, “Voice Reenactment with F0 and timing constraints and adversarial learning of conversions,” Oct. 2021.

[29] J. Li, R. Chen, and X. Huang, “A sequence-to-sequence remaining useful life prediction method combining unsupervised LSTM encoding-decoding and temporal convolutional network,” Meas Sci Technol, vol. 33, no. 8, p. 085013, Aug. 2022, doi: 10.1088/1361-6501/ac632d.

[30] M. A. Hasanah, S. Soim, and A. S. Handayani, “Implementasi CRISP-DM Model Menggunakan Metode Decision Tree dengan Algoritma CART untuk Prediksi Curah Hujan Berpotensi Banjir,” Journal of Applied Informatics and Computing, vol. 5, no. 2, pp. 103–108, Oct. 2021, doi: 10.30871/jaic.v5i2.3200.

Slang Spelling Detection in Indonesian Religious Content Using Sequence-to-Sequence Algorithm

Slang Spelling Detection in Indonesian Religious Content Using Sequence-to-Sequence Algorithm

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

ABOUT THE JOURNAL

POLICIES

AUTHORS

Our Commitment