Islamic Chatbot Based on Reinforcement Learning Using Q-Learning Algorithm
DOI:
https://doi.org/10.15575/kjrt.v3i2.1731Keywords:
Chatbot, Q-Learning, Reinforcement Learning, sentence-transformersAbstract
This study develops a retrieval-based chatbot using a Reinforcement Learning (RL) approach with a Q-Learning algorithm combined with a Sentence-Transformer (SBERT) model to understand the semantic context of user questions. The system is designed to map questions into vector representations, calculate meaning similarity with training data, and select answers based on learned Q-values. The dataset used consists of question-answer pairs in JSON format. The training process is carried out in a question-and-answer simulation environment, where the RL agent is rewarded based on the suitability of the selected answer. Test results show that the chatbot is able to provide relevant and contextual responses even though the sentence structure differs from the training data. To improve accessibility, the system is packaged as a REST API using Flask and integrated into a Flutter-based mobile application as a user interface. This approach has proven to be efficient and computationally lightweight, and offers a promising alternative for developing responsive retrieval-based educational chatbots.
References
[1] E. Solomon and S. L. Tilahun, “Rule based chatbot design methods: A review,” Journal of Computational Science and Data Analytics, vol. 01, no. 1, pp. 75–84, Sep. 2024, doi: 10.69660/jcsda.01012405.
[2] A. Lommatzsch, B. Llanque, V. S. Rosenberg, S. A. M. Tahir, H. D. Boyadzhiev, and M. Walny, “Combining Information Retrieval and Large Language Models for a Chatbot that Generates Reliable, Natural-style Answers.,” in LWDA, 2023, pp. 298–310.
[3] H.-Y. Lin, “Large-Scale Artificial Intelligence Models,” Computer (Long Beach. Calif)., vol. 55, no. 5, pp. 76–80, May 2022, doi: 10.1109/MC.2022.3151419.
[4] D. Cohen et al., “Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning,” Jul. 2022.
[5] R. Horev, “BERT Explained: State of the art language model for NLP,” towardsdatascience.com. Accessed: Dec. 04, 2021. [Online]. Available: https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270
[6] V. Raina and S. Krishnamurthy, “Natural Language Processing,” in Building an Effective Data Science Practice, Berkeley, CA: Apress, 2022, pp. 63–73. doi: 10.1007/978-1-4842-7419-4_6.
[7] K. R. Chowdhary, “Natural Language Processing,” in Fundamentals of Artificial Intelligence, New Delhi: Springer India, 2020, pp. 603–649. doi: 10.1007/978-81-322-3972-7_19.
[8] Smt. Jeetbala, Dr. D. Singh, and K. D. Goyal, “Natural Language Processing (NLP),” in Enhancing Cybersecurity with Machine Learning: A Data-Driven Approach to Detect and Mitigate Threats, Iterative International Publishers, Selfypage Developers Pvt Ltd, 2025, pp. 46–59. doi: 10.58532/nbennurtech6.
[9] A. Malik, A. P. Gefadri, E. Sidik, and A. P. Syadrina, “SoulScripture: Chatbot using Bidirectional Encoder Representations from Transformers as a Medium of Spiritual Guidance,” Khazanah Journal of Religion and Technology, vol. 2, no. 1, pp. 23–27, Aug. 2024, doi: 10.15575/kjrt.v2i1.822.
[10] F. R. A. Sutiyo, N. S. Harahap, S. Agustian, and R. M. Candra, “Implementasi Question Answering Berbasis Chatbot Telegram Pada Tafsir Al-Jalalain Menggunakan Langchain dan LLM,” KLIK: Kajian Ilmiah Informatika dan Komputer, vol. 4, no. 5, pp. 2464–2472, 2024.
[11] R. Shah, S. Lahoti, and K. Lavanya, “An intelligent chat-bot using natural language processing,” International Journal of Engineering Research, vol. 6, no. 5, p. 281, 2017, doi: 10.5958/2319-6890.2017.00019.8.
[12] R. N. E. Anggraini, D. Tursina, and R. Sarno, “Islamic QA with Chatbot System Using Convolutional Neural Network,” Iraqi Journal of Science, pp. 2232–2241, Apr. 2024, doi: 10.24996/ijs.2024.65.4.38.
[13] N. Esfandiari, K. Kiani, and R. Rastgoo, “Transformer-based generative chatbot using reinforcement learning,” Journal of AI and Data Mining, vol. 12, no. 3, pp. 349–358, 2024.
[14] J. Liu, F. Pan, and L. Luo, “GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning,” May 2020.
[15] R. Wirth and J. Hipp, “CRISP-DM: Towards a standard process model for data mining,” in Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining, Springer-Verlag London, UK, 2000.
[16] N. Oralbayeva et al., “K-Qbot: Language Learning Chatbot Based on Reinforcement Learning,” in 2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI), IEEE, Mar. 2022, pp. 963–967. doi: 10.1109/HRI53351.2022.9889428.
[17] M. B. Lone et al., “Self-Learning Chatbots using Reinforcement Learning,” in 2022 3rd International Conference on Intelligent Engineering and Management (ICIEM), IEEE, Apr. 2022, pp. 802–808. doi: 10.1109/ICIEM54221.2022.9853156.
[18] R. B. Yousif, M. G. Abd Alkreem, and A. B. Yousif, “Personalized Chatbot Responses using Reinforcement Learning and User Modeling,” Journal of Education for Pure Science, vol. 14, no. 4, Dec. 2024, doi: 10.32792/jeps.v14i3.462.
[19] A. Onishi, “FRAC-Q-Learning: A Reinforcement Learning with Boredom Avoidance Processes for Social Robots,” Nov. 2024.
[20] X. Wu, “Enhancing Q-Learning with Large Language Model Heuristics,” May 2024.
[21] P. K. R et al., “Deep Reinforcement Learning for natural language understanding and Dialogue Systems,” in 2023 6th International Conference on Recent Trends in Advance Computing (ICRTAC), IEEE, Dec. 2023, pp. 736–741. doi: 10.1109/ICRTAC59277.2023.10480771.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Aria Octavian Hamza, Denis Firmansyah, Abidzar Giffari, Aisyah Muthmainnah, Adil Zukhruf Firdaus

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.