Cyberbullying Detection in Indonesian TikTok Comments Using IndoBERT with Fairness Evaluation
DOI:
https://doi.org/10.63158/journalisi.v8i1.1448Keywords:
Content Moderation, Cyberbullying Detection, Fairness Evaluation, IndoBERT, TikTok CommentsAbstract
This study investigates automated cyberbullying detection on TikTok within the Indonesian digital context, where high social media usage among children and adolescents demands scalable and consistent content moderation. We propose an IndoBERT-based framework for detecting and classifying cyberbullying in Indonesian-language TikTok comments, incorporating algorithmic fairness considerations. A dataset of 2,122 TikTok comments was collected from a publicly available Kaggle repository and divided into training, validation, and testing sets using a 70:15:15 stratified sampling ratio. The IndoBERT-base-p1 model was fine-tuned with the PyTorch and HuggingFace frameworks, optimizing hyperparameters like the AdamW optimizer and learning rate scheduling. Experimental results show that the model achieved an accuracy of 70.66% and a ROC-AUC score of 0.7969, demonstrating solid discriminative power. With a macro F1-score of 0.7066 and a cyberbullying recall of 0.7170, the model shows balanced performance in identifying harmful content. A key contribution of this study is a fairness evaluation framework that reveals an accuracy gap of 2.08% and an equal opportunity gap of 0.0208, indicating overall fairness. However, demographic parity remains a concern. This system, supporting content triage combined with human review, enhances moderation workflows by filtering non-cyberbullying cases while flagging potentially harmful content for human oversight.
Downloads
References
[1] H. Dwistia, M. Sajdah, O. Awaliah, and N. Elfina, “Pemanfaatan Media Sosial Sebagai Media Pembelajaran Pendidikan Agama Islam,” Ar-Rusyd: Jurnal Pendidikan Agama Islam, vol. 1, no. 2, pp. 81–99, 2022, doi: 10.61094/Arrusyd.2830-2281.33.
[2] D. McCashin and C. M. Murphy, “Using Tiktok for Public and Youth Mental Health – A Systematic Review and Content Analysis,” Clin. Child Psychol. Psychiatry, vol. 28, no. 1, pp. 279–306, 2023, doi: 10.1177/13591045221106608.
[3] D. Keasaman, D. I. Pelabuhan Pengasinan, P. Jakarta, and Y. Mariah, “Jurnal Indonesia Sosial Sains,” Jurnal Indonesia Sosial Sains, vol. 2, no. 3, p. 494, 2021.
[4] N. Rokhman, P. A. Maulan, and N. A. Wirahuda, “Analisis Penilaian Esai Secara Otomatis Menggunakan Natural Language Processing (NLP) dan Cosine Similarity,” Go Infotech: Jurnal Ilmiah Stmik Aub, vol. 31, no. 1, pp. 41–52, 2025, doi: 10.36309/Goi.V31i1.359.
[5] T. Nugraha Manoppo and D. Hatta Fudholi, “Deteksi Cyberbullying Berdasarkan Unsur Perbuatan Pidana Yang Dilanggar Dengan Naive Bayes Dan Support Vector Machine,” Jurnal Sains Komputer & Informatika (J-Sakti), vol. 5, no. 1, pp. 10–19, 2021.
[6] F. Muftie, K. M. Yafi, and Q. M. Addina, “Perbandingan Performa Deteksi Cyberbullying Dengan Transformer, Deep Learning, Dan Machine Learning,” Jurnal Pendidikan Informatika Dan Sains, vol. 13, no. 1, pp. 75–87, 2024, doi: 10.31571/Saintek.V13i1.4002.
[7] G. Z. Nabiilah, I. N. Alam, E. S. Purwanto, and M. F. Hidayat, “Indonesian Multilabel Classification Using Indobert Embedding and Mbert Classification,” International Journal of Electrical and Computer Engineering, vol. 14, no. 1, pp. 1071–1078, 2024, doi: 10.11591/Ijece.V14i1.Pp1071-1078.
[8] C. Denis, R. Elie, M. Hebiri, and F. Hu, “Fairness Guarantees in Multi-Class Classification with Demographic Parity,” Journal of Machine Learning Research, vol. 25, pp. 1–46, 2024.
[9] A. Kurniasih and L. P. Manik, “On the Role of Text Preprocessing in Bert Embedding-Based Dnns for Classifying Informal Texts,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 6, pp. 927–934, 2022, doi: 10.14569/Ijacsa.2022.01306109.
[10] Hushian, “Cyberbullying Bahasa Indonesia, With Slang,” [Online]. Available: https://www.kaggle.com/Datasets/Hushian/Cyberbullying-Dataset-With-Slang
[11] E. Küzeci, “Personal Data Protection Law,” Introduction to Turkish Business Law, no. 016999, pp. 457–483, 2022.
[12] D. Rifaldi, Abdul Fadlil, and Herman, “Teknik Preprocessing pada Text Mining Menggunakan Data Tweet ‘Mental Health,’” Jurnal Pendidikan Teknologi Informasi, vol. 3, no. 2, pp. 161–171, 2023.
[13] A. A. Khan, “Balanced Split: A New Train-Test Data Splitting Strategy for Imbalanced Datasets,” arXiv, 2022.
[14] R. B. D. Figueiredo and H. A. Mendes, “Analyzing Information Leakage on Video Object Detection Datasets by Splitting Images into Clusters with High Spatiotemporal Correlation,” IEEE Access, vol. 12, pp. 47646–47655, 2024, doi: 10.1109/Access.2024.3383047.
[15] H. Bichri, A. Chergui, and M. Hain, “Investigating the Impact of Train/Test Split Ratio on the Performance of Pre-Trained Models with Custom Datasets,” International Journal of Advanced Computer Science and Applications, vol. 15, no. 2, pp. 331–339, 2024, doi: 10.14569/Ijacsa.2024.0150235.
[16] A. Paszke et al., “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” Adv. Neural Inf. Process. Syst., vol. 32, no. Neurips, 2021.
[17] M. Riva, T. L. Parigi, F. Ungaro, and L. Massimino, “Hugging Face’s Impact on Medical Applications of Artificial Intelligence,” Computational and Structural Biotechnology Reports, vol. 1, no. March, p. 100003, 2024, doi: 10.1016/J.Csbr.2024.100003.
[18] Anugerah Simanjuntak et al., “Research and Analysis of Indobert Hyperparameter Tuning in Fake News Detection,” Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, vol. 13, no. 1, pp. 60–67, 2024, doi: 10.22146/Jnteti.V13i1.8532.
[19] H. Tan, W. Shao, H. Wu, K. Yang, and L. Song, “A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings,” Proceedings of the Annual Meeting of the Association for Computational Linguistics, no. 2018, pp. 246–256, 2022, doi: 10.18653/V1/2022.Findings-Acl.22.
[20] L. Wu, G. Perin, and S. Picek, “I Choose You: Automated Hyperparameter Tuning for Deep Learning-Based Side-Channel Analysis,” IEEE Trans. Emerg. Top. Comput., vol. 12, no. 2, pp. 546–557, 2024, doi: 10.1109/Tetc.2022.3218372.
[21] S. Xie and Z. Li, “Implicit Bias of AdamW: ℓ∞-Norm Constrained Optimization,” Proc. Mach. Learn. Res., vol. 235, pp. 54488–54510, 2024.
[22] C. Wang, Y. Xiao, X. Gao, L. L. Li, and J. Wang, “Close the Gap Between Deep Learning and Mobile Intelligence by Incorporating Training in the Loop,” MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia, no. October 2019, pp. 1419–1427, 2019, doi: 10.1145/3343031.3350904.
[23] G. Alfonso-Francia et al., “Performance Evaluation of Different Object Detection Models for the Segmentation of Optical Cups and Discs,” Diagnostics, vol. 12, no. 12, 2022, doi: 10.3390/Diagnostics12123031.
[24] D. Chicco and G. Jurman, “The Matthews Correlation Coefficient (MCC) Should Replace the ROC AUC as the Standard Metric for Assessing Binary Classification,” Biodata Min., vol. 16, no. 1, Dec. 2023, doi: 10.1186/S13040-023-00322-4.
[25] K. C. Yuni K and I. Hanifuddin, “Analisis Fairness Terhadap Sistem Pembayaran Jasa Pengairan Sawah pada Petani Desa Bibrik Kecamatan Jiwan Kabupaten Madiun,” Journal of Economics, Law, and Humanities, vol. 1, no. 2, pp. 59–74, 2022, doi: 10.21154/Jelhum.V1i2.1194.
[26] H. Al-Khalifa, K. Al-Khalefah, and H. Haroon, “Error Analysis of Pretrained Language Models (PLMs) in English-to-Arabic Machine Translation,” Human-Centric Intelligent Systems, vol. 4, no. 2, pp. 206–219, 2024, doi: 10.1007/S44230-024-00061-7.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors Declaration
- The Authors certify that they have read, understood, and agreed to the Journal of Information Systems and Informatics (JournalISI) submission guidelines, policies, and submission declaration. The submission has been prepared using the provided template.
- The Authors certify that all authors have approved the publication of this manuscript and that there is no conflict of interest.
- The Authors confirm that the manuscript is their original work, has not received prior publication, is not under consideration for publication elsewhere, and has not been previously published.
- The Authors confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- The Authors confirm that the manuscript is not copied from or plagiarized from any other published work.
- The Authors declare that the manuscript will not be submitted for publication in any other journal or magazine until a decision is made by the journal editors.
- If the manuscript is finally accepted for publication, the Authors confirm that they will either proceed with publication immediately or withdraw the manuscript in accordance with the journal’s withdrawal policies.
- The Authors agree that, upon publication of the manuscript in this journal, they transfer copyright or assign exclusive rights to the publisher, including commercial rights














