Pre-Trained Large Language Model Pipeline for Anomaly Detection Based on the MITRE ATT&CK Framework 


Vol. 50,  No. 10, pp. 1631-1645, Oct.  2025
10.7840/kics.2025.50.10.1631


PDF Full-Text
  Abstract

In this paper, we propose a Large Language Model (LLM) pipeline utilizing the UWF-ZeekData22 dataset based on MITRE ATT&CK Matrix to address the growing cyber threats in modern society. We first performed an exploratory data analysis (EDA) to derive key feature groups that reflect the spatio-temporal characteristics and connectivity of network traffic logs. The derived feature groups are used to generate input sequences for pre-training the BERT model. In the pre-training phase, we applied a masked language model (MLM) task to effectively learn network traffic patterns and achieved a mask prediction accuracy of over 0.9. In the fine-tuning and inference phase, we optimized the models for anomaly detection by adopting a weighted sampling technique to handle the imbalance problem of each tactic in the dataset. The performance evaluation showed that all models had an accuracy above 0.94 and an AUC-ROC value close to 1.0. We also analyzed the impact of the padding method according to model size and found that static padding performed better for large models, while dynamic padding performed better for small models. These results demonstrate that LLM-based pre-training can successfully learn complex patterns of network traffic logs and can reliably detect various tactics. Therefore, the proposal of this paper is expected to provide a practical case study in the modernization of network security systems and the development of real-time security monitoring solutions.

  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Related Articles
  Cite this article

[IEEE Style]

K. Kang, Y. So, J. Park, "Pre-Trained Large Language Model Pipeline for Anomaly Detection Based on the MITRE ATT&CK Framework," The Journal of Korean Institute of Communications and Information Sciences, vol. 50, no. 10, pp. 1631-1645, 2025. DOI: 10.7840/kics.2025.50.10.1631.

[ACM Style]

Kyuchang Kang, Yu-Jin So, and Jong-Geun Park. 2025. Pre-Trained Large Language Model Pipeline for Anomaly Detection Based on the MITRE ATT&CK Framework. The Journal of Korean Institute of Communications and Information Sciences, 50, 10, (2025), 1631-1645. DOI: 10.7840/kics.2025.50.10.1631.

[KICS Style]

Kyuchang Kang, Yu-Jin So, Jong-Geun Park, "Pre-Trained Large Language Model Pipeline for Anomaly Detection Based on the MITRE ATT&CK Framework," The Journal of Korean Institute of Communications and Information Sciences, vol. 50, no. 10, pp. 1631-1645, 10. 2025. (https://doi.org/10.7840/kics.2025.50.10.1631)
Vol. 50, No. 10 Index