Pre-Trained Large Language Model Pipeline for Anomaly Detection Based on the MITRE ATT&CK Framework

Kyuchang Kang; Yu-Jin So; Jong-Geun Park

Pre-Trained Large Language Model Pipeline for Anomaly Detection Based on the MITRE ATT&CK Framework

Kyuchang Kang

Yu-Jin So

Jong-Geun Park

Vol. 50, No. 10, pp. 1631-1645, Oct. 2025

10.7840/kics.2025.50.10.1631

Anomaly Detection

Pre-trained Large Language Model (LLM)

MITRE ATT&CK Framework

Network Logs Analysis

feature engineering

cybersecurity

PDF Full-Text

Abstract

In this paper, we propose a Large Language Model (LLM) pipeline utilizing the UWF-ZeekData22 dataset based on MITRE ATT&CK Matrix to address the growing cyber threats in modern society. We first performed an exploratory data analysis (EDA) to derive key feature groups that reflect the spatio-temporal characteristics and connectivity of network traffic logs. The derived feature groups are used to generate input sequences for pre-training the BERT model. In the pre-training phase, we applied a masked language model (MLM) task to effectively learn network traffic patterns and achieved a mask prediction accuracy of over 0.9. In the fine-tuning and inference phase, we optimized the models for anomaly detection by adopting a weighted sampling technique to handle the imbalance problem of each tactic in the dataset. The performance evaluation showed that all models had an accuracy above 0.94 and an AUC-ROC value close to 1.0. We also analyzed the impact of the padding method according to model size and found that static padding performed better for large models, while dynamic padding performed better for small models. These results demonstrate that LLM-based pre-training can successfully learn complex patterns of network traffic logs and can reliably detect various tactics. Therefore, the proposal of this paper is expected to provide a practical case study in the modernization of network security systems and the development of real-time security monitoring solutions.

Statistics

Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.

Vivian Ukamaka Ihekoronye, Cosmas Ifeanyi Nwakanma, Dong-Seong Kim, Jae-Min Lee · Vol. 48, No. 6

Eunhye Choi, Hyunggon Park · Vol. 49, No. 4

Mugon Joe, Miru Kim, Minhae Kwon · Vol. 49, No. 7

Woonyon Kim, Eung-Ki Park, Sin-Kyu Kim · Vol. 44, No. 5

Yu-Jin So, Jong-Geun Park, Kyuchang Kang · Vol. 50, No. 4

Cite this article

[IEEE Style]

K. Kang, Y. So, J. Park, "Pre-Trained Large Language Model Pipeline for Anomaly Detection Based on the MITRE ATT&CK Framework," The Journal of Korean Institute of Communications and Information Sciences, vol. 50, no. 10, pp. 1631-1645, 2025. DOI: 10.7840/kics.2025.50.10.1631.

[ACM Style]

Kyuchang Kang, Yu-Jin So, and Jong-Geun Park. 2025. Pre-Trained Large Language Model Pipeline for Anomaly Detection Based on the MITRE ATT&CK Framework. The Journal of Korean Institute of Communications and Information Sciences, 50, 10, (2025), 1631-1645. DOI: 10.7840/kics.2025.50.10.1631.

[KICS Style]

Kyuchang Kang, Yu-Jin So, Jong-Geun Park, "Pre-Trained Large Language Model Pipeline for Anomaly Detection Based on the MITRE ATT&CK Framework," The Journal of Korean Institute of Communications and Information Sciences, vol. 50, no. 10, pp. 1631-1645, 10. 2025. (https://doi.org/10.7840/kics.2025.50.10.1631)

Vol. 50, No. 10 Index

Pre-Trained Large Language Model Pipeline for Anomaly Detection Based on the MITRE ATT&CK Framework

Submenu

Search
(IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

Advanced Search

Recent Publications
(LAST 3 YEARS)

Pre-Trained Large Language Model Pipeline for Anomaly Detection Based on the MITRE ATT&CK Framework

DATA-FedAVG: Delay-Aware Truncated Accuracy-Based Federated Averaging for Intrusion Detection in UAV Network

Acoustic Anomaly Detection Based on Band Energy of Discrete Wavelet Transform

Fine-Tuning Anomaly Classifier for Unbalanced Network Data

A Study on a Cybersecurity Evaluation Method for Industrial Control Systems in the 4th Industrial Revolution Era

A Study on Data Reconstruction and Model Parameter Optimization for Implementation of Anomaly Detection System Based on User Behavior Analysis

Submenu

Search (IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

Advanced Search

POPULAR KEYWORDS(TOP 10 KEYWORDS)

Recent Publications(LAST 3 YEARS)

Search
(IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

POPULAR KEYWORDS
(TOP 10 KEYWORDS)

Recent Publications
(LAST 3 YEARS)