A Hybrid Extreme Gradient Boosting and Long Short-Term Memory Algorithm for Cyber Threats Detection
Abstract
The vast amounts of data, lack of scalability, and low detection rates of traditional intrusion detection technologies make it impossible to keep up with evolving and increasingly sophisticated cyber threats. Therefore, there is an urgent need to detect and stop cyber threats early. Deep Learning has greatly improved intrusion detection due to its ability to self-learn and extract highly accurate features. In this paper, a Hybrid XG Boosted and Long Short-Term Memory algorithm (HXGBLSTM) is proposed. A comparative analysis is conducted between the computational performance of six established evolutionary computation algorithms and the recently developed bio-inspired metaheuristic algorithm called Zebra Optimisation Algorithm. These algorithms include the Particle Swarm Optimisation Algorithm, the Bio-inspired Algorithms, Bat Optimisation Algorithm, Firefly Optimisation Algorithm, and Monarch Butterfly Optimisation Algorithm, as well as the Genetic Algorithm as an Evolutionary Algorithm. The dimensionality curse has been mitigated by using these metaheuristic methods for feature selection, and the results are compared with the wrapper-based feature selection XGBoost algorithm. The proposed algorithm uses the CSE-CIC -IDS2018 dataset, which contains the latest network attacks. XGBoost outperformed the other FS algorithms and was used as the feature selection algorithm. In evaluating the effectiveness of the newly proposed HXGBLSTM, binary and multi-class classifications are considered. When comparing the performance of the proposed HXGBLSTM for cyber threat detection, it outperforms seven innovative deep learning algorithms for binary classification and four of them for multi-class classification. Other evaluation criteria such as recall, F1 score, and precision have been also used for comparison. The results showed that the best accuracy for binary classification is 99.8\%, with F1-score of 99.83\%, precision of 99.85\%, and recall of 99.82\%, in extensive and detailed experiments conducted on a real dataset. The best accuracy, F1-score, precision, and recall for multi-class classification were all around 100\%, which does give the proposed algorithm an advantage over the compared ones.
References
Al-Fawa’reh, M., Al-Fayoumi, M., Nashwan, S., and Fraihat, S. Cyber threat intelligence using pca-dnn model to detect abnormal network behavior. Egyptian Informatics Journal 23, 2 (2022), 173–185.
Al Razib, M., Javeed, D., Khan, M. T., Alkanhel, R., and Muthanna, M. S. A. Cyber threats detection in smart environments using sdn-enabled dnn-lstm hybrid framework. IEEE Access 10 (2022), 53015–53026.
Alahmed, S., Alasad, Q., Hammood, M. M., Yuan, J.-S., and Alawad, M. Mitigation of black-box attacks on intrusion detection systemsbased ml. Computers 11, 7 (2022), 115.
Ali, A. F., and Hassanien, A.-E. A survey of metaheuristics methods for bioinformatics applications. In Applications of Intelligent Optimization in Biology and Medicine: Current Trends and Open Problems. Springer, 2015, pp. 23–46.
Alohali, M. A., Al-Wesabi, F. N., Hilal, A. M., Goel, S., Gupta, D., and Khanna, A. Artificial intelligence enabled intrusion detection systems for cognitive cyber-physical systems in industry 4.0 environment. Cognitive Neurodynamics (2022), 1–13.
Alzughaibi, S., and El Khediri, S. A cloud intrusion detection systems based on dnn using backpropagation and pso on the cse-cic-ids2018 dataset. Applied Sciences 13, 4 (2023), 2276.
Ankit, T., and Ritika, L. A review of the advancement in intrusion detection datasets. Procedia Computer Science 167 (2020), 636–645.
Antunes, M., Oliveira, L., Seguro, A., Ver´ıssimo, J., Salgado, R., and Murteira, T. Benchmarking deep learning methods for behaviour-based network intrusion detection. In Informatics (2022), vol. 9, p. 29.
Asad, M., Asim, M., Javed, T., Beg, M. O., Mujtaba, H., and Abbas, S. Deepdetect: detection of distributed denial of service attacks using deep learning. The Computer Journal 63, 7 (2020), 983–994.
Assis, M. V., Carvalho, L. F., Lloret, J., and Proenc¸a Jr, M. L. A gru deep learning system against attacks in software defined networks. Journal of Network and Computer Applications 177 (2021), 102942.
Azeroual, H., Belghiti, I. D., and Berbiche, N. A framework for implementing an ml or dl model to improve intrusion detection systems (ids) in the ntma context, with an example on the dataset (cse-cic-ids2018). In ITM Web of Conferences (2022), vol. 46, p. 02005.
Bacanin, N., Venkatachalam, K., Bezdan, T., Zivkovic, M., and Abouhawwash, M. A novel firefly algorithm approach for efficient feature selection with covid-19 dataset. Microprocessors and Microsystems 98 (2023), 104778.
Carrio, A., Sampedro, C., Rodriguez- Ramos, A., and Campoy, P. A review of deep learning methods and applications for unmanned aerial vehicles. Journal of Sensors 2017 (2017).
Chouhan, R. K., Atulkar, M., and Nagwani, N. K. A framework to detect ddos attack in ryu controller based software defined networks using feature extraction and classification. Applied Intelligence (2022), 1–21.
Darwish, A., Hassanien, A. E., and Das, S. A survey of swarm and evolutionary computing approaches for deep learning. Artificial intelligence review 53, 3 (2020), 1767–1812.
Eskandari, S., and Seifaddini, M. Online and offline streaming feature selection methods with bat algorithm for redundancy analysis. Pattern Recognition 133 (2023), 109007.
Farhan, B. I., and Jasim, A. D. Performance analysis of intrusion detection for deep learning model based on cse-cic-ids2018 dataset. Indonesian Journal of Electrical Engineering and Computer Science 26, 2 (2022), 1165–1172.
Canadian Institute for Cybersecurity. A realistic cyber defense dataset (cse-cic-ids2018). https://registry.opendata.aws/cse-cic-ids2018/, June 2023.
Gers, F. A., Schmidhuber, J., and Cummins, F. Learning to forget: Continual prediction with lstm. Neural computation 12, 10 (2000), 2451–2471.
Ghimire, S., Deo, R. C., Wang, H., Al- Musaylh, M. S., Casillas-P´erez, D., and Salcedo-Sanz, S. Stacked lstm sequence-tosequence autoencoder with feature selection for daily solar radiation prediction: a review and new modeling results. Energies 15, 3 (2022), 1061.
Haider, S., Akhunzada, A., Mustafa, I., Patel, T. B., Fernandez, A., Choo, K.-K. R., and Iqbal, J. A deep cnn ensemble framework for efficient ddos attack detection in software defined networks. Ieee Access 8 (2020), 53972–53983.
Hind, B., and Barbora, B. Recent advances in machine-learning driven intrusion detection in transportation: Survey. Procedia Computer Science 184 (2021), 877–886.
Hnamte, V., and Hussain, J. Dependable intrusion detection system using deep convolutional neural network: A novel framework and performance evaluation approach. Telematics and Informatics Reports 11 (2023), 100077.
Hochreiter, S., and Schmidhuber, J. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
Javeed, D., Gao, T., Khan, M. T., and Shoukat, D. A hybrid intelligent framework to combat sophisticated threats in secure industries. Sensors 22, 4 (2022), 1582.
Jiyeon, K., Yulim, S., and Eunjung, C. An intrusion detection model based on a convolutional neural network. Journal of Multimedia Information System 6, 4 (2019), 165–172.
Kanna, P. R., and Santhi, P. Hybrid intrusion detection using mapreduce based black widow optimized convolutional long short-term memory neural networks. Expert Systems with Applications 194 (2022), 116545.
Kilincer, I. F., Ertam, F., and Sengur, A. A comprehensive intrusion detection framework using boosting algorithms. Computers and Electrical Engineering 100 (2022), 107869.
Malik, J., Akhunzada, A., Bibi, I., Imran, M., Musaddiq, A., and Kim, S. W. Hybrid deep learning: An efficient reconnaissance and surveillance detection mechanism in sdn. IEEE Access 8 (2020), 134695–134706.
Mijwil, M., Salem, I. E., and Ismaeel, M. M. The significance of machine learning and deep learning techniques in cybersecurity: A comprehensive review. Iraqi Journal For Computer Science and Mathematics 4, 1 (2023), 87–101.
Mishra, S., Sagban, R., Yakoob, A., and Gandhi, N. Swarm intelligence in anomaly detection systems: an overview. International Journal of Computers and Applications 43, 2 (2021), 109–118.
Mohamed, A. F., Leandros, M., Sotiris, M., and Helge, J. Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. Journal of Information Security and Applications 50 (2020), 102419.
Mossa, G., Ghaleb, G., Faisal, A., Reem, A., and Suad, O. A detailed analysis of benchmark datasets for network intrusion detection system. Asian Journal of Research in Computer Science 7, 4 (2021), 14–33.
Ragab, M. Hybrid firefly particle swarm optimisation algorithm for feature selection problems. Expert Systems (2023).
Ren, X., Yang, W., Jiang, X., Jin, G., and Yu, Y. A deep learning framework for multimodal course recommendation based on lstm+ attention. Sustainability 14, 5 (2022), 2907.
Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J., and M¨uller, K.-R. Explaining deep neural networks and beyond: A review of methods and applications. Proceedings of the IEEE 109, 3 (2021), 247–278.
Sohail, A. Genetic algorithms in the fields of artificial intelligence and data sciences. Annals of Data Science 10, 4 (2023), 1007–1018.
Thakkar, A., and Lohiya, R. Role of swarm and evolutionary algorithms for intrusion detection system: A survey. Swarm and evolutionary computation 53 (2020), 100631.
Thuy, T. T. T., Thuan, L. D., Duc, N. H., and Minh, H. T. A study on heuristic algorithms combined with lr on a dnn-based ids model to detect iot attacks. In MENDEL (2023), vol. 29, pp. 62–70.
Tiwari, A. A hybrid feature selection method using an improved binary butterfly optimization algorithm and adaptive β–hill climbing. IEEE Access (2023).
Trojovsk´a, E., Dehghani, M., and Trojovsk` y, P. Zebra optimization algorithm: A new bio-inspired optimization algorithm for solving optimization algorithm. IEEE Access 10 (2022), 49445–49473.
Ullah, S., Khan, M. A., Ahmad, J., Jamal, S. S., e Huma, Z., Hassan, M. T., Pitropakis, N., and Buchanan, W. J. Hdlids: a hybrid deep learning architecture for intrusion detection in the internet of vehicles. Sensors 22, 4 (2022), 1340.
Vyshnia, G. Feature importance with featurewiz python. https://www.kaggle.com/code/gvyshnya/jan22-tpc-feature-importance-withfeaturewiz#Introduction, Jan 2022.
Wang, F., Zhang, W., Yang, Q., Kang, Y., Fan, Y., Wei, J., Liu, Z., Dai, S., Li, H., Li, Z., et al. Generation of a hutchinson–gilford progeria syndrome monkey model by base editing. Protein & cell 11, 11 (2020), 809–824.
Wang, Y.-C., Houng, Y.-C., Chen, H.-X., and Tseng, S.-M. Network anomaly intrusion detection based on deep learning approach. Sensors 23, 4 (2023), 2171.
Yuan, S., and Wu, X. Deep learning for insider threat detection: Review, challenges and opportunities. Computers & Security 104 (2021), 102221.
Zhang, B., Zhang, Y., and Jiang, X. Feature selection for global tropospheric ozone prediction based on the bo-xgboost-rfe algorithm. Scientific Reports 12, 1 (2022), 1–10.
Zhang, Y., and Liu, Q. On iot intrusion detection based on data augmentation for enhancing learning on unbalanced samples. Future Generation Computer Systems 133 (2022), 213–227.
Zhou, C., and Chen, X. Predicting china’s energy consumption: Combining machine learning with three-layer decomposition approach. Energy Reports 7 (2021), 5086–5099.
Zhou, L., Zhang, C., Liu, F., Qiu, Z., and He, Y. Application of deep learning in food: a review. Comprehensive reviews in food science and food safety 18, 6 (2019), 1793–1811.
Copyright (c) 2023 MENDEL
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
MENDEL open access articles are normally published under a Creative Commons Attribution-NonCommercial-ShareAlike (CC BY-NC-SA 4.0) https://creativecommons.org/licenses/by-nc-sa/4.0/ . Under the CC BY-NC-SA 4.0 license permitted 3rd party reuse is only applicable for non-commercial purposes. Articles posted under the CC BY-NC-SA 4.0 license allow users to share, copy, and redistribute the material in any medium of format, and adapt, remix, transform, and build upon the material for any purpose. Reusing under the CC BY-NC-SA 4.0 license requires that appropriate attribution to the source of the material must be included along with a link to the license, with any changes made to the original material indicated.