A Novel Data Engineering Process Which Integrates Alert Information, Security Logs, And SOC Analysts



We build up a user centric ML system for the cyber security operation center in endeavor environment. We examine the regular data sources in SOC, their work process, and how to leverage and procedure these data sets to construct an effective ML system. The work is besieged towards two groups of readers. The primary group is data scientists or ML researchers who do not have cyber security domain awareness but want to build ML systems for safety operations center. The second group of people is those cyber security practitioners who have deep information and expertise in cyber security, but do not have ML knowledge and wish to construct one by them. All through the work, we use the system we built in the Symantec SOC construction setting as an example to display the full steps from data collection, label creation, feature engineering, ML algorithm selection, and model show evaluations, to risk score making.


Security Operation Center (SOC); DNS (Domain Name System); IDS/IPS (Intrusion Detection/Prevention System); Machine Learning (ML); DLP (Data Loss Protection); DHCP (Dynamic Host Configuration Protocol); Data Mining (DM); Deep Neural Network (DNN);


"The 6 Categories of Critical Log Information", SANS Technology Institute, 2013.

X. Li and B. Liu, "Learning to classify text using positive and unlabeled data", Proceedings of the 18th international joint conference on Artificial intelligence, 2003.

A. L. Buczak and E. Guven, "A survey of data mining and machine learning methods for cyber security intrusion detection", IEEE Communications Surveys & Tutorials, vol. 18.2, pp. 1153-1176, 2015.

S. Choudhury and A. Bhowal, "Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection", Smart Technologies and Management for Computing Communication Controls Energy and Materials (ICSTM), 2015.

N. Chand et al., "A comparative analysis of SVM and its stacking with other classification algorithm for intrusion detection", Advances in Computing Communication & Automation (ICACCA), 2016.

K. Goeschel, "Reducing false positives in intrusion detection systems using data-mining techniques utilizing support vector machines decision trees and naive Bayes for off-line analysis", SoutheastCon, 2016.

M. J. Kang and J. W. Kang, "A novel intrusion detection method using deep neural network for in-vehicle network security", VehicularTechnology Conference, 2016.

Full Text: PDF


  • There are currently no refbacks.

Copyright © 2012 - 2021, All rights reserved.| ijitr.com

Creative Commons License
International Journal of Innovative Technology and Research is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJITR , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.