PoS - Proceedings of Science
Volume 327 - International Symposium on Grids and Clouds 2018 in conjunction with Frontiers in Computational Drug Discovery (ISGC 2018 & FCDD) - Networking, Security, Infrastructure & Operations
Building a large scale Intrusion Detection System using Big Data technologies
P. Panero*, L. Vâlsan, V. Brillault and I.C. Schuszter
Full text: pdf
Published on: December 12, 2018
Abstract
Computer security threats have always been a major concern and continue to increase in frequency and complexity. The nature and techniques of the attacks evolve rapidly over time, making their detection more difficult. Therefore the means and tools used to deal with them need to evolve at the same pace if not faster.
In this paper the implementation of an Intrusion Detection System (IDS) both at the Network (NIDS) and Host (HIDS) level, used at CERN, is presented. The system is currently processing in real time approximately one TB of data per day, with the final goal of coping with at least 5 TB / day. In order to accomplish this goal at first an infrastructure to collect data from sources such as system logs, web server logs and the NIDS logs has been developed making use of technologies such as Apache Flume and Apache Kafka. Once the data is collected it needs to be processed in search of malicious activity: the data is consumed by Apache Spark jobs which compare in real time this data with known signatures of malicious activities. These are known as Indicators of Compromise (IoC). They are published by many security experts and centralized in a local Malware Information Sharing Platform (MISP) instance.
Nonetheless, detecting an intrusion is not enough. There is a need to understand what happened and why. In order to gain knowledge on the context of the detected intrusion the data is also enriched in real time when it is passing through the pipeline. For example, DNS resolution and IP geolocation are applied to it. A system generic enough to process any kind of data in JSON format is enriching the data in order to get additional context of what is happening and finally looking for indicators of compromise to detect possible intrusions, making use of the latest technologies in the Big Data ecosystem.
DOI: https://doi.org/10.22323/1.327.0014
How to cite

Metadata are provided both in "article" format (very similar to INSPIRE) as this helps creating very compact bibliographies which can be beneficial to authors and readers, and in "proceeding" format which is more detailed and complete.

Open Access
Creative Commons LicenseCopyright owned by the author(s) under the term of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.