DANAK: Finding the odd! - Jérôme François

(Detecting Anomalies in Netflow records by spatial. Aggregation and Kernel ... traffic has anomalies. Additionally DANAK presents a method for the embedding ... to identify the anomaly, with the result that the graphical representation is unable ...
443KB taille 5 téléchargements 26 vues
DANAK: Finding the odd! Cynthia Wagner, J´erˆome Franc¸ois, Radu State, Thomas Engel University of Luxembourg, SnT - Interdisciplinary Centre for Security, Reliability and Trust Campus Kirchberg L-1359 Luxembourg Email: {cynthia.wagner, jerome.francois, radu.state, thomas.engel}@uni.lu

Abstract—In this paper introduces DANAK for the detection of anomalies in Netflow records by referring to spatio-temporal aggregation. Spatially aggregated Netflow records are fed into a new kernel function to analyze those on context and quantitative evolution. To enhance the analysis of sparse or missing data in time series, phase space embedding (PSE) is applied.

I. I NTRODUCTION One kind of available information on network borders are Netflow records, which can be exported by mostly all routers today, but storing and analyzing these large quantities instantly is a problem. A question that arises is, if it is really necessary to evaluate all records or if abstractions of records are sufficient to provide the same outcomes. Another issue is that network load changes or other disturbances impact network monitoring, such that large data sets include periods where data is sparse or even absent. This influences the evaluation of traffic time series. This paper describes a monitoring framework, DANAK (Detecting Anomalies in Netflow records by spatial Aggregation and Kernel methods) that aims to detect anomalies in Netflow record time series. The full analysis of individual Netflow records is not recommended, therefore a spatio-temporal aggregation technique is developed. A kernel function, a sub-domain of machine learning, is designed that captures topological and traffic changes in aggregated data and by this deduces, if network traffic has anomalies. Additionally DANAK presents a method for the embedding of time series into an n-dimensional space to reconstruct missing dimensions, such as sparse data periods. II. AGGREGATION AND THE KERNEL FUNCTION

and quantitative traffic information by extraction from Netflow records. A kernel function has been used for evaluating generated traffic profiles and can be described as a similarity function for complex input data, where the distances can directly be derived without exhaustive calculations. A kernel function K is defined as a similarity mapping K :P X ×X → [0, ∞[, where X is an input space and K(x, y) = i φi (x)φi (y) = φ(x) · φ(y) a similarity score, with φi (x) being a feature function over a sample x. Here, a new kernel function is introduced that calculates the similarity between input profile trees, Tn and Tm for source and destination IP address profiles. The first kernel function metric uses IP address information, such that IP-subnets can be described as, IP = (pref ixi , pref ixlengthi ), with pref ixi as IP network part and pref ixlengthi the host identifier size (see Fig. 1). The second metric handles traffic volume information for a node in percent of bytes, voli . A profile for source or destination is a set of nodes Tsrc,dst = {n1 , ..., nj }, where a node nx is defined as nx = ( prefixx , prefixlengthx , volx ). For two traffic profiles Tn =< Tnsrc , Tndst > and dst dst >, the , Tm Tm =< Tm P kernel function can be defined by Ksrc,dst (Tn , Tm ) = 21 i∈Tnsrc,dst ,j∈Tm src,dst ssrc,dst (i, j)× vsrc,dst (i, j). The similarity function ssrc,dst (i, j) compares traffic profiles and calculates their similarity. It can be defined as,  pref ixlengthj 2   2pref ixlengthi     if pref ixi prefix of pref ixj     2pref ixlengthi  pref ixlengthj 2 ssrc,dst (i, j) = (1)  if pref ix prefix of pref ixi j        0    otherwise

Fig. 1.

Output tree T

Spatio-temporal aggregation of IP data was first presented in [2]. An advantage of aggregation is that overviews on subnet basis can be presented instead of single IP flow basis. DANAK implements a similar method that outputs traffic profiles in tree-like form by spatially assembling subnet, host

The matching function part performs the calculations for the |voli −volj |2  volume and is defined as, vsrc,dst (i, j) = exp − . σ2 Fig. 2 shows a time series with n traffic profiles with a violent DDoS UDP flood. By evaluating this time series with the kernel function, the attack can be detected. In Fig. 3, a stealthy DDoS flood attack is evaluated with the kernel function. It can be seen that the kernel function is not sufficient to identify the anomaly, with the result that the graphical representation is unable to reveal it to a human expert. This

56(%)(7/$&/8&9#(,-&:#/0%"*&8/#&(&*9#/$;&(