Herausgeber: Marcus A. Maloof
Titel: Machine Learning and Data Mining for Computer Security Methods and Applications
Verlag: Springer-Verlag
ISBN/ISSN: 9781846282539
Auflage: 1
Preis : CHF 147.80
Kategorie: Informatik, EDV Buch
Sprache: English
Technische Daten
Seiten: 210
Kopierschutz: DRM
Geräte: PC/MAC/eReader/Tablet
Formate: PDF

'Machine Learning and Data Mining for Computer Security' provides an overview of the current state of research in machine learning and data mining as it applies to problems in computer security. This book has a strong focus on information processing and combines and extends results from computer security.

The first part of the book surveys the data sources, the learning and mining methods, evaluation methodologies, and past work relevant for computer security. The second part of the book consists of articles written by the top researchers working in this area. These articles deals with topics of host-based intrusion detection through the analysis of audit trails, of command sequences and of system calls as well as network intrusion detection through the analysis of TCP packets and the detection of malicious executables.

This book fills the great need for a book that collects and frames work on developing and applying methods from machine learning and data mining to problems in computer security.


2 An Introduction to Information Assurance (p. 7)

Clay Shields

2.1 Introduction

The intuitive function of computer security is to limit access to a computer system. With a perfect security system, information would never be compromised because unauthorized users would never gain access to the system. Unfortunately, it seems beyond our current abilities to build a system that is both perfectly secure and useful.

Instead, the security of information is often compromised through technical flaws and through user actions. The realization that we cannot build a perfect system is important, because it shows that we need more than just protection mechanisms. We should expect the system to fail, and be prepared for failures.

As described in Sect. 2.2, system designers not only use mechanisms that protect against policy violations, but also detect when violations occur, and respond to the violation. This response often includes analyzing why the protection mechanisms failed and improving them to prevent future failures.

It is also important to realize that security systems do not exist just to limit access to a system. The true goal of implementing security is to protect the information on the system, which can be far more valuable than the system itself or access to its computing resources.

Because systems involve human users, protecting information requires more than just technical measures. It also requires that the users be aware of and follow security policies that support protection of information as needed.

This chapter provides a wider view of information security, with the goal of giving machine learning researchers and practitioners an overview of the area and suggesting new areas that might benefit from machine learning approaches. This wider view of security is called information assurance.

It includes the technical aspects of protecting information, as well as defining policies thoroughly and correctly and ensuring proper behavior of human users and operators. I will first describe the security process.

I will then explain the standard model of information assurance and its components, and, finally, will describe common attackers and the threats they pose. I will conclude with some examples of problems that fall outside much of the normal technical considerations of computer security that may be amenable to solution by machine learning methods.

2.2 The Security Process

Human beings are inherently fallible. Because we will make mistakes, our security process must reflect that fact and attempt to account for it. This recognition leads to the cycle of security shown in Fig. 2.1. This cycle is really very familiar and intuitive, and is common in everyday life, and is illustrated here with a running example of securing an automobile.

2.2.1 Protection

Protection mechanisms are used to enforce a particular policy. The goal is to prevent things that are undesirable from occurring. A familiar example is securing an automobile and its contents. A car comes with locks to prevent anyone without a key from gaining access to it, or from starting it without the key. These locks constitute the car’s protection mechanisms.

2.2.2 Detection

Since we anticipate that our protection mechanisms will be imperfect, we attempt to determine when that occurs by adding detection mechanisms.

List of Contributors13
1 Introduction17
Part I Survey Contributions21
2 An Introduction to Information Assurance23
2.1 Introduction23
2.2 The Security Process24
2.2.1 Protection24
2.2.2 Detection24
2.2.3 Response25
2.3 Information Assurance26
2.3.1 Security Properties26
2.3.2 Information Location30
2.3.3 System Processes31
2.4 Attackers and the Threats Posed32
2.4.1 Worker with a Backhoe33
2.4.2 Ignorant Users33
2.4.3 Criminals33
2.4.4 Script Kiddies34
2.4.5 Automated Agents34
2.4.6 Professional System Crackers35
2.4.7 Insiders35
2.5 Opportunities for Machine Learning Approaches36
2.6 Conclusion37
3 Some Basic Concepts of Machine Learning and Data Mining39
3.1 Introduction39
3.2 From Data to Examples40
3.3 Representations, Models, and Algorithms43
3.3.1 Instance-Based Learning45
3.3.2 Naive Bayes45
3.3.3 Kernel Density Estimation45
3.3.4 Learning Coe.cients of a Linear Function46
3.3.5 Learning Decision Rules46
3.3.6 Learning Decision Trees47
3.3.7 Mining Association Rules47
3.4 Evaluating Models48
3.4.1 Problems with Simple Performance Measures51
3.4.2 ROC Analysis52
3.4.3 Principled Evaluations and Their Importance54
3.5 Ensemble Methods and Sequence Learning55
3.5.1 Ensemble Methods56
3.5.2 Sequence Learning56
3.6 Implementations and Data Sets58
3.7 Further Reading58
3.8 Concluding Remarks59
Part II Research Contributions61
4 Learning to Detect Malicious Executables63
4.1 Introduction63
4.2 Related Work65
4.3 Data Collection68
4.4 Classification Methodology68
4.4.1 Instance-Based Learner69
4.4.2 The TFIDF Classi.er69
4.4.3 Naive Bayes70
4.4.4 Support Vector Machines70
4.4.5 Decision Trees71
4.4.6 Boosted Classi.ers71
4.5 Experimental Design72
4.6 Experimental Results72
4.6.1 Pilot Studies72
4.6.2 Experiment with a Small Collection73
4.6.3 Experiment with a Larger Collection73
4.7 Discussion76
4.8 Concluding Remarks79
5 Data Mining Applied to Intrusion Detection: MITRE Experiences81
5.1 Introduction81
5.1.1 Related Work82
5.1.2 MITRE Intrusion Detection83
5.2 Initial Feature Selection, Aggregation, Classification, and Ranking84
5.2.1 Feature Selection and Aggregation85
5.2.2 HOMER86
5.2.3 BART Algorithm and Implementation86
5.2.4 Other Anomaly Detection Efforts89
5.3 Classifier to Reduce False Alarms90
5.3.1 Incremental Classifier Algorithm90
5.3.2 Classifier Experiments92
5.4 Clustering to Detect Anomalies94
5.4.1 Clustering with a Reference Model on KDD Cup Data95
5.4.2 Clustering without a Reference Model on MITRE Data97
5.5 Conclusion97
6 Intrusion Detection Alarm Clustering105
6.1 Introduction105
6.2 Root Causes and Root Cause Analysis106
6.3 The CLARAty Alarm Clustering Method108
6.3.1 Motivation108
6.3.2 The CLARAty Algorithm109
6.3.3 CLARAty Use Case111
6.4 Cluster Validation112
6.4.1 The Validation Dilemma112
6.4.2 Cluster Validation in Brief113
6.4.3 Validation of Alarm Clusters115
6.5 Cluster Tendency116
6.5.1 Test of Cluster Tendency116
6.5.2 Experimental Setup and Results119
6.5.3 Derivation of Probabilities120
6.6 Conclusion122
7 Behavioral Features for Network Anomaly Detection123
7.1 Introduction123
7.2 Inter-Flow versus I