
Troubleshooting for Network Operators
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Therefore, in this context, this book presents a novel efficient multi modular troubleshooting architecture to overcome limitations related to encrypted traffic and high time complexity. This architecture contains five main modules: data collection, anomaly detection, temporary remediation, root cause analysis and definitive remediation. In data collection, there are two sub modules: parameter measurement and traffic classification. This architecture is implemented and validated in a software-defined networking (SDN) environment.
More details
Other editions
Additional editions

Persons
Van Van Tong is a lecturer at the School of Information and Communication Technology at Hanoi University of Science and Technology, Vietnam. His research interests include blockchain, cyber security, SDN and network troubleshooting.
Sami Souihi, HDR, is an Associate Professor in Computer Science in the N&T Department of Paris-Est Créteil University (UPEC), France, and is part of the LiSSiTincNET research team. His research focuses on adaptive mechanisms in large-scale dynamic systems, among others.
Hai-Anh Tran is lecturer researcher and Vice-Dean in the Faculty of Computer Engineering, SoICT at HUST, Vietnam. His research interests include computer networks, distributed systems, network security, QoS, QoE and IoT, ranging from the theory of design to implementation.
Abdelhamid Mellouk is a full-time Professor, the Director of the IT4H High School Engineering Department, UPEC, and Head of the TincNET research team in France. He is also the founder of Network Control Research and Curricula activities at UPEC, the current Co President of the French Deep Tech Data Science and Artificial Intelligence Systematic Hub, member of the High Scientific Research and Technology National Council and President of policies and programs commission, IEEE ComSoc CSR TC Award Chair.
Content
Introduction xi
Chapter 1 State of the Art on Network Troubleshooting 1
1.1 Network troubleshooting 1
1.1.1 State of the art 2
1.1.2 Traditional troubleshooting architecture 9
1.2 Background on encryption protocols 10
1.2.1 QUIC 11
1.2.2 Other protocols 16
1.3 Drawbacks of troubleshooting with encrypted traffic 18
1.3.1 Network performance monitoring 18
1.3.2 Intrusion detection system 20
1.4 Conclusion 22
Chapter 2 Novel Global Troubleshooting Framework for Encrypted Traffic 25
2.1 Novel network troubleshooting architecture for encrypted traffic 25
2.2 Proof of concept of novel troubleshooting architecture in SDN 28
2.3 Data collection 32
2.3.1 Data classification 32
2.3.2 Monitoring tools 34
2.3.3 Parameter measurement 37
2.4 Troubleshooting dataset 40
2.4.1 Datasets for root cause analysis 40
2.4.2 Dataset for traffic classification 42
2.5 Conclusion 43
Chapter 3 Traffic Classification: Novel QUIC Traffic Classifier Based on Convolutional Neural Network 45
3.1 Introduction 45
3.2 Background 48
3.2.1 Convolutional network 48
3.2.2 Characteristics of QUIC-based applications 49
3.3 Traffic classification approaches 50
3.3.1 Port-based approaches 50
3.3.2 Payload-based approaches 51
3.3.3 Statistic-based approaches 51
3.3.4 DL-based approaches 52
3.4 Novel traffic classification method for QUIC traffic 53
3.4.1 Traffic collection 55
3.4.2 Flow-based features 55
3.4.3 Preprocessing 56
3.4.4 Novel traffic classification method 56
3.5 Experimental results 59
3.5.1 Dataset specification 59
3.5.2 Performance metrics 60
3.5.3 Performance analysis 61
3.6 Conclusion 65
Chapter 4 Anomaly Detection 67
4.1 Introduction 67
4.2 Anomaly detection approaches 68
4.2.1 Knowledge-based mechanisms 68
4.2.2 Rule inductions 69
4.2.3 Information theory 70
4.2.4 ML-based mechanisms 70
4.3 Anomaly detection approach using machine learning 71
4.3.1 ML-based anomaly detection method 72
4.3.2 Data collection and processing 74
4.4 Experimental results 75
4.4.1 Experimental setup 75
4.4.2 Performance analysis 76
4.5 Conclusion 79
Chapter 5 Temporary Remediation: SDN-based Application-aware Segment Routing for Large-scale Networks 81
5.1 Introduction 81
5.2 Application-aware routing mechanisms 84
5.2.1 Application-aware routing 84
5.2.2 Application-aware MPLS 86
5.2.3 Application-aware SR 86
5.3 Adaptive segment routing mechanism for encrypted traffic 87
5.3.1 Overview of the SDN-based adaptive segment routing framework 87
5.3.2 Network monitoring 89
5.3.3 Anomaly detection 90
5.3.4 Application-aware remediation 91
5.4 Experimental results 95
5.4.1 Experiment setup 95
5.4.2 Benchmark 97
5.4.3 Performance analysis 97
5.5 Conclusion 104
Chapter 6 Root Cause Analysis and Definitive Remediation 107
6.1 Root cause analysis: machine learning based root cause analysis for SDN network 107
6.1.1 Introduction 107
6.1.2 Root cause analysis mechanisms 109
6.1.3 ML-based RCA mechanism 111
6.1.4 Experimental results 114
6.1.5 Conclusion 119
6.2 Definitive remediation: adaptive QUIC BBR algorithm using reinforcement learning for dynamic networks 121
6.2.1 Introduction 121
6.2.2 Congestion control mechanisms 123
6.2.3 Adaptive BBR algorithm 126
6.2.4 Experimental results 128
6.2.5 Conclusion 133
Conclusions and Prospects 135
References 141
Index 159
1
State of the Art on Network Troubleshooting
"A protocol approach to troubleshooting"
Ed Wilson
Chapter 1 presents the state of the art on network troubleshooting and a traditional troubleshooting architecture for non-encrypted traffic. We then discuss its limitations when traffic is encrypted.
1.1. Network troubleshooting
In the early 19th century, technicians were dispatched to find problems in telegraph and phone line infrastructure to repair and solve the issues. Historically, a troubleshooter refers to a skilled worker who finds and solves technical problems. Nowadays, troubleshooting is a form of problem-solving that aims to repair failed processes in a machine or a system. According to the related work Morris and Rouse (1985) and Jonassen and Hung (2006), there are several existing conceptions of the troubleshooting process. The basic concept of troubleshooting is finding the faulty components in a device to repair or replace it Perez (1991). Schaafstal et al. (2000) designed the troubleshooting process with four subtasks: formulating problem description, cause generation, test and evaluation. Similarly, troubleshooting is considered as an iterative process with four subprocesses: problem space construction, problem space reduction, fault diagnosis and solution verification (Johnson et al. 1993).
Network troubleshooting is an iterative process with three subtasks: identifying, diagnosing and solving problems in the network. In the past, network operators (NOs) implemented manual troubleshooting tools such as ping, traceroute, etc. ping is a computer network administration utility designed to check a reachability between a source and a destination and round-trip time of packets in the network. traceroute is a computer network diagnostic utility used to display possible routes between a source and a destination and measure a transit delay of packets in the network. These troubleshooting tools are used to diagnose complex problems such as loops caused by undefined interaction between spanning tree protocols (Heller et al. 2013), etc. However, these approaches are not effective with a huge number of network devices. Besides, 24.6% of administrators reported that anomaly diagnosis takes more than 1 h on average to solve anomalies (Zeng et al. 2012a). Therefore, it is necessary for an automated troubleshooting process that aims to detect an anomaly, locate its causes and solve it. Consequently, network troubleshooting is considered by the research community Fonseca and Mota (2017); Yu et al. (2018); Cherrared et al. (2019). In the following section, we present the state of the art of network troubleshooting.
1.1.1. State of the art
According to the related work on network troubleshooting (Yu et al. 2018; Fonseca and Mota 2017; Van et al. 2018), problems can be classified into several categories thanks to locations where problems happen or factors that result in problems. Yu et al. (2018) and Fonseca and Mota (2017) categorize problems into problems in application, control and infrastructure layer. Similarly, problems can be classified into problems in application service providers (ASP) or Internet service providers (ISP) (Van et al. 2018). Besides, problems can be classified into problems caused by administrators (e.g. router misconfiguration, server misconfiguration, etc.) or problems that are not caused by administrators (e.g. link failure, switch failure, buffer overload, etc.). According to a survey of NOs (Zeng et al. 2012b), in this book, we present several problems that are not caused by administrators in following sections.
1.1.1.1. Rule failure
Bu et al. (2016) categorized failure rule in the network into missing fault and priority fault. The missing fault occurs when a rule is not executed as expected, whereas the priority fault occurs when overlapping rules violate a priority order.
There are research studies concentrating on the missing fault including ATPG in Zeng et al. (2012a) and Monocle Peresíni et al. (2015). These approaches verify the rules by generating probe packets to exercise every rule. ATPG uses a header space analysis (Kazemian et al. 2012) to check the reachability between all test hosts. Then, the reachability result is transferred to a probe packet generator to compute a minimal set of probe packets via greedy algorithm (Slavık 1997). Next, these probe packets are sent into the network systems to check the rule's corrections. If an error is detected, a fault localization algorithm is implemented to narrow down to identify the root cause. However, ATPG has a drawback when it generates the probe packets for all rules. It is not effective when there are only a few up-to-date rules. Consequently, Monocle is proposed to overcome this drawback. This approach only verifies recently installed rules and reports misbehaviors. Besides, Monocle formulates knowledge from flow tables in the switches as constraints and applies an SAT solver (Biere 2008) to generate a set of probe packets.
Probing is an intrusive method that generates significant overheads and increases link utilization in the network. Consequently, it is necessary to minimize the number of probe packets. This is a minimum set cover problem, which is an NP-Complete problem (Zeng et al. 2012a). Therefore, Bu et al. (2016) proposed RuleScope, a framework for detecting rule failures in the network. RuleScope divides flow tables into solvable subsets of rules to minimize probe scale. Then, this approach creates a directed acyclic graph for each subset and generates a set of probe packets for each subset. As a result, this approach processes the probe packet generation more quickly due to a small scale of rule subsets.
Although RuleScope minimizes the number of probe packets, this approach suffers from a drawback related to a separation in the flow tables. This leads to the priority fault in the switches. The separation in the flow tables into small subsets can result in pretermitting two overlapping rules in two different subsets of rules. Zhao et al. (2018a) proposed SERVE, a rule verification to identify rule failure in the switches automatically. Firstly, SERVE extracts all rules for each device and builds a multi-rooted tree that considers rule connections. Next, SERVE analyzes the multi-rooted tree to generate the minimum number of probe packets. The minimum set cover problem is an NP-Complete problem, so SERVE applies the depth-first search (DFS) algorithm to generate the probe packets. Zhao et al. (2018b) extended the previous study of Zhao et al. (2018a) to present a complete framework. After generating the probe packets, SERVE injects these packets into network systems using an out-band channel. Besides, SERVE also computes a desired network behavior using the multi-rooted trees. According to a comparison between the feedback from the out-band channel for every rule and the desired network behaviors, SERVE can detect faulty rules and send notifications to administrators. SERVE's performance is evaluated to benchmarks in processing time, number of probe packets and overheads. Concerning the number of probe packets, SERVE decreases the number of probe packets by up to 75% in comparison with Monocle. Regarding the processing time, SERVE's figure is three times less than the figure for ATPG. As for the overhead, in-band bandwidth is not influenced according to using the out-band channel to inject the probe packets. Besides, the out-band bandwidth is far less than link capacity.
1.1.1.2. Link failure
Link failure refers to unreachability between two switches. It can lead to a high packet loss and performance degradation in the network. Link failure can be detected according to probe packets in active monitoring approaches. ping is a simple troubleshooting tool that sends probe packets to check the reachability between two end-points. If probe packets are lost, it means that there is a faulty link between these end-points. Similarly, Cascone et al. (2017) proposed a fast failure detection mechanism to detect the link failure based on the exchange of bidirectional "heartbeat" packets. When the packet rate drops below a threshold, a node sends heartbeat packets to its neighbors. If there are no responses from its neighbors after a given time, the link failure happens in the network. However, this mechanism requires a strict consumption related to the backup solutions that cannot be utilized to guarantee the short failover delays (1 ms).
Moreover, this problem can be detected by using the Link Layer Discovery Protocol (LLDP) in software-defined networking (SDN) (Khan et al. 2016; Tarnaras et al. 2015). According to the topology discovery protocol, SDN controller can detect link failure and remove it from network topology. Firstly, an OpenFlow (OF) switch connects to the controller so that the controller knows its active ports. Next, the controller generates a Packet-out message to each active port in the switch to discover the topology. The LLDP between switch s1 and s2 is depicted in Figure 1.1. Firstly, the controller encapsulates an LLDP packet in a Packet-out message and sends it to the switch s1. When switch s1 receives the Packet-out message, it will forward the LLDP packet to switch s2. After receiving the LLDP packet, switch s2 encapsulates this packet in a Packet-in message and sends it back to the controller. The controller receives this message and creates a link from switch s1 to s2. The same process is performed to identify the link for an opposite direction. When link...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.