Troubleshooting for Network Operators

Name: Troubleshooting for Network Operators | The Road to a New Paradigm with Encrypted Traffic
Brand: Wiley
Price: 130.99 EUR
Availability: OnlineOnly

The Road to a New Paradigm with Encrypted Traffic

Van Van Tong Sami Souihi Hai-Anh Tran Abdelhamid Mellouk(Author)

Wiley (Publisher)

1st Edition

Published on 12. September 2023

192 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-394-23665-7 (ISBN)

€130.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Persons

Content

Preface ix

Introduction xi

Chapter 1 State of the Art on Network Troubleshooting 1

1.1 Network troubleshooting 1

1.1.1 State of the art 2

1.1.2 Traditional troubleshooting architecture 9

1.2 Background on encryption protocols 10

1.2.1 QUIC 11

1.2.2 Other protocols 16

1.3 Drawbacks of troubleshooting with encrypted traffic 18

1.3.1 Network performance monitoring 18

1.3.2 Intrusion detection system 20

1.4 Conclusion 22

Chapter 2 Novel Global Troubleshooting Framework for Encrypted Traffic 25

2.1 Novel network troubleshooting architecture for encrypted traffic 25

2.2 Proof of concept of novel troubleshooting architecture in SDN 28

2.3 Data collection 32

2.3.1 Data classification 32

2.3.2 Monitoring tools 34

2.3.3 Parameter measurement 37

2.4 Troubleshooting dataset 40

2.4.1 Datasets for root cause analysis 40

2.4.2 Dataset for traffic classification 42

2.5 Conclusion 43

Chapter 3 Traffic Classification: Novel QUIC Traffic Classifier Based on Convolutional Neural Network 45

3.1 Introduction 45

3.2 Background 48

3.2.1 Convolutional network 48

3.2.2 Characteristics of QUIC-based applications 49

3.3 Traffic classification approaches 50

3.3.1 Port-based approaches 50

3.3.2 Payload-based approaches 51

3.3.3 Statistic-based approaches 51

3.3.4 DL-based approaches 52

3.4 Novel traffic classification method for QUIC traffic 53

3.4.1 Traffic collection 55

3.4.2 Flow-based features 55

3.4.3 Preprocessing 56

3.4.4 Novel traffic classification method 56

3.5 Experimental results 59

3.5.1 Dataset specification 59

3.5.2 Performance metrics 60

3.5.3 Performance analysis 61

3.6 Conclusion 65

Chapter 4 Anomaly Detection 67

4.1 Introduction 67

4.2 Anomaly detection approaches 68

4.2.1 Knowledge-based mechanisms 68

4.2.2 Rule inductions 69

4.2.3 Information theory 70

4.2.4 ML-based mechanisms 70

4.3 Anomaly detection approach using machine learning 71

4.3.1 ML-based anomaly detection method 72

4.3.2 Data collection and processing 74

4.4 Experimental results 75

4.4.1 Experimental setup 75

4.4.2 Performance analysis 76

4.5 Conclusion 79

Chapter 5 Temporary Remediation: SDN-based Application-aware Segment Routing for Large-scale Networks 81

5.1 Introduction 81

5.2 Application-aware routing mechanisms 84

5.2.1 Application-aware routing 84

5.2.2 Application-aware MPLS 86

5.2.3 Application-aware SR 86

5.3 Adaptive segment routing mechanism for encrypted traffic 87

5.3.1 Overview of the SDN-based adaptive segment routing framework 87

5.3.2 Network monitoring 89

5.3.3 Anomaly detection 90

5.3.4 Application-aware remediation 91

5.4 Experimental results 95

5.4.1 Experiment setup 95

5.4.2 Benchmark 97

5.4.3 Performance analysis 97

5.5 Conclusion 104

Chapter 6 Root Cause Analysis and Definitive Remediation 107

6.1 Root cause analysis: machine learning based root cause analysis for SDN network 107

6.1.1 Introduction 107

6.1.2 Root cause analysis mechanisms 109

6.1.3 ML-based RCA mechanism 111

6.1.4 Experimental results 114

6.1.5 Conclusion 119

6.2 Definitive remediation: adaptive QUIC BBR algorithm using reinforcement learning for dynamic networks 121

6.2.1 Introduction 121

6.2.2 Congestion control mechanisms 123

6.2.3 Adaptive BBR algorithm 126

6.2.4 Experimental results 128

6.2.5 Conclusion 133

Conclusions and Prospects 135

References 141

Index 159

1
State of the Art on Network Troubleshooting

"A protocol approach to troubleshooting"

Ed Wilson

Chapter 1 presents the state of the art on network troubleshooting and a traditional troubleshooting architecture for non-encrypted traffic. We then discuss its limitations when traffic is encrypted.

1.1. Network troubleshooting

In the early 19th century, technicians were dispatched to find problems in telegraph and phone line infrastructure to repair and solve the issues. Historically, a troubleshooter refers to a skilled worker who finds and solves technical problems. Nowadays, troubleshooting is a form of problem-solving that aims to repair failed processes in a machine or a system. According to the related work Morris and Rouse (1985) and Jonassen and Hung (2006), there are several existing conceptions of the troubleshooting process. The basic concept of troubleshooting is finding the faulty components in a device to repair or replace it Perez (1991). Schaafstal et al. (2000) designed the troubleshooting process with four subtasks: formulating problem description, cause generation, test and evaluation. Similarly, troubleshooting is considered as an iterative process with four subprocesses: problem space construction, problem space reduction, fault diagnosis and solution verification (Johnson et al. 1993).

Network troubleshooting is an iterative process with three subtasks: identifying, diagnosing and solving problems in the network. In the past, network operators (NOs) implemented manual troubleshooting tools such as ping, traceroute, etc. ping is a computer network administration utility designed to check a reachability between a source and a destination and round-trip time of packets in the network. traceroute is a computer network diagnostic utility used to display possible routes between a source and a destination and measure a transit delay of packets in the network. These troubleshooting tools are used to diagnose complex problems such as loops caused by undefined interaction between spanning tree protocols (Heller et al. 2013), etc. However, these approaches are not effective with a huge number of network devices. Besides, 24.6% of administrators reported that anomaly diagnosis takes more than 1 h on average to solve anomalies (Zeng et al. 2012a). Therefore, it is necessary for an automated troubleshooting process that aims to detect an anomaly, locate its causes and solve it. Consequently, network troubleshooting is considered by the research community Fonseca and Mota (2017); Yu et al. (2018); Cherrared et al. (2019). In the following section, we present the state of the art of network troubleshooting.

1.1.1. State of the art

According to the related work on network troubleshooting (Yu et al. 2018; Fonseca and Mota 2017; Van et al. 2018), problems can be classified into several categories thanks to locations where problems happen or factors that result in problems. Yu et al. (2018) and Fonseca and Mota (2017) categorize problems into problems in application, control and infrastructure layer. Similarly, problems can be classified into problems in application service providers (ASP) or Internet service providers (ISP) (Van et al. 2018). Besides, problems can be classified into problems caused by administrators (e.g. router misconfiguration, server misconfiguration, etc.) or problems that are not caused by administrators (e.g. link failure, switch failure, buffer overload, etc.). According to a survey of NOs (Zeng et al. 2012b), in this book, we present several problems that are not caused by administrators in following sections.

1.1.1.1. Rule failure

Bu et al. (2016) categorized failure rule in the network into missing fault and priority fault. The missing fault occurs when a rule is not executed as expected, whereas the priority fault occurs when overlapping rules violate a priority order.

There are research studies concentrating on the missing fault including ATPG in Zeng et al. (2012a) and Monocle Peresíni et al. (2015). These approaches verify the rules by generating probe packets to exercise every rule. ATPG uses a header space analysis (Kazemian et al. 2012) to check the reachability between all test hosts. Then, the reachability result is transferred to a probe packet generator to compute a minimal set of probe packets via greedy algorithm (Slavık 1997). Next, these probe packets are sent into the network systems to check the rule's corrections. If an error is detected, a fault localization algorithm is implemented to narrow down to identify the root cause. However, ATPG has a drawback when it generates the probe packets for all rules. It is not effective when there are only a few up-to-date rules. Consequently, Monocle is proposed to overcome this drawback. This approach only verifies recently installed rules and reports misbehaviors. Besides, Monocle formulates knowledge from flow tables in the switches as constraints and applies an SAT solver (Biere 2008) to generate a set of probe packets.

Probing is an intrusive method that generates significant overheads and increases link utilization in the network. Consequently, it is necessary to minimize the number of probe packets. This is a minimum set cover problem, which is an NP-Complete problem (Zeng et al. 2012a). Therefore, Bu et al. (2016) proposed RuleScope, a framework for detecting rule failures in the network. RuleScope divides flow tables into solvable subsets of rules to minimize probe scale. Then, this approach creates a directed acyclic graph for each subset and generates a set of probe packets for each subset. As a result, this approach processes the probe packet generation more quickly due to a small scale of rule subsets.

Although RuleScope minimizes the number of probe packets, this approach suffers from a drawback related to a separation in the flow tables. This leads to the priority fault in the switches. The separation in the flow tables into small subsets can result in pretermitting two overlapping rules in two different subsets of rules. Zhao et al. (2018a) proposed SERVE, a rule verification to identify rule failure in the switches automatically. Firstly, SERVE extracts all rules for each device and builds a multi-rooted tree that considers rule connections. Next, SERVE analyzes the multi-rooted tree to generate the minimum number of probe packets. The minimum set cover problem is an NP-Complete problem, so SERVE applies the depth-first search (DFS) algorithm to generate the probe packets. Zhao et al. (2018b) extended the previous study of Zhao et al. (2018a) to present a complete framework. After generating the probe packets, SERVE injects these packets into network systems using an out-band channel. Besides, SERVE also computes a desired network behavior using the multi-rooted trees. According to a comparison between the feedback from the out-band channel for every rule and the desired network behaviors, SERVE can detect faulty rules and send notifications to administrators. SERVE's performance is evaluated to benchmarks in processing time, number of probe packets and overheads. Concerning the number of probe packets, SERVE decreases the number of probe packets by up to 75% in comparison with Monocle. Regarding the processing time, SERVE's figure is three times less than the figure for ATPG. As for the overhead, in-band bandwidth is not influenced according to using the out-band channel to inject the probe packets. Besides, the out-band bandwidth is far less than link capacity.

1.1.1.2. Link failure

Link failure refers to unreachability between two switches. It can lead to a high packet loss and performance degradation in the network. Link failure can be detected according to probe packets in active monitoring approaches. ping is a simple troubleshooting tool that sends probe packets to check the reachability between two end-points. If probe packets are lost, it means that there is a faulty link between these end-points. Similarly, Cascone et al. (2017) proposed a fast failure detection mechanism to detect the link failure based on the exchange of bidirectional "heartbeat" packets. When the packet rate drops below a threshold, a node sends heartbeat packets to its neighbors. If there are no responses from its neighbors after a given time, the link failure happens in the network. However, this mechanism requires a strict consumption related to the backup solutions that cannot be utilized to guarantee the short failover delays (1 ms).

Moreover, this problem can be detected by using the Link Layer Discovery Protocol (LLDP) in software-defined networking (SDN) (Khan et al. 2016; Tarnaras et al. 2015). According to the topology discovery protocol, SDN controller can detect link failure and remove it from network topology. Firstly, an OpenFlow (OF) switch connects to the controller so that the controller knows its active ports. Next, the controller generates a Packet-out message to each active port in the switch to discover the topology. The LLDP between switch s1 and s2 is depicted in Figure 1.1. Firstly, the controller encapsulates an LLDP packet in a Packet-out message and sends it to the switch s1. When switch s1 receives the Packet-out message, it will forward the LLDP packet to switch s2. After receiving the LLDP packet, switch s2 encapsulates this packet in a Packet-in message and sends it back to the controller. The controller receives this message and creates a link from switch s1 to s2. The same process is performed to identify the link for an opposite direction. When link...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Troubleshooting for Network Operators

Description

More details

Other editions

Additional editions

Persons

Content

1
State of the Art on Network Troubleshooting

1.1. Network troubleshooting

1.1.1. State of the art

1.1.1.1. Rule failure

1.1.1.2. Link failure

System requirements