Applied Modeling Techniques and Data Analysis 1

Name: Applied Modeling Techniques and Data Analysis 1 | Computational Data Analysis Methods and Tools
Brand: Wiley
Price: 139.99 EUR
Availability: OnlineOnly

Computational Data Analysis Methods and Tools

Yiannis Dimotikalis Alex Karagrigoriou Christina Parpoula Christos H. Skiadas(Herausgeber*in)

Wiley (Verlag)

1. Auflage

Erschienen am 30. März 2021

304 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-119-82156-4 (ISBN)

139,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

Preface xi
Yannis DIMOTIKALIS, Alex KARAGRIGORIOU, Christina PARPOULA and Christos H. SKIADAS

Part 1. Computational Data Analysis 1

Chapter 1. A Variant of Updating PageRank in Evolving Tree Graphs 3
Benard ABOLA, Pitos Seleka BIGANDA, Christopher ENGSTRÖM, John Magero MANGO, Godwin KAKUBA and Sergei SILVESTROV

1.1. Introduction 3

1.2. Notations and definitions 5

1.3. Updating the transition matrix 5

1.4. Updating the PageRank of a tree graph 10

1.4.1. Updating the PageRank of tree graph when a batch of edges changes 12

1.4.2. An example of updating the PageRank of a tree 15

1.5. Maintaining the levels of vertices in a changing tree graph 17

1.6. Conclusion 21

1.7. Acknowledgments 21

1.8. References 21

Chapter 2. Nonlinearly Perturbed Markov Chains and Information Networks 23
Benard ABOLA, Pitos Seleka BIGANDA, Sergei SILVESTROV, Dmitrii SILVESTROV, Christopher ENGSTRÖM, John Magero MANGO and Godwin KAKUBA

2.1. Introduction 23

2.2. Stationary distributions for Markov chains with damping component 26

2.2.1. Stationary distributions for Markov chains with damping component 26

2.2.2. The stationary distribution of the Markov chain X0,n 28

2.3. A perturbation analysis for stationary distributions of Markov chains with damping component 29

2.3.1. Continuity property for stationary probabilities 29

2.3.2. Rate of convergence for stationary distributions 29

2.3.3. Asymptotic expansions for stationary distributions 30

2.3.4. Results of numerical experiments 32

2.4. Coupling and ergodic theorems for perturbed Markov chains with damping component 39

2.4.1. Coupling for regularly perturbed Markov chains with damping component 39

2.4.2. Coupling for singularly perturbed Markov chains with damping component 41

2.4.3. Ergodic theorems for perturbed Markov chains with damping component in the triangular array mode 42

2.4.4. Numerical examples 43

2.5. Acknowledgments 51

2.6. References 51

Chapter 3. PageRank and Perturbed Markov Chains 57
Pitos Seleka BIGANDA, Benard ABOLA, Christopher ENGSTRÖM, Sergei SILVESTROV, Godwin KAKUBA and John Magero MANGO

3.1. Introduction 57

3.2. PageRank of the first-order perturbed Markov chain 59

3.3. PageRank of the second-order perturbed Markov chain 60

3.4. Rates of convergence of Page Ranks of first- and second-order perturbed Markovchains 70

3.5. Conclusion 72

3.6. Acknowledgments 72

3.7. References 72

Chapter 4. Doubly Robust Data-driven Distributionally Robust Optimization 75
Jose BLANCHET, Yang KANG, Fan ZHANG, Fei HE and Zhangyi HU

4.1. Introduction 75

4.2. DD-DRO, optimal transport and supervised machine learning 79

4.2.1. Optimal transport distances and discrepancies 80

4.3. Data-driven selection of optimal transport cost function 81

4.3.1. Data-driven cost functions via metric learning procedures 81

4.4. Robust optimization for metric learning 83

4.4.1. Robust optimization for relative metric learning 83

4.4.2. Robust optimization for absolute metric learning 86

4.5. Numerical experiments 88

4.6. Discussion and conclusion 89

4.7. References 89

Chapter 5. A Comparison of Graph Centrality Measures Based on Lazy Random Walks 91
Collins ANGUZU, Christopher ENGSTRÖM and Sergei SILVESTROV

5.1. Introduction 91

5.1.1. Notations and abbreviations 93

5.1.2. Linear systems and the Neumann series 94

5.2. Review on some centrality measures 95

5.2.1. Degree centrality 95

5.2.2. Katz status and ß-centralities 95

5.2.3. Eigenvector and cumulative nomination centralities 96

5.2.4. Alpha centrality 97

5.2.5. PageRank centrality 98

5.2.6. Summary of the centrality measures as steady state, shifted and power series 99

5.3. Generalizations of centrality measures 99

5.3.1. Priors to centrality measures 99

5.3.2. Lazy variants of centrality measures 100

5.3.3. Lazy a-centrality 100

5.3.4. Lazy Katz centrality 102

5.3.5. Lazy cumulative nomination centrality 103

5.4. Experimental results 104

5.5. Discussion 106

5.6. Conclusion 109

5.7. Acknowledgments 109

5.8. References 110

Chapter 6. Error Detection in Sequential Laser Sensor Input 113
Gwenael GATTO and Olympia HADJILIADIS

6.1. Introduction 113

6.2. Data description 114

6.3. Algorithms 116

6.3.1. Algorithm for consecutive changes in mean 118

6.3.2. Algorithm for burst detection 120

6.4. Results 125

6.5. Acknowledgments 127

6.6. References 127

Chapter 7. Diagnostics and Visualization of Point Process Models for Event Times on a Social Network 129
Jing WU, Anna L. SMITH and Tian ZHENG

7.1. Introduction 129

7.2. Background 131

7.2.1. Univariate point processes 131

7.2.2. Network point processes 132

7.3. Model checking for time heterogeneity 134

7.3.1. Time rescaling theorem 134

7.3.2. Residual process 136

7.4. Model checking for network heterogeneity and structure 138

7.4.1. Kolmogorov-Smirnov test 138

7.4.2. Structure score based on the Pearson residual matrix 141

7.5. Summary 143

7.6. Acknowledgments 144

7.7. References 144

Part 2. Data Analysis Methods and Tools 147

Chapter 8. Exploring the Distribution of Conditional Quantile Estimates: An Application to Specific Costs of Pig Production in the European Union 149
Dominique DESBOIS

8.1. Introduction 150

8.2. Conceptual framework and methodological aspects 150

8.2.1. The empirical model for estimating the specific production costs 151

8.2.2. The procedures for estimating and testing conditional quantiles 152

8.2.3. Symbolic PCA of the specific cost distributions 154

8.2.4. Symbolic clustering analysis of the specific cost distributions 162

8.3. Results 165

8.3.1. The SO-PCA of specific cost estimates 167

8.3.2. The divisive hierarchy of specific cost estimates 170

8.4. Conclusion 171

8.5. References 172

Chapter 9. Maximization Problem Subject to Constraint of Availability in Semi-Markov Model of Operation 175
Franciszek GRABSKI

9.1. Introduction 175

9.2. Semi-Markov decision process 176

9.3. Semi-Markov decision model of operation 177

9.3.1. Description and assumptions 177

9.3.2. Model construction 177

9.4. Optimization problem 178

9.4.1. Linear programming method 179

9.5. Numerical example 182

9.6. Conclusion 184

9.7. References 185

Chapter 10. The Impact of Multicollinearity on Big Data Multivariate Analysis Modeling 187
Kimon NTOTSIS and Alex KARAGRIGORIOU

10.1. Introduction 187

10.2. Multicollinearity 188

10.3. Dimension reduction techniques 191

10.3.1. Beale et al 192

10.3.2. Principal component analysis 192

10.4. Application 194

10.4.1. The modeling of PPE 194

10.4.2. Concluding remarks 200

10.5. Acknowledgments 200

10.6. References 200

Chapter 11. Weak Signals in High-Dimensional Poisson Regression Models 203
Orawan REANGSEPHET, Supranee LISAWADI and Syed Ejaz AHMED

11.1. Introduction 203

11.2. Statistical background 204

11.3. Methodologies 205

11.3.1. Predictor screening methods 205

11.3.2. Post-screening parameter estimation methods 206

11.4. Numerical studies 208

11.4.1. Simulation settings and performance criteria 208

11.4.2. Results 209

11.5. Conclusion 217

11.6. Acknowledgments 218

11.7. References 218

Chapter 12. Groundwater Level Forecasting for Water Resource Management 221
Andrea ZIRULIA, Alessio BARBAGLI and Enrico GUASTALDI

12.1. Introduction 221

12.2. Materials and methods 222

12.2.1. Study area 222

12.2.2. Forecast method 222

12.3. Results 224

12.4. Conclusion 230

12.5. References 230

Chapter 13. Phase I Non-parametric Control Charts for Individual Observations: A Selective Review and Some Results 233
Christina PARPOULA

13.1. Introduction 234

13.1.1. Background 234

13.1.2. Univariate non-parametric process monitoring 235

13.2. Problem formulation 237

13.3. A comparative study 239

13.3.1. The existing methodologies 239

13.3.2. Simulation settings 240

13.3.3. Simulation-study results 242

13.4. Concluding remarks 247

13.5. References 247

Chapter 14. On Divergence and Dissimilarity Measures for Multiple Time Series 249
Konstantinos MAKRIS, Alex KARAGRIGORIOU and Ilia VONTA

14.1. Introduction 249

14.2. Classical measures 250

14.3. Divergence measures 252

14.4. Dissimilarity measures for ordered data 254

14.4.1. Standard dissimilarity measures 254

14.4.2. Advanced dissimilarity measures 256

14.5. Conclusion 259

14.6. References 259

List of Authors 261

Index 265

1
A Variant of Updating PageRank in Evolving Tree Graphs

A PageRank update refers to the process of computing new PageRank values after a change(s) (addition or removal of links/vertices) has occurred in real-life networks. The purpose of updating is to avoid re-calculating the values from scratch. To efficiently carry out the update, we consider PageRank to be the expected number of visits to a target vertex if multiple random walks are performed, starting at each vertex once and weighing each of these walks by a weight value. Hence, it might be looked at as updating a non-normalized PageRank. We focus on networks of tree graphs and propose an approach to sequentially update a scaled adjacency matrix after every change, as well as the levels of the vertices. In this way, we can update the PageRank of affected vertices by their corresponding levels.

1.1. Introduction

Most real-world networks are continuously changing, and this phenomenon has come with challenges to the known data mining algorithms that assume the static form of datasets (Bahmani et al. 2012). Besides, it is a primary aim of network analysts to keep track of critical nodes.

Since Brin and Page (1998) pioneered PageRank centrality more than two decades ago, the centrality measure has found its applications in many disciplines (Gleich 2015). Recently, the study of PageRank of evolving graphs has drawn considerable attention and several computational models have been proposed. For instance, Langville and Meyer (2006) proposed an algorithm to update PageRank using an aggregation/disaggregation concept. In Bahmani et al. (2012), an algorithm that estimates a PageRank vector by crawling a small portion of the graph has been proposed, while Ohsaka et al. (2015) proposed an updating method for personalized PageRanks. In addition, the idea of partitioning information networks into connected acyclic and strongly connected components and then applying iterative methods of solving linear systems - while keeping an eye on the component(s) that has evolved - is proposed in Engström (2016) and Engström and Silvestrov (2016). In fact, all of these authors treat PageRank computation using matrix-vector multiplication.

There is evidence that tree graphs have applications in science and engineering. For instance, in Han et al. (2016), a directed acyclic graph (DAG) has been used in optimization problems, while Shandilya et al. (2014) applies the graph model in security systems.

Furthermore, tree graphs have been used to investigate ecological systems and functional similarity in biological networks (Ulanowicz 2004; Rocchi et al. 2017). For the case of an ecosystem, analysis of extinction risk or identifying important species in such a network is of significance (Allesina and Bodini 2004; Allesina and Pascual 2009). The notion of modeling an ecosystem as a tree graph is the result of mutual dependence between interacting species, for instance, dominator tree and food web structures (Ulanowicz 2004). In fact, dependence interaction can lead to interesting problems of network evolution. For example, if an organism Y requires nutrients from an organism X, in short, X Y, then lack of nutrients from X may lead to the extinction of Y unless it switches to another form of resource by addition of a new link. This is one of the primary roles of ecosystem-based management, as mentioned in Rocchi et al. (2017).

According to Wang et al. (2010), it has been found that a gene's functions are closely associated with different diseases, and such a dependence relationship requires a tree graph model. In view of Wang et al. (2010) and Han et al. (2016), a graph model improves computation cost and conceptualizes large networks.

The existing methods are virtually rooted in the iterative techniques which might be demanding when a localized change only occurs in a large network. Furthermore, the methods pay little attention to specific kinds of networks such as tree and strongly connected graphs. Computing PageRank of such components can lead to a great reduction in time complexity. For example, vertices of a strongly connected graph can be reordered by a feedback vertex set. In light of all of the above, we focus on the algorithm that updates PageRank when vertices are reordered by their levels, specifically for evolving directed tree graphs.

The remaining sections of this chapter are organized as follows. In section 1.2, we propose a technique for updating transition matrices when an edge is added or removed. Section 1.3 presents a single vertex update of PageRank when an edge is inserted or removed. Furthermore, we demonstrate that refinement iterative formation of linear systems fits in a single vertex update. Section 1.4 is devoted to maintaining the levels of vertices after some changes, and the conclusion is given in section 1.5.

1.2. Notations and definitions

For convenience, we define some notations that will be used throughout this chapter.

- Let us denote a directed graph by G = (V, E), where V and E are sets of vertices and edges, respectively. The number of vertices in graph G will be denoted by |V | or n. Furthermore, G ? ?G will denote a new graph after some changes of the edges or vertices.
- A ? Rn×n represents a matrix derived from G, and Ai means the ith row of A.
- denotes a column vector, and represents a row vector. We denote as a vector with 1 in the ith entry and 0 elsewhere.
- is the personalization vector of the vertices, which can be thought of as preference assigned to each of the vertices, and c is the damping parameter, usually taken to be 0.85.

DEFINITION 1.1.- A directed tree graph G is a digraph without cycles.

DEFINITION 1.2.- The PageRank of a vertex vi, denoted by pi, is defined as

[1.1]

where deg(u) is the degree of u and No is the out-neighbor of u. If vi is a root, then pi = (1 - c)?i.

1.3. Updating the transition matrix

In this section, we present how to update the transition matrix when an edge(s) is added or removed. Two cases are considered:

(1) a source vertex vi has at most one outgoing edge and at least one edge is added;

(2) a source vertex vi has at least one outgoing edge and at least one edge is removed.

Each case will be handled separately.

Case 1: Addition of an edge to a source vertex with and without outgoing edges.

LEMMA 1.1.- Let A(1) and A(2) be transition matrices of a tree graph G and G??G, respectively. Let deg(vi) be the out-degree of vertex vi. Suppose an edge vi vj is added, then a new matrix A(2) can be updated as

(i) A(2) = A(1) + , if deg(vi) = 0;
(ii) A(2) = , if deg(vi) = 1.

We remark that a matrix A in (ii) is a sub-matrix of unaffected vertices in the graph G.

PROOF.- (i) Let be a column vector with 1 at the ith entry and 0 elsewhere. If there was no edge between vertex vi and vj, then the only edge is the new one, vi vj. To write the new transition matrix A(2), we apply the well-known shearing matrix properties: If , for i, j = 1, 2, . . . , n, then

where A(1) = . Upon adding vi vj the entry , hence A(2) = A(1) + .

(ii) Let us define an operation on identity matrix I, at an entry (i, i) as in Lancaster and Tismenetsky (1985), then

Then, the result of multiplying the ith row of A(1) by a non-zero scalar µ yields

[1.2]

Using Lemma 1.1 (i) and substituting , we write

where A = .

Adding an edge vi vk, for k ? j to the source vertex vi, the weight of the outgoing edges changes to . Let , and by applying relation [1.2], we get

[1.3]

rearranging [1.3] completes the proof.

Let V0 be a set of vertices without outgoing edges and suppose vh ? V0. Then, a generalization of Lemma 1.1(i) when multiple edges, say k edges, are added takes the form

[1.4]

Without loss of generality, we define the transition matrix A(2) when the source vertex vi has the degree deg(vi) and k edges are added to distinct target vertices as

[1.5]

where is a row vector with 1 in all the k target vertices and 0 elsewhere.

Case 2: Removal of edge(s) from a source vertex with at least one outgoing edge.

LEMMA 1.2.- Let A(1) and A(2) be transition matrices of tree graphs G and G??G, respectively. Suppose an edge vi vj is removed, then...

Systemvoraussetzungen

Als PDF speichern Als Link merken

Applied Modeling Techniques and Data Analysis 1

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

1 A Variant of Updating PageRank in Evolving Tree Graphs

1.1. Introduction

1.2. Notations and definitions

1.3. Updating the transition matrix

Systemvoraussetzungen

1
A Variant of Updating PageRank in Evolving Tree Graphs