Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
Alex Karagrigoriou is Professor of Probability and Statistics, Deputy Director of Graduate Studies in Statistics and Actuarial-Financial Mathematics, and Director of the Laboratory of Statistics and Data Analysis within the Department of Statistics and Actuarial-Financial Mathematics at the University of the Aegean, Greece.
Christina Parpoula is Assistant Professor of Applied Statistics and Research Methodology within the Department of Psychology at the Panteion University of Social and Political Sciences, Greece.
Christos H. Skiadas is Former Vice-Rector at the Technical University of Crete, Greece, and founder of its Data Analysis and Forecasting Laboratory. He continues his research in ManLab, within the faculty?s Department of Production Engineering and Management.
Preface xiYannis DIMOTIKALIS, Alex KARAGRIGORIOU, Christina PARPOULA and Christos H. SKIADAS
Part 1. Computational Data Analysis 1
Chapter 1. A Variant of Updating PageRank in Evolving Tree Graphs 3Benard ABOLA, Pitos Seleka BIGANDA, Christopher ENGSTRÖM, John Magero MANGO, Godwin KAKUBA and Sergei SILVESTROV
1.1. Introduction 3
1.2. Notations and definitions 5
1.3. Updating the transition matrix 5
1.4. Updating the PageRank of a tree graph 10
1.4.1. Updating the PageRank of tree graph when a batch of edges changes 12
1.4.2. An example of updating the PageRank of a tree 15
1.5. Maintaining the levels of vertices in a changing tree graph 17
1.6. Conclusion 21
1.7. Acknowledgments 21
1.8. References 21
Chapter 2. Nonlinearly Perturbed Markov Chains and Information Networks 23Benard ABOLA, Pitos Seleka BIGANDA, Sergei SILVESTROV, Dmitrii SILVESTROV, Christopher ENGSTRÖM, John Magero MANGO and Godwin KAKUBA
2.1. Introduction 23
2.2. Stationary distributions for Markov chains with damping component 26
2.2.1. Stationary distributions for Markov chains with damping component 26
2.2.2. The stationary distribution of the Markov chain X0,n 28
2.3. A perturbation analysis for stationary distributions of Markov chains with damping component 29
2.3.1. Continuity property for stationary probabilities 29
2.3.2. Rate of convergence for stationary distributions 29
2.3.3. Asymptotic expansions for stationary distributions 30
2.3.4. Results of numerical experiments 32
2.4. Coupling and ergodic theorems for perturbed Markov chains with damping component 39
2.4.1. Coupling for regularly perturbed Markov chains with damping component 39
2.4.2. Coupling for singularly perturbed Markov chains with damping component 41
2.4.3. Ergodic theorems for perturbed Markov chains with damping component in the triangular array mode 42
2.4.4. Numerical examples 43
2.5. Acknowledgments 51
2.6. References 51
Chapter 3. PageRank and Perturbed Markov Chains 57Pitos Seleka BIGANDA, Benard ABOLA, Christopher ENGSTRÖM, Sergei SILVESTROV, Godwin KAKUBA and John Magero MANGO
3.1. Introduction 57
3.2. PageRank of the first-order perturbed Markov chain 59
3.3. PageRank of the second-order perturbed Markov chain 60
3.4. Rates of convergence of Page Ranks of first- and second-order perturbed Markovchains 70
3.5. Conclusion 72
3.6. Acknowledgments 72
3.7. References 72
Chapter 4. Doubly Robust Data-driven Distributionally Robust Optimization 75Jose BLANCHET, Yang KANG, Fan ZHANG, Fei HE and Zhangyi HU
4.1. Introduction 75
4.2. DD-DRO, optimal transport and supervised machine learning 79
4.2.1. Optimal transport distances and discrepancies 80
4.3. Data-driven selection of optimal transport cost function 81
4.3.1. Data-driven cost functions via metric learning procedures 81
4.4. Robust optimization for metric learning 83
4.4.1. Robust optimization for relative metric learning 83
4.4.2. Robust optimization for absolute metric learning 86
4.5. Numerical experiments 88
4.6. Discussion and conclusion 89
4.7. References 89
Chapter 5. A Comparison of Graph Centrality Measures Based on Lazy Random Walks 91Collins ANGUZU, Christopher ENGSTRÖM and Sergei SILVESTROV
5.1. Introduction 91
5.1.1. Notations and abbreviations 93
5.1.2. Linear systems and the Neumann series 94
5.2. Review on some centrality measures 95
5.2.1. Degree centrality 95
5.2.2. Katz status and ß-centralities 95
5.2.3. Eigenvector and cumulative nomination centralities 96
5.2.4. Alpha centrality 97
5.2.5. PageRank centrality 98
5.2.6. Summary of the centrality measures as steady state, shifted and power series 99
5.3. Generalizations of centrality measures 99
5.3.1. Priors to centrality measures 99
5.3.2. Lazy variants of centrality measures 100
5.3.3. Lazy a-centrality 100
5.3.4. Lazy Katz centrality 102
5.3.5. Lazy cumulative nomination centrality 103
5.4. Experimental results 104
5.5. Discussion 106
5.6. Conclusion 109
5.7. Acknowledgments 109
5.8. References 110
Chapter 6. Error Detection in Sequential Laser Sensor Input 113Gwenael GATTO and Olympia HADJILIADIS
6.1. Introduction 113
6.2. Data description 114
6.3. Algorithms 116
6.3.1. Algorithm for consecutive changes in mean 118
6.3.2. Algorithm for burst detection 120
6.4. Results 125
6.5. Acknowledgments 127
6.6. References 127
Chapter 7. Diagnostics and Visualization of Point Process Models for Event Times on a Social Network 129Jing WU, Anna L. SMITH and Tian ZHENG
7.1. Introduction 129
7.2. Background 131
7.2.1. Univariate point processes 131
7.2.2. Network point processes 132
7.3. Model checking for time heterogeneity 134
7.3.1. Time rescaling theorem 134
7.3.2. Residual process 136
7.4. Model checking for network heterogeneity and structure 138
7.4.1. Kolmogorov-Smirnov test 138
7.4.2. Structure score based on the Pearson residual matrix 141
7.5. Summary 143
7.6. Acknowledgments 144
7.7. References 144
Part 2. Data Analysis Methods and Tools 147
Chapter 8. Exploring the Distribution of Conditional Quantile Estimates: An Application to Specific Costs of Pig Production in the European Union 149Dominique DESBOIS
8.1. Introduction 150
8.2. Conceptual framework and methodological aspects 150
8.2.1. The empirical model for estimating the specific production costs 151
8.2.2. The procedures for estimating and testing conditional quantiles 152
8.2.3. Symbolic PCA of the specific cost distributions 154
8.2.4. Symbolic clustering analysis of the specific cost distributions 162
8.3. Results 165
8.3.1. The SO-PCA of specific cost estimates 167
8.3.2. The divisive hierarchy of specific cost estimates 170
8.4. Conclusion 171
8.5. References 172
Chapter 9. Maximization Problem Subject to Constraint of Availability in Semi-Markov Model of Operation 175Franciszek GRABSKI
9.1. Introduction 175
9.2. Semi-Markov decision process 176
9.3. Semi-Markov decision model of operation 177
9.3.1. Description and assumptions 177
9.3.2. Model construction 177
9.4. Optimization problem 178
9.4.1. Linear programming method 179
9.5. Numerical example 182
9.6. Conclusion 184
9.7. References 185
Chapter 10. The Impact of Multicollinearity on Big Data Multivariate Analysis Modeling 187Kimon NTOTSIS and Alex KARAGRIGORIOU
10.1. Introduction 187
10.2. Multicollinearity 188
10.3. Dimension reduction techniques 191
10.3.1. Beale et al 192
10.3.2. Principal component analysis 192
10.4. Application 194
10.4.1. The modeling of PPE 194
10.4.2. Concluding remarks 200
10.5. Acknowledgments 200
10.6. References 200
Chapter 11. Weak Signals in High-Dimensional Poisson Regression Models 203Orawan REANGSEPHET, Supranee LISAWADI and Syed Ejaz AHMED
11.1. Introduction 203
11.2. Statistical background 204
11.3. Methodologies 205
11.3.1. Predictor screening methods 205
11.3.2. Post-screening parameter estimation methods 206
11.4. Numerical studies 208
11.4.1. Simulation settings and performance criteria 208
11.4.2. Results 209
11.5. Conclusion 217
11.6. Acknowledgments 218
11.7. References 218
Chapter 12. Groundwater Level Forecasting for Water Resource Management 221Andrea ZIRULIA, Alessio BARBAGLI and Enrico GUASTALDI
12.1. Introduction 221
12.2. Materials and methods 222
12.2.1. Study area 222
12.2.2. Forecast method 222
12.3. Results 224
12.4. Conclusion 230
12.5. References 230
Chapter 13. Phase I Non-parametric Control Charts for Individual Observations: A Selective Review and Some Results 233Christina PARPOULA
13.1. Introduction 234
13.1.1. Background 234
13.1.2. Univariate non-parametric process monitoring 235
13.2. Problem formulation 237
13.3. A comparative study 239
13.3.1. The existing methodologies 239
13.3.2. Simulation settings 240
13.3.3. Simulation-study results 242
13.4. Concluding remarks 247
13.5. References 247
Chapter 14. On Divergence and Dissimilarity Measures for Multiple Time Series 249Konstantinos MAKRIS, Alex KARAGRIGORIOU and Ilia VONTA
14.1. Introduction 249
14.2. Classical measures 250
14.3. Divergence measures 252
14.4. Dissimilarity measures for ordered data 254
14.4.1. Standard dissimilarity measures 254
14.4.2. Advanced dissimilarity measures 256
14.5. Conclusion 259
14.6. References 259
List of Authors 261
Index 265
A PageRank update refers to the process of computing new PageRank values after a change(s) (addition or removal of links/vertices) has occurred in real-life networks. The purpose of updating is to avoid re-calculating the values from scratch. To efficiently carry out the update, we consider PageRank to be the expected number of visits to a target vertex if multiple random walks are performed, starting at each vertex once and weighing each of these walks by a weight value. Hence, it might be looked at as updating a non-normalized PageRank. We focus on networks of tree graphs and propose an approach to sequentially update a scaled adjacency matrix after every change, as well as the levels of the vertices. In this way, we can update the PageRank of affected vertices by their corresponding levels.
Most real-world networks are continuously changing, and this phenomenon has come with challenges to the known data mining algorithms that assume the static form of datasets (Bahmani et al. 2012). Besides, it is a primary aim of network analysts to keep track of critical nodes.
Since Brin and Page (1998) pioneered PageRank centrality more than two decades ago, the centrality measure has found its applications in many disciplines (Gleich 2015). Recently, the study of PageRank of evolving graphs has drawn considerable attention and several computational models have been proposed. For instance, Langville and Meyer (2006) proposed an algorithm to update PageRank using an aggregation/disaggregation concept. In Bahmani et al. (2012), an algorithm that estimates a PageRank vector by crawling a small portion of the graph has been proposed, while Ohsaka et al. (2015) proposed an updating method for personalized PageRanks. In addition, the idea of partitioning information networks into connected acyclic and strongly connected components and then applying iterative methods of solving linear systems - while keeping an eye on the component(s) that has evolved - is proposed in Engström (2016) and Engström and Silvestrov (2016). In fact, all of these authors treat PageRank computation using matrix-vector multiplication.
There is evidence that tree graphs have applications in science and engineering. For instance, in Han et al. (2016), a directed acyclic graph (DAG) has been used in optimization problems, while Shandilya et al. (2014) applies the graph model in security systems.
Furthermore, tree graphs have been used to investigate ecological systems and functional similarity in biological networks (Ulanowicz 2004; Rocchi et al. 2017). For the case of an ecosystem, analysis of extinction risk or identifying important species in such a network is of significance (Allesina and Bodini 2004; Allesina and Pascual 2009). The notion of modeling an ecosystem as a tree graph is the result of mutual dependence between interacting species, for instance, dominator tree and food web structures (Ulanowicz 2004). In fact, dependence interaction can lead to interesting problems of network evolution. For example, if an organism Y requires nutrients from an organism X, in short, X Y, then lack of nutrients from X may lead to the extinction of Y unless it switches to another form of resource by addition of a new link. This is one of the primary roles of ecosystem-based management, as mentioned in Rocchi et al. (2017).
According to Wang et al. (2010), it has been found that a gene's functions are closely associated with different diseases, and such a dependence relationship requires a tree graph model. In view of Wang et al. (2010) and Han et al. (2016), a graph model improves computation cost and conceptualizes large networks.
The existing methods are virtually rooted in the iterative techniques which might be demanding when a localized change only occurs in a large network. Furthermore, the methods pay little attention to specific kinds of networks such as tree and strongly connected graphs. Computing PageRank of such components can lead to a great reduction in time complexity. For example, vertices of a strongly connected graph can be reordered by a feedback vertex set. In light of all of the above, we focus on the algorithm that updates PageRank when vertices are reordered by their levels, specifically for evolving directed tree graphs.
The remaining sections of this chapter are organized as follows. In section 1.2, we propose a technique for updating transition matrices when an edge is added or removed. Section 1.3 presents a single vertex update of PageRank when an edge is inserted or removed. Furthermore, we demonstrate that refinement iterative formation of linear systems fits in a single vertex update. Section 1.4 is devoted to maintaining the levels of vertices after some changes, and the conclusion is given in section 1.5.
For convenience, we define some notations that will be used throughout this chapter.
DEFINITION 1.1.- A directed tree graph G is a digraph without cycles.
DEFINITION 1.2.- The PageRank of a vertex vi, denoted by pi, is defined as
where deg(u) is the degree of u and No is the out-neighbor of u. If vi is a root, then pi = (1 - c)?i.
In this section, we present how to update the transition matrix when an edge(s) is added or removed. Two cases are considered:
(1) a source vertex vi has at most one outgoing edge and at least one edge is added;
(2) a source vertex vi has at least one outgoing edge and at least one edge is removed.
Each case will be handled separately.
Case 1: Addition of an edge to a source vertex with and without outgoing edges.
LEMMA 1.1.- Let A(1) and A(2) be transition matrices of a tree graph G and G??G, respectively. Let deg(vi) be the out-degree of vertex vi. Suppose an edge vi vj is added, then a new matrix A(2) can be updated as
We remark that a matrix A in (ii) is a sub-matrix of unaffected vertices in the graph G.
PROOF.- (i) Let be a column vector with 1 at the ith entry and 0 elsewhere. If there was no edge between vertex vi and vj, then the only edge is the new one, vi vj. To write the new transition matrix A(2), we apply the well-known shearing matrix properties: If , for i, j = 1, 2, . . . , n, then
where A(1) = . Upon adding vi vj the entry , hence A(2) = A(1) + .
(ii) Let us define an operation on identity matrix I, at an entry (i, i) as in Lancaster and Tismenetsky (1985), then
Then, the result of multiplying the ith row of A(1) by a non-zero scalar µ yields
Using Lemma 1.1 (i) and substituting , we write
where A = .
Adding an edge vi vk, for k ? j to the source vertex vi, the weight of the outgoing edges changes to . Let , and by applying relation [1.2], we get
rearranging [1.3] completes the proof.
?
Let V0 be a set of vertices without outgoing edges and suppose vh ? V0. Then, a generalization of Lemma 1.1(i) when multiple edges, say k edges, are added takes the form
Without loss of generality, we define the transition matrix A(2) when the source vertex vi has the degree deg(vi) and k edges are added to distinct target vertices as
where is a row vector with 1 in all the k target vertices and 0 elsewhere.
Case 2: Removal of edge(s) from a source vertex with at least one outgoing edge.
LEMMA 1.2.- Let A(1) and A(2) be transition matrices of tree graphs G and G??G, respectively. Suppose an edge vi vj is removed, then...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.