Advances in Network Clustering and Blockmodeling

Name: Advances in Network Clustering and Blockmodeling
Brand: Wiley
Price: 80.99 EUR
Availability: OnlineOnly

Patrick Doreian Vladimir Batagelj Anuska Ferligoj(Editor)

Wiley (Publisher)

1st Edition

Published on 16. December 2019

432 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-22467-9 (ISBN)

€80.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Provides an overview of the developments and advances in the field of network clustering and blockmodeling over the last 10 years

This book offers an integrated treatment of network clustering and blockmodeling, covering all of the newest approaches and methods that have been developed over the last decade. Presented in a comprehensive manner, it offers the foundations for understanding network structures and processes, and features a wide variety of new techniques addressing issues that occur during the partitioning of networks across multiple disciplines such as community detection, blockmodeling of valued networks, role assignment, and stochastic blockmodeling.

Written by a team of international experts in the field, Advances in Network Clustering and Blockmodeling offers a plethora of diverse perspectives covering topics such as: bibliometric analyses of the network clustering literature; clustering approaches to networks; label propagation for clustering; and treating missing network data before partitioning. It also examines the partitioning of signed networks, multimode networks, and linked networks. A chapter on structured networks and coarsegrained descriptions is presented, along with another on scientific coauthorship networks. The book finishes with a section covering conclusions and directions for future work. In addition, the editors provide numerous tables, figures, case studies, examples, datasets, and more.

* Offers a clear and insightful look at the state of the art in network clustering and blockmodeling

* Provides an excellent mix of mathematical rigor and practical application in a comprehensive manner

* Presents a suite of new methods, procedures, algorithms for partitioning networks, as well as new techniques for visualizing matrix arrays

* Features numerous examples throughout, enabling readers to gain a better understanding of research methods and to conduct their own research effectively

* Written by leading contributors in the field of spatial networks analysis

Advances in Network Clustering and Blockmodeling is an ideal book for graduate and undergraduate students taking courses on network analysis or working with networks using real data. It will also benefit researchers and practitioners interested in network analysis.

More details

Other editions

Persons

Content

List of Contributors xv

1 Introduction 1
Patrick Doreian, Vladimir Batagelj, and AnuSka Ferligoj

1.1 On the Chapters 1

1.2 Looking Forward 9

2 Bibliometric Analyses of the Network Clustering Literature 11
Vladimir Batagelj, AnuSka Ferligoj, and Patrick Doreian

2.1 Introduction 11

2.2 Data Collection and Cleaning 12

2.2.1 Most Cited/Citing Works 15

2.2.2 The Boundary Problem for Citation Networks 17

2.3 Analyses of the Citation Networks 19

2.3.1 Components 20

2.3.2 The CPM Path of the Main Citation Network 20

2.3.3 Key-Route Paths 20

2.3.4 Positioning Sets of Selected Works in a Citation Network 30

2.4 Link Islands in the Clustering Network Literature 35

2.4.1 Island 10: Community Detection and Blockmodeling 35

2.4.2 Island 7: Engineering Geology 36

2.4.3 Island 9: Geophysics 38

2.4.4 Island 2: Electromagnetic Fields and their Impact on Humans 38

2.4.5 Limitations and Extensions 40

2.5 Authors 41

2.5.1 Productivity Inside Research Groups 42

2.5.2 Collaboration 43

2.5.3 Citations Among Authors Contributing to the Network Partitioning Literature 45

2.5.4 Citations Among Journals 47

2.5.5 Bibliographic Coupling 50

2.5.6 Linking Through a Jaccard Network 58

2.6 Summary and Future Work 62

Acknowledgements 63

References 63

3 Clustering Approaches to Networks 65
Vladimir Batagelj

3.1 Introduction 65

3.2 Clustering 66

3.2.1 The Clustering Problem 66

3.2.2 Criterion Functions 67

3.2.3 Cluster-Error Function/Examples 72

3.2.4 The Complexity of the Clustering Problem 75

3.3 Approaches to Clustering 76

3.3.1 Local Optimization 76

3.3.2 Dynamic Programming 79

3.3.3 Hierarchical Methods 79

3.3.4 Adding Hierarchical Methods 83

3.3.5 The Leaders Method 84

3.4 Clustering Graphs and Networks 87

3.5 Clustering in Graphs and Networks 89

3.5.1 An Indirect Approach 89

3.5.2 A Direct Approach: Blockmodeling 90

3.5.3 Graph Theoretic Approaches 90

3.6 Agglomerative Method for Relational Constraints 90

3.6.1 Software Support 95

3.7 Some Examples 95

3.7.1 The US Geographical Data, 2016 95

3.7.2 Citations Among Authors from the Network Clustering Literature 98

3.8 Conclusion 102

Acknowledgements 102

References 102

4 Different Approaches to Community Detection 105
Martin Rosvall, Jean-Charles Delvenne, Michael T. Schaub, and Renaud Lambiotte

4.1 Introduction 105

4.2 Minimizing Constraint Violations: the Cut-based Perspective 107

4.3 Maximizing Internal Density: the Clustering Perspective 108

4.4 Identifying Structural Equivalence: the Stochastic Block Model Perspective 110

4.5 Identifying Coarse-grained Descriptions: the Dynamical Perspective 111

4.6 Discussion 114

4.7 Conclusions 116

Acknowledgements 116

References 116

5 Label Propagation for Clustering 121
Lovro subelj

5.1 Label Propagation Method 121

5.1.1 Resolution of Label Ties 123

5.1.2 Order of Label Propagation 123

5.1.3 Label Equilibrium Criterium 124

5.1.4 Algorithm and Complexity 125

5.2 Label Propagation as Optimization 127

5.3 Advances of Label Propagation 128

5.3.1 Label Propagation Under Constraints 129

5.3.2 Label Propagation with Preferences 130

5.3.3 Method Stability and Complexity 133

5.4 Extensions to Other Networks 137

5.5 Alternative Types of Network Structures 139

5.5.1 Overlapping Groups of Nodes 139

5.5.2 Hierarchy of Groups of Nodes 140

5.5.3 Structural Equivalence Groups 142

5.6 Applications of Label Propagation 146

5.7 Summary and Outlook 146

References 147

6 Blockmodeling of Valued Networks 151
Carl Nordlund and AleS ?iberna

6.1 Introduction 151

6.2 Valued Data Types 153

6.3 Transformations 154

6.3.1 Scaling Transformations 155

6.3.2 Dichotomization 157

6.3.3 Normalization Procedures 157

6.3.4 Iterative Row-column Normalization 158

6.3.5 Transaction-flow and Deviational Transformations 159

6.4 Indirect Clustering Approaches 160

6.4.1 Structural Equivalence: Indirect Metrics 160

6.4.2 The CONCOR Algorithm 161

6.4.3 Deviational Structural Equivalence: Indirect Approach 162

6.4.4 Regular Equivalence: The REGE Algorithms 162

6.4.5 Indirect Approaches: Finding Clusters, Interpreting Blocks 163

6.5 Direct Approaches 164

6.5.1 Generalized Blockmodeling 164

6.5.2 Generalized Blockmodeling of Valued Networks 165

6.5.3 Deviational Generalized Blockmodeling 166

6.6 On the Selection of Suitable Approaches 167

6.7 Examples 168

6.7.1 EIES Friendship Data at Time 2 168

6.7.2 Commodity Trade Within EU/EFTA 2010 173

6.8 Conclusion 183

Acknowledgements 185

References 185

7 Treating Missing Network Data Before Partitioning 189
Anja ?nidarSic, Patrick Doreian, and AnuSka Ferligoj

7.1 Introduction 189

7.2 Types of Missing Network Data 190

7.2.1 Measurement Errors in Recorded (Or Reported) Ties 190

7.2.2 Item Non-Response 192

7.2.3 Actor Non-Response 192

7.3 Treatments of Missing Data (Due to Actor Non-Response) 193

7.3.1 Reconstruction 194

7.3.2 Imputations of the Mean Values of Incoming Ties 196

7.3.3 Imputations of the Modal Values of Incoming Ties 196

7.3.4 Reconstruction and Imputations Based on Modal Values of Incoming Ties 197

7.3.5 Imputations of the Total Mean 197

7.3.6 Imputations of Median of the Three Nearest Neighbors based on Incoming Ties 197

7.3.7 Null Tie Imputations 198

7.3.8 Blockmodel Results for the Whole and Treated Networks 198

7.4 A Study Design Examining the Impact of Non-Response Treatments on Clustering Results 200

7.4.1 Some Features of Indirect and Direct Blockmodeling 200

7.4.2 Design of the Simulation Study 201

7.4.3 The Real Networks Used in the Simulation Studies 201

7.5 Results 202

7.5.1 Indirect Blockmodeling of Real Valued Networks 202

7.5.2 Indirect Blockmodeling on Real Binary Networks 210

7.5.3 Direct Blockmodeling of Binary Real Networks 216

7.6 Conclusions 222

Acknowledgements 223

References 223

8 Partitioning Signed Networks 225
Vincent Traag, Patrick Doreian, and Andrej Mrvar

8.1 Notation 225

8.2 Structural Balance Theory 226

8.2.1 Weak Structural Balance 230

8.3 Partitioning 232

8.3.1 Strong Structural Balance 233

8.3.2 Weak Structural Balance 237

8.3.3 Blockmodeling 238

8.3.4 Community Detection 239

8.4 Empirical Analysis 242

8.5 Summary and Future Work 247

References 248

9 Partitioning Multimode Networks 251
Martin G Everett and Stephen P Borgatti

9.1 Introduction 251

9.2 Two-Mode Partitioning 252

9.3 Community Detection 253

9.4 Dual Projection 254

9.5 Signed Two-Mode Networks 257

9.6 Spectral Methods 258

9.7 Clustering 261

9.8 More Complex Data 262

9.9 Conclusion 263

References 263

10 Blockmodeling Linked Networks 267
AleS ?iberna

10.1 Introduction 267

10.2 Blockmodeling Linked Networks 268

10.2.1 Separate Analysis 269

10.2.2 A True Linked Blockmodeling Approach 269

10.2.3 Weighting of Different Parts of a Linked Network 270

10.3 Examples 270

10.3.1 Co-authorship Network at Two Time-points 270

10.3.2 A Multilevel Network of Participants at a Trade Fair for TV Programs 277

10.4 Conclusion 284

Acknowledgements 285

References 285

11 Bayesian Stochastic Blockmodeling 289
Tiago P. Peixoto

11.1 Introduction 289

11.2 Structure Versus Randomness in Networks 290

11.3 The Stochastic Blockmodel 292

11.4 Bayesian Inference: The Posterior Probability of Partitions 294

11.5 Microcanonical Models and the Minimum Description Length Principle 298

11.6 The "Resolution Limit" Underfitting Problem and the Nested SBM 300

11.7 Model Variations 305

11.7.1 Model Selection 306

11.7.2 Degree Correction 306

11.7.3 Group Overlaps 310

11.7.4 Further Model Extensions 313

11.8 Efficient Inference Using Markov Chain Monte Carlo 314

11.9 To Sample or To Optimize? 317

11.10 Generalization and Prediction 321

11.11 Fundamental Limits of Inference: The Detectability-Indetectability Phase Transition 323

11.12 Conclusion 327

References 328

12 Structured Networks and Coarse-Grained Descriptions: A Dynamical Perspective 333
Michael T. Schaub, Jean-Charles Delvenne, Renaud Lambiotte, and Mauricio Barahona

12.1 Introduction 333

12.2 Part I: Dynamics on and of Networks 337

12.2.1 General Setup 337

12.2.2 Consensus Dynamics 338

12.2.3 Diffusion Processes and Random Walks 340

12.3 Part II: The Influence of Graph Structure on Network Dynamics 342

12.3.1 Time Scale Separation in Partitioned Networks 342

12.3.2 Strictly Invariant Subspaces of the Network Dynamics and External Equitable Partitions 345

12.3.3 Structural Balance: Consensus on Signed Networks and Polarized Opinion Dynamics 348

12.4 Part III: Using Dynamical Processes to Reveal Network Structure 351

12.4.1 A Generic Algorithmic Framework for Dynamics-Based Network Partitioning and Coarse Graining 352

12.4.2 Extending the Framework by using other Measures 354

12.5 Discussion 357

Acknowledgements 358

References 358

13 Scientific Co-Authorship Networks 363
Marjan Cugmas, AnuSka Ferligoj, and Luka Kronegger

13.1 Introduction 363

13.2 Methods 364

13.2.1 Blockmodeling 365

13.2.2 Measuring the Obtained Blockmodels' Stability 365

13.3 The Data 369

13.4 The Structure of Obtained Blockmodels 370

13.5 Stability of the Obtained Blockmodel Structures 378

13.5.1 Clustering of Scientific Disciplines According to Different Operationalizations of Core Stability 378

13.5.2 Explaining the Stability of Cores 382

13.6 Conclusions 384

Acknowledgements 386

References 386

14 Conclusions and Directions for Future Work 389
Patrick Doreian, AnuSka Ferligoj, and Vladimir Batagelj

14.1 Issues Raised within Chapters 390

14.2 Linking Ideas Found in Different Chapters 395

14.3 A Brief Summary and Conclusion 397

References 397

Topic Index 399

Person Index 407

1
Introduction

Patrick Doreian3,4, Vladimir Batagelj1,2,5 and Anuska Ferligoj3,5

1IMFM Ljubljana

2IAM, University of Primorska, Koper

3FDV, University of Ljubljana

4University of Pittsburgh

5NRU HSE Moscow

This book focuses on network clustering regardless of the disciplines within which a network was established. In the initial conception for the book, our attention was driven primarily by concerns regarding blockmodeling and community detection as they applied to social networks. But as we looked further into this general topic to invite potential contributors, we realized that the domain was much broader. The wide variety of approaches contained in this volume exemplifies this diversity. For us, as we assembled this volume, this was an exciting learning experience, one we hope will be experienced by readers of this book.

There is no single best approach to network clustering. Put differently, there is no cookie-cutter approach fitting all such data sets. Yes, there are adherents of one (their) approach who think this is the case. As shown in the chapters that follow, none of the authors of the contributed chapters share this very narrow view. This is a wide-open realm with multiple exciting approaches. We reflect further on this in the concluding chapter. Here, we describe briefly the contents of the following chapters merely as an introduction to them. In our view, each chapter merits close attention.

1.1 On the Chapters

As the book is concerned with network clustering, Chapter 2 offers an extended examination of the network clustering literature. Identifying the citation network for this literature turned out to be a complex task. Methods for doing this are described in detail. The initial search used the Web of Science (WoS) to identify documents using search terms. There were multiple searches, with the first being for 2015 and the final network was obtained for February 2017. This literature expanded dramatically and in a complex fashion.

While a citation network is composed of links between works treated as nodes, there is more to consider when other types of units are included. These include authors, journals, and keywords. As part of a more general strategy, one starting from the citation network but using additional information, multiple two-mode networks were constructed. This included an authorship network with works authors, a journal network featuring works journals, and a keyword network with works keywords. They help give more insight into the one-mode citation network having only scientific productions.

Chapter 2 lists the most cited works, the most citing works, the most used keywords, and a discussion of the "boundary problem" as it relates to citation networks. One message from this chapter is that the way the boundary of a citation network is established affects greatly the data analytic results that follow. This implies using great care in constructing citation networks - not all citation networks will suffice for meaningful analyses.

Chapter 2 presents a wide range of analyses of the citation network for the network clustering literature. Components of the network were established. Both critical project main paths and key-route paths were identified. In the analysis of this network, a clear transition between the blockmodeling and the community detection literatures was revealed. Another technique used in Chapter 2 is the identification of link islands which have higher levels of internal cohesion as a way identifying some important subnetworks of this literature. The largest of these link islands featured works from the blockmodeling and community detection literatures. Since the transition from the former to the latter, it seems the two have developed independently. This seems problematic given our belief in the utility of useful ideas flowing between fields and sub-fields.

The other islands discussed in Chapter 2 come from the fields of engineering geology, geophysics, as well as electromagnetic fields and their impacts on humans. One of the searches used in the searches of the WoS database included the terms "block model" and "block". The latter crops up in other literatures, a surprise for us. In considering these other link islands, another surprise awaited the authors of this chapter. We are used to debates in the social network literature regarding the difference between static and temporal approaches to studying networks. This divide is present also in the natural sciences.

Chapter 2 also contains an examination of authors and measures of productivity within research groups, collaboration, and an examination of citations among authors contributing to the network clustering literature. Again, the stark division between the community detection and blockmodeling literatures was clear. Examined also are citations between journals publishing articles in the broad area of the network clustering literature. The methodological details of doing this, as outlined in this chapter, merit further attention.

Bibliographic coupling, which occurs when two works both cite a third work in their bibliographies, was also examined. This included a sustained assessment as to how this coupling is measured. These tools were applied to the network clustering literature, especially for the largest identified island. This included an examination of the most frequently used keywords in the social networks literature and the physicist-driven approach to studying social networks. While some keywords were the same, there were considerable differences, again illustrating the different concerns in these two literatures.

Chapter 3 provides an overview of "classical" clustering including both the clustering of networks and clustering in networks. The clustering problem is considered as a discrete optimization problem which turns to be, in most cases, NP hard. Therefore, local optimization or greedy methods are usually used for solving it. These methods can be adapted also for clustering in networks (or clustering with relational constraints). The hierarchical agglomerative clustering method can be extended for efficiently clustering large sparse networks. This is illustrated with an analysis of normalized author citation networks from the network clustering literature from Chapter 2.

Chapter 4 describes different approaches to community detection. The authors ask very useful questions which led them to provide helpful guidelines for researchers contemplating network clustering within a community detection framework. It starts with a bold claim that there is no precise definition of a community. We concur. Their focused review of community detection methods makes it clear that researchers need to have a clear idea as to why any method is selected prior to its use. The authors point to the problem that the same term can have different meanings in different subfields, reflecting what was found in Chapter 2. They argue "community detection should not be viewed as a well-defined problem but rather an umbrella term with many facets." This delightful image is equally applicable to blockmodeling within social network analysis!

Four different approaches to community detection are outlined. The first uses the cut-based perspective. The second is a clustering approach maximizing the internal density of clusters and the third is the stochastic equivalence perspective. Finally, there is a dynamical perspective focusing on the impact of communities and dynamic processes to establish a dynamically relevant coarse-grained partitions of network structure. Four sections follow which provide precise descriptions of the fundamental properties of these approaches and the results stemming from adopting them.

Their discussion makes it clear that there is no single "best" community detection algorithm and that there can be multiple equally valid partitions of a network depending on which of the four considered approaches is used. Again, this sentence holds fully when "blockmodeling" is used instead of "community detection". Of course, this applies to all the network clustering approaches presented in this volume.

Chapter 5 provides an extensive discussion of label propagation as a heuristic method initially proposed for community detection. There is natural segue between Chapters 4 and 5. Label propagation is a partially supervised machine learning algorithm assigning labels to previously unlabeled data points. At the start of the algorithm, subsets of nodes have labels, which amounts to a clustering of them. These labels are propagated to the unlabeled points throughout the course of the algorithm. Nodes carry a label denoting the community to which they belong. Membership in a community changes based on the labels that the neighboring nodes possess as the labels diffuse through the network.

The author is clear that while it is not the most accurate or the most robust clustering method, a label propagation algorithm is simple to implement and is exceptionally fast. Networks with hundreds of millions of nodes can be analyzed readily. The early work on this approach is described with the basic ideas presented for simple undirected networks. However, the author points out that it can be used for many more types of networks, including those with multiple edges between nodes, two-mode networks, and signed networks. It can be used also to identify and delineate...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Advances in Network Clustering and Blockmodeling

Description

More details

Other editions

Additional editions

Persons

Content

1
Introduction

1.1 On the Chapters

System requirements