
Advances in Network Clustering and Blockmodeling
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
This book offers an integrated treatment of network clustering and blockmodeling, covering all of the newest approaches and methods that have been developed over the last decade. Presented in a comprehensive manner, it offers the foundations for understanding network structures and processes, and features a wide variety of new techniques addressing issues that occur during the partitioning of networks across multiple disciplines such as community detection, blockmodeling of valued networks, role assignment, and stochastic blockmodeling.
Written by a team of international experts in the field, Advances in Network Clustering and Blockmodeling offers a plethora of diverse perspectives covering topics such as: bibliometric analyses of the network clustering literature; clustering approaches to networks; label propagation for clustering; and treating missing network data before partitioning. It also examines the partitioning of signed networks, multimode networks, and linked networks. A chapter on structured networks and coarsegrained descriptions is presented, along with another on scientific coauthorship networks. The book finishes with a section covering conclusions and directions for future work. In addition, the editors provide numerous tables, figures, case studies, examples, datasets, and more.
* Offers a clear and insightful look at the state of the art in network clustering and blockmodeling
* Provides an excellent mix of mathematical rigor and practical application in a comprehensive manner
* Presents a suite of new methods, procedures, algorithms for partitioning networks, as well as new techniques for visualizing matrix arrays
* Features numerous examples throughout, enabling readers to gain a better understanding of research methods and to conduct their own research effectively
* Written by leading contributors in the field of spatial networks analysis
Advances in Network Clustering and Blockmodeling is an ideal book for graduate and undergraduate students taking courses on network analysis or working with networks using real data. It will also benefit researchers and practitioners interested in network analysis.
More details
Other editions
Additional editions


Persons
Patrick Doreian, MA, is Professor Emeritus of Sociology and Statistics at the University of Pittsburgh and has a research position at the Faculty of Social Sciences at the University of Ljubljana. He has published over 150 articles in academic journals as well as nine books and numerous book chapters. His co-authored book Generalized Blockmodeling written with Vladimir Batagelj and AnuSka Ferligoj received the Harrison White Outstanding Book Award in 2007. He is an honorary Senator of the University of Ljubljana, Slovenia.
Vladimir Batagelj, PhD, is Professor Emeritus of Discrete and Computational Mathematics from the University of Ljubljana, Slovenia. He is Senior Researcher at the Department of Theoretical Computer Science of IMFM, Ljubljana, the Institute Andrej MaruSic at University of Primorska, Koper, and NRU HSE International Laboratory for Applied Network Research, Moscow. He is a co-author of program Pajek for large network analysis and visualization. He is an elected member of the International Statistical Institute. With Patrick Doreian, AnuSka Ferligoj and NataSa Kej??ar he co-authored the book Understanding Large Temporal Networks and Spatial Networks, Wiley, 2014.
AnuSka Ferligoj, PhD, is Professor of Statistics at the Faculty of Social Sciences at the University of Ljubljana and academic supervisor at the NRU HSE International Laboratory for Applied Network Research, Moscow. She is a member of the European Academy of Sociology. In 2010 she received the Doctor et Professor Honoris Causa at the Eötvös Loránd University, Budapest, Hungary.
Content
List of Contributors xv
1 Introduction 1
Patrick Doreian, Vladimir Batagelj, and AnuSka Ferligoj
1.1 On the Chapters 1
1.2 Looking Forward 9
2 Bibliometric Analyses of the Network Clustering Literature 11
Vladimir Batagelj, AnuSka Ferligoj, and Patrick Doreian
2.1 Introduction 11
2.2 Data Collection and Cleaning 12
2.2.1 Most Cited/Citing Works 15
2.2.2 The Boundary Problem for Citation Networks 17
2.3 Analyses of the Citation Networks 19
2.3.1 Components 20
2.3.2 The CPM Path of the Main Citation Network 20
2.3.3 Key-Route Paths 20
2.3.4 Positioning Sets of Selected Works in a Citation Network 30
2.4 Link Islands in the Clustering Network Literature 35
2.4.1 Island 10: Community Detection and Blockmodeling 35
2.4.2 Island 7: Engineering Geology 36
2.4.3 Island 9: Geophysics 38
2.4.4 Island 2: Electromagnetic Fields and their Impact on Humans 38
2.4.5 Limitations and Extensions 40
2.5 Authors 41
2.5.1 Productivity Inside Research Groups 42
2.5.2 Collaboration 43
2.5.3 Citations Among Authors Contributing to the Network Partitioning Literature 45
2.5.4 Citations Among Journals 47
2.5.5 Bibliographic Coupling 50
2.5.6 Linking Through a Jaccard Network 58
2.6 Summary and Future Work 62
Acknowledgements 63
References 63
3 Clustering Approaches to Networks 65
Vladimir Batagelj
3.1 Introduction 65
3.2 Clustering 66
3.2.1 The Clustering Problem 66
3.2.2 Criterion Functions 67
3.2.3 Cluster-Error Function/Examples 72
3.2.4 The Complexity of the Clustering Problem 75
3.3 Approaches to Clustering 76
3.3.1 Local Optimization 76
3.3.2 Dynamic Programming 79
3.3.3 Hierarchical Methods 79
3.3.4 Adding Hierarchical Methods 83
3.3.5 The Leaders Method 84
3.4 Clustering Graphs and Networks 87
3.5 Clustering in Graphs and Networks 89
3.5.1 An Indirect Approach 89
3.5.2 A Direct Approach: Blockmodeling 90
3.5.3 Graph Theoretic Approaches 90
3.6 Agglomerative Method for Relational Constraints 90
3.6.1 Software Support 95
3.7 Some Examples 95
3.7.1 The US Geographical Data, 2016 95
3.7.2 Citations Among Authors from the Network Clustering Literature 98
3.8 Conclusion 102
Acknowledgements 102
References 102
4 Different Approaches to Community Detection 105
Martin Rosvall, Jean-Charles Delvenne, Michael T. Schaub, and Renaud Lambiotte
4.1 Introduction 105
4.2 Minimizing Constraint Violations: the Cut-based Perspective 107
4.3 Maximizing Internal Density: the Clustering Perspective 108
4.4 Identifying Structural Equivalence: the Stochastic Block Model Perspective 110
4.5 Identifying Coarse-grained Descriptions: the Dynamical Perspective 111
4.6 Discussion 114
4.7 Conclusions 116
Acknowledgements 116
References 116
5 Label Propagation for Clustering 121
Lovro subelj
5.1 Label Propagation Method 121
5.1.1 Resolution of Label Ties 123
5.1.2 Order of Label Propagation 123
5.1.3 Label Equilibrium Criterium 124
5.1.4 Algorithm and Complexity 125
5.2 Label Propagation as Optimization 127
5.3 Advances of Label Propagation 128
5.3.1 Label Propagation Under Constraints 129
5.3.2 Label Propagation with Preferences 130
5.3.3 Method Stability and Complexity 133
5.4 Extensions to Other Networks 137
5.5 Alternative Types of Network Structures 139
5.5.1 Overlapping Groups of Nodes 139
5.5.2 Hierarchy of Groups of Nodes 140
5.5.3 Structural Equivalence Groups 142
5.6 Applications of Label Propagation 146
5.7 Summary and Outlook 146
References 147
6 Blockmodeling of Valued Networks 151
Carl Nordlund and AleS ?iberna
6.1 Introduction 151
6.2 Valued Data Types 153
6.3 Transformations 154
6.3.1 Scaling Transformations 155
6.3.2 Dichotomization 157
6.3.3 Normalization Procedures 157
6.3.4 Iterative Row-column Normalization 158
6.3.5 Transaction-flow and Deviational Transformations 159
6.4 Indirect Clustering Approaches 160
6.4.1 Structural Equivalence: Indirect Metrics 160
6.4.2 The CONCOR Algorithm 161
6.4.3 Deviational Structural Equivalence: Indirect Approach 162
6.4.4 Regular Equivalence: The REGE Algorithms 162
6.4.5 Indirect Approaches: Finding Clusters, Interpreting Blocks 163
6.5 Direct Approaches 164
6.5.1 Generalized Blockmodeling 164
6.5.2 Generalized Blockmodeling of Valued Networks 165
6.5.3 Deviational Generalized Blockmodeling 166
6.6 On the Selection of Suitable Approaches 167
6.7 Examples 168
6.7.1 EIES Friendship Data at Time 2 168
6.7.2 Commodity Trade Within EU/EFTA 2010 173
6.8 Conclusion 183
Acknowledgements 185
References 185
7 Treating Missing Network Data Before Partitioning 189
Anja ?nidarSic, Patrick Doreian, and AnuSka Ferligoj
7.1 Introduction 189
7.2 Types of Missing Network Data 190
7.2.1 Measurement Errors in Recorded (Or Reported) Ties 190
7.2.2 Item Non-Response 192
7.2.3 Actor Non-Response 192
7.3 Treatments of Missing Data (Due to Actor Non-Response) 193
7.3.1 Reconstruction 194
7.3.2 Imputations of the Mean Values of Incoming Ties 196
7.3.3 Imputations of the Modal Values of Incoming Ties 196
7.3.4 Reconstruction and Imputations Based on Modal Values of Incoming Ties 197
7.3.5 Imputations of the Total Mean 197
7.3.6 Imputations of Median of the Three Nearest Neighbors based on Incoming Ties 197
7.3.7 Null Tie Imputations 198
7.3.8 Blockmodel Results for the Whole and Treated Networks 198
7.4 A Study Design Examining the Impact of Non-Response Treatments on Clustering Results 200
7.4.1 Some Features of Indirect and Direct Blockmodeling 200
7.4.2 Design of the Simulation Study 201
7.4.3 The Real Networks Used in the Simulation Studies 201
7.5 Results 202
7.5.1 Indirect Blockmodeling of Real Valued Networks 202
7.5.2 Indirect Blockmodeling on Real Binary Networks 210
7.5.3 Direct Blockmodeling of Binary Real Networks 216
7.6 Conclusions 222
Acknowledgements 223
References 223
8 Partitioning Signed Networks 225
Vincent Traag, Patrick Doreian, and Andrej Mrvar
8.1 Notation 225
8.2 Structural Balance Theory 226
8.2.1 Weak Structural Balance 230
8.3 Partitioning 232
8.3.1 Strong Structural Balance 233
8.3.2 Weak Structural Balance 237
8.3.3 Blockmodeling 238
8.3.4 Community Detection 239
8.4 Empirical Analysis 242
8.5 Summary and Future Work 247
References 248
9 Partitioning Multimode Networks 251
Martin G Everett and Stephen P Borgatti
9.1 Introduction 251
9.2 Two-Mode Partitioning 252
9.3 Community Detection 253
9.4 Dual Projection 254
9.5 Signed Two-Mode Networks 257
9.6 Spectral Methods 258
9.7 Clustering 261
9.8 More Complex Data 262
9.9 Conclusion 263
References 263
10 Blockmodeling Linked Networks 267
AleS ?iberna
10.1 Introduction 267
10.2 Blockmodeling Linked Networks 268
10.2.1 Separate Analysis 269
10.2.2 A True Linked Blockmodeling Approach 269
10.2.3 Weighting of Different Parts of a Linked Network 270
10.3 Examples 270
10.3.1 Co-authorship Network at Two Time-points 270
10.3.2 A Multilevel Network of Participants at a Trade Fair for TV Programs 277
10.4 Conclusion 284
Acknowledgements 285
References 285
11 Bayesian Stochastic Blockmodeling 289
Tiago P. Peixoto
11.1 Introduction 289
11.2 Structure Versus Randomness in Networks 290
11.3 The Stochastic Blockmodel 292
11.4 Bayesian Inference: The Posterior Probability of Partitions 294
11.5 Microcanonical Models and the Minimum Description Length Principle 298
11.6 The "Resolution Limit" Underfitting Problem and the Nested SBM 300
11.7 Model Variations 305
11.7.1 Model Selection 306
11.7.2 Degree Correction 306
11.7.3 Group Overlaps 310
11.7.4 Further Model Extensions 313
11.8 Efficient Inference Using Markov Chain Monte Carlo 314
11.9 To Sample or To Optimize? 317
11.10 Generalization and Prediction 321
11.11 Fundamental Limits of Inference: The Detectability-Indetectability Phase Transition 323
11.12 Conclusion 327
References 328
12 Structured Networks and Coarse-Grained Descriptions: A Dynamical Perspective 333
Michael T. Schaub, Jean-Charles Delvenne, Renaud Lambiotte, and Mauricio Barahona
12.1 Introduction 333
12.2 Part I: Dynamics on and of Networks 337
12.2.1 General Setup 337
12.2.2 Consensus Dynamics 338
12.2.3 Diffusion Processes and Random Walks 340
12.3 Part II: The Influence of Graph Structure on Network Dynamics 342
12.3.1 Time Scale Separation in Partitioned Networks 342
12.3.2 Strictly Invariant Subspaces of the Network Dynamics and External Equitable Partitions 345
12.3.3 Structural Balance: Consensus on Signed Networks and Polarized Opinion Dynamics 348
12.4 Part III: Using Dynamical Processes to Reveal Network Structure 351
12.4.1 A Generic Algorithmic Framework for Dynamics-Based Network Partitioning and Coarse Graining 352
12.4.2 Extending the Framework by using other Measures 354
12.5 Discussion 357
Acknowledgements 358
References 358
13 Scientific Co-Authorship Networks 363
Marjan Cugmas, AnuSka Ferligoj, and Luka Kronegger
13.1 Introduction 363
13.2 Methods 364
13.2.1 Blockmodeling 365
13.2.2 Measuring the Obtained Blockmodels' Stability 365
13.3 The Data 369
13.4 The Structure of Obtained Blockmodels 370
13.5 Stability of the Obtained Blockmodel Structures 378
13.5.1 Clustering of Scientific Disciplines According to Different Operationalizations of Core Stability 378
13.5.2 Explaining the Stability of Cores 382
13.6 Conclusions 384
Acknowledgements 386
References 386
14 Conclusions and Directions for Future Work 389
Patrick Doreian, AnuSka Ferligoj, and Vladimir Batagelj
14.1 Issues Raised within Chapters 390
14.2 Linking Ideas Found in Different Chapters 395
14.3 A Brief Summary and Conclusion 397
References 397
Topic Index 399
Person Index 407
1
Introduction
Patrick Doreian3,4, Vladimir Batagelj1,2,5 and Anuska Ferligoj3,5
1IMFM Ljubljana
2IAM, University of Primorska, Koper
3FDV, University of Ljubljana
4University of Pittsburgh
5NRU HSE Moscow
This book focuses on network clustering regardless of the disciplines within which a network was established. In the initial conception for the book, our attention was driven primarily by concerns regarding blockmodeling and community detection as they applied to social networks. But as we looked further into this general topic to invite potential contributors, we realized that the domain was much broader. The wide variety of approaches contained in this volume exemplifies this diversity. For us, as we assembled this volume, this was an exciting learning experience, one we hope will be experienced by readers of this book.
There is no single best approach to network clustering. Put differently, there is no cookie-cutter approach fitting all such data sets. Yes, there are adherents of one (their) approach who think this is the case. As shown in the chapters that follow, none of the authors of the contributed chapters share this very narrow view. This is a wide-open realm with multiple exciting approaches. We reflect further on this in the concluding chapter. Here, we describe briefly the contents of the following chapters merely as an introduction to them. In our view, each chapter merits close attention.
1.1 On the Chapters
As the book is concerned with network clustering, Chapter 2 offers an extended examination of the network clustering literature. Identifying the citation network for this literature turned out to be a complex task. Methods for doing this are described in detail. The initial search used the Web of Science (WoS) to identify documents using search terms. There were multiple searches, with the first being for 2015 and the final network was obtained for February 2017. This literature expanded dramatically and in a complex fashion.
While a citation network is composed of links between works treated as nodes, there is more to consider when other types of units are included. These include authors, journals, and keywords. As part of a more general strategy, one starting from the citation network but using additional information, multiple two-mode networks were constructed. This included an authorship network with works authors, a journal network featuring works journals, and a keyword network with works keywords. They help give more insight into the one-mode citation network having only scientific productions.
Chapter 2 lists the most cited works, the most citing works, the most used keywords, and a discussion of the "boundary problem" as it relates to citation networks. One message from this chapter is that the way the boundary of a citation network is established affects greatly the data analytic results that follow. This implies using great care in constructing citation networks - not all citation networks will suffice for meaningful analyses.
Chapter 2 presents a wide range of analyses of the citation network for the network clustering literature. Components of the network were established. Both critical project main paths and key-route paths were identified. In the analysis of this network, a clear transition between the blockmodeling and the community detection literatures was revealed. Another technique used in Chapter 2 is the identification of link islands which have higher levels of internal cohesion as a way identifying some important subnetworks of this literature. The largest of these link islands featured works from the blockmodeling and community detection literatures. Since the transition from the former to the latter, it seems the two have developed independently. This seems problematic given our belief in the utility of useful ideas flowing between fields and sub-fields.
The other islands discussed in Chapter 2 come from the fields of engineering geology, geophysics, as well as electromagnetic fields and their impacts on humans. One of the searches used in the searches of the WoS database included the terms "block model" and "block". The latter crops up in other literatures, a surprise for us. In considering these other link islands, another surprise awaited the authors of this chapter. We are used to debates in the social network literature regarding the difference between static and temporal approaches to studying networks. This divide is present also in the natural sciences.
Chapter 2 also contains an examination of authors and measures of productivity within research groups, collaboration, and an examination of citations among authors contributing to the network clustering literature. Again, the stark division between the community detection and blockmodeling literatures was clear. Examined also are citations between journals publishing articles in the broad area of the network clustering literature. The methodological details of doing this, as outlined in this chapter, merit further attention.
Bibliographic coupling, which occurs when two works both cite a third work in their bibliographies, was also examined. This included a sustained assessment as to how this coupling is measured. These tools were applied to the network clustering literature, especially for the largest identified island. This included an examination of the most frequently used keywords in the social networks literature and the physicist-driven approach to studying social networks. While some keywords were the same, there were considerable differences, again illustrating the different concerns in these two literatures.
Chapter 3 provides an overview of "classical" clustering including both the clustering of networks and clustering in networks. The clustering problem is considered as a discrete optimization problem which turns to be, in most cases, NP hard. Therefore, local optimization or greedy methods are usually used for solving it. These methods can be adapted also for clustering in networks (or clustering with relational constraints). The hierarchical agglomerative clustering method can be extended for efficiently clustering large sparse networks. This is illustrated with an analysis of normalized author citation networks from the network clustering literature from Chapter 2.
Chapter 4 describes different approaches to community detection. The authors ask very useful questions which led them to provide helpful guidelines for researchers contemplating network clustering within a community detection framework. It starts with a bold claim that there is no precise definition of a community. We concur. Their focused review of community detection methods makes it clear that researchers need to have a clear idea as to why any method is selected prior to its use. The authors point to the problem that the same term can have different meanings in different subfields, reflecting what was found in Chapter 2. They argue "community detection should not be viewed as a well-defined problem but rather an umbrella term with many facets." This delightful image is equally applicable to blockmodeling within social network analysis!
Four different approaches to community detection are outlined. The first uses the cut-based perspective. The second is a clustering approach maximizing the internal density of clusters and the third is the stochastic equivalence perspective. Finally, there is a dynamical perspective focusing on the impact of communities and dynamic processes to establish a dynamically relevant coarse-grained partitions of network structure. Four sections follow which provide precise descriptions of the fundamental properties of these approaches and the results stemming from adopting them.
Their discussion makes it clear that there is no single "best" community detection algorithm and that there can be multiple equally valid partitions of a network depending on which of the four considered approaches is used. Again, this sentence holds fully when "blockmodeling" is used instead of "community detection". Of course, this applies to all the network clustering approaches presented in this volume.
Chapter 5 provides an extensive discussion of label propagation as a heuristic method initially proposed for community detection. There is natural segue between Chapters 4 and 5. Label propagation is a partially supervised machine learning algorithm assigning labels to previously unlabeled data points. At the start of the algorithm, subsets of nodes have labels, which amounts to a clustering of them. These labels are propagated to the unlabeled points throughout the course of the algorithm. Nodes carry a label denoting the community to which they belong. Membership in a community changes based on the labels that the neighboring nodes possess as the labels diffuse through the network.
The author is clear that while it is not the most accurate or the most robust clustering method, a label propagation algorithm is simple to implement and is exceptionally fast. Networks with hundreds of millions of nodes can be analyzed readily. The early work on this approach is described with the basic ideas presented for simple undirected networks. However, the author points out that it can be used for many more types of networks, including those with multiple edges between nodes, two-mode networks, and signed networks. It can be used also to identify and delineate...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.