Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
Preface xv
Acknowledgments xix
Some Notation xxi
1 Introduction 1
1.1 Big Data: Basic Concepts 1
1.2 Data Mining with Big Data 9
1.3 A Mathematical Introduction to Big Data 13
1.4 A Mathematical Theory of Big Data 28
1.5 Smart Grid 34
1.6 Big Data and Smart Grid 36
1.7 Reading Guide 37
Bibliographical Remarks 39
Part I Fundamentals of Big Data 41
2 The Mathematical Foundations of Big Data Systems 43
2.1 Big Data Analytics 44
2.2 Big Data: Sense, Collect, Store, and Analyze 45
2.3 Intelligent Algorithms 48
2.4 Signal Processing for Smart Grid 48
2.5 Monitoring and Optimization for Power Grids 48
2.6 Distributed Sensing and Measurement for Power Grids 49
2.7 Real-time Analysis of Streaming Data 50
2.8 Salient Features of Big Data 51
2.9 Big Data for Quantum Systems 54
2.10 Big Data for Financial Systems 55
2.11 Big Data for Atmospheric Systems 73
2.12 Big Data for Sensing Networks 74
2.13 Big Data forWireless Networks 75
2.14 Big Data for Transportation 78
Bibliographical Remarks 78
3 Large Random Matrices: An Introduction 79
3.1 Modeling of Large Dimensional Data as Random Matrices 79
3.2 A Brief of Random MatrixTheory 81
3.3 Change Point of Views: From Vectors to Measures 85
3.4 The Stieltjes Transform of Measures 86
3.5 A Fundamental Result: The Marchenko-Pastur Equation 88
3.6 Linear Eigenvalue Statistics and Limit Laws 89
3.7 Central LimitTheorem for Linear Eigenvalue Statistics 99
3.8 Central LimitTheorem for Random Matrix S-1T 101
3.9 Independence for Random Matrices 103
3.10 Matrix-Valued Gaussian Distribution 110
3.11 Matrix-ValuedWishart Distribution 112
3.12 Moment Method 112
3.13 Stieltjes Transform Method 113
3.14 Concentration of the Spectral Measure for Large Random Matrices 114
3.15 Future Directions 117
Bibliographical Remarks 117
4 Linear Spectral Statistics of the Sample Covariance Matrix 121
4.1 Linear Spectral Statistics 121
4.2 Generalized Marchenko-Pastur Distributions 122
4.3 Estimation of Spectral Density Functions 127
4.4 Limiting Spectral Distribution of Time Series 146
Bibliographical Remarks 154
5 Large Hermitian Random Matrices and Free Random Variables 155
5.1 Large Economic/Financial Systems 156
5.2 Matrix-Valued Probability 157
5.3 Wishart-Levy Free Stable Random Matrices 166
5.4 Basic Concepts for Free Random Variables 168
5.5 The Analytical Spectrum of theWishart-Levy Random Matrix 172
5.6 Basic Properties of the Stieltjes Transform 176
5.7 Basic Theorems for the Stieltjes Transform 179
5.8 Free Probability for Hermitian Random Matrices 185
5.9 Random Vandermonde Matrix 196
5.10 Non-Asymptotic Analysis of State Estimation 200
Bibliographical Remarks 201
6 Large Non-Hermitian Random Matrices and Quatartenionic Free Probability Theory 203
6.1 Quatartenionic Free ProbabilityTheory 204
6.2 R-diagonalMatrices 209
6.3 The Sum of Non-Hermitian Random Matrices 216
6.4 The Product of Non-Hermitian Random Matrices 220
6.5 Singular Value Equivalent Models 226
6.6 The Power of the Non-Hermitian Random Matrix 234
6.7 Power Series of Large Non-Hermitian Random Matrices 239
6.8 Products of Random Ginibre Matrices 246
6.9 Products of Rectangular Gaussian Random Matrices 249
6.10 Product of ComplexWishart Matrices 252
6.11 Spectral Relations between Products and Powers 254
6.12 Products of Finite-Size I.I.D. Gaussian Random Matrices 258
6.13 Lyapunov Exponents for Products of Complex Gaussian Random Matrices 260
6.14 Euclidean Random Matrices 264
6.15 Random Matrices with Independent Entries and the Circular Law 273
6.16 The Circular Law and Outliers 275
6.17 Random SVD, Single Ring Law, and Outliers 285
6.18 The Elliptic Law and Outliers 295
Bibliographical Remarks 305
7 The Mathematical Foundations of Data Collection 307
7.1 Architectures and Applications for Big Data 307
7.2 Covariance Matrix Estimation 308
7.3 Spectral Estimators for Large Random Matrices 312
7.4 Asymptotic Framework for Matrix Reconstruction 319
7.5 Optimum Shrinkage 329
7.6 A Shrinkage Approach to Large-Scale Covariance Matrix Estimation 331
7.7 Eigenvectors of Large Sample Covariance Matrix Ensembles 338
7.8 A General Class of Random Matrices 351
Bibliographical Remarks 359
8 Matrix Hypothesis Testing using Large RandomMatrices 361
8.1 Motivating Examples 362
8.2 Hypothesis Test of Two Alternative Random Matrices 363
8.3 Eigenvalue Bounds for Expectation and Variance 364
8.4 Concentration of Empirical Distribution Functions 369
8.5 Random Quadratic Forms 381
8.6 Log-Determinant of Random Matrices 382
8.7 General MANOVA Matrices 383
8.8 Finite Rank Perturbations of Large Random Matrices 386
8.9 Hypothesis Tests for High-Dimensional Datasets 391
8.9.1 Motivation for Likelihood Ratio Test (LRT) and Covariance Matrix Tests 392
8.10 Roy's Largest Root Test 428
8.11 Optimal Tests of Hypotheses for Large Random Matrices 431
8.12 Matrix Elliptically Contoured Distributions 444
8.13 Hypothesis Testing for Matrix Elliptically Contoured Distributions 446
Bibliographical Remarks 452
Part II Smart Grid 455
9 Applications and Requirements of Smart Grid 457
9.1 History 457
9.2 Concepts and Vision 458
9.3 Today's Electric Grid 459
9.4 Future Smart Electrical Energy System 464
10 Technical Challenges for Smart Grid 471
Bibliographical Remarks 483
11 Big Data for Smart Grid 485
11.1 Power in Numbers: Big Data and Grid Infrastructure 485
11.2 Energy's Internet:The Convergence of Big Data and the Cloud 486
11.3 Edge Analytics: Consumers, Electric Vehicles, and Distributed Generation 486
11.4 CrosscuttingThemes: Big Data 486
11.5 Cloud Computing for Smart Grid 488
11.6 Data Storage, Data Access and Data Analysis 488
11.7 The State-of-the-Art Processing Techniques of Big Data 488
11.8 Big Data Meets the Smart Electrical Grid 488
11.9 4Vs of Big Data: Volume, Variety, Value and Velocity 489
11.10 Cloud Computing for Big Data 490
11.11 Big Data for Smart Grid 490
11.12 Information Platforms for Smart Grid 491
Bibliographical Remarks 491
12 Grid Monitoring and State Estimation 493
12.1 Phase Measurement Unit 493
12.2 Optimal PMU Placement 495
12.3 State Estimation 495
12.4 Basics of State Estimation 495
12.5 Evolution of State Estimation 496
12.6 Static State Estimation 497
12.7 Forecasting-Aided State Estimation 500
12.8 Phasor Measurement Units 501
12.9 Distributed System State Estimation 502
12.10 Event-Triggered Approaches to State Estimation 502
12.11 Bad Data Detection 502
12.12 Improved Bad Data Detection 504
12.13 Cyber-Attacks 504
12.14 Line Outage Detection 504
Bibliographical Remarks 504
13 False Data Injection Attacks against State Estimation 505
13.1 State Estimation 505
13.2 False Data Injection Attacks 507
13.3 MMSE State Estimation and Generalized Likelihood Ratio Test 508
13.4 Sparse Recovery from Nonlinear Measurements 512
13.5 Real-Time Intrusion Detection 515
Bibliographical Remarks 515
14 Demand Response 517
14.1 Why Engage Demand? 517
14.2 Optimal Real-time Pricing Algorithms 520
14.3 Transportation Electrification and Vehicle-to-Grid Applications 522
14.4 Grid Storage 522
Bibliographical Remarks 523
Part III Communications and Sensing 525
15 Big Data for Communications 527
15.1 5G and Big Data 527
15.2 5GWireless Communication Networks 527
15.3 Massive Multiple Input, Multiple Output 528
15.4 Free Probability for the Capacity of the Massive MIMO Channel 537
15.5 Spectral Sensing for Cognitive Radio 539
Bibliographical Remarks 539
16 Big Data for Sensing 541
16.1 Distributed Detection and Estimation 541
16.2 Euclidean Random Matrix 547
16.3 Decentralized Computing 548
Appendix A: Some Basic Results on Free Probability 551
Appendix B: Matrix-Valued Random Variables 557
References 567
Index 601
Data is "unreasonably effective" [2]. Nobel laureate Eugene Wigner referred to the unreasonable effectiveness of mathematics in the natural sciences [3]. What is big data? According to [4], its sizes are in the order of terabytes or petabytes; it is often online, and it is not available from a central source. It is diverse, may be loosely structured with a large percentage of data missing.It is heterogeneous.
The promise of data-driven decision-making is now broadly recognized [5-16]. There is no clear consensus about what big data is. In fact, there have been many controversial statements about big data, such as "Size is the only thing that matters."
Big data is a big deal [17]. The Big Data Research and Development Initiative has been launched by the US Federal government. "By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help accelerate the pace of discovery in science and engineering, strengthen our national security, and transform teaching and learning" [17]. Universities are beginning to create new courses to prepare the next generation of "data scientists."
The age of big data has already arrived with global data doubling every two years. The utility industry is not the only one facing this issue (Wal-Mart has a million customer transactions a day) but utilities have been slower to respond to the data deluge. Scaling up the algorithms to massive datasets is a big challenge.
According to [18]:
A key tenet of big data is that the world and the data that describe it are constantly changing and organizations that can recognize the changes and react quickly and intelligently will have the upper hand . As the volume of data explodes, organizations will need analytic tools that are reliable, robust and capable of being automated. At the same time, the analytics, algorithms, and user interfaces they employ will need to facilitate interactions with the people who work with the tools.
Data is a strategic resource, together with natural resources and human resources. Data is king! "Big data" refers to a technology phenomenon that has arisen since the late 1980s [19]. As computers have improved, their growing storage and processing capacities have provided new and powerful ways to gain insight into the world by sifting through enormous quantities of data available. But this insight, discoverable in previously unseen patterns and trends within these phenomenally large data sets, can be hard to detect without new analytic tools that can comb through the information and highlight points of interest.
Sources such as online or mobile financial transactions, social media traffic, and GPS coordinates, now generate over 2.5 quintillion bytes of so-called "big data" every day. The growth of mobile data traffic from subscribers in emerging markets exceeded 100% annually through 2015. There are new possibilities for international development (see Figure 1.1).
Figure 1.1 Big data, big impact: new possibilities for international development.
Source: Reproduced from [6] with permission from the World Economic Forum.
Big data at the societal level provides a powerful microscope, together with social mining-the ability to discover knowledge from these data. Scientific research is being revolutionized by this, and policy making is next in line, because big data and social mining are providing novel means for measuring and monitoring wellbeing in our society more realistically, beyond the GDP, more precisely, continuously, everywhere [20].
Most scientific disciplines are finding the data deluge to be extremely challenging, and tremendous opportunities can be realized if we can better organize and access the data [16].
Chris Anderson believed that the data deluge makes the scientific method obsolete [21]. Petabytes data tell us to say correlation is enough. There is no need to find the models. Correction replaces causality. It remains open to see whether the data growth will lead to a fundamental change in scientific methods.
In the computing industry we are now focussing on how to process big data [22].
A fundamental question is "What is the unifying theory for big data?" This book adopts the viewpoint that big data is a new science of combining data science and information science. Specialists in different fields deal with big data on their own, while information experts play a secondary role as assistants. In other words, most scientific problems are in the hands of specialists whereas only few problems-common to all fields-are refined by computing experts. When more and more problems are open, some unifying challenges common to all fields will arise. Big data from the Internet may receive more attention first. Big data from physical systems will become more and more important.
Big data will form a unique discipline that requires expertise from mathematics, statistics and computing algorithms.
Following the excellent review in [22], we highlight some challenges for big data:
The Defense Advanced Research Projects Agency's (DARPA's) XDATA program seeks to develop computational techniques and software tools for analyzing large volumes of data, both semistructured (e.g., tabular, relational, categorical, metadata) and unstructured (e.g., text documents, message traffic). Central challenges to be addressed include (i) developing scalable algorithms for processing imperfect data in distributed data stores, and (ii) creating effective human-computer interaction tools to facilitate rapidly customizable visual reasoning for diverse missions.
Data continues to be generated and digitally archived at increasing rates, resulting in vast databases available for search and analysis. Access to these databases has generated new insights through data-driven methods in the commercial, science, and computing sectors [23]. The defense section is "swimming in sensors and drowning in data." Big data arises from the Internet and the monitoring of industrial equipment. Sensor networks and the Internet of Things (IoT) are another two drivers.
There is a trend for data to be used that can sometimes be seen only once, for milliseconds, or can only be stored for a short time before being deleted, especially in some defense applications. This trend is accelerated by the proliferation of various digital devices and the Internet. It is important to develop fast, scalable, and efficient methods for processing and visualizing data.
The XDATA program's technology development is approached through four technical areas (TAs):
It is useful to consider distributed computing via architectures like MapReduce, and its open source implementation, Hadoop. Data collected by the Department of Defense (DoD) are particularly difficult to deal with, including missing data, missing connections between data, incomplete data, corrupted data, data of variable size and type, and so forth [23]. We need to develop analytical principles and implementations scalable to data volume and distributed computer architectures. The challenge for Technical Area 1 is how to enable systematic use of big data in the following list of topic areas:
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.