Smart Grid using Big Data Analytics

Name: Smart Grid using Big Data Analytics | A Random Matrix Theory Approach
Brand: Wiley
Price: 124.95 EUR
Availability: OnlineOnly

A Random Matrix Theory Approach

Robert C. Qiu Paul Antonik(Autor*in)

Wiley (Verlag)

Erschienen am 8. Februar 2017

632 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-118-71679-3 (ISBN)

124,95 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

Preface xv

Acknowledgments xix

Some Notation xxi

1 Introduction 1

1.1 Big Data: Basic Concepts 1

1.2 Data Mining with Big Data 9

1.3 A Mathematical Introduction to Big Data 13

1.4 A Mathematical Theory of Big Data 28

1.5 Smart Grid 34

1.6 Big Data and Smart Grid 36

1.7 Reading Guide 37

Bibliographical Remarks 39

Part I Fundamentals of Big Data 41

2 The Mathematical Foundations of Big Data Systems 43

2.1 Big Data Analytics 44

2.2 Big Data: Sense, Collect, Store, and Analyze 45

2.3 Intelligent Algorithms 48

2.4 Signal Processing for Smart Grid 48

2.5 Monitoring and Optimization for Power Grids 48

2.6 Distributed Sensing and Measurement for Power Grids 49

2.7 Real-time Analysis of Streaming Data 50

2.8 Salient Features of Big Data 51

2.9 Big Data for Quantum Systems 54

2.10 Big Data for Financial Systems 55

2.11 Big Data for Atmospheric Systems 73

2.12 Big Data for Sensing Networks 74

2.13 Big Data forWireless Networks 75

2.14 Big Data for Transportation 78

Bibliographical Remarks 78

3 Large Random Matrices: An Introduction 79

3.1 Modeling of Large Dimensional Data as Random Matrices 79

3.2 A Brief of Random MatrixTheory 81

3.3 Change Point of Views: From Vectors to Measures 85

3.4 The Stieltjes Transform of Measures 86

3.5 A Fundamental Result: The Marchenko-Pastur Equation 88

3.6 Linear Eigenvalue Statistics and Limit Laws 89

3.7 Central LimitTheorem for Linear Eigenvalue Statistics 99

3.8 Central LimitTheorem for Random Matrix S-1T 101

3.9 Independence for Random Matrices 103

3.10 Matrix-Valued Gaussian Distribution 110

3.11 Matrix-ValuedWishart Distribution 112

3.12 Moment Method 112

3.13 Stieltjes Transform Method 113

3.14 Concentration of the Spectral Measure for Large Random Matrices 114

3.15 Future Directions 117

Bibliographical Remarks 117

4 Linear Spectral Statistics of the Sample Covariance Matrix 121

4.1 Linear Spectral Statistics 121

4.2 Generalized Marchenko-Pastur Distributions 122

4.3 Estimation of Spectral Density Functions 127

4.4 Limiting Spectral Distribution of Time Series 146

Bibliographical Remarks 154

5 Large Hermitian Random Matrices and Free Random Variables 155

5.1 Large Economic/Financial Systems 156

5.2 Matrix-Valued Probability 157

5.3 Wishart-Levy Free Stable Random Matrices 166

5.4 Basic Concepts for Free Random Variables 168

5.5 The Analytical Spectrum of theWishart-Levy Random Matrix 172

5.6 Basic Properties of the Stieltjes Transform 176

5.7 Basic Theorems for the Stieltjes Transform 179

5.8 Free Probability for Hermitian Random Matrices 185

5.9 Random Vandermonde Matrix 196

5.10 Non-Asymptotic Analysis of State Estimation 200

Bibliographical Remarks 201

6 Large Non-Hermitian Random Matrices and Quatartenionic Free Probability Theory 203

6.1 Quatartenionic Free ProbabilityTheory 204

6.2 R-diagonalMatrices 209

6.3 The Sum of Non-Hermitian Random Matrices 216

6.4 The Product of Non-Hermitian Random Matrices 220

6.5 Singular Value Equivalent Models 226

6.6 The Power of the Non-Hermitian Random Matrix 234

6.7 Power Series of Large Non-Hermitian Random Matrices 239

6.8 Products of Random Ginibre Matrices 246

6.9 Products of Rectangular Gaussian Random Matrices 249

6.10 Product of ComplexWishart Matrices 252

6.11 Spectral Relations between Products and Powers 254

6.12 Products of Finite-Size I.I.D. Gaussian Random Matrices 258

6.13 Lyapunov Exponents for Products of Complex Gaussian Random Matrices 260

6.14 Euclidean Random Matrices 264

6.15 Random Matrices with Independent Entries and the Circular Law 273

6.16 The Circular Law and Outliers 275

6.17 Random SVD, Single Ring Law, and Outliers 285

6.18 The Elliptic Law and Outliers 295

Bibliographical Remarks 305

7 The Mathematical Foundations of Data Collection 307

7.1 Architectures and Applications for Big Data 307

7.2 Covariance Matrix Estimation 308

7.3 Spectral Estimators for Large Random Matrices 312

7.4 Asymptotic Framework for Matrix Reconstruction 319

7.5 Optimum Shrinkage 329

7.6 A Shrinkage Approach to Large-Scale Covariance Matrix Estimation 331

7.7 Eigenvectors of Large Sample Covariance Matrix Ensembles 338

7.8 A General Class of Random Matrices 351

Bibliographical Remarks 359

8 Matrix Hypothesis Testing using Large RandomMatrices 361

8.1 Motivating Examples 362

8.2 Hypothesis Test of Two Alternative Random Matrices 363

8.3 Eigenvalue Bounds for Expectation and Variance 364

8.4 Concentration of Empirical Distribution Functions 369

8.5 Random Quadratic Forms 381

8.6 Log-Determinant of Random Matrices 382

8.7 General MANOVA Matrices 383

8.8 Finite Rank Perturbations of Large Random Matrices 386

8.9 Hypothesis Tests for High-Dimensional Datasets 391

8.9.1 Motivation for Likelihood Ratio Test (LRT) and Covariance Matrix Tests 392

8.10 Roy's Largest Root Test 428

8.11 Optimal Tests of Hypotheses for Large Random Matrices 431

8.12 Matrix Elliptically Contoured Distributions 444

8.13 Hypothesis Testing for Matrix Elliptically Contoured Distributions 446

Bibliographical Remarks 452

Part II Smart Grid 455

9 Applications and Requirements of Smart Grid 457

9.1 History 457

9.2 Concepts and Vision 458

9.3 Today's Electric Grid 459

9.4 Future Smart Electrical Energy System 464

10 Technical Challenges for Smart Grid 471

Bibliographical Remarks 483

11 Big Data for Smart Grid 485

11.1 Power in Numbers: Big Data and Grid Infrastructure 485

11.2 Energy's Internet:The Convergence of Big Data and the Cloud 486

11.3 Edge Analytics: Consumers, Electric Vehicles, and Distributed Generation 486

11.4 CrosscuttingThemes: Big Data 486

11.5 Cloud Computing for Smart Grid 488

11.6 Data Storage, Data Access and Data Analysis 488

11.7 The State-of-the-Art Processing Techniques of Big Data 488

11.8 Big Data Meets the Smart Electrical Grid 488

11.9 4Vs of Big Data: Volume, Variety, Value and Velocity 489

11.10 Cloud Computing for Big Data 490

11.11 Big Data for Smart Grid 490

11.12 Information Platforms for Smart Grid 491

Bibliographical Remarks 491

12 Grid Monitoring and State Estimation 493

12.1 Phase Measurement Unit 493

12.2 Optimal PMU Placement 495

12.3 State Estimation 495

12.4 Basics of State Estimation 495

12.5 Evolution of State Estimation 496

12.6 Static State Estimation 497

12.7 Forecasting-Aided State Estimation 500

12.8 Phasor Measurement Units 501

12.9 Distributed System State Estimation 502

12.10 Event-Triggered Approaches to State Estimation 502

12.11 Bad Data Detection 502

12.12 Improved Bad Data Detection 504

12.13 Cyber-Attacks 504

12.14 Line Outage Detection 504

Bibliographical Remarks 504

13 False Data Injection Attacks against State Estimation 505

13.1 State Estimation 505

13.2 False Data Injection Attacks 507

13.3 MMSE State Estimation and Generalized Likelihood Ratio Test 508

13.4 Sparse Recovery from Nonlinear Measurements 512

13.5 Real-Time Intrusion Detection 515

Bibliographical Remarks 515

14 Demand Response 517

14.1 Why Engage Demand? 517

14.2 Optimal Real-time Pricing Algorithms 520

14.3 Transportation Electrification and Vehicle-to-Grid Applications 522

14.4 Grid Storage 522

Bibliographical Remarks 523

Part III Communications and Sensing 525

15 Big Data for Communications 527

15.1 5G and Big Data 527

15.2 5GWireless Communication Networks 527

15.3 Massive Multiple Input, Multiple Output 528

15.4 Free Probability for the Capacity of the Massive MIMO Channel 537

15.5 Spectral Sensing for Cognitive Radio 539

Bibliographical Remarks 539

16 Big Data for Sensing 541

16.1 Distributed Detection and Estimation 541

16.2 Euclidean Random Matrix 547

16.3 Decentralized Computing 548

Appendix A: Some Basic Results on Free Probability 551

Appendix B: Matrix-Valued Random Variables 557

References 567

Index 601

1
Introduction

1.1 Big Data: Basic Concepts

Data is "unreasonably effective" [2]. Nobel laureate Eugene Wigner referred to the unreasonable effectiveness of mathematics in the natural sciences [3]. What is big data? According to [4], its sizes are in the order of terabytes or petabytes; it is often online, and it is not available from a central source. It is diverse, may be loosely structured with a large percentage of data missing.It is heterogeneous.

The promise of data-driven decision-making is now broadly recognized [5-16]. There is no clear consensus about what big data is. In fact, there have been many controversial statements about big data, such as "Size is the only thing that matters."

Big data is a big deal [17]. The Big Data Research and Development Initiative has been launched by the US Federal government. "By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help accelerate the pace of discovery in science and engineering, strengthen our national security, and transform teaching and learning" [17]. Universities are beginning to create new courses to prepare the next generation of "data scientists."

The age of big data has already arrived with global data doubling every two years. The utility industry is not the only one facing this issue (Wal-Mart has a million customer transactions a day) but utilities have been slower to respond to the data deluge. Scaling up the algorithms to massive datasets is a big challenge.

According to [18]:

A key tenet of big data is that the world and the data that describe it are constantly changing and organizations that can recognize the changes and react quickly and intelligently will have the upper hand . As the volume of data explodes, organizations will need analytic tools that are reliable, robust and capable of being automated. At the same time, the analytics, algorithms, and user interfaces they employ will need to facilitate interactions with the people who work with the tools.

1.1.1 Big Data-Big Picture

Data is a strategic resource, together with natural resources and human resources. Data is king! "Big data" refers to a technology phenomenon that has arisen since the late 1980s [19]. As computers have improved, their growing storage and processing capacities have provided new and powerful ways to gain insight into the world by sifting through enormous quantities of data available. But this insight, discoverable in previously unseen patterns and trends within these phenomenally large data sets, can be hard to detect without new analytic tools that can comb through the information and highlight points of interest.

Sources such as online or mobile financial transactions, social media traffic, and GPS coordinates, now generate over 2.5 quintillion bytes of so-called "big data" every day. The growth of mobile data traffic from subscribers in emerging markets exceeded 100% annually through 2015. There are new possibilities for international development (see Figure 1.1).

Figure 1.1 Big data, big impact: new possibilities for international development.

Source: Reproduced from [6] with permission from the World Economic Forum.

Big data at the societal level provides a powerful microscope, together with social mining-the ability to discover knowledge from these data. Scientific research is being revolutionized by this, and policy making is next in line, because big data and social mining are providing novel means for measuring and monitoring wellbeing in our society more realistically, beyond the GDP, more precisely, continuously, everywhere [20].

Most scientific disciplines are finding the data deluge to be extremely challenging, and tremendous opportunities can be realized if we can better organize and access the data [16].

Chris Anderson believed that the data deluge makes the scientific method obsolete [21]. Petabytes data tell us to say correlation is enough. There is no need to find the models. Correction replaces causality. It remains open to see whether the data growth will lead to a fundamental change in scientific methods.

In the computing industry we are now focussing on how to process big data [22].

A fundamental question is "What is the unifying theory for big data?" This book adopts the viewpoint that big data is a new science of combining data science and information science. Specialists in different fields deal with big data on their own, while information experts play a secondary role as assistants. In other words, most scientific problems are in the hands of specialists whereas only few problems-common to all fields-are refined by computing experts. When more and more problems are open, some unifying challenges common to all fields will arise. Big data from the Internet may receive more attention first. Big data from physical systems will become more and more important.

Big data will form a unique discipline that requires expertise from mathematics, statistics and computing algorithms.

Following the excellent review in [22], we highlight some challenges for big data:

Processing unstructured and semistructured data. Presently 85% of the data are unstructured or semistructured. Traditional relational databases cannot handle these massive datasets. High scalability is the most important requirement for big-data analysis. MapReduce and Hadoop are two nonrelational data analysis technologies.
Novel approaches for data representation. Current data representation cannot visually express the true essence of the data. If the raw data are labeled, the problem is much easier but customers do not approve of the labeling.
Data fusion. The true value of big data cannot exhibit itself without data fusion. The data deluge on the Internet has something to do with data formats. One critical challenge is whether we can conveniently fuse the data from individuals, industry and government. It is preferable that data formats be platform free.
Redundancy reduction and high-efficiency, low-cost data storage. Redundancy reduction is important for cost reduction.
Analytical tools and development environments that are suitable for a variety of fields. Computing algorithm researchers and people from different disciplines are encouraged to work together closely as a team. There are enormous barriers for people from different disciplines to share data. Data collection, especially simultaneous collection for relational data, is still very challenging.
Novel approaches to save energy for data processing, data storage, and communication.

1.1.2 DARPA's XDATA Program

The Defense Advanced Research Projects Agency's (DARPA's) XDATA program seeks to develop computational techniques and software tools for analyzing large volumes of data, both semistructured (e.g., tabular, relational, categorical, metadata) and unstructured (e.g., text documents, message traffic). Central challenges to be addressed include (i) developing scalable algorithms for processing imperfect data in distributed data stores, and (ii) creating effective human-computer interaction tools to facilitate rapidly customizable visual reasoning for diverse missions.

Data continues to be generated and digitally archived at increasing rates, resulting in vast databases available for search and analysis. Access to these databases has generated new insights through data-driven methods in the commercial, science, and computing sectors [23]. The defense section is "swimming in sensors and drowning in data." Big data arises from the Internet and the monitoring of industrial equipment. Sensor networks and the Internet of Things (IoT) are another two drivers.

There is a trend for data to be used that can sometimes be seen only once, for milliseconds, or can only be stored for a short time before being deleted, especially in some defense applications. This trend is accelerated by the proliferation of various digital devices and the Internet. It is important to develop fast, scalable, and efficient methods for processing and visualizing data.

The XDATA program's technology development is approached through four technical areas (TAs):

TA1: Scalable analytics and data-processing technology;
TA2: Visual user interface technology;
TA3: Research software integration;
TA4: Evaluation.

It is useful to consider distributed computing via architectures like MapReduce, and its open source implementation, Hadoop. Data collected by the Department of Defense (DoD) are particularly difficult to deal with, including missing data, missing connections between data, incomplete data, corrupted data, data of variable size and type, and so forth [23]. We need to develop analytical principles and implementations scalable to data volume and distributed computer architectures. The challenge for Technical Area 1 is how to enable systematic use of big data in the following list of topic areas:

Methods for leveraging the problem structure to create new algorithms to achieve optimal tradeoffs among time complexity, space complexity, and stream complexity (i.e., how many passes over the data are needed).
Methods for the propagation of uncertainty (i.e., every query should have an answer and an error bar), with performance guarantees...

Systemvoraussetzungen

Als PDF speichern Als Link merken