Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
Acknowledgments, xix
1 Introduction, 1
1.1 Systems and their characteristics, 1
1.1.1 Classes of systems, 1
1.1.2 System states, 1
1.1.3 Change of state, 2
1.1.4 Thermodynamic entropy, 3
1.1.5 Evolutive connotation of entropy, 5
1.1.6 Statistical mechanical entropy, 5
1.2 Informational entropies, 7
1.2.1 Types of entropies, 8
1.2.2 Shannon entropy, 9
1.2.3 Information gain function, 12
1.2.4 Boltzmann, Gibbs and Shannon entropies, 14
1.2.5 Negentropy, 15
1.2.6 Exponential entropy, 16
1.2.7 Tsallis entropy, 18
1.2.8 Renyi entropy, 19
1.3 Entropy, information, and uncertainty, 21
1.3.1 Information, 22
1.3.2 Uncertainty and surprise, 24
1.4 Types of uncertainty, 25
1.5 Entropy and related concepts, 27
1.5.1 Information content of data, 27
1.5.2 Criteria for model selection, 28
1.5.3 Hypothesis testing, 29
1.5.4 Risk assessment, 29
Questions, 29
References, 31
Additional References, 32
2 Entropy Theory, 33
2.1 Formulation of entropy, 33
2.2 Shannon entropy, 39
2.3 Connotations of information and entropy, 42
2.3.1 Amount of information, 42
2.3.2 Measure of information, 43
2.3.3 Source of information, 43
2.3.4 Removal of uncertainty, 44
2.3.5 Equivocation, 45
2.3.6 Average amount of information, 45
2.3.7 Measurement system, 46
2.3.8 Information and organization, 46
2.4 Discrete entropy: univariate case and marginal entropy, 46
2.5 Discrete entropy: bivariate case, 52
2.5.1 Joint entropy, 53
2.5.2 Conditional entropy, 53
2.5.3 Transinformation, 57
2.6 Dimensionless entropies, 79
2.7 Bayes theorem, 80
2.8 Informational correlation coefficient, 88
2.9 Coefficient of nontransferred information, 90
2.10 Discrete entropy: multidimensional case, 92
2.11 Continuous entropy, 93
2.11.1 Univariate case, 94
2.11.2 Differential entropy of continuous variables, 97
2.11.3 Variable transformation and entropy, 99
2.11.4 Bivariate case, 100
2.11.5 Multivariate case, 105
2.12 Stochastic processes and entropy, 105
2.13 Effect of proportional class interval, 107
2.14 Effect of the form of probability distribution, 110
2.15 Data with zero values, 111
2.16 Effect of measurement units, 113
2.17 Effect of averaging data, 115
2.18 Effect of measurement error, 116
2.19 Entropy in frequency domain, 118
2.20 Principle of maximum entropy, 118
2.21 Concentration theorem, 119
2.22 Principle of minimum cross entropy, 122
2.23 Relation between entropy and error probability, 123
2.24 Various interpretations of entropy, 125
2.24.1 Measure of randomness or disorder, 125
2.24.2 Measure of unbiasedness or objectivity, 125
2.24.3 Measure of equality, 125
2.24.4 Measure of diversity, 126
2.24.5 Measure of lack of concentration, 126
2.24.6 Measure of flexibility, 126
2.24.7 Measure of complexity, 126
2.24.8 Measure of departure from uniform distribution, 127
2.24.9 Measure of interdependence, 127
2.24.10 Measure of dependence, 128
2.24.11 Measure of interactivity, 128
2.24.12 Measure of similarity, 129
2.24.13 Measure of redundancy, 129
2.24.14 Measure of organization, 130
2.25 Relation between entropy and variance, 133
2.26 Entropy power, 135
2.27 Relative frequency, 135
2.28 Application of entropy theory, 136
Questions, 136
References, 137
Additional Reading, 139
3 Principle of Maximum Entropy, 142
3.1 Formulation, 142
3.2 POME formalism for discrete variables, 145
3.3 POME formalism for continuous variables, 152
3.3.1 Entropy maximization using the method of Lagrange multipliers, 152
3.3.2 Direct method for entropy maximization, 157
3.4 POME formalism for two variables, 158
3.5 Effect of constraints on entropy, 165
3.6 Invariance of total entropy, 167
Questions, 168
References, 170
Additional Reading, 170
4 Derivation of Pome-Based Distributions, 172
4.1 Discrete variable and discrete distributions, 172
4.1.1 Constraint E[x] and the Maxwell-Boltzmann distribution, 172
4.1.2 Two constraints and Bose-Einstein distribution, 174
4.1.3 Two constraints and Fermi-Dirac distribution, 177
4.1.4 Intermediate statistics distribution, 178
4.1.5 Constraint: E[N]: Bernoulli distribution for a single trial, 179
4.1.6 Binomial distribution for repeated trials, 180
4.1.7 Geometric distribution: repeated trials, 181
4.1.8 Negative binomial distribution: repeated trials, 183
4.1.9 Constraint: E[N] = n: Poisson distribution, 183
4.2 Continuous variable and continuous distributions, 185
4.2.1 Finite interval [a, b], no constraint, and rectangular distribution, 185
4.2.2 Finite interval [a, b], one constraint and truncated exponential distribution, 186
4.2.3 Finite interval [0, 1], two constraints E[ln x] and E[ln(1 - x)] and beta distribution of first kind, 188
4.2.4 Semi-infinite interval (0,8), one constraint E[x] and exponential distribution, 191
4.2.5 Semi-infinite interval, two constraints E[x] and E[ln x] and gamma distribution, 192
4.2.6 Semi-infinite interval, two constraints E[ln x] and E[ln(1 + x)] and beta distribution of second kind, 194
4.2.7 Infinite interval, two constraints E[x] and E[x2] and normal distribution, 195
4.2.8 Semi-infinite interval, log-transformation Y = lnX, two constraints E[y] and E[y2] and log-normal distribution, 197
4.2.9 Infinite and semi-infinite intervals: constraints and distributions, 199
Questions, 203
References, 208
Additional Reading, 208
5 Multivariate Probability Distributions, 213
5.1 Multivariate normal distributions, 213
5.1.1 One time lag serial dependence, 213
5.1.2 Two-lag serial dependence, 221
5.1.3 Multi-lag serial dependence, 229
5.1.4 No serial dependence: bivariate case, 234
5.1.5 Cross-correlation and serial dependence: bivariate case, 238
5.1.6 Multivariate case: no serial dependence, 244
5.1.7 Multi-lag serial dependence, 245
5.2 Multivariate exponential distributions, 245
5.2.1 Bivariate exponential distribution, 245
5.2.2 Trivariate exponential distribution, 254
5.2.3 Extension to Weibull distribution, 257
5.3 Multivariate distributions using the entropy-copula method, 258
5.3.1 Families of copula, 259
5.3.2 Application, 260
5.4 Copula entropy, 265
Questions, 266
References, 267
Additional Reading, 268
6 Principle of Minimum Cross-Entropy, 270
6.1 Concept and formulation of POMCE, 270
6.2 Properties of POMCE, 271
6.3 POMCE formalism for discrete variables, 275
6.4 POMCE formulation for continuous variables, 279
6.5 Relation to POME, 280
6.6 Relation to mutual information, 281
6.7 Relation to variational distance, 281
6.8 Lin's directed divergence measure, 282
6.9 Upper bounds for cross-entropy, 286
Questions, 287
References, 288
Additional Reading, 289
7 Derivation of POME-Based Distributions, 290
7.1 Discrete variable and mean E[x] as a constraint, 290
7.1.1 Uniform prior distribution, 291
7.1.2 Arithmetic prior distribution, 293
7.1.3 Geometric prior distribution, 294
7.1.4 Binomial prior distribution, 295
7.1.5 General prior distribution, 297
7.2 Discrete variable taking on an infinite set of values, 298
7.2.1 Improper prior probability distribution, 298
7.2.2 A priori Poisson probability distribution, 301
7.2.3 A priori negative binomial distribution, 304
7.3 Continuous variable: general formulation, 305
7.3.1 Uniform prior and mean constraint, 307
7.3.2 Exponential prior and mean and mean log constraints, 308
Questions, 308
References, 309
8 Parameter Estimation, 310
8.1 Ordinary entropy-based parameter estimation method, 310
8.1.1 Specification of constraints, 311
8.1.2 Derivation of entropy-based distribution, 311
8.1.3 Construction of zeroth Lagrange multiplier, 311
8.1.4 Determination of Lagrange multipliers, 312
8.1.5 Determination of distribution parameters, 313
8.2 Parameter-space expansion method, 325
8.3 Contrast with method of maximum likelihood estimation (MLE), 329
8.4 Parameter estimation by numerical methods, 331
Questions, 332
References, 333
Additional Reading, 334
9 Spatial Entropy, 335
9.1 Organization of spatial data, 336
9.1.1 Distribution, density, and aggregation, 337
9.2 Spatial entropy statistics, 339
9.2.1 Redundancy, 343
9.2.2 Information gain, 345
9.2.3 Disutility entropy, 352
9.3 One dimensional aggregation, 353
9.4 Another approach to spatial representation, 360
9.5 Two-dimensional aggregation, 363
9.5.1 Probability density function and its resolution, 372
9.5.2 Relation between spatial entropy and spatial disutility, 375
9.6 Entropy maximization for modeling spatial phenomena, 376
9.7 Cluster analysis by entropy maximization, 380
9.8 Spatial visualization and mapping, 384
9.9 Scale and entropy, 386
9.10 Spatial probability distributions, 388
9.11 Scaling: rank size rule and Zipf's law, 391
9.11.1 Exponential law, 391
9.11.2 Log-normal law, 391
9.11.3 Power law, 392
9.11.4 Law of proportionate effect, 392
Questions, 393
References, 394
Further Reading, 395
10 Inverse Spatial Entropy, 398
10.1 Definition, 398
10.2 Principle of entropy decomposition, 402
10.3 Measures of information gain, 405
10.3.1 Bivariate measures, 405
10.3.2 Map representation, 410
10.3.3 Construction of spatial measures, 412
10.4 Aggregation properties, 417
10.5 Spatial interpretations, 420
10.6 Hierarchical decomposition, 426
10.7 Comparative measures of spatial decomposition, 428
Questions, 433
References, 435
11 Entropy Spectral Analyses, 436
11.1 Characteristics of time series, 436
11.1.1 Mean, 437
11.1.2 Variance, 438
11.1.3 Covariance, 440
11.1.4 Correlation, 441
11.1.5 Stationarity, 443
11.2 Spectral analysis, 446
11.2.1 Fourier representation, 448
11.2.2 Fourier transform, 453
11.2.3 Periodogram, 454
11.2.4 Power, 457
11.2.5 Power spectrum, 461
11.3 Spectral analysis using maximum entropy, 464
11.3.1 Burg method, 465
11.3.2 Kapur-Kesavan method, 473
11.3.3 Maximization of entropy, 473
11.3.4 Determination of Lagrange multipliers ¿k, 476
11.3.5 Spectral density, 479
11.3.6 Extrapolation of autocovariance functions, 482
11.3.7 Entropy of power spectrum, 482
11.4 Spectral estimation using configurational entropy, 483
11.5 Spectral estimation by mutual information principle, 486
References, 490
Additional Reading, 490
12 Minimum Cross Entropy Spectral Analysis, 492
12.1 Cross-entropy, 492
12.2 Minimum cross-entropy spectral analysis (MCESA), 493
12.2.1 Power spectrum probability density function, 493
12.2.2 Minimum cross-entropy-based probability density functions given total expected spectral powers at each frequency, 498
12.2.3 Spectral probability density functions for white noise, 501
12.3 Minimum cross-entropy power spectrum given auto-correlation, 503
12.3.1 No prior power spectrum estimate is given, 504
12.3.2 A prior power spectrum estimate is given, 505
12.3.3 Given spectral powers: Tk = Gj, Gj = Pk, 506
12.4 Cross-entropy between input and output of linear filter, 509
12.4.1 Given input signal PDF, 509
12.4.2 Given prior power spectrum, 510
12.5 Comparison, 512
12.6 Towards efficient algorithms, 514
12.7 General method for minimum cross-entropy spectral estimation, 515
References, 515
Additional References, 516
13 Evaluation and Design of Sampling and Measurement Networks, 517
13.1 Design considerations, 517
13.2 Information-related approaches, 518
13.2.1 Information variance, 518
13.2.2 Transfer function variance, 520
13.2.3 Correlation, 521
13.3 Entropy measures, 521
13.3.1 Marginal entropy, joint entropy, conditional entropy and transinformation, 521
13.3.2 Informational correlation coefficient, 523
13.3.3 Isoinformation, 524
13.3.4 Information transfer function, 524
13.3.5 Information distance, 525
13.3.6 Information area, 525
13.3.7 Application to rainfall networks, 525
13.4 Directional information transfer index, 530
13.4.1 Kernel estimation, 531
13.4.2 Application to groundwater quality networks, 533
13.5 Total correlation, 537
13.6 Maximum information minimum redundancy (MIMR), 539
13.6.1 Optimization, 541
13.6.2 Selection procedure, 542
Questions, 553
References, 554
Additional Reading, 556
14 Selection of Variables and Models, 559
14.1 Methods for selection, 559
14.2 Kullback-Leibler (KL) distance, 560
14.3 Variable selection, 560
14.4 Transitivity, 561
14.5 Logit model, 561
14.6 Risk and vulnerability assessment, 574
14.6.1 Hazard assessment, 576
14.6.2 Vulnerability assessment, 577
14.6.3 Risk assessment and ranking, 578
Questions, 578
References, 579
Additional Reading, 580
15 Neural Networks, 581
15.1 Single neuron, 581
15.2 Neural network training, 585
15.3 Principle of maximum information preservation, 588
15.4 A single neuron corrupted by processing noise, 589
15.5 A single neuron corrupted by additive input noise, 592
15.6 Redundancy and diversity, 596
15.7 Decision trees and entropy nets, 598
Questions, 602
References, 603
16 System Complexity, 605
16.1 Ferdinand's measure of complexity, 605
16.1.1 Specification of constraints, 606
16.1.2 Maximization of entropy, 606
16.1.3 Determination of Lagrange multipliers, 606
16.1.4 Partition function, 607
16.1.5 Analysis of complexity, 610
16.1.6 Maximum entropy, 614
16.1.7 Complexity as a function of N, 616
16.2 Kapur's complexity analysis, 618
16.3 Cornacchio's generalized complexity measures, 620
16.3.1 Special case: R = 1, 624
16.3.2 Analysis of complexity: non-unique K-transition points and conditional complexity, 624
16.4 Kapur's simplification, 627
16.5 Kapur's measure, 627
16.6 Hypothesis testing, 628
16.7 Other complexity measures, 628
Questions, 631
References, 631
Additional References, 632
Author Index, 633
Subject Index, 639
Since the development of informational entropy in 1948 by Shannon, the literature on entropy has grown by leaps and bounds and it is almost impossible to provide a comprehensive treatment of all facets of entropy under one cover. Thermodynamics, statistical mechanics, and informational statistics tend to lay the foundation for what we now know as entropy theory. Soofi (1994) perhaps summed up best the main pillars in the evolution of entropy for quantifying information. Using a pyramid he summarized information theoretic statistics as shown in Figure 2.1, wherein the informational entropy developed by Shannon (1948) represents the vertex. The base of the Shannon entropy represents three distinct extensions which are variants of quantifying information: discrimination information (Kullback, 1959), mutual information (Lindley, 1956, 1961), and principle of maximum entropy (POME) or information (Jaynes, 1957, 1968, 1982). The lateral faces of the pyramid are represented by three planes: 1) the SKJ (Shannon-Kullback-Jaynes) minimum discrimination information plane, 2) the SLK (Shannon-Lindley-Kullback) mutual information plane, and 3) the SLJ (Shannon-Lindley-Jaynes) Bayesian information theory plane. Most of the information-based contributions can be located on one of the faces or in the interior of the pyramid. The discussion in this chapter on what we call entropy theory represents some aspects of all three faces but not fully.
Figure 2.1 Pyramid showing informational-theoretic statistics.
The entropy theory may be comprised of four parts: 1) Shannon entropy, 2) principle of maximum entropy, 3) principle of minimum cross entropy, and 4) concentration theorem. The first three are the main parts and are most frequently used. One can also employ the Tsallis entropy or another type of entropy in place of the Shannon entropy for some problems. Before discussing all four parts, it will be instructive to amplify the formulation of entropy presented in Chapter 1.
In order to explain entropy, consider a random variable X which can take on N equally likely different values. For example, if a six-faced dice is thrown, any face bearing the number 1, 2, 3, 4, 5, or 6 has an equal chance to appear upon throw. It is now assumed that a certain value of X (or the face of the dice bearing that number or outcome upon throw) is known to only one person. Another person would like to know the outcome (face) of the dice throw by asking questions to the person, who knows the answer, in the form of only yes or no. Thus, the number of alternatives for a face to turn up in this case is six, that is, N = 6. It can be shown that the minimum number of questions to be asked in order to ascertain the true outcome is:
2.1
where I represents the amount of information required to determine the certain value of X, 1/N defines the probability of finding the unknown value of X by asking a single question, when all outcomes are equally likely, and log is the logarithm to the base of 2. If nothing else is known about the variable, then it must be assumed that all values are equally likely in accordance with the principle of insufficient reason.
In general,
2.2
where pi is the probability of outcome i = 1, 2, …, N. Here I can be viewed as the minimum amount of information required to positively ascertain the outcome of X upon throw. Stated in another way, this defines the amount of information gained after observing the event X = x with probability 1/N. In other words I is a measure of information and is a function of N. The base of the logarithm is 2, because the questions being posed (i.e., questions admitting only either yes or no answers) are in binary form. The point to be kept in mind when asking questions is to gain information, not assent or dissent, and hence in many cases a yes is as good an answer as is a no. This information measure or equation (2.2) satisfies the following properties:
If a six-faced dice is thrown, any face bearing the number 1, 2, 3, 4, 5, or 6 has an equal chance to appear. The outcome of the first throw is number 5 which is known to a person A. How many questions does one need to ask this person or how much information will be required to positively ascertain the outcome of this throw?
Solution:
In this case, N = 6. Therefore, . This gives the minimum amount of information needed or the number of questions to be asked in binary form (i.e., yes or no). The number of questions needed to be asked is a measure of uncertainty. The questioning can go like this: Is the outcome between 1 and 3? If the answer is no then it must be between 4 and 6. Then the second question can be: Is it between 4 and 5? If the answer is yes, then the next question is: Is it 4 and the answer is no. Then the outcome has to be 5. In this manner entropy provides an efficient way of obtaining the answer. In vigilance, investigative or police work entropy can provide an effective way of interrogation. Another example of interest is lottery.
Suppose that N tickets are sold for winning a lottery. In other words the winning ticket must be one of these tickets, that is, the number of chances are N. Let N be 100. Each ticket has a number between 1 and 100. One person, called Jack, knows what the winning ticket is. The other person, called Mike, would like to know the winning ticket by asking Jack a series of questions whose answers will be in the form of yes or no. Find the winning ticket.
The number of binary questions needed to determine the winning ticket is given by equation (2.1). For N = 100, I = 6.64. This says that it will take 6.64 questions to find the winning ticket. To illustrate this point, the questioning might go as shown in Table 2.1. The questioning in Table 2.1 shows how much information is gained simply by asking binary questions. The best way of questioning and finding an answer is by subdividing the class consisting of questions in half at each question. This is similar to the method of regula falsi in numerical analysis when determining the root of a function numerically.
Table 2.1 Questioning for finding the winning lottery ticket.
Consider another case where two coins are thrown and a person knows what the outcome is. There are four alternatives in which head or tail can appear for the first and second coins, respectively: head and tail, head and head, tail and tail, and tail and head or one can simply write the number of alternatives N as 22. The number of questions to be asked in order to ascertain the outcome is again given by equation (2.1): log2 4 = log2 22 = 2.
Consider that the probability of raining on any day in a given week is the same. In that week it rained on a certain day and one person knew the day it rained on. Another person would like to know the day it rained on by asking a number of questions to the person who knows the answer. The answers to be given are in binary form, that is, yes or no. What will be the minimum number of questions to be asked in order to determine the day it rained on?
In this case, N = 7. Therefore, the minimum number of questions to be asked is: I = − log2 (1/7) = log2 7 = 2.807
In the above discussion the base of the logarithm is 2 because of the binary nature of answers and the questioning is done such that the number of alternatives is reduced to half each time a question is asked. If the possible responses to a question are three, rather than two, then with n questions one can cover 3n possibilities. For example, an answer to a question about weather may be: hot, cold, or pleasant. Similarly, for a crop farmers may respond: bumper, medium or poor. In such cases, the number of questions to cover N cases is given as
2.3...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.