
Mathematics for Digital Science 3
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Over the past century, advancements in computer science have consistently resulted from extensive mathematical work. Even today, innovations in the digital domain continue to be grounded in a strong mathematical foundation. To succeed in this profession, both today's students and tomorrow's computer engineers need a solid mathematical background.
The goal of this book series is to offer a solid foundation of the knowledge essential to working in the digital sector. Across three volumes, it explores fundamental principles, digital information, data analysis, and optimization. Whether the reader is pursuing initial training or looking to deepen their expertise, the Mathematics for Digital Science series revisits familiar concepts, helping them refresh and expand their knowledge while also introducing equally essential, newer topics.
More details
Other editions
Additional editions

Persons
Gérard-Michel Cochard is Professor Emeritus at Université de Picardie Jules Verne, France, where he has held various senior positions. He has also served at the French Ministry of Education and the CNAM (Conservatoire National des Arts et Métiers). His research is conducted at the Eco-PRocédés, Optimisation et Aide à la Décision (EPROAD) laboratory, France.
Mhand Hifi is Professor of Computer Science at Université de Picardie Jules Verne, France, where he heads the EPROAD UR 4669 laboratory and manages the ROD team. As an expert in operations research and NP-hard problem-solving, he actively contributes to numerous international conferences and journals in the field.
Content
Preface ix
Chapter 1. Linear Modeling for Two-Dimensional Data 1
1.1. Basic statistics 1
1.2. Linear adjustment 3
1.3. Linear correlation 7
Chapter 2. Multidimensional Data Analysis 13
2.1. Multidimensional tables 14
2.2. Analysis of a point cloud 24
2.2.1. Spatial analysis of individuals 24
2.2.2. Analysis in variable space 30
2.2.3. Link between the two spaces 32
2.3. Principal component factor analysis 35
2.3.1. Principles 35
2.3.2. Principal component factor analysis in practice 41
2.4. Appendix: useful software 48
2.5. Tanagra software 48
2.6. R software 56
Chapter 3. Introduction to Automatic Classification 65
3.1. Similarity and distance 66
3.1.1. Similarity, dissimilarity 66
3.1.2. Distance 67
3.2. Basics of information theory 69
3.3. Classification methods 78
3.4. Classification by partitioning 80
3.4.1. Partitioning principle 80
3.4.2. Inertia 81
3.4.3. Mobile center algorithm 82
3.5. Hierarchical classification 89
3.5.1. Principle 89
3.5.2. Aggregation strategies 90
3.5.3. Ward's aggregation strategy 98
3.6. Appendices 101
3.6.1. Huygens-König theorem 101
3.6.2. Classification software 103
Chapter 4. Linear Programming 109
4.1. An introductory example 110
4.2. General formulation 113
4.3. Geometry of a linear program 116
4.4. Simplex algorithm 118
4.5. Two-phase method. 129
4.6. Duality 135
4.7. Degeneracy 139
4.8. Introduction to integer linear programming 140
4.8.1. Gomory cuts 140
4.8.2. Branch and Bound algorithm 151
Chapter 5. Elements of Graph Theory 159
5.1. Definition and representations of a graph 160
5.1.1. Graphical representation 160
5.1.2. Tables associated with a graph 161
5.2. Main concepts and technology 162
5.2.1. Elements of a graph 163
5.2.2. Analysis of graph structure 168
5.3. Transitive closure of a graph's vertices 172
5.4. Decomposition of a directed graph into strongly connected components 173
5.5. Trees 175
5.5.1. Introduction 175
5.5.2. Trees in a graph 175
5.5.3. Spanning tree in a graph 177
5.6. Finding a fundamental system of independent cycles of a connected graph 179
5.7. Extremum spanning tree 180
5.7.1. General information 180
5.7.2. Extremum spanning tree search algorithms 180
Chapter 6. Path Optimization 187
6.1. Extremal-length paths 188
6.1.1. General information 188
6.1.2. Solving algorithms 189
6.1.3. Searching for e -extremal paths 204
6.2. Hamiltonian path search 205
6.2.1. Decomposition into strongly connected components 205
6.2.2. Branch and Bound algorithm 207
Chapter 7. Transportation Problems 219
7.1. Maximum flow 220
7.1.1. General 220
7.1.2. Ford-Fulkerson theorem 223
7.1.3. Ford-Fulkerson algorithm 224
7.1.4. Practical solution method 230
7.2. Least cost transportation 236
7.2.1. Issues 236
7.2.2. Northwest Corner algorithm 238
7.2.3. Least cost method 240
7.2.4. Balas-Hammer or Vogel's algorithm 243
7.2.5. Stepping stone algorithm 246
7.2.6. Inequality of supply and demand 256
7.2.7. Unconnected graph and degeneration 257
7.2.8. Potentials and dual program 261
7.3. Assignment problems 263
7.3.1. Issues 263
7.3.2. Hungarian algorithm 269
Chapter 8. Scheduling Problems 279
8.1. Planning a project 280
8.1.1. Introduction 280
8.1.2. Earliest and latest date method 283
8.1.3. Gantt chart 289
8.2. Flow-shop problem 291
8.2.1. Johnson's algorithm for two machines 293
8.2.2. Case of three machines 297
8.3. The job-shop problem 306
References 311
List of Authors 313
Index 315
1
Linear Modeling for Two-Dimensional Data
CONCEPTS COVERED IN THIS CHAPTER. -
This brief chapter serves as a reminder of the concepts presented in detail in Volume 1. It primarily provides an overview of basic statistical analysis tools, particularly linear regression and correlation for two-dimensional data.
References: [SAP 11].
1.1. Basic statistics
Consider a population of n elements. Each element i is characterized by the value of a variable x = xi. The n values xi constitute a one-dimensional statistical series, whose characteristics are:
- The average is defined by:
In this definition, it is assumed that all elements have the same statistical weight If the weights are not equal, the following expression is used:
where pi represents the statistical weight of individual i.
- Variance v(x) is defined as the average of the squares of the deviations from the average:
Huygens' theorem provides another method for calculating variance:
This relationship is often summarized as "the average of squares minus the square of the mean".
- Standard deviation s(x) is defined as the square root of the variance:
EXAMPLE 1.1.-
Consider the statistical series shown in Figure 1.1, which represents the number of rainy days over 10 consecutive years at a given location.
Figure 1.1. Statistical series
The average can be easily calculated by assigning equal statistical weight to each measurement. The average of the squares the variance v(x) = 1284 and the standard deviation s(x) = 35,83 are also determined.
Figure 1.2 shows the graphical representation of the statistical series in the form of a histogram. This histogram illustrates the distribution of data regarding the number of rainy days over the 10 years.
The average is a measure of the position of the statistical series along the number of days axis, while the standard deviation serves as a dispersion parameter, providing an indicator of the spread of the statistical series.
Figure 1.2. Graphical representation of the statistical series
1.2. Linear adjustment
Now, consider a two-dimensional statistical series, where each element is characterized by the values of two variables, x and y. For each variable, various statistical measures can be calculated, such as the average, variance and standard deviation.
To graphically represent this two-dimensional series, a two-dimensional Cartesian coordinate system is used. The x-axis represents the variable x. and the y-axis represents the variable y. Each element i of the series is represented as a point (xi,yi) in this coordinate system, where the coordinate xi corresponds to the value of the variable x, and the coordinate yi corresponds to the value of the variable y. Figure 1.3 shows examples of graphical representations of two two-dimensional series.
Figure 1.3. Example of two two-dimensional series
When observing a two-dimensional series and detecting a certain structure in the set of representative points, we may be inclined to model this structure using a curve. This involves finding a mathematical function that best describes the relationships between the variables x and y. In the examples shown in Figure 1.3, a straight line can be proposed for modeling the first example, and a parabola for the second example, as shown in Figure 1.4. These models are adjustments that simplify the representation of trends or relationships observed in the data.
Figure 1.4. Examples of adjustments
The linear adjustment is the simplest of all analytical adjustments. It involves obtaining the equation of the straight line that "best fit" the set of representative points of the series.
A classic method for obtaining the equation of the line in linear adjustment is the least squares method. This method involves minimizing the sum of the squares of the deviations between the observed values and the values predicted by the line. For the variables x and y, the respective means, denoted by and are calculated assuming equal statistical weight for each value of i:
Next, the deviations from these averages for each point in the series are calculated (convenient to work with "centered" coordinates):
It is easy to verify that:
The squares of these deviations are obtained by squaring these values:
The least squares method involves finding the coefficients a and b of the equation of the line y = ax + b. Alternatively, using the centered coordinates, the equation becomes Y´ = AX + B, where for each of the representative points, The relationship between (A, B) and (a,b) is:
The goal is to optimize the sum of squared deviations to a minimum. Mathematically, this involves minimizing the following objective function:
In other words, the aim is to minimize the following quantity:
The minimum of M corresponds to the cancellation of the first derivatives with respect to A and B, the only unknowns in M. Taking the partial derivatives:
which leads to:
These conditions lead to the following equations:
EXAMPLE 1.2.-
Let us consider the statistical series showing the number of rainy days (x) and umbrella sales in local currency (y) (see Figure 1.5).
Figure 1.5. Statistical series (x, y)
Figure 1.6. Detailed adjustment calculations
Figure 1.7. Adjustment line.
Figure 1.6 summarizes the calculations required to determine the best-fit adjustment line, with the values of a = 1311.53 and b = 8831.78. Figure 1.7 displays the best-fit adjustment line.
1.3. Linear correlation
Figure 1.8. Different correlation situations
In the case of adjustment, the goal is to express y as a function of x. This choice is arbitrary, as x could be expressed as a function of y. In this case, two adjustment lines would be obtained, both intersecting at the point
By treating the variables x and y symmetrically, the concept of correlation between these variables can be introduced. Correlation measures the relationship between two variables and quantifies the possible influence of one on the other. Figure 1.8 presents various examples of scatter plots to illustrate different correlation situations.
In particular, in the case of linear correlation, it is interesting to note that when the two best-fit adjustment lines, y = f(x) and x = f´(y), coincide, this indicates maximum linear correlation between the variables x and y.
EXAMPLE 1.3.-
For the series in Example 1.2, the following two best-fit adjustment lines are obtained:
Figure 1.9 shows that the two straight lines are very close to each other, indicating a strong correlation between the variables.
Figure 1.9. Adjustment lines.
The two best-fit adjustment lines have direction coefficients a and a´. If the lines coincide, then equivalently, a × a´ = 1. Now,
The maximum correlation corresponds to the following equality (known as the Cauchy-Schwarz equality):
The analytical definition of the linear correlation is:
which is simply
EXAMPLE 1.3 (CONTINUED).-
Let us return to Example 1.3. The equations of the adjustment lines are:
The linear correlation coefficient is close to 1, i.e. r = 0.98 ~ 1. This indicates an almost maximal linear correlation between the variables x and y. In this case, a strong relationship exists between x and y.
The linear correlation coefficient r is often written in another form, using the standard deviations s(x) and s(y):
Furthermore, the covariance cov(x, y) is defined by:
It follows that:
In the case of linear fitting, the expression for M is:
The minimum is found by replacing A and B with the values obtained:
By definition, M and therefore Mmin are positive or zero quantities. This leads to the Cauchy-Schwarz inequality:
Figure 1.10. Variations in the linear correlation coefficient
This inequality implies that the linear correlation coefficient lies in the range -1 = r = 1. This means that the linear correlation coefficient can take values between -1 and 1, inclusive. Figure 1.10 shows such a correlation scale, where different ranges of r values are...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.