Mathematics for Digital Science 3

Name: Mathematics for Digital Science 3 | Data Analysis and Optimization
Brand: Wiley
Price: 142.99 EUR
Availability: OnlineOnly

Data Analysis and Optimization

Gérard-Michel Cochard Mhand Hifi(Autor*in)

Wiley (Verlag)

1. Auflage

Erschienen am 19. Juni 2025

444 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-394-38853-0 (ISBN)

142,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

1
Linear Modeling for Two-Dimensional Data

CONCEPTS COVERED IN THIS CHAPTER. -

This brief chapter serves as a reminder of the concepts presented in detail in Volume 1. It primarily provides an overview of basic statistical analysis tools, particularly linear regression and correlation for two-dimensional data.

References: [SAP 11].

1.1. Basic statistics

Consider a population of n elements. Each element i is characterized by the value of a variable x = xi. The n values xi constitute a one-dimensional statistical series, whose characteristics are:

The average is defined by:

In this definition, it is assumed that all elements have the same statistical weight If the weights are not equal, the following expression is used:

where pi represents the statistical weight of individual i.

Variance v(x) is defined as the average of the squares of the deviations from the average:

Huygens' theorem provides another method for calculating variance:

This relationship is often summarized as "the average of squares minus the square of the mean".

Standard deviation s(x) is defined as the square root of the variance:

EXAMPLE 1.1.-

Consider the statistical series shown in Figure 1.1, which represents the number of rainy days over 10 consecutive years at a given location.

Figure 1.1. Statistical series

The average can be easily calculated by assigning equal statistical weight to each measurement. The average of the squares the variance v(x) = 1284 and the standard deviation s(x) = 35,83 are also determined.

Figure 1.2 shows the graphical representation of the statistical series in the form of a histogram. This histogram illustrates the distribution of data regarding the number of rainy days over the 10 years.

The average is a measure of the position of the statistical series along the number of days axis, while the standard deviation serves as a dispersion parameter, providing an indicator of the spread of the statistical series.

Figure 1.2. Graphical representation of the statistical series

1.2. Linear adjustment

Now, consider a two-dimensional statistical series, where each element is characterized by the values of two variables, x and y. For each variable, various statistical measures can be calculated, such as the average, variance and standard deviation.

To graphically represent this two-dimensional series, a two-dimensional Cartesian coordinate system is used. The x-axis represents the variable x. and the y-axis represents the variable y. Each element i of the series is represented as a point (xi,yi) in this coordinate system, where the coordinate xi corresponds to the value of the variable x, and the coordinate yi corresponds to the value of the variable y. Figure 1.3 shows examples of graphical representations of two two-dimensional series.

Figure 1.3. Example of two two-dimensional series

When observing a two-dimensional series and detecting a certain structure in the set of representative points, we may be inclined to model this structure using a curve. This involves finding a mathematical function that best describes the relationships between the variables x and y. In the examples shown in Figure 1.3, a straight line can be proposed for modeling the first example, and a parabola for the second example, as shown in Figure 1.4. These models are adjustments that simplify the representation of trends or relationships observed in the data.

Figure 1.4. Examples of adjustments

The linear adjustment is the simplest of all analytical adjustments. It involves obtaining the equation of the straight line that "best fit" the set of representative points of the series.

A classic method for obtaining the equation of the line in linear adjustment is the least squares method. This method involves minimizing the sum of the squares of the deviations between the observed values and the values predicted by the line. For the variables x and y, the respective means, denoted by and are calculated assuming equal statistical weight for each value of i:

Next, the deviations from these averages for each point in the series are calculated (convenient to work with "centered" coordinates):

It is easy to verify that:

The squares of these deviations are obtained by squaring these values:

The least squares method involves finding the coefficients a and b of the equation of the line y = ax + b. Alternatively, using the centered coordinates, the equation becomes Y´ = AX + B, where for each of the representative points, The relationship between (A, B) and (a,b) is:

The goal is to optimize the sum of squared deviations to a minimum. Mathematically, this involves minimizing the following objective function:

In other words, the aim is to minimize the following quantity:

The minimum of M corresponds to the cancellation of the first derivatives with respect to A and B, the only unknowns in M. Taking the partial derivatives:

which leads to:

These conditions lead to the following equations:

EXAMPLE 1.2.-

Let us consider the statistical series showing the number of rainy days (x) and umbrella sales in local currency (y) (see Figure 1.5).

Figure 1.5. Statistical series (x, y)

Figure 1.6. Detailed adjustment calculations

Figure 1.7. Adjustment line.

Figure 1.6 summarizes the calculations required to determine the best-fit adjustment line, with the values of a = 1311.53 and b = 8831.78. Figure 1.7 displays the best-fit adjustment line.

1.3. Linear correlation

Figure 1.8. Different correlation situations

In the case of adjustment, the goal is to express y as a function of x. This choice is arbitrary, as x could be expressed as a function of y. In this case, two adjustment lines would be obtained, both intersecting at the point

By treating the variables x and y symmetrically, the concept of correlation between these variables can be introduced. Correlation measures the relationship between two variables and quantifies the possible influence of one on the other. Figure 1.8 presents various examples of scatter plots to illustrate different correlation situations.

In particular, in the case of linear correlation, it is interesting to note that when the two best-fit adjustment lines, y = f(x) and x = f´(y), coincide, this indicates maximum linear correlation between the variables x and y.

EXAMPLE 1.3.-

For the series in Example 1.2, the following two best-fit adjustment lines are obtained:

Figure 1.9 shows that the two straight lines are very close to each other, indicating a strong correlation between the variables.

Figure 1.9. Adjustment lines.

The two best-fit adjustment lines have direction coefficients a and a´. If the lines coincide, then equivalently, a × a´ = 1. Now,

The maximum correlation corresponds to the following equality (known as the Cauchy-Schwarz equality):

The analytical definition of the linear correlation is:

which is simply

EXAMPLE 1.3 (CONTINUED).-

Let us return to Example 1.3. The equations of the adjustment lines are:

The linear correlation coefficient is close to 1, i.e. r = 0.98 ~ 1. This indicates an almost maximal linear correlation between the variables x and y. In this case, a strong relationship exists between x and y.

The linear correlation coefficient r is often written in another form, using the standard deviations s(x) and s(y):

Furthermore, the covariance cov(x, y) is defined by:

It follows that:

In the case of linear fitting, the expression for M is:

The minimum is found by replacing A and B with the values obtained:

By definition, M and therefore Mmin are positive or zero quantities. This leads to the Cauchy-Schwarz inequality:

Figure 1.10. Variations in the linear correlation coefficient

This inequality implies that the linear correlation coefficient lies in the range -1 = r = 1. This means that the linear correlation coefficient can take values between -1 and 1, inclusive. Figure 1.10 shows such a correlation scale, where different ranges of r values are...

Systemvoraussetzungen

Als PDF speichern Als Link merken

Mathematics for Digital Science 3

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

1 Linear Modeling for Two-Dimensional Data

CONCEPTS COVERED IN THIS CHAPTER. -

1.1. Basic statistics

EXAMPLE 1.1.-

1.2. Linear adjustment

EXAMPLE 1.2.-

1.3. Linear correlation

EXAMPLE 1.3.-

EXAMPLE 1.3 (CONTINUED).-

Systemvoraussetzungen

1
Linear Modeling for Two-Dimensional Data