Chapter 1: Dimensionality reduction
The process of transforming data from a high-dimensional space into a low-dimensional space is referred to as dimensionality reduction or dimension reduction. The goal of this transformation is to ensure that the low-dimensional representation of the data retains some significant aspects of the original data, ideally those that are close to the measurement of its intrinsic dimension. There are a number of reasons why working in high-dimensional environments might be unfavorable. Raw data are frequently scarce as a result of the curse of dimensionality, and the analysis of the data is typically computationally complex. Signal processing, voice recognition, neuroinformatics, and bioinformatics are examples of domains that frequently use dimensionality reduction. These are fields that deal with huge numbers of observations and/or large numbers of variables.
Methods are often classified as either linear or nonlinear approaches for the most part. Feature selection and feature extraction are two additional categories that can be used to classify approaches. Through the use of dimensionality reduction, noise reduction, data visualization, cluster analysis, or even as an intermediate step to assist other investigations, dimensionality reduction can be utilized.
It is the goal of the process of feature selection to identify a subset of the input variables (features, or attributes) that is appropriate for the task that is currently being performed. The filter approach, also known as information gain, the wrapper strategy, also known as accuracy-guided search, and the embedded strategy are the three tactics. The embedded strategy involves the addition or removal of features during the process of creating the model based on the prediction errors.
In the reduced space, it is possible to perform data analysis such as regression or classification with greater precision than in the corresponding space in the original space.
The data is transformed from a space with a high number of dimensions into a space with less dimensions by the process of feature projection, which is also known as feature extraction. There are a variety of nonlinear dimensionality reduction approaches available, in addition to the possibility that the data transformation will be linear, as in principal component analysis (PCA). Through multilinear subspace learning, tensor representation can be utilized for the purpose of dimensionality reduction when dealing with multidimensional knowledge.
Principal component analysis, the primary linear technique for dimensionality reduction, involves performing a linear mapping of the data to a lower-dimensional space in such a way that the variance of the data in the low-dimensional representation is maximized. This is done in order to reduce the number of dimensions. In practice, the covariance matrix of the data is built, and the eigenvectors on this matrix are computed. Furthermore, the correlation matrix of the data is sometimes constructed as well. It is now possible to recreate a significant portion of the variance of the initial data by making use of the principal components, which are the eigenvectors that correspond to the biggest eigenvalues. Furthermore, the first few eigenvectors can frequently be interpreted in terms of the large-scale physical behavior of the system. This is due to the fact that they frequently contribute the great bulk of the system's energy, particularly in low-dimensional systems. Despite this, it is necessary to demonstrate this on a case-by-case basis because not all systems exhibit this characteristics. The original space, which had a dimension equal to the number of points, has been reduced to the space that is spanned by a few eigenvectors. This reduction has resulted in the loss of some data, but it is hoped that the most significant variance has been preserved. a citation is required.
A non-negative matrix can be decomposed into the product of two non-negative matrices using the non-negative matrix decomposition (NMF) technique. This technique has shown great potential in domains like astronomy, which only deal with non-negative data. Since Lee and Seung's multiplicative update rule, which has been continuously developed: the inclusion of uncertainties, the consideration of missing data and parallel computation, sequential construction, which leads to the stability and linearity of NMF, as well as other updates including handling missing data in digital image processing, NMF has gained a lot of popularity throughout the years.
Sequential NMF is able to retain the flux in direct imaging of circumstellar structures in astronomy, as one of the methods of identifying exoplanets, particularly for the direct imaging of circumstellar discs. This is because sequential NMF is constructed using a stable component basis during creation and uses a linear modeling procedure. As opposed to principal component analysis (PCA), non-principal factor analysis (NMF) does not eliminate the mean of the matrices, which results in physical fluxes that are not negative. As a result, NMF is able to preserve more information than PCA, as Ren et al. proved.
By utilizing the kernel technique, principal component analysis can be utilized in a manner that is neither linear nor linear. The method that was developed as a result is able to generate nonlinear mappings that are capable of maximizing the variance in the data collected. The method that was developed as a result is known as kernel PCA.
Isomap, locally linear embedding (LLE), Hessian LLE, Laplacian eigenmaps, and approaches based on tangent space analysis are some examples of manifold learning techniques that are also considered to be among the most prominent nonlinear techniques. In order to generate a low-dimensional data representation, these strategies make use of a cost function that preserves the local features of the data. Additionally, these techniques can be interpreted as constructing a graph-based kernel for Kernel Principal Component Analysis.
In more recent times, there have been suggestions made for methods that, rather than establishing a fixed kernel, attempt to learn the kernel using the procedure of semidefinite programming. The maximum variance unfolding (MVU) methodology is typically cited as the most prominent example of such a method. The primary objective of the Maximum Value Undertaking (MVU) algorithm is to precisely maintain all pairwise distances between nearest neighbors (in the inner product space), while simultaneously maximizing the distances between points that are not nearest neighbors.
One alternate method for preserving neighborhoods is to minimize a cost function that measures the differences between the distances in the input and output areas. This is an alternative approach to neighborhood preservation. Important examples of such techniques include classical multidimensional scaling, which is the same as principal component analysis (PCA); isomap, which makes use of geodesic distances in the data space; diffusion maps, which make use of diffusion distances in the data space; t-distributed stochastic neighbor embedding (t-SNE), which minimizes the divergence between distributions over pairs of points; and curvilinear component analysis.
Using autoencoders, which are a specific type of feedforward neural network that contain a bottleneck hidden layer, is an alternative method for reducing the dimensions of a nonlinear system. The training of deep encoders is often carried out by employing a greedy layer-wise pre-training (for example, by utilizing a stack of limited Boltzmann machines), which is then followed by a finetuning stage that is based on backpropagation.
The linear discriminant analysis (LDA) is a generalization of Fisher's linear discriminant, which is a technique utilized in the fields of statistics, pattern recognition, and machine learning. The objective of LDA is to identify a linear combination of characteristics that distinguishes or differentiates between two or more categories of objects or events.
Through the utilization of the kernel function operator, GDA is concerned with nonlinear discriminant analysis. As a result of the fact that the GDA approach offers a mapping of the input vectors into high-dimensional feature space, the underlying theory is comparable to that of support-vector machines (SVM). In a manner analogous to that of LDA, the goal of GDA is to locate a projection for the features into a space with less dimensions by optimizing the ratio of between-class scatter to within-class scatter.
The learning of nonlinear dimension reduction functions and codings, as well as an inverse function from the coding to the original representation, can be accomplished through the utilization of autoencoders.
The T-distributed Stochastic Neighbor Embedding (t-SNE) methodology is a nonlinear dimensionality reduction method that is helpful for the visualization of high-dimensional datasets. As a result of the fact that it does not always maintain densities or distances in a satisfactory manner, its utilization in analysis processes such as clustering or outlier detection is not suggested.
Nonlinear dimensionality reduction is accomplished through the use of a technique known as uniform manifold approximation and projection (UMAP). Visually, it is comparable to t-SNE; however, it is predicated on the assumption that the data is uniformly distributed on a locally connected Riemannian manifold and that the Riemannian metric is either locally constant or approximately locally constant.
When dealing with high-dimensional datasets, it is common practice to perform dimension reduction before using a k-nearest neighbors (k-NN) algorithm. This is done in order to alleviate the effects of the curse of...