Dimensionality Reduction

Why dimensionality reduction?

Dimensionality reduction is a process of reducing the number of random varialbes under consideration by obtaining a set of principle variables.
Approaches can be divided into feature selection and feature extraction.

Principal component analysis (PCA)
- Linear dimensionality reduction
- Extract “principal components” that are uncorrelated with each other to represent the variance in the data
- Generate ranked list of “principal components” that explain high to low fraction of variance
- Typically works in Euclidean space (linear), not suitable for data on a non-euclidean space (non-linear) or contain fine structures
Kernel PCA
- Employ kernel trick to PCA to increase capacity of nonlinear mapping
Linear Disriminant Analysis (LDA)
T-distributed Stochastic Neighboring Embedding (t-SNE)
- Non-linear dimensionality reduction
- Model each high-dimensional object by a 2 / 3 dimensional point in way that similar objects are modeled by nearby points and dissimilar objectes are modeled by distant points
- Based on neighboring map
- Developed from Stochastic Neighbor Embedding (SNE)
Uniform Manifold Approximation and Projection (UMAP)
- Non-linear dimensionality reduction
- Based on neighboring map & topological data analysis method