Data Analysis Specialised Reading: Kernel Trick
Kernel Trick
- Mathematical technique that maps data from one space to another space → uses Kernel function ⇒ takes as vectors in the original space as its inputs and returns the dot product of the vectors in the feature space
- If we have data $\textbf{x}, \textbf{z} \in X$ and a map $\phi: X → \mathbb{R}^N$ then $k(\textbf{x}, \textbf{z}) = \left<\phi(\textbf{x}), \phi(\textbf{z})\right>$ is a kernel function
- consider each coordinate of the transformed vector $\phi(\textbf{x})$ is just some function oft he coordinates in the corresponding lower dimensional vector $\textbf{x}$
- If we have data $\textbf{x}, \textbf{z} \in X$ and a map $\phi: X → \mathbb{R}^N$ then $k(\textbf{x}, \textbf{z}) = \left<\phi(\textbf{x}), \phi(\textbf{z})\right>$ is a kernel function
- Work with non-linearly separable data → map data points to a higher dimensional space where they become linearly separable
- ex: mapping data from the 2nd dimension to a 3rd dimensional space so that they can be linearly separated by a plane (rather than a line)
- Generally used in SVMs → find the hyperplane that best separates the data points of different classes
How It Works
- Data Preprocessing: Data cleaning & normalisation
- Kernel Function Selection: Selecting the best kernel function
- Kernel Matrix Computation: Compute the kernel matrix that measures the similarity between the data pointers in the input space
- Feature Space Mapping: Map the data into higher-dimensional space
- Model Training: Train model on mapped data
Types of Kernel Methods in SVM
- Linear Kernel: used when the data is linearly separable $k(\textbf{x}, \textbf{z}) = \textbf{x} \cdot \textbf{z}$
- Polynomial Kernel: used when the data is not linearly separable $k(\textbf{x}, \textbf{z}) = (\textbf{x} \cdot \textbf{z}+1)d$
- Radial-basis Function (RBF) Kernel: maps input data to an infinite-dimensional space $k(\textbf{x}, \textbf{z}) = e^{-\gamma\left\Vert \textbf{x}-\textbf{z} \right\Vert^2}$
- The Gaussian Kernel is a type of RBF kernel
- Sigmoid Kernel: transforms input data using the Sigmoid kernel $k(\textbf{x}, \textbf{z}) = \tanh{(\alpha \textbf{x}^T\textbf{z} + \beta)}$
- Laplacian Kernel: similar to RBF but has a sharper peak and faster decay $k(\textbf{x}, \textbf{z}) = e^{-\left\Vert \textbf{x}-\textbf{z} \right\Vert}$
- ANOVA Radial Basis Kernel: multiple-input kernel function → can be used for feature selection $k(\textbf{x}, \textbf{z}) = 1ne^{-(\textbf{x}k-\textbf{z}k)2}d$
- Exponential Kernel: similar to RBF but decays much faster $k(\textbf{x}, \textbf{z}) = e^{-\gamma\left\Vert \textbf{x}-\textbf{z} \right\Vert^2_2}$
- Wavelet Kernel: non-stationary kernel function that can be used for time-series analysis $k(\textbf{x}, \textbf{z}) = \sum{\phi(i,j)\psi(x(i), z(j))}$
- Spectral Kernel: based on eigenvalues and eigenvectors of a similarity matrix $k(\textbf{x}, \textbf{z}) = \sum{\lambda_i\phi_i(\textbf{x})\phi_i(\textbf{z})}$
- Mahalonibus Kernel: takes into account the covariance structure of the data $k(\textbf{x}, \textbf{z}) = e^{-\frac{1}{2}(\textbf{x}-\textbf{z})^TS^{-1}(\textbf{x}-\textbf{z})}$
Choosing the Right Kernel Function
- Understand the problem: Understand the type of data, features, and the complexity of the relationship between the features
- Choose a simple kernel function: Start with the linear kernel function as a baseline to compare with the more complex ones
- Test different functions: Play around with other kernel functions and compare their performance
- Tune the parameters: Experiment with different parameter values and choose the values that results in the best performance
- Use domain knowledge: Based on the type of data, use the domain knowledge and choose the appropriate type of kernel for the data
- Consider computational complexity: Take into consideration the resources that would be required for larger data sets
This post is licensed under CC BY 4.0 by the author.