In practice, it is quite common that PCA is used to project high-dimensional data onto a lower-dimensional space, then have the K-means clustering algorithm be applied in a PCA subspace. Below is the result of automatic segmentation of baseball hitters into groups by applying K-means clustering on the first few principal components that account for the lion’s share of the sample variance.
Thursday, May 1, 2014
K-Means Clustering in a PCA Subspace
In a previous post, we saw a list of the top 250 offensive players in MLB history constructed by Principal Components Analysis (PCA), which transforms high-dimensional data to its lower-dimensional principal components while retaining a high percentage of the sample variation. The PCA projection of high-dimensional data onto a convenient lower-dimensional subspace allows accessible data visualization and provides an opportunity for data segmentation. Segmentation can add detail and structure to a PCA biplot like the one seen below for the reduced MLB players' career batting data.
In practice, it is quite common that PCA is used to project high-dimensional data onto a lower-dimensional space, then have the K-means clustering algorithm be applied in a PCA subspace. Below is the result of automatic segmentation of baseball hitters into groups by applying K-means clustering on the first few principal components that account for the lion’s share of the sample variance.
In practice, it is quite common that PCA is used to project high-dimensional data onto a lower-dimensional space, then have the K-means clustering algorithm be applied in a PCA subspace. Below is the result of automatic segmentation of baseball hitters into groups by applying K-means clustering on the first few principal components that account for the lion’s share of the sample variance.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment