K-means clustering is a commonly used unsupervised learning method while PCA is a widely used unsupervised dimeansion reduction technique. Do you think about these two method correlate with each other? I first give you a simple intuitive: Given a data matrix X where each column is a data vector, there are two ways to reduce the rank of the data matrix. On one end, we can reduce the rank of matrix by column (reduce the number of feature) from the view of dimension reduction; On the other end, we can equivalently reduce the rank of the matrix by rows (cluster the data).
Mathematics analysis has proved that the principal components of the gram matrix of the centered data are the continuous solutions to the discrete cluster membership indicators for k-means clustering. Also by embedding the data into cluster subspace that is spanned by K cluster centroids, the cluster is becoming more compact (inter-cluster distance is remained and intra-cluster distance is reduced). Applying k-means clustering in this cluster subspace can achieve more effective result that is called PCA-guided k-means clustering.