Integration of PCA and K-Means to Cluster Soccer Players into Similar Groups

This project uses European Soccer Database which has more than 25,000 matches and more than 10,000 players for European professional soccer seasons from 2008 to 2016. The exploratory data analysis includes some steps for exploring and cleaning the dataset, some steps for feature engineering using Pearson’s correlation coefficient with key attributes and domain knowledge, and some steps for grouping similar clusters using unsupervised machine learning algorithm, K-Means aided by Principal Components Analysis. Learn, like and feel free to leave a comment below.

;