Thursday 29 May 2014

Simple analysis of a few aspects of the Wikipedia World cup 2014 squads data

The data and script for this post can be found on this gist . The data is taken from wikipedia( The script analyses the data to create interesting charts about the 2014 world cup squads. Charts include box plots of age, number of home/foreign based players for each country, clubs with more than 4 players in the world cup and leagues with more than 10 players in the World cup.

The data shows that the youngest team is the Netherlands(Dutch) team. Only Mexico, Netherlands, Spain, England, Italy, Russia, germane and Iran have more home-based players than foreign based players. Most teams have less players based in their home countries. The European clubs dominate the number of clubs with the most players in the world cup (again not a surprise)!! The world cup 2014 appears to be a sort of "European Cup"!!

Links to the scripts and input data


  1. I really enjoyed playing with this script. Nice work!

  2. These look great but having just completed all the box plots for the teams with my year 9 class there are quite a few errors in your box plots. For example, Netherlands are not the youngest team. Their lower quartile is 22 but the median is 26 and the upper quartile is 30. Also their youngest player is 19 and oldest is 33 so the box plot for the Netherlands is almost completely wrong.