WSH Wizards 2 Decade Dream Team

Sandeep Bansal

Published in

Analytics Vidhya

6 min readJun 18, 2020

Using Machine Learning to analyze my favorite basketball team.

To see the entire code visit my Github: Here

As I watched the documentary The Last Dance featuring the Chicago Bulls, I anticipated the possibility of the show concluding with Michael Jordan’s final days in the NBA wearing a Wizards jersey. Unfortunately, after falling short to mention his NBA comeback to DC, I was left contemplating : Does anyone remember Michael Jordan playing for the Washington Wizards?

Although his short lived career in a Wizards jersey has been deemed a failure by many, his time in the Nation’s Capital still remains a popular memory for the DC fan base. In addition to a deep dive of the 1998 Chicago Bulls season, the documentary also highlighted many supplemental players who played a role in the franchise winning its 6th championship. Players such as Tony Kucoc, Bill Cartwright, and Scott Burrell were a distant memory until the documentary featured them in the drama filled nationally televised series.

Inspired by the documentary, I decided to look into my own favorite team’s history and relive memories of those who played for the Washington Wizards. With the utilization of machine learning, I generated the two decade dream team consisting of the best players to have played for the Washington Wizards in the past 20 years. As I began exploring the dataset two questions quickly surfaced: Who would make this team?, and did the great Michael Jordan do enough in his two year stint to make it? Buckle your seatbelts, as the results may surprise you!

Step 1: Data Acquisition and Initial Observation

The first step was to obtain the dataset from Basketball Reference. In total I obtained 182 players over the course of 20 years. The first observation I made was more or less “Wow I didn’t know that player played for the Wizards?” Below is a snapshot of some that I had forgotten over the past two decades. Bojan Bogdonovic is a premier player today, Shaun Livingston went on to win multiple championships with the Warriors, and Rasual Butler, passed away in an unfortunate tragic car accident just a few years ago.

Also worth noting is the position distribution. For example, over the past twenty years the Washington Wizards have rostered 46 shooting guards, and 20 centers. As a Wizards fan this bonds all too well with the team’s performance today. To think that over the past 20 years the team has only rostered a mere twenty centers is just one of many issues.

Step 2: Identify statistical categories for each position.

There are many statistics that are recorded in the NBA and one of the challenges was determining how to evaluate each position since some performed better in more categories than others. For example, a point guard would not have a high value of blocks per game, so it wouldn’t make sense to use that as a statistic for the point guard position. To solve this problem, I took the average performance based on the following categories: FG%, TRB/G, STL/G, BLK/G, AST/G, PTS/G, and decided to use the following statistics for the given positions.

1. Point Guard: AST/G, PTS/G, STLs/G

2. Shooting Guard: AST/G, PTS/G, STL/G
3. Small Forward: FG%, TRB/G
4. Power Forward: BLK/G, FG%, TRB/G
5. Center: FG%, TRB/G

After determining how each position would be evaluated, I can only imagine John Wall’s face if he saw assists and steals.

Did you know: Since 2014 John Wall ranks NUMBER 1 in assists per game at nearly 9.8 and stands 7th in steals averaging 1.8 per game

Step 3: Use Machine Learning to create clusters for each position

After determining each positions best-performing quality, I went ahead and created clusters within each position labeling each quality as predictive variables. Through an unsupervised machine learning model called K-Means Clustering, I’m able to group data around central points to create distinct clusters of similar data. The central points are represented by the letter k, which needs to be determined before simulating the model. In other words k will cluster players based on their statistical performance-the better the performing quality the higher the cluster ranking.

Now I had to decide the number of clusters labeled n_clusters. This number determines how many groups players will be placed in. Essentially n_clusters represents the number of teams. If my n_cluster = 5, that means my model will group players across 5 clusters (or teams) based on their performance quality. Now the question is how do I determine this n_cluster value? Ideally the lower the inertia score the more similar each data points are within a cluster. Therefore we want to pick a number that shows clusters of data points close by as well as a low inertia score. Below is an example of graphs that depicts a n_cluster number that is best useful for the data.

SF, PF, C that show an estimate of n_cluster = 8 across all positions.

A way to visualize the clusters to see if they are close together is depicted below. The diagram on the left visualizes the SF position. A quantitative way to know if the n_cluster value is accurate is to compare the mean and medium of the each positions best performing characteristic to see if the central tendency values differs in variability. To the right is a clear example which happens to be the exact same!

After determining the best n_cluster = 8, and clustering all the players into different clusters based on their performance here are the outcomes for each position. Keep in mind I did not set a minimum game amount in order to include rookies and their ranking is based by Basketball Reference which calculates how much a player contributed to the overall team wins. This means if the team overall only won 20 games for example during a specific season regardless of how good the individual player is Basketball Reference will not rank that player very high since the team did not win many games.

Below are the top two cluster of players. Look who it is: Gilbert Arenas!

Step 4: Introducing WSH Wizards 2 Decade Dream Team

Since I did not set a minimum number of games that players played, again wanting to account for rookies, I decided to combine cluster 1 and cluster 2 into one dream team and label the top 15 players the 2 Decade Dream Team.

Don’t worry Brad I didn’t forget about you!