November 15, 2021
I love basketball and I love the NBA. I think what I like most about the league is the competitive meritocracy where a player from virtually anywhere can succeed as long as he can play -- it is a make or miss league. Case in point, for the first 18 years of his life, Giannis Antetokounmpo, who was born in Greece the son of Nigerian parents, grew up poor, sometimes homeless, and effectively stateless due to Greek nationality laws concerning how he came to Greece. Now at the age of 26, Giannis, who stands 6'11" with a 7 foot wing span, is a 2-time MVP, perennial All-Star and most recently he scored 50 points in the deciding game to clinch the championship for the 2021 season. Amazing - all he needed was a ball and a hoop.
For the past 20 years I have devoured the NBA and watched most Boston Celtics games over the time period (20 seasons at 82 games per season is 1,640 games), and even purchased season's tickets for the 2013-2015 Celtics seasons (43 home games per season). So yes, I love basketball. Python proved extremely useful for analyzing how the league has changed since 2000 when I watched my first Celtics game on tv.
Before diving in you should know that much of the ranking analysis relies on using the Game Score metric, which is derived to standardize across all games, and provides a single measurement of a player's productivity for a single game. The metric is standardized such that a total of 40 points would be considered an outstanding performance, whereas 10 points is an average performance. **The formula for Game Score is: 'PTS + 0.4 * FG - 0.7 * FGA - 0.4*(FTA - FT) + 0.7 * ORB + 0.3 * DRB + STL + 0.7 * AST + 0.7 * BLK - 0.4 * PF - TOV'.
Methodology: From the original dataset (n=230,326), I used the groupby function to group by player, then averaged each statistical category to arrive at career average stats, and finally ranked by highest Game_Score average.
Observations: Even the most glib fans would not be surprised at the top 2 players, both perennial all-stars and both headed to the Hall of Fame: Lebron James, and Kevin Durant. LeBron James played 1,305 games from 2000-2021, during which he averaged 27 points per game with 7 assists, 7 rebounds, 2 steals and 1 block per game. Kevin Durant, whom played 800 games, took less shots per game overall, but did shoot more efficiently (Durant was top 20-30% for 3P, FT and FG, while James was slightly lower) while averaging 27 points, 7 rebounds and 4 assists. Next on the list was James Harden, who took more shots (he averaged 8.9 3P shots per game, as compared to 5.1 for Durant and 4.4 for James). At 29.1 points per game over 661 games, he ranked as the highest per-game scorer (games > 300) during the trailing 20 years.
Methodology: To standardize across the dataset, I also ranked each value and then grouped into quantiles (where 1 = top 20% of the dataset and 5 = bottom 20% of the dataset). For the purpose of finding the best players, I filtered games played to > 300.
Observations:
Another way to analyze the top players list may be to look at the cumulative +/- metric. The +/- metric keeps track of the net changes in the score when a given player is either on or off the court. In short, postive scores indicate the player adds value, whereas negatives scores indicate value destruction. This method does result in some overlap for the top players since 2000, but there are also new entrants - such as Tim Duncan and Tony Parker of the San Antonio Spurs, Klay Thompson and Draymond Green of the Golden State Warriors, and Rasheed Wallace and Chauncey Billups of the Detroit Pistons. The reason for this - the +/- metric is biased in that it does not explicitly measure an individual player's performance - the metric is impacted by the caliber of a player's teammates as well. It follows that an individual player on a better team will have a higher +/- metric by virtue of being on the court at the same time with better teammates.
The best teams of the past 2 decades include in some order the San Antonio Spurs from 2001-2016 (won 4 championships during the period); the Golden State Warriors from 2015-2021 (3 championships), and the Detroit Pistons of the early 2000s (1 championship). It's notable that each of these teams had a long duration where the core of the team maintained intact. The core of the Spurs was led by Tim Duncan, widely regarded as the best power forward ever, along with Tony Parker and Manu Ginobili, and later Kawhi Leonard; Steph Curry is head of the snake for the juggernaut Warriors and has been teammates with Draymond Green and Klay Thompson since the 2014 season; and the Detroit Pistons were unique in that the team was not led by a perennial superstar, but did succeed throguh a team approach with 4 very good players in Chauncey Billups, Rasheed Wallace, Ben Wallace and Tayshaun Prince.
This is all to say that those players that appeared on the best player list using the +/- metric and not through some form of ranking the aggregrate GAME_SCORE metric (Parker of the Spurs; Green and Thompson of the Warriors; and Billups and Wallace of the Pistons) benefitted form the quality of their teammates and the intactness of the core that formed the team.
Methodology: From the original dataset (n=230,326), I used the groupby function on Season, then averaged each statistical category to arrive at per season stats, and finally ranked by highest Game_Score average.
Observations:
My first observation is that it's notable that the 3 best (offensive) individual seasons of the past 20 years occured within the past 3 years, and in my mind there is no doubt that the scoring numbers are biased upwards due to recent rule changes. The 24-second shot clock, in which a team must shoot and at least hit the basketball rim within 24 seconds of obtaining offensive possession, has been in place since 1954. Starting in the 2018-2019 season, the NBA tweaked the rules a bit so that when a team shoots and then secures an offensive rebound to start a subsequent offensive position, the shot clock is reduced from 24 seconds to 14 seconds. The effect is, with reduced time to shoot, there are more offensive possessions, and as a result offensive statistics are biased upward, both for offense but also for defense as well (although likely to a lesser extent than offense). The changes were made in part to speed up the game, but also, I'm sure, is motivated for league revenue generation purposes (reduced time to shoot -> more shots -> more chances to score -> better stats -> higher excitement and interest in the league).
James Harden of the Houston Rockets posted the two best seasons (in 2019 and 2020) but he likely benefitted from his being the primary focus of his team's offense. The higher offensive usage resulted in an increased volume of shots. In both seasons, he averaged >=10 3-point shots as well as >=10 free-throws per game. The only other instance this occurred during the past 20 years was Harden's 2018 season, which ranks as the 13th-best invididual season of the past 20 years.
Nikola Jokic of the Denver Nuggets delivered the 3rd best season of the past 20 years in 2021, during which he also won the league MVP. Jokic did so while averaging 26 points, 8 rebounds, and 11 assists - which as a towering center, is unique to average that many assists per game. During that season, the 6'11 Serbian-born center dished out the 3rd most assists per game; the remaining 4 players in the top 5 assists per game leaderboard that season were 6" point guards. He is widely regarded as the best passing big man of all time.
Finally, LeBron James appears several times on the best 25 seasons of the past 20 years list. This should surprise no one - James is regarded as one of the top 2 players of all time (the other player is icon Micheal Jordan). Although they played in a different era governed by slightly different rules, James' longevity (he is currently on his 19th season) and his all around game will likely push him over the top as the best basketball player of all time. In fact:
Observations: To be considered the best player per season, a player would have to average a minimum of 24 points per game, and pull down 6 rebounds while dishing out a minimum of 4 assists. Out of 230,326 player games played, this statline was achieved 4,239 times, or 1.8%. I should also point out the the best player per season averaged well north of 24 points per game, and that while Kevin Garnett did average the lowest points per game of the beast seasons of the past 20 years, he also averaged 14 rebounds, 5 assists and 2 blocks (in the 2004 season). Out of all games played since 2000, 261 players had games with > 24 points, > 14 rebounds and > 5 assists (0.1% of all games played).
A 'triple double' in basketball is a statistical occurrence during which a player records >=10 in any three of the offensive statistical measures during a single game. As an example, the most common form of a triple-double is when an individual player tallies 10 or more points, 10 or more rebounds, and 10 or more assists for one game (the name refers to three stat categories in double digits). Some pundits regard it as a sign of basketball excellence, while others regard it as merely an outcome of stat-padding; the true meaning is likely somewhere in between. Triple-doubles historically have been rare, but have increased in prominence with the increased popularity of the three-point shot and increased pace of the game (higher ending game scores are indicative of more offensive possessions).
Since 2000, there has been a total of 1,090 triple-doubles tallied by NBA players. This includes 1,076 triple-doubles of the point-rebounds-assists variety, with 13 composed of point-rebounds-blocks, and 1 including rebounds, assists and steals.
Possibly the most unique stat-line in NBA history (or at least since 2000): on February 10, 2017, Draymond Green of the Golden State Warriors recorded the first triple-double in NBA history without scoring double-digits in points.
The 1992 Olympics held in Barcelona, Spain marked the first time that American NBA players participated in the Olympics. And it showed, as the team, featuring Michael Jordan, Magic Johnson, Larry Bird and Charles Barkley, beat opponents by an average margin of 44 points on their way to taking home the gold medal. But the world has caught up since then, considerably, to the extent that the past 3 players awarded league MVP have been non-American.
As shown below, international players represented ~23% of NBA rosters for the 2021 season (American players were 77%), which has been a steady mix shift since 2000, when 9 out of 10 players were American. In the 2021 season, American players hailed from 46 different countries, which is more than double the amount of countries represented in the 2000 season.
Chris Ford of the Boston Celtics made the NBA's first 3-pointer in the 1979-1980 NBA season (the rookie seasons for both Larry Bird and Magic Johnson) and since then the shot has become a staple of the game. Originally introduced on a one-year trial basis, the 3-point line measured 22-feet from the hoop in the corners and 23-feet, 9 inches to the top of the arc. As the game has evolved, the importance of the 3-point shot has increased.
The record for most 3-point shots made without a miss during a game is held by Ben Gordon, who made 9 3-pointers in a single game (twice) without a miss. While playing for the Chicago Bulls, Ben Gordon was 9 for 9 on 3-point shots on his way to 32 points (he connected on 11 shots out of a total of 19) on April 14, 2006. A bit under 6 years later (March 21, 2012), this time as a member of the Detroit Pistons, he connected on 9 3-pointers and made 13 out of 22 shots and 10 of 11 free throws to score a total of 45 points.
But really, Stephen ("Steph") Curry of the Golden State Warriors is the 3-point king and his dominance at making the shot is widely credited for influencing a next step in the game's evolution. To wit, there has only been 3 seasons in NBA history where a player have averaged over 5 threes made per game, and all 3 belong to Steph Curry. He averaged 5.3 threes per game in the 2020-21 season, 5.1 in the 2018-19 season, and 5.1 in the 2015-16 season. As of writing, through 15 games of the 2021-22 season, he has averaged 5.7 threes made per game.
An interactive version with tooltips - made using the python package Altair - can be accessed here. Code is reproduced below.
Much of my data collection relied on data from the excellent Basketball Reference website, which has compiled statistics for seemingly every NBA game and NBA player that has ever played (and even data on the defunct ABA league). Using a combo of web scraping and calling APIs, for each of the 30 franchises in the NBA, I created a Jupyter workbook that pulled in the game log (an individual player's stats during a game) of each player for each of the 82 games played since 2000. With 12 players per team, 82 games played per year, 30 teams and 20 seasons, this implies a total dataset of ~590,400 records (12*82*30*20), although there is likely statistical noise involved.
With the API, I used a while loop that pulled in gamelogs for each player for every game of each season since 2000. I organized the data processing with NBA season as the iterator, and saved each gamelog to a dataframe specific for a given season and given team. I then saved the output into csv format for easier data engineering and analysis. I organized my file structure to consist of 30 folders for 30 teams, each containing 21 csv files with dataframes of each player's gamelog for every game of a certain season.
Then for each team, I appended all dataframes into one master dataframe with all game logs for every player since the 2000 season. I then ranked by highest GAME_SCORE to rank the best individual games for each NBA team since 2000.
I then appended all team dataframes to compile a master dataframe for each game log for each player on each team for each season since since 2000. I also ranked this by GAME_SCORE.
Now that I have all the data for every game played by every NBA player since the 2000 season, I need a place to store the data. For this I rely upon SQL -- which I use with python for easier data manipulation.