Destiny 2 Survival Crucible Match Analysis

Intro

The purpose of this notebook is to analyse kills, deaths, and assists for both teams and individual guardians in relation to victory. We will analyse both team metrics and individual metrics as well as looking at metrics for the entire match and per minute (per match data is not available, per minute provides some measure of comperable data).

For the actual analysis, skip down to the Graphs and then the Analysis section at the bottom.

As for model c values, I did extensive testing with different values of C that is not shown here.

Imports and Data setup

First we need to setup our environment and import data. We will be using two datasets, one with team stats (3000 rows, 1500 games) and one with guardian stats (10,000 rows, ~1,666 games).

In [1]:
import numpy as np
import pandas as pd
from math import sqrt
import matplotlib.pyplot as plt

%matplotlib inline
team_df = pd.read_csv("https://docs.google.com/uc?id=1_0eVAF1TRL7Qn4H8-MMCBrX9jQD5D6n1&export=download")
guardian_df = pd.read_csv("https://docs.google.com/uc?id=1BHXewVacwmtswKBmcFEXliymgitKNnp0&export=download")

And we will remove outliers

  • Games shorter than 2 minutes (can assume the match ended with a team leaving)
  • Games that did not complete
  • Games where there were 0 Kills or Deaths
In [2]:
team_df = team_df[team_df.standing != 3]
team_df = team_df[(team_df.kills != 0) & (team_df.deaths != 0)]
team_df = team_df.reset_index(drop=True)
guardian_df = guardian_df[guardian_df.completed == 1]
guardian_df = guardian_df[guardian_df.standing != 3]
guardian_df = guardian_df[(guardian_df.kills != 0) & (guardian_df.deaths != 0)]
guardian_df = guardian_df.drop(['completed'], 1)
guardian_df = guardian_df.reset_index(drop=True)

Team Stat Analysis

First, we need to calculate the per minute values

In [3]:
minutes_team_df = team_df.copy()
minutes_team_df['kills'] = minutes_team_df['kills']/(minutes_team_df['duration']/60)
minutes_team_df['assists'] = minutes_team_df['assists']/(minutes_team_df['duration']/60)
minutes_team_df['deaths'] = minutes_team_df['deaths']/(minutes_team_df['duration']/60)

Let's Visualize the data

First, let's look at Team Kills and Deaths per match. Wins are blue, losses are red.

In [4]:
col = np.where(minutes_team_df.standing==0,'b','r')
plt.subplot(1, 3, 1)
plt.subplots_adjust(right=3, top=1)
plt.scatter(team_df.kills, team_df.assists,  color=col, alpha=0.4)
plt.xlabel(" Team Kills per match")
plt.ylabel(" Team Assists per match")
plt.subplot(1, 3, 2)
plt.scatter(team_df.deaths, team_df.assists,  color=col, alpha=0.4)
plt.xlabel(" Team Deaths per match")
plt.ylabel(" Team Assists per match")
plt.subplot(1, 3, 3)
plt.scatter(team_df.kills, team_df.deaths,  color=col, alpha=0.4)
plt.xlabel(" Team Kills per match")
plt.ylabel(" Team Deaths per match")
plt.show()

We can see a strong relation between a higher number of kills and winning and a small relation between high assists and winning.
Assists vs Deaths does play a role.
Unsuprisingly, having over a 1 Team K/D generally means you wil win the match
Now, let's examine the same data but with the per minute data.

In [5]:
plt.subplot(1, 3, 1)
plt.subplots_adjust(right=3, top=1)
plt.scatter(minutes_team_df.kills, minutes_team_df.assists,  color=col, alpha=0.4)
plt.xlabel(" Team Kills per minute")
plt.ylabel(" Team Assists per minute")
plt.subplot(1, 3, 2)
plt.scatter(minutes_team_df.deaths, minutes_team_df.assists,  color=col, alpha=0.4)
plt.xlabel(" Team Deaths per minute")
plt.ylabel(" Team Assists per minute")
plt.subplot(1, 3, 3)
plt.scatter(minutes_team_df.kills, minutes_team_df.deaths,  color=col, alpha=0.4)
plt.xlabel(" Team Kills per minute")
plt.ylabel(" Team Deaths per minute")
plt.show()

The data shows simlar findings. Team kills/min is very important. Team assists/min is midly imoprtant

Create Models

First, we will split our data into train and test sets to confirm our model works on data it was not trained on. We will also train the model

In [6]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

X = np.asarray(team_df[['kills','deaths','assists']])
y = np.asarray(team_df['standing'])
X_train_team, X_test_team, y_train_team, y_test_team = train_test_split( X, y, test_size=0.2, random_state=4)
X = np.asarray(minutes_team_df[['kills','deaths','assists']])
y = np.asarray(minutes_team_df['standing'])
X_train_team_minutes, X_test_team_minutes, y_train_team_minutes, y_test_team_minutes = train_test_split( X, y, test_size=0.2, random_state=4)
model_team = LogisticRegression(C=0.002895, solver='liblinear').fit(X_train_team, y_train_team)
model_team_minutes = LogisticRegression(C=1, solver='liblinear').fit(X_train_team_minutes, y_train_team_minutes)

Let's check the the accuracy of both models

In [7]:
from sklearn.metrics import f1_score
yhat_team = model_team.predict(X_test_team)
yhat_team_minutes = model_team_minutes.predict(X_test_team_minutes)
team_accuracy = f1_score(y_test_team, yhat_team)
team_minutes_accuracy = f1_score(y_test_team_minutes, yhat_team_minutes)


print (f'Team per game accuracy: {team_accuracy*100:.1f}%')
print (f'Team per minute accuracy: {team_minutes_accuracy*100:.1f}%')
Team per game accuracy: 92.1%
Team per minute accuracy: 91.8%

So, we now have 2 models. One that is 92.1% accurate and another that is 91.8% accurate at predicting the winner of the match based on team stats. We will look into the info more in the analysis section.

Player Stat Analysis

We will be doing the same work on the Player stat info.
First, we need to calculate the per minute values

In [8]:
minutes_guardian_df = guardian_df.copy()
minutes_guardian_df['kills'] = minutes_guardian_df['kills']/(minutes_guardian_df['duration']/60)
minutes_guardian_df['assists'] = minutes_guardian_df['assists']/(minutes_guardian_df['duration']/60)
minutes_guardian_df['deaths'] = minutes_guardian_df['deaths']/(minutes_guardian_df['duration']/60)

Let's Visualize the data

First, let's look at Player Kills and Deaths per match. Wins are blue, losses are red.

In [9]:
col = np.where(guardian_df.standing==0,'b','r')
plt.subplot(1, 3, 1)
plt.subplots_adjust(right=3, top=1)
plt.scatter(guardian_df.kills, guardian_df.assists,  color=col, alpha=0.1,s = 110, marker = 's')
plt.axis('equal')
plt.xlabel("Guardian Kills per game")
plt.ylabel("Guardian Assists per game")
plt.subplot(1, 3, 2)
plt.scatter(guardian_df.deaths, guardian_df.assists,  color=col, alpha=0.1,s = 155, marker = 's')
plt.axis('equal')
plt.xlabel("Guardian Deaths per game")
plt.ylabel("Guardian Assists per game")
plt.subplot(1, 3, 3)
plt.scatter(guardian_df.kills, guardian_df.deaths,  color=col, alpha=0.1,s = 90, marker = 's')
plt.axis('equal')
plt.xlabel("Guardian Kills per game")
plt.ylabel("Guardian Deaths per game")
plt.show()

We can see a strong relation between a higher number of kills and winning and a strong relation between high assists and winning.
Also, it appears that if a guardian has less then 7 deaths, his team will most likely win. Unless he gets over 10 kills, dying more than 7 times is likely to cause the team to lose.
Now, let's examine the same data but with the per minute data.

In [10]:
col = np.where(guardian_df.standing==0,'b','r')
plt.subplot(1, 3, 1)
plt.subplots_adjust(right=3, top=1)
plt.scatter(minutes_guardian_df.kills, minutes_guardian_df.assists,  color=col, alpha=0.1)
plt.xlabel("Guardian Kills per minute")
plt.ylabel("Guardian Assists per minute")
plt.subplot(1, 3, 2)
plt.scatter(minutes_guardian_df.deaths, minutes_guardian_df.assists,  color=col, alpha=0.1)
plt.axis('equal')
plt.xlabel("Guardian Deaths per minute")
plt.ylabel("Guardian Assists per minute")
plt.subplot(1, 3, 3)
plt.scatter(minutes_guardian_df.kills, minutes_guardian_df.deaths,  color=col, alpha=0.1)
plt.axis('equal')
plt.xlabel("Guardian Kills per minute")
plt.ylabel("Guardian Deaths per minute")
plt.show()

The data shows simlar findings. Player assists, and deaths both are large factors. As we can see in the last graph, kills/deaths is not a good indicator. Having a high K/D does not win you matches, not dying wins you matches.
One consideration here is causation. Does not dying cause you to win or does winning cause you to not die? Both could be true.
Unlike our team data, assists seems to be much more important when considering only a single guardian.

Create Models

First, we will split our data into train and test sets to confirm our model works on data it was not trained on. We will also train the model

In [11]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

X = np.asarray(guardian_df[['kills','deaths','assists']])
y = np.asarray(guardian_df['standing'])
X_train_guardian, X_test_guardian, y_train_guardian, y_test_guardian = train_test_split( X, y, test_size=0.2, random_state=4)
X = np.asarray(minutes_guardian_df[['kills','deaths','assists']])
y = np.asarray(minutes_guardian_df['standing'])
X_train_guardian_minutes, X_test_guardian_minutes, y_train_guardian_minutes, y_test_guardian_minutes = train_test_split( X, y, test_size=0.2, random_state=4)
model_guardian = LogisticRegression(C=.07, solver='liblinear').fit(X_train_guardian, y_train_guardian)
model_guardian_minutes = LogisticRegression(C=1, solver='liblinear').fit(X_train_guardian_minutes, y_train_guardian_minutes)

Let's check the the accuracy of both models

In [12]:
from sklearn.metrics import f1_score
yhat_guardian = model_guardian.predict(X_test_guardian)
yhat_guardian_minutes = model_guardian_minutes.predict(X_test_guardian_minutes)
guardian_accuracy = f1_score(y_test_guardian, yhat_guardian)
guardian_minutes_accuracy = f1_score(y_test_guardian_minutes, yhat_guardian_minutes)


print (f'Team per game accuracy: {guardian_accuracy*100:.1f}%')
print (f'Team per minute accuracy: {guardian_minutes_accuracy*100:.1f}%')
Team per game accuracy: 80.3%
Team per minute accuracy: 79.4%

So, we now have 2 models. One that is 80.3% accurate and another that is 79.4% accurate at predicting the winner of the match based on a single players stats. The decreased accuracy makes sense considering we are only considering one member of a 3 guardian fireteam.


Analysis

All right, here comes the fun part. Let's figure out what all this means.

Considering we are not putting our personal data in you might be wondering what value the model provides. The biggest thing we can learn from it is the weighting for each stat. In short, the model will tell us what stats play the biggest factor in a win. Do remember that these models are not 100% accurate so the weights do need to be taken with a grain of salt, especially for the guardian stats.
Let's look at Team stats first. A negative value indicates that the field contributes to a winning (0) probability. Positive to a losing (1) probability. You can caculate probability by multiplying the field times its weight.

In [13]:
cdf_team = pd.concat([pd.DataFrame(['kills','deaths','assists']),pd.DataFrame(np.transpose(model_team.coef_))], axis = 1)
cdf_team_minutes = pd.concat([pd.DataFrame(['kills','deaths','assists']),pd.DataFrame(np.transpose(model_team_minutes.coef_))], axis = 1)

print("Team weights")
print(cdf_team)
print("\nTeam/minute weights")
print(cdf_team_minutes)
Team weights
              
    kills -0.375103
   deaths  0.378767
  assists -0.030492

Team/minute weights
              
    kills -5.420541
   deaths  5.510833
  assists -0.260385

So, from this data we can conclude that kills/deaths per team is much more important than assists, even more so if we just look at per minute. Assists are about 1/10th to 1/20th as important as kills/deaths for a team.
In short we did learn too much from team stats except that more team assists generally relates to more team kills which does contribute to a team winning a match.

Let's examine Guardian stats

In [14]:
cdf_guardian = pd.concat([pd.DataFrame(['kills','deaths','assists']),pd.DataFrame(np.transpose(model_guardian.coef_))], axis = 1)
cdf_guardian_minutes = pd.concat([pd.DataFrame(['kills','deaths','assists']),pd.DataFrame(np.transpose(model_guardian_minutes.coef_))], axis = 1)

print("Guardian weights")
print(cdf_guardian)
print("\nGuardian/minute weights")
print(cdf_guardian_minutes)
Guardian weights
              
    kills -0.192050
   deaths  0.425820
  assists -0.394876

Guardian/minute weights
              
    kills -1.838531
   deaths  5.654579
  assists -4.052705

Now, we get some really interesting data. We can see that deaths and assists are for more important than kills for an individual. Over the course of a whole game, it is more important to not die and then get assists then to get kills. The same goes with per minute with deaths and assists receiving even more weight compared to kills. In short, if you want your team to win, don't die and play with your team.

Conclusions

For teams, do not die and get kills. But, you probably knew that already.
For individuals: focus on not dying, according to the model, that plays the biggest factor in your teams chances of winning, even more than getting kills. Prioritize survival over kills. Do your best to team-shot. One assist can help your team more than you getting a kill.