Data analysis with ipl data - I

Photo by AaDil on Unsplash

Data analysis with ipl data - I


Today we are going to perform Exploratory data analysis on ipl data and here is the link for the data(you can also check out this version of data of cricsheet).

Here is a preview of what our data looks like:-

This is top five rows of our data looks like

Our task is to predict the batsman who is going to perform well in the upcoming match using dream11 points as a target column.

Feature Engineering

Creating some columns which help us to decide or evaluate batsman performance

Total_runs

total_runs = pd.DataFrame(df.groupby(['battingteam','bowlingteam','matchid','batsmanname'])['scorevalue'].sum()).\
       rename(columns={"scorevalue": "total_runs"})

Here is a preview of the output

Tip:

As you can see after getting total runs other columns are acting as indexes and you are not able to access them. If you want to access the index columns perform all the operations before and save that dataframe in a CSV format and try loading a CSV file you will be able to access it.

Comment down if you find valuable

Let's do this for all the important columns:



#number of sixes
batsmen_scores6 = pd.DataFrame(df_2021[df_2021['scorevalue'] == 6].groupby(['battingteam','bowlingteam','matchid', 'batsmanname'])['scorevalue'].count()).\
            rename(columns={"scorevalue": "run_6"})

#number of fours
batsmen_scores4 = pd.DataFrame(df_2021[df_2021['scorevalue'] == 4].groupby(['battingteam','bowlingteam','matchid', 'batsmanname'])['scorevalue'].count()).\
            rename(columns={"scorevalue": "run_4"})

#no of balls
batsmen_ball_faced_legal = pd.DataFrame(df_2021.groupby(['battingteam','bowlingteam','matchid', 'batsmanname'])['over'].nunique()).\
            rename(columns={"over": "total_legal_balls_faced"})

#strikerate
batsmen_Strikerate = pd.DataFrame((df_2021.groupby(['battingteam','bowlingteam','matchid','batsmanname'])['scorevalue'].sum()/df_2021.groupby(['battingteam','bowlingteam','matchid', 'batsmanname'])['over'].nunique())*100).\
       rename(columns={"scorevalue":"strike_rate"})


#fifties
fifties = pd.DataFrame((df_2021.groupby(['battingteam','bowlingteam','matchid','batsmanname'])['scorevalue'].sum() >= 50)  
                       ).\
       rename(columns={"scorevalue": "50's"})
#hundreds
hundreds = pd.DataFrame(df_2021.groupby(['battingteam','bowlingteam','matchid','batsmanname'])['scorevalue'].sum() >= 100  ).\
       rename(columns={"scorevalue": "100's"})

#duck
duck  = pd.DataFrame(df_2021.groupby(['battingteam','bowlingteam','matchid','batsmanname'])['scorevalue'].sum() == 0).\
        rename(columns={"scorevalue": "duck"})

# #batsmen_position
batsmen_position = pd.DataFrame(df_2021.groupby(['battingteam','bowlingteam','matchid', 'batsmanname'])['fallofwickets'].min())

#batting_team
# batting_team = pd.DataFrame(df_2021['battingteam'])

# batting_team
# bowling_team = pd.DataFrame(df_2021['bowlingteam'])

Now align all the singular column values into a dataframe with multiple columns

total_runs['duck'] = duck
total_runs['Sixes'] = batsmen_scores6
total_runs['Fours'] = batsmen_scores4
total_runs['balls'] = batsmen_ball_faced_legal
total_runs['Fifties'] = fifties
total_runs['hundreds'] = hundreds

Replacing Boolean Values into integer values for the easier operations

df = total_runs.copy()
df.hundreds = df.hundreds.replace({True: 1, False: 0})
df.Fifties = df.Fifties.replace({True: 1, False: 0})
df.duck = df.duck.replace({True: 1, False: 0})
df.Sixes = df.Sixes.replace({True: 1, False: 0})
df = df.fillna(0)

We are going to create a target column which is dream11 which contains player points and we are going to according to the dream11 points system.

#point system of dream11
  pointsconfig = {
        'total_runs': 1,
        'run_6': 2,
        'run_4': 1,
        '>=50': 8,
        '>=100': 16,
        'duck': -2,
  }

dreamll_score = df['total_runs'] + df['Sixes']*2 + df['Fours'] + df['Fifties']*8 + df['hundreds']*16 - df['duck']*2 + 4
df['dreamll_score'] = dreamll_score

Here is the final output:

We will get into this data in the upcoming blog and dig into the insights.

Did you find this article valuable?

Support Pranith's blog by becoming a sponsor. Any amount is appreciated!