2

Hello I have a csv file of sports results for the english premier league that I am wanting to manipulate a single team that is located either in a home or away column. I then want to be able to create a series of columns which are sorted by that team and then return the results as per below. I have been able to do this in a loop but would love to know the pandas method. I have attempted to group by team- arsenal, but am finding it difficult to do that with two column options, where i would need to flip subsets of the opposing team.

df = pd.read_csv(
    'http://www.football-data.co.uk/mmz4281/1516/E0.csv', 
    sep=',')

result= df[(df['HomeTeam'] == "Arsenal") | (df.AwayTeam == "Arsenal")]


for index, row in result.iterrows():
    if row['HomeTeam'] == "Arsenal":
        if row['FTR'] == "H":
            print ('Win' , 'Home', row['FTHG'], '-', row['FTAG'])
        elif  row['FTR'] == "D":
            print ('Draw' , 'Home', row['FTHG'], '-', row['FTAG'])
        else:
            print ('Lose' , 'Home', row['FTHG'], '-', row['FTAG'])

# we dont need to put the conditons for else because we know if arsenal are not the home team they must be the away team,
# this is because we already set out dataframe filter above to show only games where arsenal is home or away, if we didnt
# do this we would do an elif and then do an improper result print for else
    else:
        if row['FTR'] == "H":
            print ('Win' , 'Home', row['FTHG'], '-', row['FTAG'])
        elif  row['FTR'] == "D":
            print ('Draw' , 'Home', row['FTHG'], '-', row['FTAG'])
        else:
            print ('Lose' , 'Home', row['FTHG'], '-', row['FTAG'])

1 Answers1

1

You can use double numpy.where:

df = pd.read_csv('http://www.football-data.co.uk/mmz4281/1516/E0.csv')
#print (df)

#add copy - http://stackoverflow.com/a/42439031/2901002
result= df[(df['HomeTeam'] == "Arsenal") | (df.AwayTeam == "Arsenal")].copy()

#get output to Series, necessary convert numeric to string
splitted = ' ' + result['FTHG'].astype(str) + ' - ' + result['FTAG'].astype(str)
#boolean masks
mask = (result['HomeTeam'] == "Arsenal")
mask1 = (result['FTR'] == "H")
mask2 = (result['FTR'] == "D")

#for new column as output
result['out'] = np.where(mask & mask1, 'Win home' + splitted,
                np.where(mask & mask2, 'Draw home' + splitted, 'Lose home' + splitted))
print (result[['HomeTeam', 'AwayTeam', 'out']])
           HomeTeam        AwayTeam              out
6           Arsenal        West Ham  Lose home 0 - 2
17   Crystal Palace         Arsenal  Lose home 1 - 2
29          Arsenal       Liverpool  Draw home 0 - 0
35        Newcastle         Arsenal  Lose home 0 - 1
40          Arsenal           Stoke   Win home 2 - 0
52          Chelsea         Arsenal  Lose home 2 - 0
60        Leicester         Arsenal  Lose home 2 - 5
77          Arsenal      Man United   Win home 3 - 0
86          Watford         Arsenal  Lose home 0 - 3
90          Arsenal         Everton   Win home 2 - 1
104         Swansea         Arsenal  Lose home 0 - 3
117         Arsenal       Tottenham  Draw home 1 - 1
127       West Brom         Arsenal  Lose home 2 - 1
137         Norwich         Arsenal  Lose home 1 - 1
140         Arsenal      Sunderland   Win home 3 - 1
156     Aston Villa         Arsenal  Lose home 0 - 2
169         Arsenal        Man City   Win home 2 - 1
176     Southampton         Arsenal  Lose home 4 - 0
180         Arsenal     Bournemouth   Win home 2 - 0
190         Arsenal       Newcastle   Win home 1 - 0
204       Liverpool         Arsenal  Lose home 3 - 3
218           Stoke         Arsenal  Lose home 0 - 0
228         Arsenal         Chelsea  Lose home 0 - 1
230         Arsenal     Southampton  Draw home 0 - 0
248     Bournemouth         Arsenal  Lose home 0 - 2
257         Arsenal       Leicester   Win home 2 - 1
267      Man United         Arsenal  Lose home 3 - 2
273         Arsenal         Swansea  Lose home 1 - 2
284       Tottenham         Arsenal  Lose home 2 - 2
295         Everton         Arsenal  Lose home 0 - 2
303         Arsenal         Watford   Win home 4 - 0
319        West Ham         Arsenal  Lose home 3 - 3
330         Arsenal  Crystal Palace  Draw home 1 - 1
338         Arsenal       West Brom   Win home 2 - 0
344      Sunderland         Arsenal  Lose home 0 - 0
346         Arsenal         Norwich   Win home 1 - 0
364        Man City         Arsenal  Lose home 2 - 2
370         Arsenal     Aston Villa   Win home 4 - 0

For Series as output:

out = pd.Series(np.where(mask & mask1, 'Win home' + splitted,
                np.where(mask & mask2, 'Draw home' + splitted, 'Lose home' + splitted)), 
                index=result.index)
print (out)
6      Lose home 0 - 2
17     Lose home 1 - 2
29     Draw home 0 - 0
35     Lose home 0 - 1
40      Win home 2 - 0
52     Lose home 2 - 0
60     Lose home 2 - 5
77      Win home 3 - 0
86     Lose home 0 - 3
90      Win home 2 - 1
104    Lose home 0 - 3
117    Draw home 1 - 1
127    Lose home 2 - 1
137    Lose home 1 - 1
140     Win home 3 - 1
156    Lose home 0 - 2
169     Win home 2 - 1
176    Lose home 4 - 0
180     Win home 2 - 0
190     Win home 1 - 0
204    Lose home 3 - 3
218    Lose home 0 - 0
228    Lose home 0 - 1
230    Draw home 0 - 0
248    Lose home 0 - 2
257     Win home 2 - 1
267    Lose home 3 - 2
273    Lose home 1 - 2
284    Lose home 2 - 2
295    Lose home 0 - 2
303     Win home 4 - 0
319    Lose home 3 - 3
330    Draw home 1 - 1
338     Win home 2 - 0
344    Lose home 0 - 0
346     Win home 1 - 0
364    Lose home 2 - 2
370     Win home 4 - 0
dtype: object
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thankyou. From that point could i then add all the additional fields next to home team, away team and the new 'out'? – Sam Sweeney Apr 22 '17 at 07:24
  • I tink it depends what do you need do next, but if need data in df then yes. Or can you explain more? – jezrael Apr 22 '17 at 07:26
  • Sure for example I would love to be able to have the fields of total home and away goals so I can begin to iterate over those? – Sam Sweeney Apr 23 '17 at 02:02
  • Hmmm, iterating in pandas is really slow, so the best is avoid it. Can you explain why do you need iterate it? Thanks. – jezrael Apr 23 '17 at 05:45