I am working on an open Chess data-set (~15500 rows after cleaning), and I create nodes and edges. But the way I create the edges takes a bit of time.
A sample of my nodes tibble:
| player | |
|---|---|
| 1 | bougris |
| 2 | a-00 |
| 3 | ischia |
| ... | ... |
A sample picture of the per_game tibble:
The way I do it:
- I iterate for each node/player in
nodestibble, - searching the games where he played as black in
per_game, - and exchanging values of column
white_dwithblack_id, while changing the winner inwinnercolumn (using theswap()method I created). - Then, with
calc_victories()method, I group all games for the specific player, with every opponent he has faced, and calculating how many times he won, or lost, from the opponent (I store it toplayer_result). An example picture:
- Then I append the
player_resultto a tibble, with all previous players' results. - Finally, I delete from the
per_gametibble the node I just handled, both from black and white columns.
Here is my code:
for(i in 1:dim(nodes)){
# Exchange values of white with black column, only where black_id is the specific player
per_game[per_game$black_id == nodes[[1]][i], c('white_id', 'black_id', 'winner')] <-
per_game[per_game$black_id == nodes[[1]][i], c('black_id', 'white_id', swap('winner'))]
# Calculate the victories, for each opponent of the specific player
player_results <- calc_victories(nodes[[1]][i])
# Append the player's matches with the rest.
all_results <- rbind(all_results, player_results)
# Delete all matches with the specific player, either if he/she is black or white
per_game <- subset(per_game, white_id != nodes[[1]][i] & black_id != nodes[[1]][i])
}
all_results
Here are the functions calc_victories() and swap():
# A method to group the matches for a player, and sum his victories against each opponent
calc_victories <- function(i='-') {
player_results <- per_game %>%
filter(white_id==i | black_id==i) %>% # Finds matches with the specific player
group_by(white_id, black_id) %>%
rename(player1=white_id, player2=black_id) %>%
summarise_at(vars(total_matches), list(victories = sum)) %>% # Summarize total matches
arrange(desc(victories)) %>% # Sorts descending
ungroup()
return (player_results)
}
# A method to change the winner, because of white-black column exchange
swap <- function(winner='draw') {
if(winner=='black'){
gets = 'white'
} else if(winner=='white'){
gets = 'black'
} else {
return (winner)
}
return(gets)
}
The code executes for about 5 minutes, to handle all nodes. I think that this is happening, mainly because I iterate for each node. Maybe I should use something like map, but I am not so sure. Thank you.
