Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2020 Prediction of the Physiological Workload of Football Training Sessions STATISTICAL MODELLING AND R SHINY APPLICATION KENNETH MCMILLAN 9372431 Project for MSc Data Analytics (online) May 2020 0 Table of Contents Figures ........................................................................................................................................ ii Tables ......................................................................................................................................... ii Executive Summary ................................................................................................................... iii 1. Introduction ............................................................................................................................ 1 Project Report Overview ........................................................................................................ 2 2. Exploratory Data Analysis ...................................................................................................... 3 Data Overview ....................................................................................................................... 3 Response and Explanatory variables ..................................................................................... 3 Effect of Position .................................................................................................................... 7 Non-linearity and repeated measurements ............................................................................. 7 EDA Summary ....................................................................................................................... 9 3. Methods ................................................................................................................................10 Data Collection ......................................................................................................................10 Data Management.................................................................................................................10 Data Analysis and Statistical Modelling .................................................................................10 Model Fitting .........................................................................................................................12 Model Comparison ................................................................................................................13 Model Selection.....................................................................................................................14 Model Performance ...............................................................................................................14 4. Results ..................................................................................................................................15 Total Distance models ...........................................................................................................15 Metrics Summary for all response variables ..........................................................................22 Performance on Test Data Set ..............................................................................................25 Results Summary ..................................................................................................................27 5. Conclusions and Discussion .................................................................................................28 Model selection: Simpler LMM or GAMM? ............................................................................28 Project limitations ..................................................................................................................28 Practical applications ............................................................................................................29 Future application improvements ..........................................................................................30 Final Conclusion....................................................................................................................30 6. References ...........................................................................................................................31 i Appendix A: Partial Residual Plots (for TDmodel7) ...................................................................33 Appendix B: Model fits for all Response Variables 1 .................................................................36 Appendix C: Model fits for all Response Variables 2 (Facets) ...................................................39 Appendix D: Navigating the R Shiny Training Session Planner Application ...............................45 Appendix E: Collection of Training Session Planner Application Screenshots ...........................46 Figures Figure 1: Professional Football Player wearing a wearable GPS device .................................... 1 Figure 2: Training Drill Schematic .............................................................................................. 4 Figure 3: Pairwise Scatterplots of continuous variables .............................................................. 5 Figure 4: Distribution of the continuous variables ....................................................................... 5 Figure 5: Training drills vary by duration, the longest being Tactical drills .................................. 6 Figure 6: Total Distance observations for each training drill ....................................................... 6 Figure 7: No apparent effect of position on Total Distance covered............................................ 7 Figure 8: Scatterplot of relationship between Duration and the response variables .................... 8 Figure 9: Scatterplot of Total Distance against Duration, by Drill Type ....................................... 8 Figure 10: Dotplot of measurements per drill for a sample player ............................................... 9 Figure 11: Wearable GPS 'STATSports Viper' Device ...............................................................10 Figure 12: Residual Plots for TDmodel1 Diagnostics.................................................................15 Figure 13: Residual Plots for TDmodel2 Diagnostics.................................................................16 Figure 14: Residual Plots for TDmodel3 Diagnostics.................................................................17 Figure 15: Residual Plots for TDmodel4 Diagnostics.................................................................18 Figure 16: Residual Plots for TDmodel7 Diagnostics (best fitting model)...................................19 Figure 17: A selection of Partial Residual Plots for TDmodel7...................................................20 Figure 18: Model Fits using TDmodel4, 7 and 9 facetted by Player ...........................................21 Figure 19: Model Fits using TDmodel4, 7 and 9 facetted by Training Drill .................................21 Figure 20: Plots of Predicted versus Actuals (using Test Data Set) ...........................................25 Figure 21: Predicted versus Actuals for all response variables ..................................................26 Tables Table 1: Variables description and information ........................................................................... 4 Table 2: Total Distance Model Metrics ......................................................................................22 Table 3: High Intensity Distance Model Metrics .........................................................................22 Table 4: Acceleration Distance Model Metrics ...........................................................................23 Table 5: Deceleration Distance Model Metrics ..........................................................................23 Table 6: Sprint Distance Model Metrics .....................................................................................24 Table 7: Maximum Speed Model Metrics ..................................................................................24 Table 8: Performance Metrics using Test Data Set ...................................................................25 Table 9: Comparison of Train/Test Performance Metrics for all response variables ..................26 ii Executive Summary Introduction: Accurate prescription of individual physiological training loads in a football (soccer) training session is crucial in order to optimise training while minimising the risk of injury and increasing player availability. Global Positioning System (GPS) devices are routinely used by professional football clubs to monitor the training workloads of their players. These GPS devices collect data on key physiological workload metrics such as total distance covered, high intensity distance, and acceleration/deceleration distances. Main Aim: The main aim of this project was to build a football Training Session Planner application using appropriate predictive models trained on historical GPS training data, to ultimately enable a football coach to quickly view physiological workload predictions of the training drills that constitute the prescribed training session. Methods: Linear Mixed Models (LMMs) and Generalised Additive Mixed Models (GAMMs) were fitted to the randomly assigned Training data set (70% of data) for all 6 response variables. Nine different models were fitted for each response: the first 4 models involved fitting LMMs with random intercept (lme4 R package), with the subsequent non-linear models using GAMMs with different smoothing methods (mgcv and gamm4 R packages). A random player effect was used to account for the nested structure of the data. R2, RMSE, MAE and AIC of the models were compared in order to select an appropriate model for each response variable. Model performance was then established using the Test data set. Results: A GAMM with cubic regression splines was found to be the most appropriate model for all of the response variables, but a simpler polynomial LMM produced a similar model fit performance. A high R2 value (0.81) was found for the Total Distance response variable. However, R2 values of the model fits were low when predicting Sprint Distance and Maximum Speed (0.53 and 0.55 respectively). All models showed an excellent fit to the Test data set (r > 0.84) with the exception of the Sprint Distance and Maximum Speed response variables (r =0.64 and 0.71 respectively). RMSEs were found to be of an acceptable practical value. Conclusions: By using historical GPS workload data of football training sessions, the physiological workloads of a prescribed training session can be estimated with practically acceptable accuracy using LMMs and GAMMs. However, predictions of Sprint Distance and Maximum Speed attained should be viewed with caution. The fitted models were deployed in an interactive R Shiny Training Session Planner application designed to assist football coaches with prediction of the physiological workload of their prescribed training sessions. iii 1. Introduction One of the most popular and effective training monitoring devices in elite sports is the Global Positioning System (GPS) (1). They are widely used in professional football (soccer), rugby and other sports (2–6) to summarise a player’s physiological training workload (i.e. metrics such as total distance covered, high intensity distance, and acceleration distances) across the different training drills in a training session (see Figure 1). Accurate ‘prescription’ of individual training loads in a training session is crucial in order to optimise training adaptations while minimising the risk of injury. Football players participate in a wide range of training drills during a training session in order to induce physiological adaptations needed to succeed in competitive matchplay. A typical soccer training session consists of a combination of warm-up, technical/tactical drills, fitness related drills, and cool-down activities. The choice and duration of each training drill has to be planned in advance. The choice may Figure 1: Professional Football Player wearing a wearable GPS device depend on the proximity of the next upcoming competitive match and/or the football coach’s preference. Each prescribed training drill will induce a specific physiological effect and these effects may differ greatly between training drills. Once a training session is completed the resulting GPS workload data are typically analysed to assess each component of the training session separately by generating visualisations and tables summarising each metric at the squad and individual player level. The main aim of this project was to build a football Training Session Planner application using statistical models trained on historical GPS workload data that would allow a football coach to quickly obtain physiological workload predictions for their prescribed training session. The application would allow the coach to quickly determine whether the physiological load of the training session is appropriate or not. For example, the football coach may want to check that a training session that is planned for 24-hours before a competitive match is not physiologically too demanding, which might result in player fatigue on match-day. On the other hand, the coach 1 may want to analyse whether a prescribed training session is of the appropriate duration and intensity to elicit favourable physiological adaptations. Project Report Overview Section 2: Exploratory Data Analysis findings are reported in Section 2. Section 3: Methodologies used in the data analysis and modelling are described in Section 3. Brief overviews of Linear Mixed Models and Generalised Additive Mixed Models are presented, along with the reasons behind why these approaches were used in the statistical modelling in this project. Section 4: This section presents the main findings of the data analysis and details the modelling processes that were implemented. For brevity, modelling details for the Total Distance response variable are presented only. Modelling for the other 5 response variables followed a similar pattern, and the model performance metrics are tabulated and graphically illustrated in this section. Graphical illustrations of the model fits for all models can be found in Appendices B and C. To conclude the Results section, model performance metrics using the Test data set are presented. Section 5: The main findings of the project are discussed in Section 5. Limitations and practical applications of the project are acknowledged, and recommendations for future work are suggested. A modified html version of the project report (which includes more Exploratory Data Analysis graphs) can also viewed at: https://rpubs.com/kennymcmillan/603180 Using the most appropriate models established in the project analysis, an interactive R Shiny Training Session Planner application was created. This application was designed for ease of use by football coaches to predict the training load of their prescribed training sessions. This application allows coaches to “build” a training session, by selecting the individual training drills, and their subsequent duration and number of sets. The R Shiny application can be viewed at: https://aspireacademy-physiology.shinyapps.io/MSc_ShinyDashboard/ Instructions on navigating the Training Session Planner application can be found in Appendix D at the end of this project report. Screenshots of the application are displayed in Appendix E. 2 2. Exploratory Data Analysis This section describes findings of an Exploratory Data Analysis of the data set. The reader can also interactively explore the data set using the R Shiny application (see EDA options in sidebar). Data Overview GPS data were collected during training sessions from 26 professional football players. 4514 training drill observations were collected over a 2-year period. There are no missing data in the data set. The majority of data collected came from Technical and Tactical Drills. The smallest amount of data were collected from Warm-Up and Recovery Drills. Within each training drill classification, the number of observations of each training drill were found to vary considerably. Observations ranged from the maximum of 430 observations for the 11 v 11, fully coached drill to only 46 for Speed and Agility training. GPS physiological workload responses to 25 different training drills were analysed, with these drills belonging to 7 different classifications of training drills (see Figure 2 overleaf). Each player was also classified according to one of five possible playing positions: Attacker (A), Central Defender (CD), Central Midfielder (CM), Wide Defender (WD), and Wide Midfielder (WM). The numbers of observations per player was found to vary, in the range of 138 to 210 observations. Response and Explanatory variables Table 1 overleaf describes the response and explanatory variables in the data set. All 6 response variables investigated in this project are continuous random variables. All but one of the explanatory variables are categorical, with the training duration variable being the exception (continuous in nature). A scatterplot of all the continuous variables is presented in Figure 3. As expected, it can be seen that the distance-related metrics generally increase as the duration of the drill increases. Total Distance covered in particular has a high correlation with drill duration (r = 0.85). In contrast, Maximum Speed attained is poorly correlated with training duration (r = 0.37). This observation makes sense as a player could reach maximum speed at any time-point during a training drill. Maximum speed attained is more likely to be explained by the type of training drill being performed (e.g. we would expect higher maximum speed to be attained during speed and agility training compared to a warm-down drill). 3 Figure 2: Training Drill Schematic Table 1: Variables description and information 4 Figure 3: Pairwise Scatterplots of continuous variables The distributions of the response variables were found to be right-skewed, except Maximum Speed that appeared normally distributed (see Figure 4). Duration of the training drill was also found to be right-skewed. Drills of longer duration were mainly attributable to Tactical and Technical Drills (see Figure 5), with some observations approaching the duration of competitive match-play (90 minutes). There seems to be an interaction between drill duration and type. Figure 4: Distribution of the continuous variables 5 Figure 5: Training drills vary by duration, the longest being Tactical drills Figure 6 below illustrates all observations for each training drill for the Total Distance response variable. For some drills it is apparent that Total Distance increases steadily with increasing training duration, but for some other drills this is not as apparent (e.g. Warm-down Drill). Figure 6: Total Distance observations for each training drill 6 Effect of Position During competitive football match-play, playing in different positions may induce different physiological workloads. For example, wide-midfield players usually cover a significantly greater total distance during match-play than central-defenders. However, Figure 7 below illustrates that for the training drills, playing position seems to have no great effect on workload output, presumably due to smaller pitch sizes being used in most training drills, and the players not being bound to their tactical position on the training pitch. Figure 7: No apparent effect of position on Total Distance covered Non-linearity and repeated measurements Figure 8 overleaf clearly demonstrates that there is a non-linear relationship between all 6 response variables and training duration. Therefore, it would seem that fitting non-linear models would be more appropriate than fitting linear models. Figure 9 also show this nonlinearity between Total Distance covered and duration of the drill by Drill Type. For the Sprint Distance response variable, some strange stratified measurements are apparent, perhaps due to incorrect measurements by the GPS devices. 7 Figure 8: Scatterplot of relationship between Duration and the response variables Figure 9: Scatterplot of Total Distance against Duration, by Drill Type 8 The project data set consisted of repeated measurements for each training drill for each player. This is illustrated in Figure 10 for Player 15, who provided the most observations (210 data points). Figure 10: Dotplot of measurements per drill for a sample player EDA Summary The most important findings of the exploratory data analysis were: Distributions of the response variables are right-skewed with the exception of the Maximum Speed response variable that appears normally distributed. The explanatory variable Duration is also right-skewed. Different training drill types have different duration ranges. Different training drills evoke different physiological responses. There appears to be a non-linear relationship between duration of the training drill and the workload metrics (response variables). The playing position of the player may not have a great effect on the training workload response. The sprint distance data has some peculiarities. 9 3. Methods This section details the methodologies used in this project. Data Collection Data were collected from training sessions performed by 26 players of a British professional football team over 2 football seasons. All the players were in peak physical condition and free from injury at the time of data collection. The data came from wearable GPS devices (see Figure 11) designed specifically for use in sport (STATSports Figure 11: Wearable GPS 'STATSports Viper' Device Viper GNSS unit). Data were collected at 5 Hz, and downloaded by the software that accompanies the GPS devices. Full training session data was then carefully parsed into individual drill data segments which were then exported into a Microsoft Excel spreadsheet for further analysis. Data Management GPS observations coming from competitive match-play were removed from the data set, so that only training drill information was used for analysis. 4514 training drill observations were available for statistical analysis and modelling. 70% of the data were randomly assigned to a Training data set for model fitting, with the remaining 30% set aside as a holdout Test data set. Data Analysis and Statistical Modelling All statistical analysis was performed using R version 3.6.1 (7). Mixed-Effects Modelling Most commonly used statistical techniques assume independence of the observations or measurements. The project data set has a hierarchical structure, with repeated measurements each player had repeat observations for certain training drills. Repeat measurements in the same individual are usually more similar to each other than values from different individuals and therefore not independent (8). Mixed-effects models, such as Linear Mixed Models (LMMs) offer a flexible framework to model the sources of variation and correlation that arise from repeat observations (8). They are a statistical model that contains both fixed effects and random effects. Fixed effects can be defined as effects that we wish to estimate explicitly, while random effects are effects that introduce variability (9). 10 Random Intercept LMM One form of LMM is the Random Intercept model and this modelling approach was applied in some of the earlier models in this project. As an example, if we let 𝑦𝑖𝑗 be the Total Distance of the i-th player on the j-th occasion and let 𝑥𝑖𝑗 denote the training drill duration, then: 𝑦𝑖𝑗 = 𝛽0 + 𝑏0𝑖 + 𝛽1 𝑥𝑖𝑗 + 𝑒𝑖𝑗 , where the 𝑏0𝑖 are independent 𝑁(0, 𝜎02 ) , the 𝑒𝑖𝑗 are independent 𝑁(0, 𝜎 2 ), and 𝑏0𝑖 and 𝑒𝑖𝑗 are independent of each other. The term 𝑏0𝑖 is the random effect corresponding to the i-th player. By rewriting the model as: 𝑦𝑖𝑗 = 𝑏𝑖∗+ 𝛽1 𝑥𝑖𝑗 + 𝑒𝑖𝑗 , the terms 𝑏𝑖∗ ~ N(β0, 𝜎02 ) are random intercepts, with each intercept corresponding to a different player centred around 𝛽0 , with added variability represented by the variance parameter 𝜎 2 (9). Non-linear modelling: Generalised Additive Models and Generalised Additive Mixed Models EDA results from Section 2 of this report highlighted that the relationship between training duration and drill type is not necessarily linear over time. Mixed-effects models can be used to model both linear and non-linear relationships between response and explanatory variables. While LMMs can be used to express linear relationships between sets of variables, non-linear mixed-effects models can also model mechanistic relationships between the response and explanatory variables (8). Generalised Additive Models (GAMs) and Generalised Additive Mixed Models (GAMMs) provide a general framework for extending the standard linear model and capturing non-linearity. The linear predictor depends on unknown smooth functions of some of the explanatory variables, while maintaining additivity (10). The general form of a GAM is: 𝑔(𝐸(𝑌)) = 𝛽0 + 𝑓1 (𝑥1 ) + 𝑓2 (𝑥2 ) + ∙∙∙ + 𝑓𝑚 (𝑥𝑚 ) + e The model relates a univariate response variable, 𝑌, to some predictor variables, 𝑥𝑖 . An exponential family distribution is specified for 𝑌 (for example, normal or Poisson distributions) along with a link function 𝑔 (for example the identity or log functions) relating the expected value of 𝑌 to the predictor variables. The terms 𝑓1 (𝑥1 ), … , 𝑓𝑚 (𝑥𝑚 ) denote smooth, non-parametric functions. Smooth functions in GAMs allow the shape of the data to be followed much more closely than linear models, as they not constrained by the assumption of linearity. Therefore, 11 GAMs (and therefore GAMMs) have the ability to model non-linear relationships in a more flexible form than standard linear models, and they have the potential to make more accurate predictions. Indeed, GAM / GAMMs can serve as a useful compromise between linear and fully non-parametric models (10). When using the mgcv package in R, smoothing terms can be specified in a gam formula call by using the “s” term. Various smooth classes are available for use, and a variety of these were used in the models fitted in this project. What defines a smooth class is the basis used to represent the smooth function and quadratic penalty (or multiple penalties) used to penalise the basis coefficients in order to control the degree of smoothness. The smooths built into the mgcv package are all based one way or another on low rank versions of splines.(11) The smooth components of GAMs can be viewed as random effects for estimation purposes, meaning that random effects terms can be incorporated into GAMs. Random effects in a GAM can be represented in the same way that the smooths are — as penalised regression terms. This method can be used within the gam function call in the mgcv package by making use of the s( ... , bs = "re") term (11), and this method has been utilised in this project. As an example, when using the Player_ID variable (an explanatory variable in this project) , s(Player_ID, bs = "re") produces a random coefficient for each level of Player, with the random coefficients all modelled as i.i.d. normal (11). Logarithmic transformations Logarithmically transforming variables is a very common way to handle situations where a nonlinear relationship exists between the explanatory and response variables, or when a variance stabilising transformation is needed. Taking the log of the response and/or explanatory variables can help when residuals appear non-normal, and also help to eliminate any heteroscedasticity. Logarithmic transformations are also a convenient means of transforming a highly skewed variable into one that is more approximately normal. Since all response variables in this project were found to be right-skewed (with the exception of the Maximum Speed response variable), and a non-linear relationship between the response variables and the training drill duration was apparent, logarithmic transformations of both the response variables and the explanatory variable relating to drill duration were applied in the models fitted in this project. Model Fitting As recommended by Gelman (12), simple models were initially fitted and then modified to generate possibly better models (according to some goodness-of-fit criterion) with this process 12 repeated until no further improvements were evident. This iterative process was applied for modelling of all 6 response variables (see Table 1). In this project report, detailed analysis of the modelling steps is presented only for the ‘Total Distance’ response variable (for brevity purposes). However, all model fitting steps, results and analysis can be viewed in the submitted R code file. The following 9 models were sequentially fitted for each response variable: 1. LMM using all variables (random intercept model) 2. LMM with log transformations and dropping of insignificant predictor variables (random intercept model) 3. LMM with added interaction term (random intercept model) 4. LMM with polynomial (random intercept model) 5. GAMM using default smooth (Thin plate regression splines) (mgcv package). 6. GAMM using Factor smooths (mgcv package). 7. GAMM using Cubic Regression Splines (mgcv package). 8. GAMM using p-splines (mgcv package). 9. GAMM model (gamm4 package) (random intercept model) All models were fitted using the Training data set. On selection of the final model fit, the holdout Test data set was used to provide an unbiased evaluation of the final model (10). R packages used to fit and visualise LMM and GAMM models LMMs (random intercept model) were fitted using the lme4 package (13). The lme4 package default output for LMMs provides t-values but no p-values. The primary motivation for this omission is that in LMMs it is not at all obvious what the appropriate denominator degrees of freedom to use are, except perhaps for some simple designs and nicely balanced data (14). In this project, significance of LMMs were calculated using the lmerTest package (15). Prediction intervals from the LMMs (used in the Training Prediction Planner Application) were obtained using the ‘predictInterval’ function from the merTools package (16). R2 values for LMMs were estimated using the MuMin package (17). GAMMs were fitted using the mgcv and gamm4 packages (11). The mgcViz package (18) was used for visualisation of GAMM partial residual plots. Model Comparison The Likelihood Ratio Test (LRT) can compare two different nested models in order to determine whether one model is a better fit to the data than the other. LRTs are commonly used to decide if a particular explanatory variable should be included in a mixed-effects model. LRTs are most 13 commonly used to decide if a particular random effect (say, a random intercept) should be retained in the model by evaluating whether that effect improves the fit of the model, with all other model parameters held constant (14). When used for evaluating the significance of fixed effects, LRTs have one potential disadvantage: using LRTs to compare two models that differ in their fixed effects structure may not always be appropriate (8). When mixed-effects models are fitted using Restricted Maximum Likelihood (REML), there is a term in the REML criterion that changes when the fixed-effects structure changes, making a comparison of models differing in their fixed effects structure meaningless. Thus, if LRTs are to be used to evaluate significance, models must be fitted using Maximum Likelihood (ML), and this approach was therefore used in this project. Model Selection The Root Mean Squared Error (RMSE) and the Akaike Information Criterion (AIC) (19) were both used to decide between alternative models, by choosing the model with the smaller values. RMSE can be interpreted as the standard deviation of the unexplained variance, and has the useful property of being in the same units as the response variable. Lower values of RMSE indicate a better fit. RMSE is a good measure of how accurately the model predicts the response, and can be regarded as an important criterion for fit if the main purpose of the model is prediction (the main aim of this project). When comparing alternative models, the residuals from the fit were analysed to check for any departures from the model’s underlying assumptions. The final selected model was then deployed in the R Shiny application using REML. Model Performance Model performance was evaluated by comparing the predicted versus actual training load metrics in the holdout Test data set for each of the 6 response variables. For each of these metrics, Pearson’s correlation coefficient (predicted versus actuals), RMSE, and MAE were reported. 14 4. Results In this section of the report, the different models that were fitted to predict the Total Distance covered are described and results reported. Similar models were fitted for the other 5 response variables, and summaries of all model performance metrics are provided in Tables 2 to 7. R code detailing all modelling steps can be found in the submitted R code file. To conclude the results section, performance of the selected models on the Test Data Set are summarised and illustrated. Total Distance models TDmodel1: All variables, no transformations The first model that was fit to the Training data set to model Total Distance covered (highlighted in purple colour in the R code equation below) was a LMM with all variables, and no transformations. This LMM was a random intercept model, fitted using the lme4 package. A random player effect was used (highlighted in red): TDmodel1 = lmer(Total_Distance ~ Duration + Drill + Position + (1 | Player_ID), data = train_data, REML = F) 𝑦𝑖𝑗𝑘𝑙 = 𝛽0 + 𝜏𝑗 + 𝛾𝑘 + 𝛽1 𝑥𝑖𝑗𝑘𝑙 + 𝑏𝑖 + 𝜖𝑖𝑗𝑘𝑙 𝑏𝑖 ~ 𝑁(0, 𝜎12 ), 𝜖𝑖𝑗𝑘𝑙 ~𝑁(0, 𝜎 2 ) where 𝛽0 is the intercept, 𝜏𝑗 is the 𝑗-th effect of Drill, 𝛾𝑘 is the 𝑘-th effect of Position, 𝛽1 is the slope for Duration (𝑥𝑖𝑗𝑘𝑙 ) measured on the 𝑖-th player at occasion 𝑙 under the levels 𝑗, 𝑘, of Drill and Position respectively. The term 𝑏𝑖 is the random effect corresponding to the i-th player. This model produces a QQ plot with a “long” head and tail (see Figure 12). RMSE was found to be the highest (385) out of all the subsequent models fitted. Figure 12: Residual Plots for TDmodel1 Diagnostics 15 TDmodel2: Position variable dropped + Log transformation ANOVA results showed the Position explanatory variable to be a non-significant predictor of Total Distance covered and was subsequently dropped from future models. A log transformation was also applied to both the response variable and the explanatory variable Duration, in order to capture the correct functional form between the response and the predictors: TDmodel2 = lmer(log(Total_Distance) ~ log(Duration) + Drill + (1 | Player_ID), data = train_data, REML = F) log(𝑦𝑖𝑗𝑘 ) = 𝛽0 + 𝜏𝑗 + 𝛽1 log(𝑥𝑖𝑗𝑘 ) + 𝑏𝑖 + 𝜖𝑖𝑗𝑘 𝑏𝑖 ~ 𝑁(0, 𝜎12 ), 𝜖𝑖𝑗𝑘 ~𝑁(0, 𝜎 2 ) where 𝛽0 is the intercept, 𝜏𝑗 is the 𝑗-th effect of Drill, 𝛽1 is the slope for Duration (𝑥𝑖𝑗𝑘 ) measured on the 𝑖-th player at occasion 𝑘 under the level 𝑗 of Drill. The term 𝑏𝑖 is the random effect corresponding to the i -th player. In this instance where both the response variable and response variable are log-transformed, the interpretation of the model is that there is an expected percentage change in 𝑦 when 𝑥 increases by some percentage. Log transformation of Total Distance and Duration resulted in a more appropriate looking QQ plot (see red arrow in Figure 13 below), and residual scatter plot. Figure 13: Residual Plots for TDmodel2 Diagnostics 16 TDmodel3: Addition of an interaction term For the 3rd model, an interaction term between training duration and drill was applied in the model (highlighted in blue): TDmodel3 = lmer(log(Total_Distance) ~ log(Duration)*Drill + (1 | Player_ID), data = train_data, REML = F) log(𝑦𝑖𝑗𝑘 ) = 𝛽0 + 𝜏𝑗 + 𝜃𝑗 𝑥∗𝑖𝑗𝑘 + 𝑏𝑖 + 𝜖𝑖𝑗𝑘 , where 𝑥∗𝑖𝑗𝑘 = log(𝑥𝑖𝑗𝑘 ) , 𝑏𝑖 ~ 𝑁(0, 𝜎12 ), 𝜖𝑖𝑗𝑘 ~𝑁(0, 𝜎 2 ) The difference in this model is the addition of a separate linear regression for each level of Drill. Thus, each 𝜃𝑗 represents the slope of log(Duration) for the effect of drill 𝑗. An ANOVA showed that addition of the interaction significantly improves the model, with AIC largely decreasing from 1749 to 1482 and RMSE falling from 381 to 346. All residual model checks seemed acceptable, except the long tail on QQ plot (see Figure 14). Figure 14: Residual Plots for TDmodel3 Diagnostics TDmodel4: Addition of appropriate polynomial Prior EDA showed that the relationship between the response variables and Duration is nonlinear. Therefore, a polynomial fit was applied to the model to capture this non-linearity: 17 TDmodel4 = lmer (log (Total_Distance) ~ poly (log(Duration),3 )*Drill + (1 | Player_ID), data = train_data, REML = F) 𝑃 log(𝑦𝑖𝑗𝑘 ) = 𝛽0 + 𝜏𝑗 + ∑ 𝜃𝑗𝑞 (𝑃𝑞 (𝑥∗𝑖𝑗𝑘 )) + 𝑏𝑖 + 𝜖𝑖𝑗𝑘 , 𝑞=1 ∗ where 𝑥𝑖𝑗𝑘 = log(𝑥𝑖𝑗𝑘 ) , 𝑏𝑖 ~ 𝑁(0, 𝜎12 ), 𝜖𝑖𝑗𝑘 ~𝑁(0, 𝜎 2 ), and 𝑃𝑞 = Polynomial of degree q ANOVA results indicated that a 3rd order polynomial fit was the most appropriate. When compared to TDmodel3, AIC further decreased greatly from 1482 to 1350, whilst RMSE dropped from 346 to 327. All residual model checks remained of an acceptable nature, albeit with a long tail on the QQ plot still present (see Figure 15 overleaf). Figure 15: Residual Plots for TDmodel4 Diagnostics TDmodel5 to TDmodel8: GAMMs with differing smoothing functions TDmodel5 to TDmodel8 were GAMM models, fitted using the gam function in the mgcv package. Each GAMM model used a different smoother, denoted as 𝑓(𝑥): log(𝑦𝑖𝑗𝑘 ) = 𝛽0 + 𝑓(𝑥∗𝑖𝑗𝑘 ) + 𝑏𝑖 + 𝜖𝑖𝑗𝑘 , where 𝑥∗𝑖𝑗𝑘 = log(𝑥𝑖𝑗𝑘 ) , 𝑏𝑖 ~ 𝑁(0, 𝜎21 ), 𝜖𝑖𝑗𝑘 ~𝑁(0, 𝜎2 ) TDmodel5 used the default Thin plate regression splines as the smoothing function. An interaction was fitted using the “by” parameter in the s function of a gam function call in the mgcv package (highlighted in blue). Position was set as a random effect (highlighted in red): 18 TDmodel5 = mgcv::gam (log(Total_Distance) ~ s(log(Duration), by=Drill, id=0) + s(Player_ID, bs = "re"), data = train_data, method = "ML") In the following models, TDmodel6 through to TDmodel8 used the same interaction and random effect terms as TDmodel5, only differing in the penalised splines used (highlighted in green). TDmodel6 used the “factor smooth” option in mgcv, where a smooth was required at each level of Drill, and each smooth had the same smoothing parameter: TDmodel6 = mgcv::gam (log(Total_Distance) ~ s( log(Duration), by=Drill, bs = "fs ") + s(Player_ID, bs="re"), data = train_data, method = "ML") TDmodel7 used cubic regression splines: these have a cubic spline basis defined by a modest sized set of knots spread evenly through the covariate values. They are penalised by the conventional integrated square second derivative cubic spline penalty (11). TDmodel7 = mgcv::gam(log(Total_Distance) ~ s(log(Duration), by=Drill, bs = "cr” ) + s(Player_ID, bs="re") , data = train_data, method = "ML") For TDmodel8, P-splines combine a B-spline basis with a discrete penalty on the basis coefficients, where any combination of penalty and basis order is allowed (11). TDmodel8 = mgcv::gam(log(Total_Distance) ~ s log(Duration), by=Drill, bs = "ps") + s(Player_ID, bs="re") , data = train_data, method = "ML") Further decrements in AIC were evident between TDmodel4 and TDmodel7 (1350 to 1315), but only small fluctuations in RMSE were seen across the fitting of the various GAMM models. Figure 16: Residual Plots for TDmodel7 Diagnostics (best fitting model) 19 Figure 16 on the previous page displays the model check graphs for TDmodel7, which was found to be the model with the lowest AIC and second lowest RMSE out of all models fitted (see Table 2 on page 22). TDmodel9: GAMM using gamm4 package Finally, a GAMM was fitted using the gamm4 package, but no further improvements in AIC or RMSE were observed when compared to the previous GAMM models fitted using mgcv. TDmodel9 = gamm4 (log(Total_Distance) ~ s(log(Duration), by = Drill), random = ~ (1 | Player_ID), data = train_data, REML = F) Figures 17 displays a selection of partial residual plots for TDmodel7 (the model chosen as the most appropriate). Appendix A contains all of the partial residual plots for TDmodel7. Figures 18 and 19 display the model fits of TDmodel4, TDmodel7, and TDmodel9, facetted by Player and Training Drill respectively. It can be seen that the performance of the LMM with polynomial (TDmodel4) and the chosen GAMM model (TDmodel7) are very similar. Figure 17: A selection of Partial Residual Plots for TDmodel7 20 Figure 18: Model Fits using TDmodel4, 7 and 9 facetted by Player Figure 19: Model Fits using TDmodel4, 7 and 9 facetted by Training Drill 21 Metrics Summary for all response variables R2, AIC, RMSE and MAE are shown in Tables 2 – 7 below for all models fitted on the 6 response variables. The models that were selected for validation with the Test data set are highlighted in a red box in each table. Table 2: Total Distance Model Metrics A 3rd order polynomial was used in TDmodel4. Table 3: High Intensity Distance Model Metrics A 3rd order polynomial was used in HIDmodel4. 22 Table 4: Acceleration Distance Model Metrics A 3rd order polynomial was used in ADmodel4. Table 5: Deceleration Distance Model Metrics A 3rd order polynomial was used in DDmodel4. 23 Table 6: Sprint Distance Model Metrics A 5th order polynomial was used in SDmodel4. Table 7: Maximum Speed Model Metrics A 3rd order polynomial was used in MSmodel4. 24 Performance on Test Data Set A GAMM using cubic regression splines was found to be the most appropriate model for the prediction of all of the response variables. However, the simpler polynomial LMM models had very similar fit. All selected models showed an excellent fit to the Test data set (r > 0.84), with the exception of Sprint Distance and Maximum Speed (r = 0.64 and 0.71 respectively) (see Table 8). Figures 20 and 21 show the Predicted versus Actuals for each response variable. Table 8: Performance Metrics using Test Data Set Figure 20: Plots of Predicted versus Actuals (using Test Data Set) 25 Figure 21: Predicted versus Actuals for all response variables Table 9 displays a comparison of the Training and Test data performance metrics. The selected models all performed better on the Test data set, with the exception of Sprint Distance and Max Speed. It should also be noted that both the Sprint Distance and Maximum Speed response variables showed poor correlation of fitted versus actuals in both the Training and Test data set model fits. Table 9: Comparison of Train/Test Performance Metrics for all response variables 26 Results Summary Logarithmic transformation of the response variables and Duration was applied in order to improve model fit. Player position was established as a non-significant predictor variable. Visual inspection of residual plots from the selected models did not reveal any problematic deviations from homoscedasticity or normality, apart from the QQ plots having a “long tail” appearance. A GAMM using cubic regression splines was found to be the most appropriate model for prediction of all the response variables. The simpler LMMs with polynomial term had very similar RMSEs when compared to the more complex GAMM models. RMSE of all models are of an acceptable practical magnitude. The selected GAMM models performed well on the Test set data, with the exception of the response variables pertaining to Sprint Distance covered and Maximum Speed attained. R2 values are low for Sprint Distance and Maximum Speed (0.53 and 0.55 respectively), and estimates of these responses should be treated with caution. 27 5. Conclusions and Discussion Model selection: Simpler LMM or GAMM? The results from this project demonstrate that GAMMs and an LMM using polynomials are appropriate models for predicting the physiological workload of football training drills. In the polynomial LMM, the polynomial term increases the flexibility of the linear model in order to model a curvilinear relationship, thereby providing a simple way to model curves without having to resort to more complicated non-linear models. However, the extra flexibility of polynomials compared to smoothing splines can produce undesirable results at the boundaries, while splines can still provide a reasonable fit to the data (10). Project limitations Some limitations of this project are recognised and highlighted in the following points: a) Small number of predictor variables: There were only a small number of predictor variables available for modelling the response variables. Additional information such as date of training session, temperature and humidity, age of player, and sequence number of drill in a training session may have provided more information. For instance, workload metrics may be different in hot and hot humid conditions. Drills performed towards the end of a training session may show lower workload values due to possible fatigue than the same drill carried out in the earlier parts of a session. It may be reasonable to suggest that younger players may work harder in some drills, whilst older, more experienced players may “pace” themselves during the training sessions. b) Limited data for some training drills: There were limited data observations for some of the training drills, and this would negatively hinder appropriate model fitting. For example, the “Speed and Agility” training drill had 46 observations only. c) Random intercept-only: Random intercept-only LMMs were fitted in this project, as it was decided not to use random slope models. It may be reasonable to investigate the use of random slopes in subsequent analysis, as it may be expected that different players may differ in their responses to the different drills. d) Models used in application: Although the GAMM models using cubic regression splines showed the combination of the lowest AICs and RMSE’s, the simpler polynomial LMM models were chosen to be deployed in the R Shiny Application, because of easier coding and quicker run-times. This is not a large limitation though, as Tables 2 to 7 in the result section clearly show that the RMSE of these models is very similar to those of the 28 GAMM models (on training data). However, as previously discussed, the use of polynomial models may result in some peculiar predictions at the boundaries of the selected training duration for some of the training drills. e) Prediction intervals: The application allows the user to select whether prediction intervals should be displayed or not. Unfortunately, coding problems meant that a simpler polynomial model that was missing the interaction term was used when prediction intervals were required (see submitted R code file). Practical applications In order to put the prediction models into practical use, a predictive Training Session Planner was created using R Shiny. Some practical usages of this application are discussed in the following points: a) Optimised football training session planning: The R Shiny Training Session Planner application created for this project (see Appendices D and E) should assist a football coach with their session planning - by simply selecting training drills, number of sets of each drill, and the training drill duration, the user can instantly predict the physiological workload of the individual drills and the full football training session. The application ultimately provides the football coach with an easy-to-use tool to design and implement training sessions in an efficient and safe manner by appropriate use of historical GPS data. b) Optimise player performance and well-being: By establishing the estimated workload for each player before training commences, the football coach has an increased chance of correctly prescribing a safe and effective training session that maximises performance and reduces the chances of possible injury. Such a tool is clearly beneficial for the football player also as an individualised training session designed to make them physiologically prepared for competitive matches has the potential to improve their performance and reduce their risk of injury due to inappropriate training loads. c) Coach educational tool: The Training Session Planner can also be used as a coach education tool. From personal experience, professional coaches rarely understand or acknowledge the physiological load that their prescribed training sessions impose. d) Estimate missing data: The Training Session Planner can also be used to predict missing training data when devices fail to work correctly, or when a player has forgotten to wear the GPS device during a training session. 29 e) An alternative to purchasing GPS equipment: For professional football clubs who cannot afford to purchase GPS sports devices, the Training Session Planner application may be used to help coaches estimate the physiological workload of their football training sessions. f) Use in other team sports: The Training Session Planner is applicable to all sports where annotated GPS training session data are collected (e.g. NFL, rugby, basketball, hockey, Australian Rules football). The application can be modified to use data from all makes of GPS sports devices (e.g. GPSports, Catapult Sports). Future application improvements a) Use of GAMM models: The R Shiny Training Session Planner application should be coded in order to use the GAMM models that were found to give the lowest AIC / RMSE in this project, and prediction intervals calculated from these models. b) Add new data: An option to add new data to the training drill database and then update the prediction models would be a welcome improvement to the application. c) Recommender system: The Training Session Planner application could be improved by including a “recommender system”. For example, to recommend a sequence of training drills that lasts between 75 to 90 minutes, which manifests a total distance workload of between 6 - 7,000m, and a high intensity distance of between 500 and 750m, whilst minimising acceleration and deceleration distances. Final Conclusion In this project, statistical models using GPS data from retrospective training drills were built to predict future training workload in order to optimise the planning of football training sessions. A Training Session Planner can help coaches to effectively use historical GPS data to instantly predict all physiological workload metrics of interest before a training session. In this way, each training session can be planned effectively, and the predicted physiological workload metrics can be updated quickly if there are any last minute changes to the football training session plan. Cautious interpretation of predicted workload values is warranted for training drills with few historical observations. Sprint Distance and Maximum Speed R2 values were low, and any predictions of these metrics in the R Shiny application should also be interpreted carefully. Furthermore, cautious interpretation of training workload predictions is required when training durations are selected that lie at the boundaries of a training drill’s observational duration range. 30 6. References 1. Larsson P. Global Positioning System and Sport-Specific Testing: Sports Med. 2003;33(15):1093–101. 2. Dwyer DB, Gabbett TJ. Global positioning system data analysis: velocity ranges and a new definition of sprinting for field sport athletes. J Strength Cond Res. 2012 Mar;26(3):818–24. 3. Hausler J, Halaki M, Orr R. Application of Global Positioning System and Microsensor Technology in Competitive Rugby League Match-Play: A Systematic Review and Metaanalysis. Sports Med Auckl NZ. 2016 Apr;46(4):559–88. 4. Varley MC, Jaspers A, Helsen WF, Malone JJ. Methodological Considerations When Quantifying High-Intensity Efforts in Team Sport Using Global Positioning System Technology. Int J Sports Physiol Perform. 2017 Sep;12(8):1059–68. 5. White AD, MacFarlane NG. Analysis of international competition and training in men’s field hockey by global positioning system and inertial sensor technology. J Strength Cond Res. 2015 Jan;29(1):137–43. 6. Wisbey B, Montgomery PG, Pyne DB, Rattray B. Quantifying movement demands of AFL football using GPS tracking. J Sci Med Sport. 2010 Sep;13(5):531–6. 7. Ihaka R, Gentleman R. R: a language for data analysis and graphics. J Comput Graph Stat. 1996;5(3):299–314. 8. Pinheiro JC, Bates DM. Linear mixed-effects models: basic concepts and examples. Mix-Eff Models -Plus. 2000;3–56. 9. Neocleous T. Advanced Predictive Models: Linear Mixed Models. Course notes for 2017-18 MSc Data Analytics (online). University of Glasgow. 2017. 10. James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. Vol. 112. Springer; 2013. 11. Wood S, Scheipl F. gamm4: Generalized additive mixed models using mgcv and lme4. R package version 0.2-3. 2014. 12. Gelman A, Hill J. Data analysis using regression and multilevel / hierarchical models. Vol. 1. Cambridge University Press New York, NY, USA; 2012. 13. Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. ArXiv Prepr ArXiv14065823. 2014; 14. Luke SG. Evaluating significance in linear mixed-effects models in R. Behav Res Methods. 2017 Aug;49(4):1494–502. 15. Kuznetsova A, Brockhoff PB, Christensen RHB. Package ‘lmertest.’ R Package Version. 2015;2(0). 16. Knowles JE, Frederick C, Knowles MJE. Package ‘merTools.’ 2016; 31 17. Barton K, Barton MK. Package ‘MuMIn.’ R Package Version. 2019;1(6). 18. Fasiolo M, Nedellec R, Goude Y, Wood SN. Scalable visualization methods for modern generalized additive models. J Comput Graph Stat. 2019;1–9. 19. Sakamoto Y, Ishiguro M, Kitagawa G. Akaike information criterion statistics. Dordr Neth Reidel. 1986;81. 32 Appendix A: Partial Residual Plots (for TDmodel7) 33 34 35 Appendix B: Model fits for all Response Variables 1 Total Distance High Intensity Distance 36 Acceleration Distance Deceleration Distance 37 Sprint Distance Maximum Speed 38 Appendix C: Model fits for all Response Variables 2 (Facets) Total Distance 39 High Intensity Distance 40 Acceleration Distance 41 Deceleration Distance 42 Sprint Distance 43 Maximum Speed 44 Appendix D: Navigating the R Shiny Training Session Planner Application 45 Appendix E: Collection of Training Session Planner Application Screenshots Screenshot of “Session Summary” tab. Squad average workload values are displayed (blue values boxes), as well as each individual player’s total session statistics. Training drills selected Squad Averages for total training session Individual player training session summary (Total +/- 90% Prediction Intervals) 46 Screenshot of the “Session Summary Visuals” tab. The user can choose the workload metrics to visualise the workload of each individual player. Choose the workload metric from the dropdown list Visualise the individual player workloads 47 Screenshot of “Drill 1 Summary” tab. The workloads of all 5 selected training drills can be viewed separately. Workloads of each player for the chosen training drill Choose what drill you want to analyse 48 By selecting the “Exploratory Data Analysis” option in the left-hand sidebar, the user has multiple options to visualise the data set used in this project. Exploratory Data Analysis option in side bar Visualise training drill:duration relationships 49 A PDF of the project report can be viewed and downloaded if required. A link to a knitted R Markdown version of the project report (a different version with more graphs, R code output, and more expansive EDA section) is provided. Project Report option in side bar Link to knitted R Markdown version of project report Downloadable PDF of project report 50