Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Evaluation Metrics Monday, October 25, 2021 12:07 AM Links: https://www.analyticsvidhya.com/blog/2019/08/11-important-model-evaluation-error-metrics/ https://towardsdatascience.com/confusion-matrix-for-your-multi-class-machine-learning-modelff9aa3bf7826 https://www.analyticsvidhya.com/blog/2021/05/importance-of-cross-validation-are-evaluation-metricsenough/ Confusion Matrix: High complex model - low bias - high variance Cross-Validation: How well the model will perform to an unseen dataset Non-Exhaustive methods 1. Hold out Validation approach: ○ Pros This approach is Fully independent of the data. This approach only needs to be run once so has lower computational costs. ○ Cons The Performance leads to a higher variance if we have a dataset of smaller size. What is bias? Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data. What is variance? Variance is the variability of model prediction for a given data point or a value which tells us spread of our data. Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but has high error rates on test data. Simple Model > Complex Model - Prevents Overfitting: A high-dimensional dataset having too many features can sometimes lead to overfitting (model captures both real and random effects). - Interpretability: An over-complex model having too many features can be hard to interpret especially when features are correlated with each other. - Computational Efficiency: A model trained on a lower-dimensional dataset is computationally efficient (execution of algorithm requires less computational time). (TYPE - I ERROR) (TYPE - II ERROR) Accuracy = (TP+TN)/(TP+FP+FN+TN) - Use when dataset has balanced classes Classification Error or Misclassification rate = (FP+FN)/(TP+TN+FP+FN) or (1-Accuracy) Precision = TP/(TP+FP) Sensitivity or Recall or TPR = TP/(TP+FN) 2. k-Fold Cross-Validation: ○ Pros Models may not be affected much if there are some outliers present in the dataset. It helps us to overcome the problem of variability. This method results in a less biased model compared to other methods since every observation has the chance of appearing in both train and test sets. The Best approach if we have a limited amount of input data. ○ Cons Imbalanced datasets will impact our model. Requires Computation k times as much as to evaluate since training algorithm has to be rerun from start K times to complete the k folds. Minimize false positives ? - Precision Minimize false negatives ? - Recall Lift Chart: • Useful for assessing performance in terms of identifying the most important cases Example - Marketing statement: 20% of prospects who are most likely to respond to a marketing offer ◊ without any model, we will randomly pick 20% of prospects ◊ With lift measures ability of DM model to identify the important class, relative to its average prevalence Specificity or TNR = TN/(TN+FP) F1-Score = 2*((Precision * Recall)/(Precision + Recall)) - Where there is no clear distinction between whether Precision is more important or Recall - Use it in combination with other evaluation metrics which gives us a complete picture of the result. - There are situations where one would like to give a percentage more importance/weight to either precision or recall, we can include an adjustable parameter beta - Fbeta measures the effectiveness of a model with respect to a user who attaches β times as much importance to recall as precision. Mean Absolute Error: The mean of the sum of absolute differences between the predicted and actual values of the continuous target variable. MAE = Σ | y_actual – y_predicted | / n • Group data based on predicted probability scores and create deciles • Calculate the lift • Below decile lift table is for no. of customers who would respond • Our predicted probability should be directly proportional to the true responders probability, If we target 10% of cases, we can expect to catch 4.11 times more responders than a random targeting of 10% of cases 3. Stratified K-Fold Cross Validation: ○ Pros It can improve different models using hyper-parameter tuning. Helps us compare models. It helps in reducing both Bias and Variance Mean Squared Error: There can be instances where large errors are undesirable. The average of the sum of squares of differences between the predicted and actual values of the continuous target variable. MSE = Σ (y_actual – y_predicted)2 / n RMSE: The metric of the attribute changes when we calculate the error using mean squared error. In order to avoid this, we use the root of mean squared error. RMSE = √MSE = √ Σ (y_actual – y_predicted)2 / n For an RMSE of 90. Incase of house price prediction, which is a 6 digit number, it’s a good score (Actual: 200,000 ; Predicted: 200,090) but for a student’s marks it is a terrible one(Actual: 100 ; Predicted: 10) R 2: - Gives the proportion of variation in target variable explained by the linear regression model. - The greater the r-squared value the better our model’s performance is. - Its value never decreases no matter the number of variables we add to our regression model. That is, even if we are adding redundant variables to the data, the value of R-squared does not decrease. It either remains the same or increases with the addition of new independent variables. Exhaustive methods 1. Leave one out Cross-Validation: ○ Pros Since we make use of all data points, hence the bias will be less. ○ Cons Higher execution time since we repeat the cross-validation process n times (where n is the number of observations in the dataset). This leads to higher variation in testing model effectiveness because we test our model against only one data point. So, our results get highly influenced by the data point. For Example, If the data point is an outlier, it can lead to a higher variation. Lift score probability: ROC (Receiver operating characteristic) -AUC: • ROC curve is plotted between TPR and FPR • Optimal threshold = max(TPR-FPR) • Not sensitive to class distribution - Different class distributions in training, test data will not show different results Many measures are sensitive to class distributions, like Precision = TP / (TP + FP) Accuracy = (TP + TN) / (P + N) Sensitivity = recall Specificity = TN / (FP + TN) = 1 –FP rate • Combination of models A and B can provide performance at any part of the convex hull • Choose a cutoff for A which gives TP & FP of (tA, fA) and similarly for B to get(tB, fB) • Then choosing models A and B with probabilities p, (1-p) gives TP & FP of (ptA+(1-p)tB, pfA+(1-p) fB), which is a point on the line between (tA, fA) , (tB, fB) with different values of p, we get performance at any point in the line. • Expected cost of applying a model with TP, FP is pos.(1-TP).c(N,p) + neg.FP.c(Y,n) pos - # positive examples, c(N,p) - missclassification cost of pred p as N neg - # negative examples, c(Y,n) - missclassification cost of pred n as P Adjusted R2: If TP2-TP1/FP2-FP1 = c(Y,n)*neg / c(N,p)*pos --> both points have same performance The only difference between r-squared and adjusted r-squared is that the adjusted r-squared value increases only if the feature added improves the model performance, thus capturing the impact of adding features more adequately. Below is the formula for adjusted r-squared BEST MODEL AT ROC CONVEX HULL AUC Scores : .90-1 = excellent (A) .80-.90 = good (B) .70-.80 = fair (C) .60-.70 = poor (D) .50-.60 = fail (F) ROC Convex Hull