Survey

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
```Evaluation Metrics
Monday, October 25, 2021
12:07 AM
https://www.analyticsvidhya.com/blog/2019/08/11-important-model-evaluation-error-metrics/
https://towardsdatascience.com/confusion-matrix-for-your-multi-class-machine-learning-modelff9aa3bf7826
https://www.analyticsvidhya.com/blog/2021/05/importance-of-cross-validation-are-evaluation-metricsenough/
Confusion Matrix:
High complex model - low bias - high variance
Cross-Validation: How well the model will perform to an unseen dataset
Non-Exhaustive methods
1. Hold out Validation approach:
○ Pros
 This approach is Fully independent of the data.
 This approach only needs to be run once so has lower computational costs.
○ Cons
 The Performance leads to a higher variance if we have a dataset of smaller size.
What is bias?
Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Model with high bias pays very
little attention to the training data and oversimplifies the model. It always leads to high error on training and test data.
What is variance?
Variance is the variability of model prediction for a given data point or a value which tells us spread of our data. Model with high variance pays a lot of
attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data
but has high error rates on test data.
Simple Model > Complex Model
- Prevents Overfitting: A high-dimensional dataset having too many features can sometimes lead to overfitting (model captures both real and
random effects).
- Interpretability: An over-complex model having too many features can be hard to interpret especially when features are correlated with each
other.
- Computational Efficiency: A model trained on a lower-dimensional dataset is computationally efficient (execution of algorithm requires less
computational time).
(TYPE - I ERROR)
(TYPE - II ERROR)
Accuracy = (TP+TN)/(TP+FP+FN+TN)
- Use when dataset has balanced classes
Classification Error or Misclassification rate = (FP+FN)/(TP+TN+FP+FN) or (1-Accuracy)
Precision = TP/(TP+FP)
Sensitivity or Recall or TPR = TP/(TP+FN)
2. k-Fold Cross-Validation:
○ Pros
 Models may not be affected much if there are some outliers present in the dataset.
 It helps us to overcome the problem of variability.
 This method results in a less biased model compared to other methods since every observation
has the chance of appearing in both train and test sets.
 The Best approach if we have a limited amount of input data.
○ Cons
 Imbalanced datasets will impact our model.
 Requires Computation k times as much as to evaluate since training algorithm has to be rerun
from start K times to complete the k folds.
Minimize false positives ? - Precision
Minimize false negatives ? - Recall
Lift Chart:
• Useful for assessing performance in terms of identifying the most important cases
Example - Marketing statement: 20% of prospects who are most likely to respond to a marketing offer
◊ without any model, we will randomly pick 20% of prospects
◊ With lift measures ability of DM model to identify the important class, relative to its average prevalence
Specificity or TNR = TN/(TN+FP)
F1-Score = 2*((Precision * Recall)/(Precision + Recall))
- Where there is no clear distinction between whether Precision is more important or Recall
- Use it in combination with other evaluation metrics which gives us a complete picture of the result.
- There are situations where one would like to give a percentage more importance/weight to either precision
or recall, we can include an adjustable parameter beta
- Fbeta measures the effectiveness of a model with respect to a user who attaches β times as much importance
to recall as precision.
Mean Absolute Error:
The mean of the sum of absolute differences between the predicted and actual values of the continuous target
variable.
MAE = Σ | y_actual – y_predicted | / n
• Group data based on predicted probability scores and create deciles
• Calculate the lift
• Below decile lift table is for no. of customers who would respond
• Our predicted probability should be directly proportional to the true responders probability,
If we target 10% of cases, we can
expect to catch 4.11 times more
responders than a random targeting
of 10% of cases
3. Stratified K-Fold Cross Validation:
○ Pros
 It can improve different models using hyper-parameter tuning.
 Helps us compare models.
 It helps in reducing both Bias and Variance
Mean Squared Error:
There can be instances where large errors are undesirable. The average of the sum of squares of differences
between the predicted and actual values of the continuous target variable.
MSE = Σ (y_actual – y_predicted)2 / n
RMSE:
The metric of the attribute changes when we calculate the error using mean squared error. In order to avoid this,
we use the root of mean squared error.
RMSE = √MSE = √ Σ (y_actual – y_predicted)2 / n
For an RMSE of 90.
Incase of house price prediction, which is a 6 digit number, it’s a good score (Actual: 200,000 ;
Predicted: 200,090)
but for a student’s marks it is a terrible one(Actual: 100 ; Predicted: 10)
R 2:
- Gives the proportion of variation in target variable explained by the linear regression model.
- The greater the r-squared value the better our model’s performance is.
- Its value never decreases no matter the number of variables we add to our regression model. That is, even if
we are adding redundant variables to the data, the value of R-squared does not decrease. It either remains
the same or increases with the addition of new independent variables.
Exhaustive methods
1. Leave one out Cross-Validation:
○ Pros
 Since we make use of all data points, hence the bias will be less.
○ Cons
 Higher execution time since we repeat the cross-validation process n times (where n is the
number of observations in the dataset).
 This leads to higher variation in testing model effectiveness because we test our model against
only one data point. So, our results get highly influenced by the data point. For Example, If the
data point is an outlier, it can lead to a higher variation.
Lift score probability:
• ROC curve is plotted between TPR and FPR
• Optimal threshold = max(TPR-FPR)
• Not sensitive to class distribution - Different class distributions in training, test data will not show different results
Many measures are sensitive to class distributions, like
Precision = TP / (TP + FP)
Accuracy = (TP + TN) / (P + N)
Sensitivity = recall
Specificity = TN / (FP + TN) = 1 –FP rate
• Combination of models A and B can provide performance at any part of the convex hull
• Choose a cutoff for A which gives TP & FP of (tA, fA) and similarly for B to get(tB, fB)
• Then choosing models A and B with probabilities p, (1-p) gives
TP & FP of (ptA+(1-p)tB, pfA+(1-p) fB), which is a point on the line between (tA, fA) , (tB, fB) with different values of p, we get performance at any
point in the line.
• Expected cost of applying a model with TP, FP is pos.(1-TP).c(N,p) + neg.FP.c(Y,n)
pos - # positive examples, c(N,p) - missclassification cost of pred p as N
neg - # negative examples, c(Y,n) - missclassification cost of pred n as P
If TP2-TP1/FP2-FP1 = c(Y,n)*neg / c(N,p)*pos --> both points have same performance
The only difference between r-squared and adjusted r-squared is that the adjusted r-squared value increases only if
the feature added improves the model performance, thus capturing the impact of adding features more