Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Logistic Regression Saed Sayad www.ismartsoft.com 1 Definition Logistic Regression is a type of regression model where the dependent variable (target) has just two values, such as: 0, 1 Y, N F, T www.ismartsoft.com 2 Sample Dataset Months n Business 189 170 166 423 145 60 97 354 99 80 25 118 74 ... Balance $429,916 $240,319 $231,327 $196,105 $193,907 $190,944 $184,333 $152,126 $151,061 $135,885 $119,751 $116,578 $123,864 ... www.ismartsoft.com Default 0 1 0 0 1 0 0 0 1 0 1 1 0 ... 3 Linear Regression (Continuous Dependent Variable) $500,000 $450,000 $400,000 $350,000 Balance $300,000 Y= 47.92X + 13916 $250,000 $200,000 $150,000 $100,000 $50,000 $0 0 100 200 300 400 500 600 Months in Business www.ismartsoft.com 4 Linear Regression (Binary Dependent Variable) 1 Default Y= -0.000X + 0.373 0 0 100 200 300 400 500 600 Months in Business www.ismartsoft.com 5 Linear Regression Model – Binary Target Yi o 1 X i i • If the actual Y is a binary variable then the predicted Y can be less than zero or greater than 1 • If the actual Y is a binary variable then error is not normally distributed. www.ismartsoft.com 6 Linear Regression Model Y 1 0 X www.ismartsoft.com 7 Frequency Table Months in Business Count <50 50-100 100-150 150-200 200-250 250-300 >300 4 12 4 4 4 1 4 www.ismartsoft.com Default Count 0 1 1 2 3 1 4 Default Frequency 0 0.083 0.25 0.5 0.75 1 1 8 Frequency Plot 1 0.8 0.6 Default Probability 0.4 0.2 0 1 2 3 4 5 6 7 Months in Business - Bins www.ismartsoft.com 9 Logistic Function 1 f ( z) 1 ez www.ismartsoft.com 10 Logistic Regression p 1 1 e ( 0 1 X ) The logistic distribution constrains the estimated probabilities to lie between 0 and 1. Maximum Likelihood Estimation is a statistical method for estimating the coefficients of a model. www.ismartsoft.com 11 Logistic Regression Model Linear Model Y 1 Logistic Model 0 X www.ismartsoft.com 12 Maximum Likelihood Estimation (MLE) • MLE maximizes the log likelihood (LL) which reflects how likely it is that the dependent variable will be predicted from the independent variables. • MLE is an iterative algorithm which starts with initial arbitrary numbers of what the coefficients should be. • After this initial function is estimated, the process is repeated until LL does not change significantly. Copyright iSmartsoft Inc. 2008 www.ismartsoft.com 13 Log Likelihood (LL) • Likelihood is the probability that the dependent variable may be predicted from the independent variables. • LL is calculated through iteration, using maximum likelihood estimation (MLE). • Log likelihood is the basis for tests of a logistic model. www.ismartsoft.com 14 Log Likelihood Test (-2LL) • The log likelihood test is a test of the significance of the difference between the likelihood ratio for the baseline model minus the likelihood ratio for a reduced model. • This difference is called "model chi-square“. • Also called Likelihood Ratio test. www.ismartsoft.com 15 Wald Test • A Wald test is used to test the statistical significance of each coefficient () in the model. • A Wald test calculates a Z statistic, which is: Z ̂ SE • This Z value is then squared, yielding a Wald statistic with a chi-square distribution. www.ismartsoft.com 16 Summary • Logistic Regression is a classification method. • It returns the probability that the binary dependent variable may be predicted from the independent variables. • Maximum Likelihood Estimation is a statistical method for estimating the coefficients of the model. • The Likelihood Ratio test is used to test the statistical significance between the full model and the simpler model. • The Wald test is used to test the statistical significance of each coefficient in the model. www.ismartsoft.com 17 Questions? www.ismartsoft.com 18