Logistic Regression(LR)

2 min readAug 21, 2020

Logistic regression is a classification model. Most of the people will train logistic regression on their data set first then they will go for other algorithms.

In LR ground truth is that data has to be linearly separable. If i speak mathematically [ ∑ yiwTxi for i=1 to n ]>0 for as many points as possible. I will explain what is that equation means later.

Lets take simple example that we are predicting review is positive or negative.

**Logistic regression by Vijay Anaparthi**

Look at above diagram we are drawing hyperplane(pi) that separates positive data points from negative data points.

The equation of hyperplane is wTx+b it looks like line equation y=mx+c. Actually we call line in 2D, plane in 3D and hyperplane in nD.

Step1 :- If [wTxi > 0] then yi = +1 because vector(w) is pointing towards positive data points.

If [wTxi < 0] then yi = -1

Where, yi is output class label either +1(positive) or -1(negative)

Step2 :- I said earlier that [ ∑ yiwTxi for i=1 to n ]>0 for as many points as possible. Now i will explain how it came. Here ’n’ is no.of data points.

First lets take positive point i.e yi=+1 and wTxi>0(positive). So if you multiply both i.e yiwTxi > 0(positive).

Now take negative data point i.e yi=-1 and wTxi<0(negative). So if you multiply both i.e yiwTxi>0(positive)

So that is why for particular data point yiwTxi>0 then that data point is correctly classified otherwise incorrectly classified point.

Step3 :- Now lets apply activation function i.e sigmoid on above equation.

a) Sigmoid function will eliminate outliers problem.

b) Outliers means suppose if you have list like [1, 2, 5,9,876]. In that list 876 is definitely outlier because 876 is very large number compare to remaining numbers in list.

Step4 :- Equation of sigmoid = [σ(x)=1/1+e−x]

Now substitute our yiwTxi in sigmoid function in the place of ‘X’.

i.e w* = [argmax ∑ 1/1+e−yiwTxi for i=1 to n]

Step5 :- We know log(x) is a monotonic function. i.e If ‘x’ increases then log(x) is also increases monotonically.

Now we can apply log function on above equation.

w* = argmax [log[∑ 1/1+e−yiwTxi for i=1 to n]]

w* = argmax -[log[∑ 1+e−yiwTxi for i=1 to n]]

w* = argmin [log[∑ 1+e−yiwTxi for i=1 to n]] this is final equation

Step6:- To reduce over fitting we can add regularization i.e L1 or L2 regularization.

w* = argmin [log[∑ 1+e−yiwTxi for i=1 to n]] + λ||w||2

Here we added L2 regularization.

Logistic regression has some drawbacks i.e data has to be linearly separable otherwise we have to apply some feature engineering techniques.

We can eliminate this problem in SVM model by using kernels.

But i can say that logistic regression is baseline model for classification problems. People first try LR then they will train other algorithms.

Thanks for reading.

Logistic Regression(LR)

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Vijay Anaparthi

No responses yet