Logistic Regression(LR)

Vijay Anaparthi
2 min readAug 21, 2020

--

Logistic regression is a classification model. Most of the people will train logistic regression on their data set first then they will go for other algorithms.

In LR ground truth is that data has to be linearly separable. If i speak mathematically [ ∑ yiwTxi for i=1 to n ]>0 for as many points as possible. I will explain what is that equation means later.

Lets take simple example that we are predicting review is positive or negative.

Logistic regression by Vijay Anaparthi

Look at above diagram we are drawing hyperplane(pi) that separates positive data points from negative data points.

The equation of hyperplane is wTx+b it looks like line equation y=mx+c. Actually we call line in 2D, plane in 3D and hyperplane in nD.

Step1 :- If [wTxi > 0] then yi = +1 because vector(w) is pointing towards positive data points.

If [wTxi < 0] then yi = -1

Where, yi is output class label either +1(positive) or -1(negative)

Step2 :- I said earlier that [ ∑ yiwTxi for i=1 to n ]>0 for as many points as possible. Now i will explain how it came. Here ’n’ is no.of data points.

First lets take positive point i.e yi=+1 and wTxi>0(positive). So if you multiply both i.e yiwTxi > 0(positive).

Now take negative data point i.e yi=-1 and wTxi<0(negative). So if you multiply both i.e yiwTxi>0(positive)

So that is why for particular data point yiwTxi>0 then that data point is correctly classified otherwise incorrectly classified point.

Step3 :- Now lets apply activation function i.e sigmoid on above equation.

a) Sigmoid function will eliminate outliers problem.

b) Outliers means suppose if you have list like [1, 2, 5,9,876]. In that list 876 is definitely outlier because 876 is very large number compare to remaining numbers in list.

Step4 :- Equation of sigmoid = [σ(x)=1/1+e−x]

Now substitute our yiwTxi in sigmoid function in the place of ‘X’.

i.e w* = [argmax ∑ 1/1+e−yiwTxi for i=1 to n]

Step5 :- We know log(x) is a monotonic function. i.e If ‘x’ increases then log(x) is also increases monotonically.

Now we can apply log function on above equation.

w* = argmax [log[∑ 1/1+e−yiwTxi for i=1 to n]]

w* = argmax -[log[∑ 1+e−yiwTxi for i=1 to n]]

w* = argmin [log[∑ 1+e−yiwTxi for i=1 to n]] this is final equation

Step6:- To reduce over fitting we can add regularization i.e L1 or L2 regularization.

w* = argmin [log[∑ 1+e−yiwTxi for i=1 to n]] + λ||w||2

Here we added L2 regularization.

Logistic regression has some drawbacks i.e data has to be linearly separable otherwise we have to apply some feature engineering techniques.

We can eliminate this problem in SVM model by using kernels.

But i can say that logistic regression is baseline model for classification problems. People first try LR then they will train other algorithms.

Thanks for reading.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Vijay Anaparthi
Vijay Anaparthi

Written by Vijay Anaparthi

Data science/ Machine learning engineer

No responses yet

Write a response