## Linear regression, L2 regularization

charbsj
Posts: 7
Joined: Sun Jan 20, 2019 1:55 pm

### Linear regression, L2 regularization

I am following the class on “deep learning prerequisite: linear regression with Python” taught by lazyprogrammer. Link: https://stackskills.com/courses/105513/lectures/2503518

I get confused by the L2 regularization theory

It says to add lambda times w^2 to the cost function. This does not make much sense to me.
1) I interpret this as penalizing the answer in a way to reduce the weight of the linear regression. Is my understanding correct?

I see how it works when the outliers have higher y values than what the “good dataset” suggests.

2) But what if the outlier points have y values lower than what the “good data set” suggests. Penalizing for high weights ( or coefficients) would actually be counter productive. Am I right?

3) Or maybe, more realistically, what if the dimensionality of the problem is large, making it impossible to ‘easily’ see what points are outliers?

4) Also, there was no discussion of how to choose lambda. Is a trial and error method used to define lambda?

I guess this whole L2 regularization theory class did not make sense to me.

Thank you

lazyprogrammer
Posts: 14
Joined: Sat Jul 28, 2018 3:46 am

### Re: Linear regression, L2 regularization

charbsj wrote:
Sun Jan 20, 2019 2:01 pm
1) I interpret this as penalizing the answer in a way to reduce the weight of the linear regression. Is my understanding correct?
Not really. You are penalizing large weights.

charbsj wrote:
Sun Jan 20, 2019 2:01 pm
2) But what if the outlier points have y values lower than what the “good data set” suggests. Penalizing for high weights ( or coefficients) would actually be counter productive. Am I right?

3) Or maybe, more realistically, what if the dimensionality of the problem is large, making it impossible to ‘easily’ see what points are outliers?
Your data should be normalized to begin with so you can't doctor your data to put the outliers at zero and the true data somewhere else.

charbsj wrote:
Sun Jan 20, 2019 2:01 pm
4) Also, there was no discussion of how to choose lambda. Is a trial and error method used to define lambda?
Yes, normally you would use methods such as cross-validation and grid search or random search. E.g. https://scikit-learn.org/stable/modules ... rchCV.html

charbsj
Posts: 7
Joined: Sun Jan 20, 2019 1:55 pm

### Re: Linear regression, L2 regularization

Your data should be normalized to begin with so you can't doctor your data to put the outliers at zero and the true data somewhere else
.

Do I do that by doing, for each feature/column, an element by element operation
- substract the feature mean
- divide by the feature standard deviation?

I assume the first column remains a column of one (for the bias term)

lazyprogrammer
Posts: 14
Joined: Sat Jul 28, 2018 3:46 am

### Re: Linear regression, L2 regularization

Right

charbsj
Posts: 7
Joined: Sun Jan 20, 2019 1:55 pm

### Re: Linear regression, L2 regularization

I still can not achieve the same results.
I still fundamentally do not understand how this L2 technique is not just about flattening the slope either

Pragmatically, I have a few questions:
- the video on coding the L2 technique does not go through any normalization of the data. I thought this was needed
- do you only normalize the X of each feature or do you also normalize the Y
- any resources on why the L2 technique is not flattening the curve but help ignore the outliers?

lazyprogrammer
Posts: 14
Joined: Sat Jul 28, 2018 3:46 am

### Re: Linear regression, L2 regularization

- the video on coding the L2 technique does not go through any normalization of the data. I thought this was needed
Generally you use it when needed. E.g. do it if it leads to better results.
- do you only normalize the X of each feature or do you also normalize the Y
Both ways are typical.
- any resources on why the L2 technique is not flattening the curve but help ignore the outliers?
Why do you think these are mutually exclusive?

As mentioned in the course, the regularization penalty represents a prior belief (prior as in prior distribution in Bayes rule).

A prior belief (meaning what you believe the weights to be without knowing anything about the data at all) that the weights are centered around zero seems to be logical. Surely that is better than a prior belief that the weights are centered around, say, negative 1 million.

charbsj
Posts: 7
Joined: Sun Jan 20, 2019 1:55 pm

### Re: Linear regression, L2 regularization

Where I do see the L2 regularization being useful is when the data is overfitted by a polynomial of order higher than it should be. In your example, I try to fit the data with a 4th order polynomial. L2 regularization helps not overfit

But where I do not understand how the L2 regularization works is when the order of the polynomial is what it should be (in your example 1st order) AND when the outliers are low.

Concretely, if I use the code you show in the video, but instead of having the outliers at y+=30, I place them at y-=30, then L2 regularization does not work for me. It makes sense, because L2 regularization, as I understand it, and as a I coded it, “favors” low weights (meaning flatter lines)

charbsj
Posts: 7
Joined: Sun Jan 20, 2019 1:55 pm

### Re: Linear regression, L2 regularization

I have posted a python class for linear regression with 15 different use cases on github (https://github.com/charbsj/Machine_Learning)

It follows your tutorial on linear regression.
I was wondering if you had some feedback on my class

charbsj
Posts: 7
Joined: Sun Jan 20, 2019 1:55 pm

### Re: Linear regression, L2 regularization

I feel a bit weak concerning the topic on probabilistic interpretation of the square error for linear regression
While I could search for it on the internet, I was wondering if you had any recommendations for some resources that could help me get a better grasp of this concept
Thank you

Felipe_Brazil
Posts: 2
Joined: Fri Dec 21, 2018 6:23 pm

### Re: Linear regression, L2 regularization

Hi, charbsj

charbsj wrote:
Thu Jan 24, 2019 1:34 pm
Concretely, if I use the code you show in the video, but instead of having the outliers at y+=30, I place them at y-=30, then L2 regularization does not work for me. It makes sense, because L2 regularization, as I understand it, and as a I coded it, “favors” low weights (meaning flatter lines)
What "doesn't work for me" mean?
Could you post here your L2 regularization code, the "r-squared" and the "sum of the square of the weights" values for both situations?
I tried to find out in your github but i could not.

Maybe I can help you 