Linear regression, L2 regularization
Linear regression, L2 regularization
I am following the class on “deep learning prerequisite: linear regression with Python” taught by lazyprogrammer. Link: https://stackskills.com/courses/105513/lectures/2503518
I get confused by the L2 regularization theory
It says to add lambda times w^2 to the cost function. This does not make much sense to me.
1) I interpret this as penalizing the answer in a way to reduce the weight of the linear regression. Is my understanding correct?
I see how it works when the outliers have higher y values than what the “good dataset” suggests.
2) But what if the outlier points have y values lower than what the “good data set” suggests. Penalizing for high weights ( or coefficients) would actually be counter productive. Am I right?
3) Or maybe, more realistically, what if the dimensionality of the problem is large, making it impossible to ‘easily’ see what points are outliers?
4) Also, there was no discussion of how to choose lambda. Is a trial and error method used to define lambda?
I guess this whole L2 regularization theory class did not make sense to me.
Thank you
I get confused by the L2 regularization theory
It says to add lambda times w^2 to the cost function. This does not make much sense to me.
1) I interpret this as penalizing the answer in a way to reduce the weight of the linear regression. Is my understanding correct?
I see how it works when the outliers have higher y values than what the “good dataset” suggests.
2) But what if the outlier points have y values lower than what the “good data set” suggests. Penalizing for high weights ( or coefficients) would actually be counter productive. Am I right?
3) Or maybe, more realistically, what if the dimensionality of the problem is large, making it impossible to ‘easily’ see what points are outliers?
4) Also, there was no discussion of how to choose lambda. Is a trial and error method used to define lambda?
I guess this whole L2 regularization theory class did not make sense to me.
Thank you

 Site Admin
 Posts: 8
 Joined: Sat Jul 28, 2018 3:46 am
Re: Linear regression, L2 regularization
Thanks for your questions!
Not really. You are penalizing large weights.
Your data should be normalized to begin with so you can't doctor your data to put the outliers at zero and the true data somewhere else.charbsj wrote: ↑Sun Jan 20, 2019 2:01 pm2) But what if the outlier points have y values lower than what the “good data set” suggests. Penalizing for high weights ( or coefficients) would actually be counter productive. Am I right?
3) Or maybe, more realistically, what if the dimensionality of the problem is large, making it impossible to ‘easily’ see what points are outliers?
Yes, normally you would use methods such as crossvalidation and grid search or random search. E.g. https://scikitlearn.org/stable/modules ... rchCV.html
Re: Linear regression, L2 regularization
.Your data should be normalized to begin with so you can't doctor your data to put the outliers at zero and the true data somewhere else
Do I do that by doing, for each feature/column, an element by element operation
 substract the feature mean
 divide by the feature standard deviation?
I assume the first column remains a column of one (for the bias term)

 Site Admin
 Posts: 8
 Joined: Sat Jul 28, 2018 3:46 am
Re: Linear regression, L2 regularization
I still can not achieve the same results.
I still fundamentally do not understand how this L2 technique is not just about flattening the slope either
Pragmatically, I have a few questions:
 the video on coding the L2 technique does not go through any normalization of the data. I thought this was needed
 do you only normalize the X of each feature or do you also normalize the Y
 any resources on why the L2 technique is not flattening the curve but help ignore the outliers?
I still fundamentally do not understand how this L2 technique is not just about flattening the slope either
Pragmatically, I have a few questions:
 the video on coding the L2 technique does not go through any normalization of the data. I thought this was needed
 do you only normalize the X of each feature or do you also normalize the Y
 any resources on why the L2 technique is not flattening the curve but help ignore the outliers?

 Site Admin
 Posts: 8
 Joined: Sat Jul 28, 2018 3:46 am
Re: Linear regression, L2 regularization
Generally you use it when needed. E.g. do it if it leads to better results. the video on coding the L2 technique does not go through any normalization of the data. I thought this was needed
Both ways are typical. do you only normalize the X of each feature or do you also normalize the Y
Why do you think these are mutually exclusive? any resources on why the L2 technique is not flattening the curve but help ignore the outliers?
As mentioned in the course, the regularization penalty represents a prior belief (prior as in prior distribution in Bayes rule).
A prior belief (meaning what you believe the weights to be without knowing anything about the data at all) that the weights are centered around zero seems to be logical. Surely that is better than a prior belief that the weights are centered around, say, negative 1 million.
Re: Linear regression, L2 regularization
Where I do see the L2 regularization being useful is when the data is overfitted by a polynomial of order higher than it should be. In your example, I try to fit the data with a 4th order polynomial. L2 regularization helps not overfit
But where I do not understand how the L2 regularization works is when the order of the polynomial is what it should be (in your example 1st order) AND when the outliers are low.
Concretely, if I use the code you show in the video, but instead of having the outliers at y+=30, I place them at y=30, then L2 regularization does not work for me. It makes sense, because L2 regularization, as I understand it, and as a I coded it, “favors” low weights (meaning flatter lines)
But where I do not understand how the L2 regularization works is when the order of the polynomial is what it should be (in your example 1st order) AND when the outliers are low.
Concretely, if I use the code you show in the video, but instead of having the outliers at y+=30, I place them at y=30, then L2 regularization does not work for me. It makes sense, because L2 regularization, as I understand it, and as a I coded it, “favors” low weights (meaning flatter lines)
Re: Linear regression, L2 regularization
I have posted a python class for linear regression with 15 different use cases on github (https://github.com/charbsj/Machine_Learning)
It follows your tutorial on linear regression.
I was wondering if you had some feedback on my class
It follows your tutorial on linear regression.
I was wondering if you had some feedback on my class
Re: Linear regression, L2 regularization
I feel a bit weak concerning the topic on probabilistic interpretation of the square error for linear regression
While I could search for it on the internet, I was wondering if you had any recommendations for some resources that could help me get a better grasp of this concept
Thank you
While I could search for it on the internet, I was wondering if you had any recommendations for some resources that could help me get a better grasp of this concept
Thank you