A question on the global average in Matrix Factorization

Post Reply
leon1209
Posts: 6
Joined: Thu May 28, 2020 3:58 am

A question on the global average in Matrix Factorization

Post by leon1209 » Fri Jun 19, 2020 2:06 am

Hi LazyProgrammer,

I am learning the Matrix Factorization lecture from your NLP with Deep Learning in Python course. I have a question that I want to ask you:

Why we use the plus sign on global average bias term to the r_hat(i,j) instead of using the minus sign, in order to make the average of r_hat be 0?

Please allow me to give a specific example to illustrate my confusion.

For example, there are 3 r_hat(i,j) instances: -2, -1, 2. The average is -0.5. If we add -0.5 to those three values, the average of the three will not be zero. But if we subtract -0.5 from those three values, the average will be zero.

My second confusion point is that why we even want to make the average of r_hat be 0? Since the true rating r(i,j) in the equation <J = (r(i,j) - r_hat(i,j))^2> would range from 1 to 5 in real life, what's the point of averaging r_hat around 0?

Look forward to hearing from you. Thanks

lazyprogrammer
Site Admin
Posts: 33
Joined: Sat Jul 28, 2018 3:46 am

Re: A question on the global average in Matrix Factorization

Post by lazyprogrammer » Fri Jun 19, 2020 8:36 pm

Generally speaking machine learning models fit more easily to functions centered around 0.

However, you should follow my rule "machine learning is experimentation, not philosophy".

Trying to use philosophy is very confusing (as you are seeing).

Instead of trying to guess how a computer program behaves, the correct action for you to take is to test both ways.

There's no way you can guess the output of a computer program.

Gensen
Posts: 2
Joined: Fri Jun 19, 2020 5:15 pm

Re: A question on the global average in Matrix Factorization

Post by Gensen » Sat Jun 20, 2020 6:10 am

Thanks.

When I run MF on the analogy test (king queen man woman) for NLP, W performs right on some instances, while (W+U)/2 performs right on other instances on which W doesn't perform right. Sometimes, no word vector performs right. Is it because I haven't find the right hyperparameters/didn't train long enough or it usually happens in NLP analogy test? (I'm new to this field, just want to get a general sense)

In general, among W, (W+U)/2, and concatenate(W,U), is there a best one as the word vector?

Gensen
Posts: 2
Joined: Fri Jun 19, 2020 5:15 pm

Re: A question on the global average in Matrix Factorization

Post by Gensen » Sat Jun 20, 2020 6:12 am

Thanks.

When I run MF on the analogy test (king queen man woman) for NLP, W performs right on some instances, while (W+U)/2 performs right on other instances on which W doesn't perform right. Sometimes, no word vector performs right. Is it because I haven't find the right hyperparameters/didn't train long enough or it usually happens in NLP analogy test? (I'm new to this field, just want to get a general sense)

In general, among W, (W+U)/2, and concatenate(W,U), is there a best one as the word vector?

lazyprogrammer
Site Admin
Posts: 33
Joined: Sat Jul 28, 2018 3:46 am

Re: A question on the global average in Matrix Factorization

Post by lazyprogrammer » Sat Jun 20, 2020 11:45 pm

Thanks for your inquiry.

You'd have to experiment to determine what the best configuration is.

Anecdotally I have found that more data usually leads to better results.

Post Reply

Return to “Natural Language Processing with Deep Learning in Python”