Quote:
Originally Posted by DocOfDan
I have n products on sale and I want to predict which of these will be top seller in a given week. For each of the n products I have 4 (binary) attributes - lets call them A, B, C & D...
My initial approach
==============
For simplicity, if we neglect attributes C & D, so that we have no missing data, I would create a 'rating' for each product, which would be a linear combination of A and B, e.g. R_i = alpha_1 * A_i + alpha_2 * B_i
I would define the probability of product i being the top seller as:
p_i = Exp(R_i)/Sum_i[Exp(R_i)]
For even more simplicity, let us suppose we have only two products, Product 1 and Product 2. Suppose we know B
1, A
2, and B
2. But all we know about A
1 is that it is very likely to be 0. So P(A
1 = 0) = 1 - ε. Let
T = "Product 1 is the top seller."
For us, the odds of T are approximately
Now suppose we receive some information J which implies we were wrong about A
1. In other words, P(A
1 = 1 | J) = 1 - ε. Then our odds of T change to
But according to Bayes' theorem,
Therefore,
Very loosely speaking, we can interpret this as follows. Suppose
α1 > 0. We had a product in our hands (Product 1) and we thought that A
1 was zero. The probability that we would discover we were wrong about that would be higher if the product in our hands was the top seller, than if it was not. In fact, it is roughly exp(
α1) times higher.
But more importantly, the ratio of these probabilities did not depend on B
1. It is in this sense that the attributes have independent effects on the product, and this behavior is built right into the structure of the model.
Your option (a) below does not seem consistent with this behavior, so it does not seem like a viable choice.
Quote:
Originally Posted by DocOfDan
What do I do about (partially) missing data?
Should I
a) Define a new indicator variable E ('is new product'), set C_i and D_i to zero for all new products and redefine R_i = Sum(alpha_1 * A + alpha_2 * B + alpha_3 * C + alpha_4 * D + alpha_5 * E) etc
or
b) take some other approach?