Optimum (highest) selection strategy given N randomly distributed trial offerings

Two Plus Two Forums Other Topics Science, Math, and Philosophy

Post Reply Subscribe

...

Page 1 of 3

1 2 3

Page 1 of 3

1 2 3

02-24-2012 , 01:20 PM

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

3 problems for fun;

1) I have N items that are (their score) randomly distributed according to some precisely known function (eg some Gaussian say or Poisson or Uniform in an interval etc whatever, that you know its parameters). Those are provided to you one by one and your objective is to pick one before all N are shown. You can select to keep one offered as your final choice instantly but if you ask to see the next the previous ones can no longer be chosen, however you can record all their values in your database for any analysis. Describe how you approach this trying to maximize the score selected.

2) Same N items, same rules but now you only know the kind of distribution not the parameters of it. Find strategy.

3) Same N items but now you know nothing about how they are distributed. Can this be solved even for N that is small (say smaller than 10 or whatever)?

Feel free to fix N to some number if you want to try to simulate or whatever but eventually we will have to come with some general strategy that may be a function of N in general. Notice when N is very large we can afford to wait initially to establish the distribution through the sampling done and then decide later (imagine N real large say thousands or hundreds). So maybe the problem can be solved easier in the large N limit first. But feel free to try for any N you wish. Study any subset of these problems you like.

Quote

02-24-2012 , 03:32 PM

coffee_monster

veteran

Join Date: Feb 2011 Posts: 3,224

Quote:

Originally Posted by masque de Z

3 problems for fun;

First, I'd look up search theory and find those papers.

Question 1 isn't too interesting--it's just backwards induction. If we get to the Nth we'll have to take it. So we can compare the expected result with the N-1st result to see which is larger. We go to the Nth if the expected result from the Nth is bigger than the known N-1st result. Then we can go back to the N-2nd and work out our expected result by selecting the N-1st result and comparing.

Question 2 might just be a variation--there might be something I'd do more, but I'd use the data (results) I'd have up to the current result to estimate the distribution and use them to do the backwards induction I'd do for Question 1. Not sure there's really a better approach, unless you want to get into risk aversion and discount future draws because of the possibility we happened to get a great few first picks.

Note: for our first pick (for both questions 1 and 2), we can estimate the mean but not the SD. But since our estimate of the mean is exactly our first pick, we won't lose anything by picking a second time (on average), thus I'd pick enough times (at least with a two parameter distribution) to be able to estimate the parameters.

Question 3, I'd guess there's some non-parametric estimator you could employ for the backwards induction step.

Quote

02-24-2012 , 03:55 PM

Aaron W.

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 30,132

Quote:

Originally Posted by masque de Z

3) Same N items but now you know nothing about how they are distributed. Can this be solved even for N that is small (say smaller than 10 or whatever)?

I believe N = 3 was a Car Talk puzzler a couple months ago.

Quote

02-26-2012 , 08:34 AM

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

Ok lets take a simple case the P1 N=2 case to try to see the solution of general N in some simple distribution case.

If you have one to see and you are looking at the first x1 and its higher than the avg you keep it. If x1 smaller than the avg though you can go for the random next that will have a higher expectation unless you have utility theory issues forcing your decision (eg some distribution that derives a lot of its avg from low probability big payoff results that implies that the majority of the time you will be getting a number smaller than the one you have now and that value is of material importance to you now moreover the fact its not above avg)

Leaving aside now the utility issues we can claim that in general with n left to see we can drop the current one if its below the avg for sure.

But how about when its higher than the avg? Then the next ones can have an avg that is lower than this seen individually but the probability that at least one of them tops it is often significant to warrant risk taking.

Take the simple uniform distribution in (0,1) say. If you have 0.65 and n=2 left to see do you keep it?

To answer this you may need to convice yourself that the chance to end with something higher if you drop is over 50% with 1 left to choose. That is easy to visualize as idea but its not enitrely the proper way to think. It worked with 1 left to see and this kind of distribution. But with say 2 left its a bit more complex. You see now the chance to select a higher one in the next 2 is not a naive 1-0.65^2=0.578 because you do not get to try all 2 at once. Instead you are given one that say may be 0.57 and then you have 1 left. When that 0.57 say comes the last one is never tried because you cannot risk since the avg is now 50%. So you keep that 0.57 and you have forced yourself to see only 1 not 2!

So the objective is different now. You reject only if the expected value of the alternative stategy is higher and this is not the same with the probability to get a better than current one in the next few runs.

So what is the expectation with 2 left to see?

It is for example whatever the value is if next you get a number over 0.5 (that you then are forced to keep) plus whatever you get when forced to see a 2nd one if the first is below 0.5.

What the expectation of this process then is ;

0.5*(0.75 ie the avg of anything from 0.5 to 1) + 0.5*0.5(the avg if forced to see the second)
=0.375+0.25=0.625

So the expectation of the drop strategy with 2 to see is 0.625 that is worse than the 0.65 you have now so you keep it!

With n=3 left to see it gets more complicated but we now use the threshold for n=2 ie 0.625.

So this means that we keep the first if it has a value higher than 0.625 with 2 left to see ie if its between 0.625 and 1 (the integrated avg now is seen as 0.8125)

So the expectation of dropping is (1-0.625)*0.8125+0.625*(0.625)=0.695

That way we can proceed backwards to any n recursively through;

E[n]=(1-E[n-1])*(1-E[n-1]^2)/2/(1-E[n-1])+E[n-1]*E[n-1]

with E[1]=0.5 for the uniform distribution in (0,1).

I think that solves the uniform distribution any N case. If this checks and all agree lets go to higher and better cases... :-)

Last edited by masque de Z; 02-26-2012 at 08:43 AM.

Quote

02-26-2012 , 02:34 PM

coffee_monster

veteran

Join Date: Feb 2011 Posts: 3,224

Yeah, that's pretty much what I said.

Last edited by coffee_monster; 02-26-2012 at 02:36 PM. Reason: except it was a lot more complicated than what I said...

Quote

02-26-2012 , 08:28 PM

river_tilt

veteran

Join Date: Apr 2006 Posts: 2,073

Is number 3 a standard secretary problem, or is there an aspect I've missed?

http://en.wikipedia.org/wiki/Secretary_problem

Last edited by river_tilt; 02-26-2012 at 08:39 PM.

Quote

02-26-2012 , 08:58 PM

coffee_monster

veteran

Join Date: Feb 2011 Posts: 3,224

Quote:

Originally Posted by river_tilt

Is number 3 a standard secretary problem, or is there an aspect I've missed?

http://en.wikipedia.org/wiki/Secretary_problem

Seems like that's exactly the answer to problem #3.

Quote

02-26-2012 , 09:22 PM

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

Quote:

Originally Posted by river_tilt

Is number 3 a standard secretary problem, or is there an aspect I've missed?

http://en.wikipedia.org/wiki/Secretary_problem

Yes thanks for the link i was vaguely aware of that now that i see it again but i felt its a bit different.

Basically look at this;

"7. The objective is to select the best applicant with the highest possible probability."

But i didnt ask that. I asked to maximize the expectation (score) of a selection not targeting the maximum element with the highest probability. Is it the same? In general it should nt be right?

I will go over the link variations though to see if it can be identified as a case of the solved problem. In any case the result is probably for practical purposes close for most distributions even if not exactly accurate. Am i wrong?

It would be interesting to show however if the approach of trying to maximize the probability of selecting the ultimate maximum is also one that yields the best average score also over other strategies or for what distributions this proves true.

Quote

02-26-2012 , 09:31 PM

Aaron W.

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 30,132

Quote:

Originally Posted by masque de Z

In general, it shouldn't be the same. But also, in general, your problem #3 is not sufficiently well-defined to allow a non-arbitrary answer.

Consider what you're asking for the N=2 case:

Suppose I give you an envelope that contains a real number. You can keep this real number, or exchange it for some other real number in a different envelope. What's your strategy to maximize your expectation?

Quote

02-26-2012 , 09:39 PM

#10

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

Quote:

Originally Posted by Aaron W.

Clearly part of the problem is to investigate the N for which it makes no sense! Its a very well defined project if seen with intention/objective to also find the cases that it has no solution and exclude them as part of the attempt to solve it.

For P3 arbitrary distribution or even known one in form but not parameters ie P2, clearly N=2 is not enough. Furthermore it may be true that even 3 is not enough but i havent considered it in detail yet. It may still prove possible to obtain some basic information from seeing the first 2.

Furthermore notice that part of the problem is the requirement that the list is presented as naturally randomly created/generated while obeying that unknown distribution. We may of course need to go deeper into what a distribution is here as if we allow many parameters we can get really ridiculously complicated and end nowhere even for larger N than 3.

Last edited by masque de Z; 02-26-2012 at 09:51 PM.

Quote

02-26-2012 , 10:11 PM

#11

Aaron W.

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 30,132

Quote:

Originally Posted by masque de Z

Let's try the following:

You have a random number generator that follows a fixed distribution that you do not know. How many values must you observe to determine the distribution "well enough" to make "valid" decisions?

I suspect that for any strategy you devise, there will exist a distribution for which your strategy is sub-optimal.

Quote

02-26-2012 , 10:22 PM

#12

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

Quote:

Originally Posted by Aaron W.

Yes but not immediately clear if we select only among the group of distributions that most physically interesting problems belong to which of course is the real spirit of this all. Otherwise i can always take a histogram created by whatever ridiculous multistep process that obeys almost no typically used law with precise intention to beat a process of selection.

Maybe then one can try the classic secretary problem as well in parallel.

Obviously any say normal distribution will have a nearby in parameters-similar one that fits the initial say 20 data points quite well not as well as the one assumed by the sampling but happens to be the true one used to create them!

These issues need to be investigated for sure and produce conclusions about whether something can be solved or not in general or even restricted sense.

The project is to learn from the problems in general about these issues not to attack them as noninteresting or not sensible without noticing also where they are interesting or very sensible if they have such domains. Clearly some subsets of them do have such domains.

Quote

02-26-2012 , 10:24 PM

#13

TomCowley

Pooh-Bah

Join Date: Sep 2004 Posts: 5,649

P2 has the same induction problem as P3. Without a prior, the answer is arbitrary.

Quote

02-26-2012 , 11:21 PM

#14

coffee_monster

veteran

Join Date: Feb 2011 Posts: 3,224

Quote:

Originally Posted by TomCowley

P2 has the same induction problem as P3. Without a prior, the answer is arbitrary.

Not really--if we know it's a gaussian, for instance, we can start to estimate the parameters once we get two draws. I'd guess the optimal strategy (that I have described in an earlier post) would differ from the 'known parameters' case and be a little more careful about suggesting that an early draw is selected. But in P2, we can start estimating the distribution pretty easily after a few draws.

In P3, we have the issue of not knowing the distribution, so there may be some non-parametric stuff we can do, but not knowing the distribution (as Aaron pointed out) can lead to some crazy stuff...

Quote

02-26-2012 , 11:24 PM

#15

Aaron W.

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 30,132

Quote:

Originally Posted by masque de Z

Yes but not immediately clear if we select only among the group of distributions that most physically interesting problems belong to which of course is the real spirit of this all.

If you want to only talk about the "physically interesting" ones, then you're talking about a very different problem. Knowing "nothing" about the distributions is very, very different from knowing that the distribution is one selected from a predefined collection of "physically interesting" ones.

Quote:

The project is to learn from the problems in general about these issues not to attack them as noninteresting or not sensible without noticing also where they are interesting or very sensible if they have such domains. Clearly some subsets of them do have such domains.

I think you've got it backwards. It seems to me that the project is to learn from specific cases and use that knowledge to attempt to make meaningful generalizations.

It's a problem that is interesting (not in a personal sense, but in an academic sense), but I think you need to talk about it at the level of heuristic solutions because you're not likely to find any meaningful formal ones.

Quote

02-26-2012 , 11:30 PM

#16

TomCowley

Pooh-Bah

Join Date: Sep 2004 Posts: 5,649

Quote:

Originally Posted by coffee_monster

Yeah, but the estimation is still arbitrary. You can say the data is 50x more likely to have come from N(0,1) than N(.3,2), or most likely to have come from N(x,y), but the leap to it being a provable solution to the problem is impossible without a prior. Same for P3, but not for P1 which is well enough defined.

Quote

02-26-2012 , 11:38 PM

#17

Aaron W.

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 30,132

Quote:

Originally Posted by coffee_monster

Being able to calculate a mean and standard deviation is fine, but you can't prove anything about the optimality of this solution without bringing in an extra assumption, namely the distribution of distributions.

Quote

02-27-2012 , 12:44 AM

#18

coffee_monster

veteran

Join Date: Feb 2011 Posts: 3,224

Quote:

Originally Posted by TomCowley

I'm not sure what you mean by 'arbitrary'. If I've got a bunch of data that I assume comes from a normal distribution, then there's well-established estimators of the mean and SD. Those estimates will change as we get more data, and they depend on the data we already have, but it's not really arbitrary (in the sense I'm thinking of it). Maximum Likelihood Estimators, I believe they're called.

So I'm not sure what your issue is--we can estimate the SD and mean for a normal distribution using the MLE, we can see what we just picked, and we can use the estimates of the SD and mean to construct our expected return from setting aside our current pick and continuing the process (through the backward induction that has been described ITT). Yeah, the estimates will change as we make more picks, but I'd think any sort of reasonable algorithm will have to deal with that.

Quote

02-27-2012 , 12:46 AM

#19

coffee_monster

veteran

Join Date: Feb 2011 Posts: 3,224

Quote:

Originally Posted by Aaron W.

Is that for P3? If so, yeah, I agree. That's part of the reason I threw something out about non-parametric estimators. Though honestly to me that's a bit more of a buzzword--I haven't used them (much).

Quote

02-27-2012 , 01:24 AM

#20

Aaron W.

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 30,132

Quote:

Originally Posted by coffee_monster

It's actually for P2 as well. As an example, suppose that you KNEW that the only possible mean for the distribution was mu = 10. Then it wouldn't matter that your estimated mean (based on the data) was 9.7. Or perhaps the actual mean was either 10 or 9.5. Then you would conclude that the mean was more likely to be 9.5 than 10.

All of this is to say that you need to say *SOMETHING* about the possible means in order to draw meaningful and non-arbitrary conclusions. It could be something like knowing it has a constant probability density over a certain interval, or a normal distribution with some mean for the means, or something.

You need that information if you're going to conclude anything about having an optimal process.

Quote

02-27-2012 , 01:39 AM

#21

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

Yes but what if we introduce the idea that we design a process to estimate like coffee_monster is suggesting by using a fraction of the initial data and then develop a strategy that is maximizing the result over all possible distributions one can throw at us that belong in some typical familiar group (say some group of most famous ones). Clearly it is understood that any other than the one we selected as generating distribution could have given the initial data as well but what if we demanded that the person playing the game against us develops a permanent stream of data with a random distribution (randomly chosen among a group of many famous ones and also with randomly selected parameters) each time and we tried to guess it and clearly we then took the average over all his efforts to "mess" with us. Since the distribution the guy is producing is random and out of his control and he doesnt get to select what data to give us to mess with us he cannot trick us always, only in a few items he gets lucky that we guessed the completely wrong one due to "bad testing luck". On average our strategy will do well vs all his possible choices when we try to award it some accumulated historical value only after many such efforts (many repeated N object trials i mean). I doubt that then we will be exploited by some better strategy!

But of course i do recognize that when i say unknown distribution i open the Pandora's box and all hell can break loose and seriously it makes no sense to try to estimate anything without any restriction. But i implicitly assumed that at least in the problems we deal with that such process may have value it would be because we face from the data usual distributions that people have worked with for decades, not something completely complicated and sophisticated that is hopeless. Yes i realize that this is not a proper mathematical formulation but of course the object is to make it so eventually!

So thanks all for posting lets try to learn from this as much as we can.

Quote

02-27-2012 , 02:04 AM

#22

coffee_monster

veteran

Join Date: Feb 2011 Posts: 3,224

Quote:

Originally Posted by Aaron W.

I think that's what the MLE does--assumes the mean is a number, and the SD is >= 0, but allows equal weight on each, and then determines what the most likely mean and SD are.

Well, I'm also assuming iid. But if you throw out the identical part of that, the question becomes impossible ("I'm going to pick a random number between 1 and 10 for the first N-1 numbers, and 100 for the Nth") so I think we have to be allowed to assume that.

Quote

02-27-2012 , 02:07 AM

#23

coffee_monster

veteran

Join Date: Feb 2011 Posts: 3,224

Quote:

Originally Posted by masque de Z

But of course i do recognize that when i say unknown distribution i open the Pandora's box and all hell can break loose and seriously it makes no sense to try to estimate anything without any restriction. But i implicitly assumed that at least in the problems we deal with that such process may have value it would be because we face from the data usual distributions that people have worked with for decades, not something completely complicated and sophisticated that is hopeless. Yes i realize that this is not a proper mathematical formulation but of course the object is to make it so eventually!

So thanks all for posting lets try to learn from this as much as we can.

I think you're looking for the word 'non-pathological'. It's the mathematicians way of handwaving themselves into being able to act (mathematically) like a physicist.

Quote

02-27-2012 , 02:27 AM

#24

Aaron W.

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 30,132

Quote:

Originally Posted by coffee_monster

I think that's what the MLE does--assumes the mean is a number, and the SD is >= 0, but allows equal weight on each, and then determines what the most likely mean and SD are.

I looked into this a little bit, and it appears that this is a pretty close explanation of what MLE does.

Given:
* A set of data
* A choice of probability distributions

MLE is a method to select the parameters of the distribution that maximize the probability that you will attain the given data from the given distribution. It appears that these calculations are very complex, and so it's not at all clear to me how fast they converge (with respect to the number of data points) and how large the errors could be.

I still don't see how you would ever prove anything about the optimality of the process without bringing in information about the distribution of distributions. But, again, this seems like a good start for building a heuristic.

Quote:

Well, I'm also assuming iid. But if you throw out the identical part of that, the question becomes impossible ("I'm going to pick a random number between 1 and 10 for the first N-1 numbers, and 100 for the Nth") so I think we have to be allowed to assume that.

Yeah. I'm pretty sure that's necessary.

Quote

02-27-2012 , 07:50 AM

#25

Vael

Carpal \'Tunnel

Join Date: Feb 2009 Posts: 6,377

What do you mean by optimality of the process, Aaron? You can compare estimators in terms of loss functions and that's afaik how optimality is defined in estimation theory. I don't think we can do better than this.

Quote

Page 1 of 3

First

1 2 3

Last

Post Reply Subscribe

...

Page 1 of 3

First

1 2 3

Last