Quote:
Originally Posted by SenorKeeed
Real life math/physics/whatever question.
We measure the static coefficient of friction of paper as it comes off the paper machine. Let's say we cut five sets of samples across the sheet and measure the COF of each sample, getting five measures of the COF. I'm interested in quantifying the distribution that these sample are picked from. That is, let's say we have data like
a1 a2 a3 a4 a5
b1 b2 b3 b4 b5
c1 c2 c3 c4 c5
d1 d2 d3 d4 d5
Where a, b, c, d are paper manufactured at different times and a1, a2, ... a5 are samples that are tested yielding COF measurements. How can I estimate the standard deviation of the testing procedure?
Taking the standard deviation of the 20 individual measurements is wrong because the process itself will be varying. Can I take the standard deviation of a1, a2,...a5 and the standard deviation of b1, b2,...b5 etc and average the four standard deviations? That also seems wrong.
This is a survey sampling problem. The choice of estimator is going to depend on the process by which you are sampling. You clearly have a two-stage sampling process. First, you sample clusters (a,b,c,d) and then within the clusters you are sampling some portion of them. This leaves some questions about how you are sampling in each stage:
1) I the first stage a simple random sample where every sheet of paper is equally likely to be sampled? Is it some sort of systematic sample where you take a sheet at some fixed interval? Such as taking every 10,000th sheet
2) In the second stage, is the sheet of paper just being chopped up into 5 pieces so the whole thing is being used or is it being chopped up into more samples where only some portion of them are being used?
If you are taking every 10,000th sheet and then using all pieces of it, this is a systematic cluster sample (you are taking sheets systematically and then sampling the whole cluster). If you are taking sheets randomly and then chopping the sheet into many pieces and randomly selecting 5 then it is a 2 stage simple random sample. Given that all of the potential sampling schemes seem pretty straightforward, once you have it fully defined, it shouldn't be too hard to find the right estimator. I don't know these off the top of my head, but I'm sure I have them somewhere in some old lecture notes or a sample survey textbook.