Open Side Menu Go to the Top
Register
Recreating Random Distributions without analysis Recreating Random Distributions without analysis

10-15-2020 , 09:47 AM
Say you observe certain Random values, e.g. in an experiment, and want to re-create that kind of randomness in a simulation of said experiment. What you can do is first find out which distribution your observation obeys to (say Normal, Exponential, Poisson, etc.), and then compute the appropriate parameters (like μ & σ for normal distribution) for that distribution that best fit your observation. Now you can generate random numbers for your simulation that behave very close to the real observations in the experiment.

There are tools that help you find such a distribution and its parametrization; so far so good. But what if you just can't get close enough that way? It's probably possible to use a combination of different distributions to get there, but I am wondering if there is a method which skips the analysis altogether and deals with the problem in sort of a numerical, brute-force manner.

That is, is there an algorithm which upon input of N observed values can generate random values that are (reasonably similar) distributed like the input values, whatever that distribution might be?

Last edited by Morphismus; 10-15-2020 at 10:12 AM.
Recreating Random Distributions without analysis Quote
10-15-2020 , 02:05 PM
Probably not the most elegant or efficient solution, but should give something close, especially if you have a fairly large experimental data set:

1. Determine the range of your experimental data.
2. Divide that range into some number of equal sub intervals. For example, if the data range from 0-100, perhaps you would divide it into 0-5, 5-10, ..., 95-100.
3. Determine the frequency at which experimental data appear in each sub interval. Eg. 0.02 for 0-5, 0.04 for 5-10, etc.
4. Generate a random number between 0 and 1. Use the frequencies from step 3 to determine which sub interval is selected by that number. For instance 0-0.02 would correspond to the 0-5 sub interval. 0.02<x<=0.06 to the 5-10 sub interval, etc.
5. Generate a random number for your simulated data set that falls in the sub interval selected in step 4. Repeat steps 4 and 5 until you generate the desired number of random numbers.

By playing with the size of the sub intervals you should be able to approximate the experimental distribution to a reasonable accuracy level. I have no idea how difficult this would be to implement in practice, but in theory it should work.
Recreating Random Distributions without analysis Quote
10-15-2020 , 03:29 PM
Quote:
Originally Posted by stremba70
Probably not the most elegant or efficient solution, but should give something close, especially if you have a fairly large experimental data set:

1. Determine the range of your experimental data.
2. Divide that range into some number of equal sub intervals. For example, if the data range from 0-100, perhaps you would divide it into 0-5, 5-10, ..., 95-100.
3. Determine the frequency at which experimental data appear in each sub interval. Eg. 0.02 for 0-5, 0.04 for 5-10, etc.
4. Generate a random number between 0 and 1. Use the frequencies from step 3 to determine which sub interval is selected by that number. For instance 0-0.02 would correspond to the 0-5 sub interval. 0.02<x<=0.06 to the 5-10 sub interval, etc.
5. Generate a random number for your simulated data set that falls in the sub interval selected in step 4. Repeat steps 4 and 5 until you generate the desired number of random numbers.

By playing with the size of the sub intervals you should be able to approximate the experimental distribution to a reasonable accuracy level. I have no idea how difficult this would be to implement in practice, but in theory it should work.
Thanks; that's essentially based on the histogram, right? Yeah, I think that should work to a certain extend, although I'm a bit skeptical about the fixed bin widths, but I guess that could be tweaked. The thing I thought there was some standard statistical method for this, but I'm not able to find anything.
Recreating Random Distributions without analysis Quote
10-16-2020 , 06:11 AM
How many data points are we talking about here? Dozens, hundreds, thousands, millions? Do you know the potential set of distribution functions that can be the data coming from or is it completely unknown? Do you start knowing basically nothing? Cant you plot if you have many data points a histogram of sufficiently small bin size and then do a numerical Fourier series fit of the resulting step functions or get some polynomial fit or some polynomial times exponential fit ? Then use the resulting series as your probability distribution function (then integrate etc) to simulate properly a new set of point?

Last edited by masque de Z; 10-16-2020 at 06:19 AM.
Recreating Random Distributions without analysis Quote
10-16-2020 , 08:48 AM
Quote:
Originally Posted by masque de Z
How many data points are we talking about here? Dozens, hundreds, thousands, millions?
thousands to millions (network requests)

Quote:
Originally Posted by masque de Z
Do you know the potential set of distribution functions that can be the data coming from or is it completely unknown? Do you start knowing basically nothing?
It's very close to an exponential distribution, but not quite; it tends to produce more small values.

Quote:
Originally Posted by masque de Z
Cant you plot if you have many data points a histogram of sufficiently small bin size and then do a numerical Fourier series fit of the resulting step functions or get some polynomial fit or some polynomial times exponential fit ? Then use the resulting series as your probability distribution function (then integrate etc) to simulate properly a new set of point?
Isn't that essentially what stremba suggested? The thing is I already came up with something which is similar to what you suggested, and seems to work, but it feels like someone should have come up with that already, as it looks somewhat fundamental. While I have a mathematics degree, Statistics & Probability were never my strong suit, so I hoped the Statistics-savvy 2+2ers here might recognize some standard solution to this problem. Maybe it's just an application not many people need?
Recreating Random Distributions without analysis Quote
10-17-2020 , 02:04 PM
What are you going to be using this random generated stuff for?
Recreating Random Distributions without analysis Quote
10-17-2020 , 03:12 PM
Basically to create variations of previous real situations and then test software against it. Say the software has to handle events that occur essentially randomly and I have data on that of previous field usage; I can then create synthetic events in a simulation to test e.g. how the software can handle n times the load with the same distribution, how it scales etc.
Recreating Random Distributions without analysis Quote
10-17-2020 , 06:56 PM
Quote:
Originally Posted by Morphismus
Basically to create variations of previous real situations and then test software against it. Say the software has to handle events that occur essentially randomly and I have data on that of previous field usage; I can then create synthetic events in a simulation to test e.g. how the software can handle n times the load with the same distribution, how it scales etc.
Ok, cool. My kid tries to break things at work too.
Recreating Random Distributions without analysis Quote
10-22-2020 , 07:05 PM
Why not just use the existing data set as your distribution, as if you're doing a bootstrapping sample?

Or do that plus add some amount of random variation. It can be normally distributed noise (pick an appropriate standard deviation) and you'll still have the same overall distribution.

https://en.wikipedia.org/wiki/Bootstrapping_(statistics)
Recreating Random Distributions without analysis Quote
10-24-2020 , 05:26 AM
Quote:
Originally Posted by Aaron W.
Why not just use the existing data set as your distribution, as if you're doing a bootstrapping sample?

Or do that plus add some amount of random variation. It can be normally distributed noise (pick an appropriate standard deviation) and you'll still have the same overall distribution.

https://en.wikipedia.org/wiki/Bootstrapping_(statistics)
Thanks, that is interesting! I wasn't even aware of resampling...
Recreating Random Distributions without analysis Quote

      
m