![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| High Level Programming Post questions about C, C++, Java, SQL, and other programming languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality | iBot | UNIX and Linux RSS News | 0 | 06-08-2009 11:30 PM |
| Sampling and Binning- Engineering problem | Needhelp2 | Shell Programming and Scripting | 7 | 09-05-2008 03:11 AM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
||||
|
Uniform sampling
Hi
My goal is to build a C function that perform a uniform sampling. I mean that I have a big file with a lot of data and I'd like to take just some data samples accordly the uniform distribution. Es. file1: a b c d e f g h i l m n o p q An example would be to flip a coin for each letter and select all letters for which the coin showed the head. I have used the srand() system call, but it is not uniform. Can anyone suggest a way to start? Thanks D ---------- Post updated at 08:00 PM ---------- Previous update was at 06:30 PM ---------- Hi i'm trying in this way: Code:
...... #define RAND_MAX 2 srand(time(NULL)); ....... in the loop p=rand()%RAND_MAX If you have a different suggestion let me know thanks D. |
|
|||||
|
Depending on what you need, the srand()/rand() PRNG combination works well. But for a larger number of samples, it starts being predictable and clusters. More advanced generators are the Blum-Blum-Shub generator or Fortuna. If you need real random numbers, you might try a service like random.org or build your own true random number generator.
|
|
||||
|
FWIW -
in terms of statistical sampling practices if you need a mean and standard deviation what you are doing is really overkill, and results in a sample size of 50% of the data. You might just as well mean/std deviation/ANOVA or whatever the whole file. For example, a statically signficant (95% confidence) sample size for the population of the US used in polling: ~1526 persons taken out of 300 million using systematic sampling methods. What you are doing is sort of systematic sampling, yes, but the intent of sampling is not to look at almost everything. |
|
||||
|
ok
I'll check more in deep for differents solutions Thanks D. |
| Sponsored Links | ||
|
|