How to select lines randomly without replacement in UNIX?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to select lines randomly without replacement in UNIX?
# 1  
Old 09-13-2016
How to select lines randomly without replacement in UNIX?

Dear Folks

I have one column of 15000 lines and want to select randomly 5000 of them in five different times without replacement. I am aware that command 'shuf' and 'sort -R' could select randomly those lines but I am not sure how could I avoid the replacement of selection line. Does anyone have a suggestion?
# 2  
Old 09-13-2016
We cant read your mind so provide us with a sample of the input and the desired output...
# 3  
Old 09-13-2016
Say, I have this small input file:
Code:
1
2
3
4
5
6
7
8
9
10

My desired output is to select three numbers but different each time.
Code:
2
4
9

Code:
5
8
1

# 4  
Old 09-13-2016
The linux shuf utility can select random lines without repeats. Run it once and read three lines at a time from it in a loop.

Code:
shuf < inputfile | while read LINE1 && read LINE2 && read LINE3
do
...
done

These 2 Users Gave Thanks to Corona688 For This Post:
# 5  
Old 09-13-2016
Thank you Corona688 for your suggestion. I only present the small example. In my case, I want to randomly choose 5000 lines out of 15000 lines. what should I do for this situation?
# 6  
Old 09-13-2016
Quote:
Originally Posted by sajmar
... In my case, I want to randomly choose 5000 lines out of 15000 lines. what should I do for this situation?
Try to come up with a smart hash function that'd let you choose a different set of lines in each iteration...
# 7  
Old 09-13-2016
Quote:
Originally Posted by sajmar
Thank you Corona688 for your suggestion. I only present the small example. In my case, I want to randomly choose 5000 lines out of 15000 lines. what should I do for this situation?
Code:
shuf < inputfile | split -l 5000 sample.

This will create sample files sample.aa, sample.ab, sample.ac of 5000 randomly-chosen non repeating lines each.

I've used shuf on megabytes of data for the purpose of selecting random samples before. It operates by seeking and should be reasonably efficient.

Last edited by Corona688; 09-13-2016 at 03:36 PM..
These 2 Users Gave Thanks to Corona688 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Randomly create time in UNIX

Hey, How can i create randomly create time N times. Suppose i want to create data for a particualr date 5 times... Mon Jan 19 11:42:50 Mon Jan 19 19:16:40 Mon Jan 19 12:12:33 Mon Jan 19 14:26:27 Mon Jan 19 12:29:53 Mon Jan 19 13:30:31 I want the script to create N times randome... (2 Replies)
Discussion started by: jaituteja
2 Replies

2. Shell Programming and Scripting

Concatenate select lines from multiple files

I have about 6000 files of the following format (three simplified examples shown; actual files have variable numbers of columns, but the same number of lines). I would like to concatenate the ID (*Loc*) and data lines, but not the others, as shown below. The result would be one large file (or... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

3. Shell Programming and Scripting

Select lines where at least x columns above threshold value

I have a file with 20 columns. I'd like to retain only the lines for which the values in at least x columns, looking only at columns 6-20, are above a threshold. For example, I'd like to retain only the lines in the file below that have at least 8 columns (again, looking only at columns 6-20)... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

4. UNIX for Dummies Questions & Answers

How to randomly select lines from a text file

I have a text file with 1000 lines, I want to randomly select 200 lines from it and print them as output. How do I go about doing that? Thanks! (7 Replies)
Discussion started by: evelibertine
7 Replies

5. Shell Programming and Scripting

Get 20% of lines in File randomly

Hello, This is my code: nb_lignes=`wc -l $1 | cut -d " " -f1` for i in $(seq $nb_lignes) do m=`head $1 -n $i | tail -1` //command done Please how can i change it to get Get 20% of lines in File randomly to apply "command" on each line ? 20% or 40% or 60 % (it's a parameter) Thank you. (15 Replies)
Discussion started by: chercheur857
15 Replies

6. Shell Programming and Scripting

select the lines in between some time span

Hi Everyone ! i want to take all the lines from a file that falls in between some date... and every line in a file has a time stamp.. ---some text---- 01/Jan/2010 ---- some other text ---- ---some text---- 10/Jan/2010 ---- some other text ---- ---some text---- 20/Dec/2010 ---- some... (3 Replies)
Discussion started by: me_newbie
3 Replies

7. Shell Programming and Scripting

How to select/delete some lines in shell?

I need to delete half(approx) the file or select half the file by existence of some character My file looks like 1 2 3 4 . . . 50 . . 100I need to select only 50 to rest of the file or needs to delete the file upto 50. Please help me out.. (6 Replies)
Discussion started by: SujeethP
6 Replies

8. Shell Programming and Scripting

Select lines in which column have value greater than some percent of total file lines

i have a file in following format 1 32 3 4 6 4 4 45 1 45 4 61 54 66 4 5 65 51 56 65 1 12 32 85 now here the total number of lines are 8(they vary each time) Now i want to select only those lines in which the values... (6 Replies)
Discussion started by: vaibhavkorde
6 Replies

9. UNIX for Dummies Questions & Answers

How to select lines in unix matches a pattern at a particular position

I have huge file. I want to copy the lines which have first character as 2 or 7, and also which has fist two characters as 90. I need only these records from file. How I can acheive this. Can somebody help me..... (2 Replies)
Discussion started by: cs_banda
2 Replies

10. Shell Programming and Scripting

how to select a value randomly

on my desktop i am using the kde rotating desktop image option. this rotates images randomly every half hour. now, i would like to write an html file which will have an inline frame with some text, maybe system messages, or my friends live journal thati read alot, or unix.com! however, i dont want... (1 Reply)
Discussion started by: norsk hedensk
1 Replies
Login or Register to Ask a Question