How to select lines randomly without replacement in UNIX?


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
# 1  
How to select lines randomly without replacement in UNIX?

Dear Folks

I have one column of 15000 lines and want to select randomly 5000 of them in five different times without replacement. I am aware that command 'shuf' and 'sort -R' could select randomly those lines but I am not sure how could I avoid the replacement of selection line. Does anyone have a suggestion?
# 2  
We cant read your mind so provide us with a sample of the input and the desired output...
# 3  
Say, I have this small input file:
Code:
1
2
3
4
5
6
7
8
9
10

My desired output is to select three numbers but different each time.
Code:
2
4
9

Code:
5
8
1

# 4  
The linux shuf utility can select random lines without repeats. Run it once and read three lines at a time from it in a loop.

Code:
shuf < inputfile | while read LINE1 && read LINE2 && read LINE3
do
...
done

These 2 Users Gave Thanks to Corona688 For This Post:
# 5  
Thank you Corona688 for your suggestion. I only present the small example. In my case, I want to randomly choose 5000 lines out of 15000 lines. what should I do for this situation?
# 6  
Quote:
Originally Posted by sajmar
... In my case, I want to randomly choose 5000 lines out of 15000 lines. what should I do for this situation?
Try to come up with a smart hash function that'd let you choose a different set of lines in each iteration...
# 7  
Quote:
Originally Posted by sajmar
Thank you Corona688 for your suggestion. I only present the small example. In my case, I want to randomly choose 5000 lines out of 15000 lines. what should I do for this situation?
Code:
shuf < inputfile | split -l 5000 sample.

This will create sample files sample.aa, sample.ab, sample.ac of 5000 randomly-chosen non repeating lines each.

I've used shuf on megabytes of data for the purpose of selecting random samples before. It operates by seeking and should be reasonably efficient.

Last edited by Corona688; 09-13-2016 at 04:36 PM..
These 2 Users Gave Thanks to Corona688 For This Post:
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #870
Difficulty: Medium
Lisp introduced the concept of automatic garbage collection.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Select lines based on character length

Hi, I've got a file like this: 22 22:35645163:T:<CN0>:0 0 35645163 T <CN0> 22 rs140738445:20902439:TTTTTTTG:T 0 20902439 T TTTTTTTG 22 rs149602065:40537763:TTTTTTG:T 0 40537763 T TTTTTTG 22 rs71670155:50538408:TTTTTTG:T 0 50538408 T TTTTTTG... (3 Replies)
Discussion started by: zajtat
3 Replies

2. Shell Programming and Scripting

Randomly create time in UNIX

Hey, How can i create randomly create time N times. Suppose i want to create data for a particualr date 5 times... Mon Jan 19 11:42:50 Mon Jan 19 19:16:40 Mon Jan 19 12:12:33 Mon Jan 19 14:26:27 Mon Jan 19 12:29:53 Mon Jan 19 13:30:31 I want the script to create N times randome... (2 Replies)
Discussion started by: jaituteja
2 Replies

3. Shell Programming and Scripting

Select lines where at least x columns above threshold value

I have a file with 20 columns. I'd like to retain only the lines for which the values in at least x columns, looking only at columns 6-20, are above a threshold. For example, I'd like to retain only the lines in the file below that have at least 8 columns (again, looking only at columns 6-20)... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

4. UNIX for Dummies Questions & Answers

How to randomly select lines from a text file

I have a text file with 1000 lines, I want to randomly select 200 lines from it and print them as output. How do I go about doing that? Thanks! (7 Replies)
Discussion started by: evelibertine
7 Replies

5. Shell Programming and Scripting

Get 20% of lines in File randomly

Hello, This is my code: nb_lignes=`wc -l $1 | cut -d " " -f1` for i in $(seq $nb_lignes) do m=`head $1 -n $i | tail -1` //command done Please how can i change it to get Get 20% of lines in File randomly to apply "command" on each line ? 20% or 40% or 60 % (it's a parameter) Thank you. (15 Replies)
Discussion started by: chercheur857
15 Replies

6. Shell Programming and Scripting

select the lines in between some time span

Hi Everyone ! i want to take all the lines from a file that falls in between some date... and every line in a file has a time stamp.. ---some text---- 01/Jan/2010 ---- some other text ---- ---some text---- 10/Jan/2010 ---- some other text ---- ---some text---- 20/Dec/2010 ---- some... (3 Replies)
Discussion started by: me_newbie
3 Replies

7. Shell Programming and Scripting

How to select/delete some lines in shell?

I need to delete half(approx) the file or select half the file by existence of some character My file looks like 1 2 3 4 . . . 50 . . 100I need to select only 50 to rest of the file or needs to delete the file upto 50. Please help me out.. (6 Replies)
Discussion started by: SujeethP
6 Replies

8. Shell Programming and Scripting

Select lines in which column have value greater than some percent of total file lines

i have a file in following format 1 32 3 4 6 4 4 45 1 45 4 61 54 66 4 5 65 51 56 65 1 12 32 85 now here the total number of lines are 8(they vary each time) Now i want to select only those lines in which the values... (6 Replies)
Discussion started by: vaibhavkorde
6 Replies

9. UNIX for Dummies Questions & Answers

How to select lines in unix matches a pattern at a particular position

I have huge file. I want to copy the lines which have first character as 2 or 7, and also which has fist two characters as 90. I need only these records from file. How I can acheive this. Can somebody help me..... (2 Replies)
Discussion started by: cs_banda
2 Replies

10. Shell Programming and Scripting

how to select a value randomly

on my desktop i am using the kde rotating desktop image option. this rotates images randomly every half hour. now, i would like to write an html file which will have an inline frame with some text, maybe system messages, or my friends live journal thati read alot, or unix.com! however, i dont want... (1 Reply)
Discussion started by: norsk hedensk
1 Replies

Featured Tech Videos