Sponsored Content
Top Forums Shell Programming and Scripting How to select lines randomly without replacement in UNIX? Post 302981572 by Don Cragun on Wednesday 14th of September 2016 11:25:12 PM
Old 09-15-2016
Quote:
Originally Posted by sajmar
Thanks to all the folks for their suggestions. I am still not meet my requirement. As I said, I have a file with 15000 lines and I want to select 5000 lines for five times. However, in each of these five times, I want to have different 5000 selected line. In other words, I am looking for five different set of randomly selected 5000 lines from the whole set of 15000.
Actually, your specification has never been clear. First, you wanted 2 3 line output files from a 10 line input file with no duplicates in either of the output files. Then you wanted a single 5000 line file from a 15000 line file. Then you wanted 3 5000 line output files from a 15000 line input file. And, now you want 5 5000 line output files from a 15000 input line file. How do you randomly select 25000 lines from a 15000 line file without replacements?

If you mean that you want 5 5000 files each of which has lines from a 15000 line file with no replacements in any one of the 5 output files, why doesn't:
Code:
shuf < 15000LineFile | head -n 5000 > 5000LineFile

give you what you want (or to get 5 output files):
Code:
for i in 1 2 3 4 5
do	shuf < 15000LineFile | head -n 5000 > 5000LineFile$i
done

And, of course, Corona688's suggestion would have given you 3 5000 line files with no duplicates from your 15000 line file and a second run would give you 3 more 5000 line files to choose from...

But, of course, all of these assume that there are no duplicated lines in 15000LineFile (or if there are duplicates, you don't mind them being duplicated in one of your output files as long as there aren't more than N duplicates in an output file if there are N duplicates in your input file). Is there a chance for duplicated lines in your input file? If so, do those duplicates have to be removed before creating output files?

If we had a clearer specification of how lines in one of the output files are related to lines in other output files and whether or not there could be duplicated lines in the input file (and, if so, how they are to be handled), all of the output files could be created by a single invocation of awk.

Knowing what operating system and shell you're using would also help for several possible script suggestions.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

how to select a value randomly

on my desktop i am using the kde rotating desktop image option. this rotates images randomly every half hour. now, i would like to write an html file which will have an inline frame with some text, maybe system messages, or my friends live journal thati read alot, or unix.com! however, i dont want... (1 Reply)
Discussion started by: norsk hedensk
1 Replies

2. UNIX for Dummies Questions & Answers

How to select lines in unix matches a pattern at a particular position

I have huge file. I want to copy the lines which have first character as 2 or 7, and also which has fist two characters as 90. I need only these records from file. How I can acheive this. Can somebody help me..... (2 Replies)
Discussion started by: cs_banda
2 Replies

3. Shell Programming and Scripting

Select lines in which column have value greater than some percent of total file lines

i have a file in following format 1 32 3 4 6 4 4 45 1 45 4 61 54 66 4 5 65 51 56 65 1 12 32 85 now here the total number of lines are 8(they vary each time) Now i want to select only those lines in which the values... (6 Replies)
Discussion started by: vaibhavkorde
6 Replies

4. Shell Programming and Scripting

How to select/delete some lines in shell?

I need to delete half(approx) the file or select half the file by existence of some character My file looks like 1 2 3 4 . . . 50 . . 100I need to select only 50 to rest of the file or needs to delete the file upto 50. Please help me out.. (6 Replies)
Discussion started by: SujeethP
6 Replies

5. Shell Programming and Scripting

select the lines in between some time span

Hi Everyone ! i want to take all the lines from a file that falls in between some date... and every line in a file has a time stamp.. ---some text---- 01/Jan/2010 ---- some other text ---- ---some text---- 10/Jan/2010 ---- some other text ---- ---some text---- 20/Dec/2010 ---- some... (3 Replies)
Discussion started by: me_newbie
3 Replies

6. Shell Programming and Scripting

Get 20% of lines in File randomly

Hello, This is my code: nb_lignes=`wc -l $1 | cut -d " " -f1` for i in $(seq $nb_lignes) do m=`head $1 -n $i | tail -1` //command done Please how can i change it to get Get 20% of lines in File randomly to apply "command" on each line ? 20% or 40% or 60 % (it's a parameter) Thank you. (15 Replies)
Discussion started by: chercheur857
15 Replies

7. UNIX for Dummies Questions & Answers

How to randomly select lines from a text file

I have a text file with 1000 lines, I want to randomly select 200 lines from it and print them as output. How do I go about doing that? Thanks! (7 Replies)
Discussion started by: evelibertine
7 Replies

8. Shell Programming and Scripting

Select lines where at least x columns above threshold value

I have a file with 20 columns. I'd like to retain only the lines for which the values in at least x columns, looking only at columns 6-20, are above a threshold. For example, I'd like to retain only the lines in the file below that have at least 8 columns (again, looking only at columns 6-20)... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

9. Shell Programming and Scripting

Concatenate select lines from multiple files

I have about 6000 files of the following format (three simplified examples shown; actual files have variable numbers of columns, but the same number of lines). I would like to concatenate the ID (*Loc*) and data lines, but not the others, as shown below. The result would be one large file (or... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

10. Shell Programming and Scripting

Randomly create time in UNIX

Hey, How can i create randomly create time N times. Suppose i want to create data for a particualr date 5 times... Mon Jan 19 11:42:50 Mon Jan 19 19:16:40 Mon Jan 19 12:12:33 Mon Jan 19 14:26:27 Mon Jan 19 12:29:53 Mon Jan 19 13:30:31 I want the script to create N times randome... (2 Replies)
Discussion started by: jaituteja
2 Replies
COMM(1) 						    BSD General Commands Manual 						   COMM(1)

NAME
comm -- select or reject lines common to two files SYNOPSIS
comm [-123i] file1 file2 DESCRIPTION
The comm utility reads file1 and file2, which should be sorted lexically, and produces three text columns as output: lines only in file1; lines only in file2; and lines in both files. The filename ``-'' means the standard input. The following options are available: -1 Suppress printing of column 1. -2 Suppress printing of column 2. -3 Suppress printing of column 3. -i Case insensitive comparison of lines. Each column will have a number of tab characters prepended to it equal to the number of lower numbered columns that are being printed. For example, if column number two is being suppressed, lines printed in column number one will not have any tabs preceding them, and lines printed in column number three will have one. The comm utility assumes that the files are lexically sorted; all characters participate in line comparisons. ENVIRONMENT
The LANG, LC_ALL, LC_COLLATE, and LC_CTYPE environment variables affect the execution of comm as described in environ(7). EXIT STATUS
The comm utility exits 0 on success, and >0 if an error occurs. SEE ALSO
cmp(1), diff(1), sort(1), uniq(1) STANDARDS
The comm utility conforms to IEEE Std 1003.2-1992 (``POSIX.2''). The -i option is an extension to the POSIX standard. HISTORY
A comm command appeared in Version 4 AT&T UNIX. BUGS
Input lines are limited to LINE_MAX (2048) characters in length. BSD
January 26, 2005 BSD
All times are GMT -4. The time now is 02:26 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy