|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
How to randomly select lines from a text file
I have a text file with 1000 lines, I want to randomly select 200 lines from it and print them as output. How do I go about doing that? Thanks!
|
| Sponsored Links | ||
|
|
#2
|
|||
|
|||
|
Your time would be better spent learning awk than asking dozens of questions which could be trivially answered in it. Code:
awk 'NR==FNR { B++; next }
(NR != FNR) && (!Z) {
srand();
if(B < 200) exit(1); # Too few lines
for(N=1; N<=200; )
{
V=sprintf("%d", (rand()*B)+1)+0;
if(!(V in A)) { A[V]=1; N++ }
}
Z=1
} NR in A' inputfile inputfileLast edited by Corona688; 10-25-2012 at 11:34 AM.. Reason: Fixed typos |
| The Following User Says Thank You to Corona688 For This Useful Post: | ||
evelibertine (10-23-2012) | ||
| Sponsored Links | ||
|
|
#3
|
||||
|
||||
|
For the fun of it here's another way that does not use awk although the awk version will be more efficient. This has the overhead of creating the pipeline repeatedly which should be avoided for good practice. Also I believe the ksh RANDOM built-in has a limit of 32767 that must be considered if the file is large. Code:
$ cat x
##
## x nbr_of_lines_wanted filename
##
#!/bin/ksh
iterations=$1
file="$2"
((lines_avail=$(wc -l < "$file")+1))
while (( $iterations > 0 )); do
head -$((${RANDOM} % $lines_avail)) "$file" | tail -1
(( iterations=$iterations - 1 ))
done
exit 0This is actually a good example of how a seemingly simple solution for a small file can end up burning you on performance and system limitations should you need to run it on a much larger file or a system that may see increased load in the future. Typically when you see a long command line or pipeline like this being done a large number of times (especially a user-enterable number of times) it should be a red flag warning that there will most likely be a more efficient way of structuring the program. Last edited by gary_w; 10-23-2012 at 04:11 PM.. |
| The Following User Says Thank You to gary_w For This Useful Post: | ||
evelibertine (10-23-2012) | ||
|
#4
|
||||
|
||||
|
Hi. There are a number of commonly-available utilities to do this. Here is a demonstration of two: Code:
#!/usr/bin/env bash
# @(#) s1 Demonstrate random selection of lines with rl, shuf.
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C rl shuf
pl " Lines selected from:"
cat data0
# Prepare data file from single line of words.
tr ' ' '\n' < data0 > data1
pl " Results from shuf, 1:"
shuf -n 3 data1
pl " Results from shuf, 2:"
shuf -n 3 data1
pl " Results from rl, 1:"
rl -c 3 data1
pl " Results from rl, 2:"
rl -c 3 data1
exit 0producing: Code:
% ./s1 Environment: LC_ALL = C, LANG = C (Versions displayed with local utility "version") OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64 Distribution : Debian GNU/Linux 5.0.8 (lenny) bash GNU bash 3.2.39 rl 0.2.7 shuf (GNU coreutils) 6.10 ----- Lines selected from: foo bar baz qux quux corge grault garble warg fred plugh xyzzy thud ----- Results from shuf, 1: quux thud qux ----- Results from shuf, 2: thud xyzzy quux ----- Results from rl, 1: corge qux grault ----- Results from rl, 2: thud corge warg You may need to install these from your distribution repository. See man pages for details. See also Algorithm::Numerical::Sample - search.cpan.org if a perl module is desirable. Best wishes ... cheers, drl |
| The Following User Says Thank You to drl For This Useful Post: | ||
evelibertine (10-24-2012) | ||
| Sponsored Links | |
|
|
#5
|
||||
|
||||
|
In the order of lines in the file, without all lines in memory: Code:
awk '
NR==FNR { next }
FNR==1{
srand;
n=NR-1
for(i=1; i<=200; i++) {
line=0
while(!line || line in A) line=int(rand*n)+1
A[line]
}
}
FNR in A
' infile infileIn the order of the selection, with all lines in the file in memory.. Code:
awk '
{ R[NR]=$0 }
END{
srand;
n=NR
for(i=1; i<=200; i++) {
line=0
while(!line || line in A) line=int(rand*n)+1
A[line]
print R[line]
}
}
' infile |
| Sponsored Links | |
|
|
#6
|
|||
|
|||
|
Code:
cat file | head -n 500 | tail -n 200 A very simple idea but not for random lines. |
| Sponsored Links | |
|
|
#7
|
|||
|
|||
|
Quote:
Probably no need to fix it since Scrutinizer's first example in post #5 is a correct implementation of the same approach. Regards, Alister ---------- Post updated at 06:28 PM ---------- Previous update was at 06:26 PM ---------- With regard to all of the AWK suggestions, without knowing exactly how the script is to be used, it's possible that all of the recommendations are inadequate. Nearly every awk srand implementation's default seed is the number of seconds since the epoch. Successive or simultaneous runs could yield identical results. May or may not be an issue. We don't have sufficient information to make that determination. Just a head's up for the OP. If it is an issue, more information would be required to determine a robust seed expression. Regards, Alister |
| The Following User Says Thank You to alister For This Useful Post: | ||
Corona688 (10-25-2012) | ||
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Get 20% of lines in File randomly | chercheur857 | Shell Programming and Scripting | 15 | 10-22-2012 03:38 PM |
| randomly shuffle two text files the same way | adrunknarwhal | Shell Programming and Scripting | 3 | 08-31-2011 10:34 PM |
| Select lines in which column have value greater than some percent of total file lines | vaibhavkorde | Shell Programming and Scripting | 6 | 04-21-2011 04:42 AM |
| Randomly appearing control characters in text files | aakashahuja | AIX | 0 | 07-18-2006 05:26 AM |
| how to select a value randomly | norsk hedensk | Shell Programming and Scripting | 1 | 10-28-2003 04:39 PM |
|
|