The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
.
google unix.com



UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
read and write from a file rinku Shell Programming and Scripting 2 01-11-2008 01:22 AM
How to read and write directory or file contents in c++ ? namrata5 High Level Programming 3 09-28-2007 03:58 PM
sed to read and write to very same file 435 Gavea Shell Programming and Scripting 5 06-29-2006 11:04 PM
popening for read and write szzz High Level Programming 1 11-18-2003 12:05 PM
read, write & STDOUT_FILENO.... M3xican High Level Programming 2 07-17-2002 04:41 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 05-08-2008
sashankkrk sashankkrk is offline
Registered User
  
 

Join Date: Apr 2008
Posts: 2
How to read and write a random row from a file?

Lets say I have a file abc.txt and it has about 35 million rows. I would like to take a sample of 100 random rows from that file for my testing purpose and write it to a file say test.txt.

How do I do this operation?

Thanks,
Sashank
  #2 (permalink)  
Old 05-12-2008
Smiling Dragon's Avatar
Smiling Dragon Smiling Dragon is offline Forum Advisor  
Disorganised User
  
 

Join Date: Nov 2007
Location: New Zealand
Posts: 922
If you need it to be truly random, you'll need a suitable random number generator.

Once you have 100 random (or predefined if that's all you need) numbers, normalise them against the number of lines in your file (I'd suggest you select 100 numbers between 0 and 1, then multiply that by the lines in the file). That will give you a list of 100 lines numbers.
Code:
#!/bin/sh
# make LINES contain a space seperated list of your line numbers
INPUT="abc.txt"
OUTPUT="test.txt"
rm -f $OUTPUT
for line in $LINES
do
  head -$line $INPUT | tail -1 >> $OUTPUT
done
It's clumsy and won't be very quick (especially for lines further into the file) but it'll do the job.

A better solution would be to seek into the file the correct distance and just dd out the required data.
  #3 (permalink)  
Old 05-13-2008
frozentin frozentin is offline
Registered User
  
 

Join Date: May 2008
Location: Vienna, VA + Bombay, India
Posts: 109
Quote:
Originally Posted by Smiling Dragon View Post
If you need it to be truly random, you'll need a suitable random number generator.

Once you have 100 random (or predefined if that's all you need) numbers, normalise them against the number of lines in your file (I'd suggest you select 100 numbers between 0 and 1, then multiply that by the lines in the file). That will give you a list of 100 lines numbers.
Code:
#!/bin/sh
# make LINES contain a space seperated list of your line numbers
INPUT="abc.txt"
OUTPUT="test.txt"
rm -f $OUTPUT
for line in $LINES
do
  head -$line $INPUT | tail -1 >> $OUTPUT
done
It's clumsy and won't be very quick (especially for lines further into the file) but it'll do the job.

A better solution would be to seek into the file the correct distance and just dd out the required data.
Why not use sed in place of the head/tail?

sed -n "$line p" filename >> $OUTOUT
  #4 (permalink)  
Old 05-13-2008
Smiling Dragon's Avatar
Smiling Dragon Smiling Dragon is offline Forum Advisor  
Disorganised User
  
 

Join Date: Nov 2007
Location: New Zealand
Posts: 922
Quote:
Originally Posted by frozentin View Post
Why not use sed in place of the head/tail?
sed -n "$line p" filename >> $OUTOUT
Didn't know sed could do that... Handy I wonder which one uses more cycles as they both would have to parse and reparse the file... I suspect yours will be quicker
  #5 (permalink)  
Old 05-13-2008
frozentin frozentin is offline
Registered User
  
 

Join Date: May 2008
Location: Vienna, VA + Bombay, India
Posts: 109
This was something I picked up from these forums a couple of days back. Didn't know I would be passing on this "knowledge" to others so soon.

About the cycles: yes, I think so.
  #6 (permalink)  
Old 05-13-2008
Perderabo's Avatar
Perderabo Perderabo is offline Forum Staff  
Unix Daemon
  
 

Join Date: Aug 2001
Location: Ashburn, Virginia
Posts: 9,111
Much faster use of sed...
Code:
 sed -n ${line}'{p;q;}' filename
Upon reaching the desired line, sed will print it and then immediately exit. It will not continue uselessly reading the rest of the file.
  #7 (permalink)  
Old 05-13-2008
era era is offline Forum Advisor  
Herder of Useless Cats (On Sabbatical)
  
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,652
You can combine all of them into a single sed script, which will be a lot quicker.

Say, given a list of numbers in increasing order in file1, you can

Code:
sed -e 's/$/p/' file1 | sed  -n -f - bigfile >samples
to read the big file just once, and print the selected line numbers.

(Not all sed implementations understand "-f -" I have been stymied to learn; you need a temporary file then, obviously.)

If you are a bit clever you can also make it quit after printing the last one, to avoid needlessly reading the big input file through to the end. Implementing that is left as an exercise for the astute reader. (-:

Last edited by era; 05-13-2008 at 05:14 AM.. Reason: Oops, forgot sed -n
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 09:37 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0