The UNIX and Linux Forums  


Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Splitting av file in 2 at specific place based on textpattern borgeh Shell Programming and Scripting 0 09-24-2007 08:02 PM
insert file 1 at a specific place of file 2 JCR Shell Programming and Scripting 1 02-01-2007 11:48 AM
how to edit large file in unix balireddy_77 Shell Programming and Scripting 3 12-14-2006 07:40 AM
Help to edit a large file jxh461 Shell Programming and Scripting 8 05-19-2003 05:38 PM
how to edit large files using vi nazri UNIX for Dummies Questions & Answers 3 06-15-2001 10:18 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 08-29-2007
mvijayv mvijayv is offline
Registered User
  
 

Join Date: Jan 2007
Posts: 16
Edit a large file in place

Folks,
I have a file with 50 million records having 2 columns. I have to do the below:
1. Generate some random numbers of a fixed length.
2. Replace the second column of randomly chosen rows with the random numbers.

I tried using a little bit of perl to generate random numbers and sed to replace it manually. The problem I see is that it generates an output with the replaced record with all 50 million records. I'd rather not have the output generated for each row update. I'd like to get the output once all the updates are done ....
I was wondering if I could edit the file in place using sed ... I did try to look for this in-place option .. but I dont have the GNU version of SED ...

Any thoughts ...?

Thanks
V
  #2 (permalink)  
Old 08-30-2007
ilan ilan is offline
Registered User
  
 

Join Date: Jul 2007
Posts: 101
Here is the initial move on your requirement:
>>1. Generate some random numbers of a fixed length.

i=00000000
echo $RANDOM$i | cut -c 1-8

the above serves to generate random numbers for more than 50 million records;again not sure how frequently a number repeats!!

not clear about the second requirement. be specific please...

-ilan
  #3 (permalink)  
Old 08-30-2007
mvijayv mvijayv is offline
Registered User
  
 

Join Date: Jan 2007
Posts: 16
Hi ilan,
Thanks for taking this up ... I have the first piece figured out ... I can generate a random number using a small perl script that I downloaded of the net ... but I have a problem with the second part ... I'll try to describe it better.

I have 50 million records with 2 columns. Both the columns are present in all the records.

Step1: Generate a random value (this is the part i figured out above)
Step2: Locate a random record among the 50 million
Step3: Replace the value in the second column with the value generated in step 1.
Step4: Go back to Step1, generate a new value, look for another random record, replace it with this value and so on for about a million times.

I want to be able to do this in place since everytime I replace a record using awk, it gives the whole 50 million inclusive of that change as the output and i have redirect the output to another file, rename it to the original and start over again for the next iteration.
What I need is a way to edit the file in place in a loop identifying random records and changing the second column a million times.


The high level requirement is:
Given a file of 50 million records, I have to generate a file that has 50 million records but has 1 million records whose second column varies from that of the first file. Maybe there is an easier way to do this ... But I am stumped right now ....


Thanks,
V
  #4 (permalink)  
Old 08-31-2007
ahmedwaseem2000 ahmedwaseem2000 is offline
Registered User
  
 

Join Date: Aug 2005
Location: Bangalore
Posts: 219
Could you please post some sample data of input and output so that we can be more clear about the requirement.
  #5 (permalink)  
Old 09-03-2007
mvijayv mvijayv is offline
Registered User
  
 

Join Date: Jan 2007
Posts: 16
12123|12345678
42142|23442253
52315|32250205
....
....
...
....
....
around 50 million

Now I want to at random choose records and change the value of the second column

For example if I choose the second record at random. I will change the 2nd column to a random value:

12123|12345678
42142|53988989
52315|32250205
....
....
...
....
....

same operation 1 million times each time choosing a different record at random.
  #6 (permalink)  
Old 09-03-2007
fazliturk fazliturk is offline
Registered User
  
 

Join Date: Aug 2007
Posts: 45
I tried the following code in aix,in ksh
code is long but there is no while etc.
let say your original file origfile

step 1.

sed s/"|"/" "/g origfile >tempfile

/** if you dont have sed ,you must change "|" with blank with someting */
/after this your original file looks like this 12123 12345678 */

grep -n "^$" tempfile >origfile,rm tempfile

/*after this your original file looks like this ;
1 12123 12345678
2 42142 53988989

step 2.
/**produce 1 million random numbers and save to the RandNumbersFile**/

step 3.
/**produce 1 million random numbers and save to the RandRecordsFile**/
sort -u RandRecordsFile>tempfile
mv tempfile RandRecordsFile

/*you can produce 1 million numbers but if you sort it unically it can be less than 1 million. you must be sure that every line in this file is unique, the above command arranges this*/

let "NeededLine=1000000-`wc -l RandRecordsFile |awk '{print $1}'`"

/*this line shows you how many new records do you need after sort */

counter=0
while [ $counter -lt $NeededLine ]
do
/**produce random RandomRecord(means random number).I mean you must add your code here **/
grep $RandomRecord RandomRecordsFile >/dev/null
if [ $? -ne 0 ]
then
echo $RandomRecord >>RandomRecordsFile
let "counter=$counter+1"
fi
done
sort -u RandomRecordsFile>tempfile
paste tempfile RandNumbersFile >RandomRecordsFile
rm tempfile

/** after this your RandomRecordsFile looks like this ;
1 12345678
27 53988989
first one stands for record num and the second rundom field (orig second field) **/

join -v1 origfile RandomRecordsFile >tempfile /** unmatched lines **/
join -o 1.1,1.2,2.2 origfile RandomRecordsFile >>tempfile /*matched lines */
sort -u tempfile >origfile /*sort on field1 */
/**if you need add these lines
cut -f2,f3 origfile >tempfile
sed s/" "/"|"/g tempfile>origfile **/
rm tempfile

so the code is;
/**produce 1 million random numbers and save to the RandNumbersFile**/
/**produce 1 million random numbers and save to the RandRecordsFile**/

cp yourfile origfile
sed s/"|"/" "/g origfile >tempfile
grep -n "^$" tempfile >origfile
sort -u RandRecordsFile>tempfile
mv tempfile RandRecordsFile
let "NeededLine=1000000-`wc -l RandRecordsFile |awk '{print $1}'`"
while [ $counter -lt $NeededLine ]
do
/**produce random RandomRecord(means random number).I mean you must add your code here **/
grep $RandomRecord RandomRecordsFile >/dev/null
if [ $? -ne 0 ]
then
echo $RandomRecord >>RandomRecordsFile
let "counter=$counter+1"
fi
done
sort -u RandomRecordsFile>tempfile
paste tempfile RandNumbersFile >RandomRecordsFile
join -v1 origfile RandomRecordsFile >tempfile
join -o 1.1,1.2,2.2 origfile RandomRecordsFile
sort -u tempfile >origfile
rm tempfile

Last edited by fazliturk; 09-03-2007 at 09:41 AM..
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 05:39 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0