![]() |
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Splitting av file in 2 at specific place based on textpattern | borgeh | Shell Programming and Scripting | 0 | 09-24-2007 08:02 PM |
| insert file 1 at a specific place of file 2 | JCR | Shell Programming and Scripting | 1 | 02-01-2007 11:48 AM |
| how to edit large file in unix | balireddy_77 | Shell Programming and Scripting | 3 | 12-14-2006 07:40 AM |
| Help to edit a large file | jxh461 | Shell Programming and Scripting | 8 | 05-19-2003 05:38 PM |
| how to edit large files using vi | nazri | UNIX for Dummies Questions & Answers | 3 | 06-15-2001 10:18 AM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
Edit a large file in place
Folks,I have a file with 50 million records having 2 columns. I have to do the below: 1. Generate some random numbers of a fixed length. 2. Replace the second column of randomly chosen rows with the random numbers. I tried using a little bit of perl to generate random numbers and sed to replace it manually. The problem I see is that it generates an output with the replaced record with all 50 million records. I'd rather not have the output generated for each row update. I'd like to get the output once all the updates are done .... I was wondering if I could edit the file in place using sed ... I did try to look for this in-place option .. but I dont have the GNU version of SED ... Any thoughts ...? Thanks V |
|
||||
|
Here is the initial move on your requirement:
>>1. Generate some random numbers of a fixed length. i=00000000 echo $RANDOM$i | cut -c 1-8 the above serves to generate random numbers for more than 50 million records;again not sure how frequently a number repeats!! not clear about the second requirement. be specific please... -ilan |
|
||||
|
Hi ilan,
Thanks for taking this up ... I have the first piece figured out ... I can generate a random number using a small perl script that I downloaded of the net ... but I have a problem with the second part ... I'll try to describe it better. I have 50 million records with 2 columns. Both the columns are present in all the records. Step1: Generate a random value (this is the part i figured out above) Step2: Locate a random record among the 50 million Step3: Replace the value in the second column with the value generated in step 1. Step4: Go back to Step1, generate a new value, look for another random record, replace it with this value and so on for about a million times. I want to be able to do this in place since everytime I replace a record using awk, it gives the whole 50 million inclusive of that change as the output and i have redirect the output to another file, rename it to the original and start over again for the next iteration. What I need is a way to edit the file in place in a loop identifying random records and changing the second column a million times. The high level requirement is: Given a file of 50 million records, I have to generate a file that has 50 million records but has 1 million records whose second column varies from that of the first file. Maybe there is an easier way to do this ... But I am stumped right now .... Thanks, V |
|
||||
|
12123|12345678
42142|23442253 52315|32250205 .... .... ... .... .... around 50 million Now I want to at random choose records and change the value of the second column For example if I choose the second record at random. I will change the 2nd column to a random value: 12123|12345678 42142|53988989 52315|32250205 .... .... ... .... .... same operation 1 million times each time choosing a different record at random. |
|
||||
|
I tried the following code in aix,in ksh
code is long but there is no while etc. let say your original file origfile step 1. sed s/"|"/" "/g origfile >tempfile /** if you dont have sed ,you must change "|" with blank with someting */ /after this your original file looks like this 12123 12345678 */ grep -n "^$" tempfile >origfile,rm tempfile /*after this your original file looks like this ; 1 12123 12345678 2 42142 53988989 step 2. /**produce 1 million random numbers and save to the RandNumbersFile**/ step 3. /**produce 1 million random numbers and save to the RandRecordsFile**/ sort -u RandRecordsFile>tempfile mv tempfile RandRecordsFile /*you can produce 1 million numbers but if you sort it unically it can be less than 1 million. you must be sure that every line in this file is unique, the above command arranges this*/ let "NeededLine=1000000-`wc -l RandRecordsFile |awk '{print $1}'`" /*this line shows you how many new records do you need after sort */ counter=0 while [ $counter -lt $NeededLine ] do /**produce random RandomRecord(means random number).I mean you must add your code here **/ grep $RandomRecord RandomRecordsFile >/dev/null if [ $? -ne 0 ] then echo $RandomRecord >>RandomRecordsFile let "counter=$counter+1" fi done sort -u RandomRecordsFile>tempfile paste tempfile RandNumbersFile >RandomRecordsFile rm tempfile /** after this your RandomRecordsFile looks like this ; 1 12345678 27 53988989 first one stands for record num and the second rundom field (orig second field) **/ join -v1 origfile RandomRecordsFile >tempfile /** unmatched lines **/ join -o 1.1,1.2,2.2 origfile RandomRecordsFile >>tempfile /*matched lines */ sort -u tempfile >origfile /*sort on field1 */ /**if you need add these lines cut -f2,f3 origfile >tempfile sed s/" "/"|"/g tempfile>origfile **/ rm tempfile so the code is; /**produce 1 million random numbers and save to the RandNumbersFile**/ /**produce 1 million random numbers and save to the RandRecordsFile**/ cp yourfile origfile sed s/"|"/" "/g origfile >tempfile grep -n "^$" tempfile >origfile sort -u RandRecordsFile>tempfile mv tempfile RandRecordsFile let "NeededLine=1000000-`wc -l RandRecordsFile |awk '{print $1}'`" while [ $counter -lt $NeededLine ] do /**produce random RandomRecord(means random number).I mean you must add your code here **/ grep $RandomRecord RandomRecordsFile >/dev/null if [ $? -ne 0 ] then echo $RandomRecord >>RandomRecordsFile let "counter=$counter+1" fi done sort -u RandomRecordsFile>tempfile paste tempfile RandNumbersFile >RandomRecordsFile join -v1 origfile RandomRecordsFile >tempfile join -o 1.1,1.2,2.2 origfile RandomRecordsFile sort -u tempfile >origfile rm tempfile Last edited by fazliturk; 09-03-2007 at 09:41 AM.. |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|