Edit a large file in place

08-29-2007

Registered User

24, 0

Join Date: Jan 2007

Last Activity: 31 January 2017, 5:44 AM EST

Posts: 24

Thanks Given: 1

Thanked 0 Times in 0 Posts

Edit a large file in place

Folks,
I have a file with 50 million records having 2 columns. I have to do the below:
1. Generate some random numbers of a fixed length.
2. Replace the second column of randomly chosen rows with the random numbers.

I tried using a little bit of perl to generate random numbers and sed to replace it manually. The problem I see is that it generates an output with the replaced record with all 50 million records. I'd rather not have the output generated for each row update. I'd like to get the output once all the updates are done ....
I was wondering if I could edit the file in place using sed ... I did try to look for this in-place option .. but I dont have the GNU version of SED ...

Any thoughts ...?

Thanks
V

mvijayv

View Public Profile for mvijayv

Find all posts by mvijayv

08-30-2007

Registered User

110, 2

Join Date: Jul 2007

Last Activity: 28 December 2015, 1:11 PM EST

Posts: 110

Thanks Given: 0

Thanked 2 Times in 2 Posts

Here is the initial move on your requirement:
>>1. Generate some random numbers of a fixed length.

i=00000000
echo $RANDOM$i | cut -c 1-8

the above serves to generate random numbers for more than 50 million records;again not sure how frequently a number repeats!!

not clear about the second requirement. be specific please...

-ilan

ilan

View Public Profile for ilan

Find all posts by ilan

08-30-2007

Registered User

24, 0

Join Date: Jan 2007

Last Activity: 31 January 2017, 5:44 AM EST

Posts: 24

Thanks Given: 1

Thanked 0 Times in 0 Posts

Hi ilan,
Thanks for taking this up ... I have the first piece figured out ... I can generate a random number using a small perl script that I downloaded of the net ... but I have a problem with the second part ... I'll try to describe it better.

I have 50 million records with 2 columns. Both the columns are present in all the records.

Step1: Generate a random value (this is the part i figured out above)
Step2: Locate a random record among the 50 million
Step3: Replace the value in the second column with the value generated in step 1.
Step4: Go back to Step1, generate a new value, look for another random record, replace it with this value and so on for about a million times.

I want to be able to do this in place since everytime I replace a record using awk, it gives the whole 50 million inclusive of that change as the output and i have redirect the output to another file, rename it to the original and start over again for the next iteration.
What I need is a way to edit the file in place in a loop identifying random records and changing the second column a million times.

The high level requirement is:
Given a file of 50 million records, I have to generate a file that has 50 million records but has 1 million records whose second column varies from that of the first file. Maybe there is an easier way to do this ... But I am stumped right now ....

Thanks,
V

mvijayv

View Public Profile for mvijayv

Find all posts by mvijayv

08-31-2007

Registered User

306, 2

Join Date: Aug 2005

Last Activity: 16 July 2017, 12:05 PM EDT

Location: Bangalore

Posts: 306

Thanks Given: 10

Thanked 2 Times in 2 Posts

Could you please post some sample data of input and output so that we can be more clear about the requirement.

ahmedwaseem2000

View Public Profile for ahmedwaseem2000

Find all posts by ahmedwaseem2000

09-03-2007

Registered User

24, 0

Join Date: Jan 2007

Last Activity: 31 January 2017, 5:44 AM EST

Posts: 24

Thanks Given: 1

Thanked 0 Times in 0 Posts

12123|12345678
42142|23442253
52315|32250205
....
....
...
....
....
around 50 million

Now I want to at random choose records and change the value of the second column

For example if I choose the second record at random. I will change the 2nd column to a random value:

12123|12345678
42142|53988989
52315|32250205
....
....
...
....
....

same operation 1 million times each time choosing a different record at random.

mvijayv

View Public Profile for mvijayv

Find all posts by mvijayv

09-03-2007

Registered User

45, 0

Join Date: Aug 2007

Last Activity: 7 May 2008, 2:50 AM EDT

Posts: 45

Thanks Given: 0

Thanked 0 Times in 0 Posts

I tried the following code in aix,in ksh
code is long but there is no while etc.
let say your original file origfile

step 1.

sed s/"|"/" "/g origfile >tempfile

/** if you dont have sed ,you must change "|" with blank with someting */
/after this your original file looks like this 12123 12345678 */

grep -n "^$" tempfile >origfile,rm tempfile

/*after this your original file looks like this ;
1 12123 12345678
2 42142 53988989

step 2.
/**produce 1 million random numbers and save to the RandNumbersFile**/

step 3.
/**produce 1 million random numbers and save to the RandRecordsFile**/
sort -u RandRecordsFile>tempfile
mv tempfile RandRecordsFile

/*you can produce 1 million numbers but if you sort it unically it can be less than 1 million. you must be sure that every line in this file is unique, the above command arranges this*/

let "NeededLine=1000000-`wc -l RandRecordsFile |awk '{print $1}'`"

/*this line shows you how many new records do you need after sort */

counter=0
while [ $counter -lt $NeededLine ]
do
/**produce random RandomRecord(means random number).I mean you must add your code here **/
grep $RandomRecord RandomRecordsFile >/dev/null
if [ $? -ne 0 ]
then
echo $RandomRecord >>RandomRecordsFile
let "counter=$counter+1"
fi
done
sort -u RandomRecordsFile>tempfile
paste tempfile RandNumbersFile >RandomRecordsFile
rm tempfile

/** after this your RandomRecordsFile looks like this ;
1 12345678
27 53988989
first one stands for record num and the second rundom field (orig second field) **/

join -v1 origfile RandomRecordsFile >tempfile /** unmatched lines **/
join -o 1.1,1.2,2.2 origfile RandomRecordsFile >>tempfile /*matched lines */
sort -u tempfile >origfile /*sort on field1 */
/**if you need add these lines
cut -f2,f3 origfile >tempfile
sed s/" "/"|"/g tempfile>origfile **/
rm tempfile

so the code is;
/**produce 1 million random numbers and save to the RandNumbersFile**/
/**produce 1 million random numbers and save to the RandRecordsFile**/

cp yourfile origfile
sed s/"|"/" "/g origfile >tempfile
grep -n "^$" tempfile >origfile
sort -u RandRecordsFile>tempfile
mv tempfile RandRecordsFile
let "NeededLine=1000000-`wc -l RandRecordsFile |awk '{print $1}'`"
while [ $counter -lt $NeededLine ]
do
/**produce random RandomRecord(means random number).I mean you must add your code here **/
grep $RandomRecord RandomRecordsFile >/dev/null
if [ $? -ne 0 ]
then
echo $RandomRecord >>RandomRecordsFile
let "counter=$counter+1"
fi
done
sort -u RandomRecordsFile>tempfile
paste tempfile RandNumbersFile >RandomRecordsFile
join -v1 origfile RandomRecordsFile >tempfile
join -o 1.1,1.2,2.2 origfile RandomRecordsFile
sort -u tempfile >origfile
rm tempfile

Last edited by fazliturk; 09-03-2007 at 09:41 AM..

fazliturk

View Public Profile for fazliturk

Find all posts by fazliturk

09-11-2007

Registered User

24, 0

Join Date: Jan 2007

Last Activity: 31 January 2017, 5:44 AM EST

Posts: 24

Thanks Given: 1

Thanked 0 Times in 0 Posts

Thanks Fazliturk ...
I am going to be trying this pretty soon ... Will let you know how it goes ... - thanks again ...

mvijayv

View Public Profile for mvijayv

Find all posts by mvijayv

Shell Programming and Scripting

Edit a large file in place

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search for a file in all directories and place the file in that directory

Discussion started by: ROCK_PLSQL

2. Shell Programming and Scripting

How to edit a large file

Discussion started by: bobby1015

3. Shell Programming and Scripting

How to get awk to edit in place and join all lines in text file

Discussion started by: n00ti

4. Shell Programming and Scripting

Read from file specific place in file using inode

Discussion started by: lastZenMaster

5. Solaris

What is the best way to copy data from place to another place?

Discussion started by: unitipon

6. Shell Programming and Scripting

sed edit in place -i issues

Discussion started by: abacus

7. Shell Programming and Scripting

Scripting the process to edit a large file

Discussion started by: makkar4u

8. Shell Programming and Scripting

how to edit large file in unix

Discussion started by: balireddy_77

9. Shell Programming and Scripting

Help to edit a large file

Discussion started by: jxh461

10. UNIX for Dummies Questions & Answers

how to edit large files using vi

Discussion started by: nazri