Edit a large file in place


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Edit a large file in place
# 1  
Old 08-29-2007
Edit a large file in place

SmilieFolks,
I have a file with 50 million records having 2 columns. I have to do the below:
1. Generate some random numbers of a fixed length.
2. Replace the second column of randomly chosen rows with the random numbers.

I tried using a little bit of perl to generate random numbers and sed to replace it manually. The problem I see is that it generates an output with the replaced record with all 50 million records. I'd rather not have the output generated for each row update. I'd like to get the output once all the updates are done ....
I was wondering if I could edit the file in place using sed ... I did try to look for this in-place option .. but I dont have the GNU version of SED ...

Any thoughts ...?

Thanks
V
# 2  
Old 08-30-2007
Here is the initial move on your requirement:
>>1. Generate some random numbers of a fixed length.

i=00000000
echo $RANDOM$i | cut -c 1-8

the above serves to generate random numbers for more than 50 million records;again not sure how frequently a number repeats!!

not clear about the second requirement. be specific please...

-ilan
# 3  
Old 08-30-2007
Hi ilan,
Thanks for taking this up ... I have the first piece figured out ... I can generate a random number using a small perl script that I downloaded of the net ... but I have a problem with the second part ... I'll try to describe it better.

I have 50 million records with 2 columns. Both the columns are present in all the records.

Step1: Generate a random value (this is the part i figured out above)
Step2: Locate a random record among the 50 million
Step3: Replace the value in the second column with the value generated in step 1.
Step4: Go back to Step1, generate a new value, look for another random record, replace it with this value and so on for about a million times.

I want to be able to do this in place since everytime I replace a record using awk, it gives the whole 50 million inclusive of that change as the output and i have redirect the output to another file, rename it to the original and start over again for the next iteration.
What I need is a way to edit the file in place in a loop identifying random records and changing the second column a million times.


The high level requirement is:
Given a file of 50 million records, I have to generate a file that has 50 million records but has 1 million records whose second column varies from that of the first file. Maybe there is an easier way to do this ... But I am stumped right now ....


Thanks,
V
# 4  
Old 08-31-2007
Could you please post some sample data of input and output so that we can be more clear about the requirement.
# 5  
Old 09-03-2007
12123|12345678
42142|23442253
52315|32250205
....
....
...
....
....
around 50 million

Now I want to at random choose records and change the value of the second column

For example if I choose the second record at random. I will change the 2nd column to a random value:

12123|12345678
42142|53988989
52315|32250205
....
....
...
....
....

same operation 1 million times each time choosing a different record at random.
# 6  
Old 09-03-2007
I tried the following code in aix,in ksh
code is long but there is no while etc.
let say your original file origfile

step 1.

sed s/"|"/" "/g origfile >tempfile

/** if you dont have sed ,you must change "|" with blank with someting */
/after this your original file looks like this 12123 12345678 */

grep -n "^$" tempfile >origfile,rm tempfile

/*after this your original file looks like this ;
1 12123 12345678
2 42142 53988989

step 2.
/**produce 1 million random numbers and save to the RandNumbersFile**/

step 3.
/**produce 1 million random numbers and save to the RandRecordsFile**/
sort -u RandRecordsFile>tempfile
mv tempfile RandRecordsFile

/*you can produce 1 million numbers but if you sort it unically it can be less than 1 million. you must be sure that every line in this file is unique, the above command arranges this*/

let "NeededLine=1000000-`wc -l RandRecordsFile |awk '{print $1}'`"

/*this line shows you how many new records do you need after sort */

counter=0
while [ $counter -lt $NeededLine ]
do
/**produce random RandomRecord(means random number).I mean you must add your code here **/
grep $RandomRecord RandomRecordsFile >/dev/null
if [ $? -ne 0 ]
then
echo $RandomRecord >>RandomRecordsFile
let "counter=$counter+1"
fi
done
sort -u RandomRecordsFile>tempfile
paste tempfile RandNumbersFile >RandomRecordsFile
rm tempfile

/** after this your RandomRecordsFile looks like this ;
1 12345678
27 53988989
first one stands for record num and the second rundom field (orig second field) **/

join -v1 origfile RandomRecordsFile >tempfile /** unmatched lines **/
join -o 1.1,1.2,2.2 origfile RandomRecordsFile >>tempfile /*matched lines */
sort -u tempfile >origfile /*sort on field1 */
/**if you need add these lines
cut -f2,f3 origfile >tempfile
sed s/" "/"|"/g tempfile>origfile **/
rm tempfile

so the code is;
/**produce 1 million random numbers and save to the RandNumbersFile**/
/**produce 1 million random numbers and save to the RandRecordsFile**/

cp yourfile origfile
sed s/"|"/" "/g origfile >tempfile
grep -n "^$" tempfile >origfile
sort -u RandRecordsFile>tempfile
mv tempfile RandRecordsFile
let "NeededLine=1000000-`wc -l RandRecordsFile |awk '{print $1}'`"
while [ $counter -lt $NeededLine ]
do
/**produce random RandomRecord(means random number).I mean you must add your code here **/
grep $RandomRecord RandomRecordsFile >/dev/null
if [ $? -ne 0 ]
then
echo $RandomRecord >>RandomRecordsFile
let "counter=$counter+1"
fi
done
sort -u RandomRecordsFile>tempfile
paste tempfile RandNumbersFile >RandomRecordsFile
join -v1 origfile RandomRecordsFile >tempfile
join -o 1.1,1.2,2.2 origfile RandomRecordsFile
sort -u tempfile >origfile
rm tempfile

Last edited by fazliturk; 09-03-2007 at 09:41 AM..
# 7  
Old 09-11-2007
Thanks Fazliturk ...
I am going to be trying this pretty soon ... Will let you know how it goes ... - thanks again ...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search for a file in all directories and place the file in that directory

Hi All, Daily I am getting the updated file. I have to search for this file in all directories and sub directories. If the file existed in a particular directory then move this updated file to that particular directory. If the file is not existed in any of the directories then place this... (4 Replies)
Discussion started by: ROCK_PLSQL
4 Replies

2. Shell Programming and Scripting

How to edit a large file

Whenever I am trying to edit a file in unix with vi editor, I am getting the following error: <data> :Tmp file too large Is there any way that I can edit the file other than vi. Any help is really appreciated. Thanks (10 Replies)
Discussion started by: bobby1015
10 Replies

3. Shell Programming and Scripting

How to get awk to edit in place and join all lines in text file

Hi, I lack the utter fundamentals on how to craft an awk script. I have hundreds of text files that were mangled by .doc format so all the lines are broken up so I need to join all of the lines of text into a single line. Normally I use vim command "ggVGJ" to join all lines but with so many... (3 Replies)
Discussion started by: n00ti
3 Replies

4. Shell Programming and Scripting

Read from file specific place in file using inode

Hello, I am using tcsh on AIX. I would like to write a script that does the following: 1. given an inode, how do I find exactly the name of the file? I know I could do this using ls -i | grep <inode> but it returns: <inode> <filename>. I need some string manipulation or something to... (1 Reply)
Discussion started by: lastZenMaster
1 Replies

5. Solaris

What is the best way to copy data from place to another place?

Dear Gurus, I need you to advice or suggestion about the best solution to copy data around 200-300G from serverA(location A) to serverB(location B). Normally, I will share folder and then copy but it takes too long time(about 2 days). Do you have any suggestion or which way should be... (9 Replies)
Discussion started by: unitipon
9 Replies

6. Shell Programming and Scripting

sed edit in place -i issues

Hello, I am attempting to create a command that I can eventually put into a loop so I can edit 1file on many servers. I would like to edit the file in place with sed -i. If not I will take any suggestions on how to use a temp file. I need to remove a email address from the configuration file... (4 Replies)
Discussion started by: abacus
4 Replies

7. Shell Programming and Scripting

Scripting the process to edit a large file

Hi, I need to make a script to edit a file. File is a large file in below format Version: 2008120101 ;$INCLUDE ./abc/xyz/Delhi ;$INCLUDE ./abc/xyz/London $INCLUDE ./abc/xyz/New York First line in the file is version number which is in year,month,date and serial number format. Each... (5 Replies)
Discussion started by: makkar4u
5 Replies

8. Shell Programming and Scripting

how to edit large file in unix

hi All, Plz let me know how to edit a file with 2000000 records. each record contains with 40 field seperated by |. i want modify 455487 record, but i am uable to edit this large file using vi editor in unix. plz let me know how to modify this file. Thanks in advance. -Bali Reddy (3 Replies)
Discussion started by: balireddy_77
3 Replies

9. Shell Programming and Scripting

Help to edit a large file

I am trying to edit a file that has 33k+ records. In this file I need to edit each record that has a 'Y' in the 107th position and change the 10 fields before the 'Y' to blanks. Not all records have a 'Y' in the 107th field. ex: ... (8 Replies)
Discussion started by: jxh461
8 Replies

10. UNIX for Dummies Questions & Answers

how to edit large files using vi

How to edit large file using vi where you can't increase /usr/var/tmp anymore? (3 Replies)
Discussion started by: nazri
3 Replies
Login or Register to Ask a Question