Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).
Hi Experts,
I had to edit (a particular value) in header line of a very huge file so for that i wanted to search & replace a particular value on a file which was of 24 GB in Size. I managed to do it but it took long time to complete. Can anyone please tell me how can we do it in a optimised way.
Thanks in advance.
Manish
Steps which i followed:
1. head -1 orignal_file > temp
2. sed -n '2,$p' original_file >> temp
3. mv temp original_file
Afaik, in the general case no. But if your new first line has the same number of bytes or shorter and you can pad it with spaces then I believe you can do it quick with low level programming - open for read/write, read some bytes (512 for example) in a buffer, change them, rewind, and write the buffer back.
There is no fundamental operation for inserting or deleting data in the middle of a file. You have to rewrite the entire file after the edit.
A 24 gigabyte file in 11 minutes is 37 megabytes per second, which is actually a pretty impressive transfer rate! It's probably maxed out your disk or bus speed now, changing the program won't help significantly. It might help to write the output to a different disk than you're reading from.
If you could use yazu's suggestion of always keeping the string the same length, so the data afterwards doesn't need to be rewritten, that would let the edit happen in a fraction of a second...
While doing this we are excatly searching & replacing 8 character like 20110901 to 20110902. And we were monitoring the performance of the server which was very good. It didn't swaped out on memory. Still it took so much time .. rite.. i think on Linux if its takes 11 mins which is still more. Please correct me if I am wrong.
While doing this we are excatly searching & replacing 8 character like 20110901 to 20110902.
Could you show us the first few lines of the file, and the data you wish replaced? If the data is always the same length and always in the same place, you can use dd to write it in...
---------- Post updated at 11:42 AM ---------- Previous update was at 11:37 AM ----------
An example:
The 'bs=1' tells it to work on a sector size of 1 byte, which lets us seek seek exactly 65 characters into the file with seek=65. The conv=notrunc is important, it tells dd not to replace the file but to just overwrite data that's already there.
---------- Post updated at 12:06 PM ---------- Previous update was at 11:42 AM ----------
Another method needing BASH 3.0 or newer:
Both methods are able to edit early lines in the file as long as their length doesn't change, without having to read or write data afterwards at all.
The DD version would be more reliable and portable if you always know where the data to replace is.
---------- Post updated at 12:27 PM ---------- Previous update was at 12:06 PM ----------
Another thing you could do is just keep the header always separate from the huge file. When you need to feed it into something, use sed or awk or whatever to get the modified header, and cat out the rest of the file. (one of the rare useful uses of cat.)
Last edited by Corona688; 09-02-2011 at 04:22 PM..
Hi ,
I would like to replace new line characters(\n) in a huge file of about 2 million records . I tried this one (:%s/\n//g) but it's hanging there and no result. Does this command do not work if the file is big. Please let me know if you have any other options
Regards
Raj (1 Reply)
I have a environment property file which contains:
Input file:
value1 = url1
value2 = url2
value3 = url3 and so on.
I need to search all *.xml files under directory for value1 and replace it with url1.
Same thing I have to do for all values mentioned in input file. I need script in unix bash... (7 Replies)
Hello,
I'm trying the solve the following problem.
I have a file which I intend to use as a csv called master.csv
The columns are separated by commas.
I want to change the text on a specific row in either column 3,4,5 or 6 from xxx to yyy depending upon if column 1 matches a specified pattern.... (3 Replies)
Hello guys,
I need your help for a specific sed command that would search for a multi line pattern and if found, would replace it by another multi line pattern.
For instance, here is the input:
<RefNickName>abcd</RefNickName>
<NickName>efgh</NickName>
<Customize>
... (0 Replies)
Hello
I need to search for a mult-line strngs(with spaces in between and qoted) in a file1 and replace that text with Fixed string globally in file1. The strng to search for is in file2.
The file is big with some 20K records. so speed and effciency is required
file1: (where srch & rplc... (0 Replies)
I have file which contains around 5000 lines.
The lines are fixed legth but having no delimiter.Each line line contains nearly 3000 characters.
I want to delete the lines
a> if it starts with 1 and if 576th postion is a digit i,e 0-9
or
b> if it starts with 0 or 9(i,e header and footer)
... (4 Replies)
Hi,
I have a folder which contains multiple config.xml files and one input file, Please see the below format.
Config Files format looks like :-
Code:
<application name="SAMPLE-ARCHIVE">
<NVPairs name="Global Variables">
<NameValuePair>
... (0 Replies)
Hello,
I have a file and in that, I want to search for a aprticular word and then replace another word in the same line with something else.
Example: In file abc.txt, there is a line
<host oa_var="s_hostname">test</host>
I want to search with s_hostname text and then replace test with... (2 Replies)
i am very new to UNIX
plz help me in this scenario
i have two text files as below
file1.txt
name=Rajakumar.
Discipline=Electronics and communication.
Designation=software Engineer.
file2.txt
name=Kannan.
Discipline=Mechanical.
Designation=CADD Design Engineer.
... (6 Replies)