Sponsored Content
Top Forums Shell Programming and Scripting Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB). Post 302552343 by Corona688 on Friday 2nd of September 2011 02:27:04 PM
Old 09-02-2011
Quote:
Originally Posted by manishkomar007
Thanks Corona688...!!

While doing this we are excatly searching & replacing 8 character like 20110901 to 20110902.
Could you show us the first few lines of the file, and the data you wish replaced? If the data is always the same length and always in the same place, you can use dd to write it in...

---------- Post updated at 11:42 AM ---------- Previous update was at 11:37 AM ----------

An example:
Code:
$ cat textdata
This is line 1
This is line 2
This is the data I want replaced >>11111111<<
This is another line
etc etc until end of file.
$ printf "%s" 22222222 | dd conv=notrunc of=textdata seek=65 bs=1
$ cat textdata
This is line 1
This is line 2
This is the data I want replaced >>22222222<<
This is another line
etc etc until end of file.

The 'bs=1' tells it to work on a sector size of 1 byte, which lets us seek seek exactly 65 characters into the file with seek=65. The conv=notrunc is important, it tells dd not to replace the file but to just overwrite data that's already there.

---------- Post updated at 12:06 PM ---------- Previous update was at 11:42 AM ----------

Another method needing BASH 3.0 or newer:

Code:
#!/bin/bash

exec 5<hugedata
exec 6<>hugedata

# Read lines one at a time from both file descriptors.
# When we find the line we want in FD 5, FD 6 will still be at the
# previous line, allowing us to overwrite the line with it.
while read -u 5 LINE
do
        # Match strings like >>12345678<< anywhere in the line
        # save it in BASH_REMATCH in three segments:  ...>>, 11111111, <<...
        if [[ $LINE =~ ^(.*\>\>)([0-9]+)(\<\<.*)$ ]]
        then
                NEWLINE="${BASH_REMATCH[1]}22222222${BASH_REMATCH[3]}"

                if [ "${#NEWLINE}" -ne "${#LINE}" ]
                then
                        echo "Error, new line would be different length"
                        exit 1
                fi

                # Overwrite the line with a line of same length
                echo "${NEWLINE}" >&6
                exec 6>&-
                exec 5>&-

                echo "Found and replaced ${BASH_REMATCH[2]} with 22222222" >&2
                exit 0
        else
                read -u 6 LINE  # Keep FD 5 and FD 6 in sync
        fi
done <&5

echo "Warning, didn't find any data to replace" >&2
exit 1

Code:
$ cat hugedata
This is line 1
This is line 2
This is the data I want replaced >>11111111<<
This is another line
etc etc until end of file.
$ ./datarep2.sh
$ cat hugedata
This is line 1
This is line 2
This is the data I want replaced >>22222222<<
This is another line
etc etc until end of file.
$

Both methods are able to edit early lines in the file as long as their length doesn't change, without having to read or write data afterwards at all.


The DD version would be more reliable and portable if you always know where the data to replace is.

---------- Post updated at 12:27 PM ---------- Previous update was at 12:06 PM ----------

Another thing you could do is just keep the header always separate from the huge file. When you need to feed it into something, use sed or awk or whatever to get the modified header, and cat out the rest of the file. (one of the rare useful uses of cat.)
Code:
( sed 's/orig/replacement/' < header ; cat restoffile ) | programusinghugefile


Last edited by Corona688; 09-02-2011 at 03:22 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

In Line File Modifications: Search and Replace

grep -il "TEST" ${ENVIRON}/*.pde| while read pde &nbsp;&nbsp;do &nbsp;&nbsp;&nbsp;&nbsp;cat $pde | sed s/"TEST 3,1"/"TEST 3,0"/g | sed s/"TEST&nbsp;&nbsp;3,1"/"TEST&nbsp;&nbsp;3,0"/g > ${pde}.tmp &nbsp;&nbsp;&nbsp;&nbsp;if ; then &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mv ${pde}.tmp $pde ... (2 Replies)
Discussion started by: Shakey21
2 Replies

2. UNIX for Dummies Questions & Answers

how can search a String in one text file and replace the whole line in another file

i am very new to UNIX plz help me in this scenario i have two text files as below file1.txt name=Rajakumar. Discipline=Electronics and communication. Designation=software Engineer. file2.txt name=Kannan. Discipline=Mechanical. Designation=CADD Design Engineer. ... (6 Replies)
Discussion started by: kkraja
6 Replies

3. UNIX for Dummies Questions & Answers

How to search and replace a particular line in file with sed command

Hello, I have a file and in that, I want to search for a aprticular word and then replace another word in the same line with something else. Example: In file abc.txt, there is a line <host oa_var="s_hostname">test</host> I want to search with s_hostname text and then replace test with... (2 Replies)
Discussion started by: sshah1001
2 Replies

4. Shell Programming and Scripting

Search & Replace in Multiple Files by reading a input file

Hi, I have a folder which contains multiple config.xml files and one input file, Please see the below format. Config Files format looks like :- Code: <application name="SAMPLE-ARCHIVE"> <NVPairs name="Global Variables"> <NameValuePair> ... (0 Replies)
Discussion started by: haiksuresh
0 Replies

5. Shell Programming and Scripting

Implement in one line sed or awk having no delimiter and file size is huge

I have file which contains around 5000 lines. The lines are fixed legth but having no delimiter.Each line line contains nearly 3000 characters. I want to delete the lines a> if it starts with 1 and if 576th postion is a digit i,e 0-9 or b> if it starts with 0 or 9(i,e header and footer) ... (4 Replies)
Discussion started by: millan
4 Replies

6. Shell Programming and Scripting

Global search and replace multi line file

Hello I need to search for a mult-line strngs(with spaces in between and qoted) in a file1 and replace that text with Fixed string globally in file1. The strng to search for is in file2. The file is big with some 20K records. so speed and effciency is required file1: (where srch & rplc... (0 Replies)
Discussion started by: Hiano
0 Replies

7. Shell Programming and Scripting

Mutli line pattern search & replace in a xml file

Hello guys, I need your help for a specific sed command that would search for a multi line pattern and if found, would replace it by another multi line pattern. For instance, here is the input: <RefNickName>abcd</RefNickName> <NickName>efgh</NickName> <Customize> ... (0 Replies)
Discussion started by: xciteddd
0 Replies

8. Shell Programming and Scripting

awk search/replace specific field, using variables for regexp & subsitution then overwrite file

Hello, I'm trying the solve the following problem. I have a file which I intend to use as a csv called master.csv The columns are separated by commas. I want to change the text on a specific row in either column 3,4,5 or 6 from xxx to yyy depending upon if column 1 matches a specified pattern.... (3 Replies)
Discussion started by: cyphex
3 Replies

9. Shell Programming and Scripting

Search & Replace in Multiple Files by reading a input file

I have a environment property file which contains: Input file: value1 = url1 value2 = url2 value3 = url3 and so on. I need to search all *.xml files under directory for value1 and replace it with url1. Same thing I have to do for all values mentioned in input file. I need script in unix bash... (7 Replies)
Discussion started by: Shamkamde
7 Replies

10. UNIX for Dummies Questions & Answers

Need to replace new line characters in a huge file

Hi , I would like to replace new line characters(\n) in a huge file of about 2 million records . I tried this one (:%s/\n//g) but it's hanging there and no result. Does this command do not work if the file is big. Please let me know if you have any other options Regards Raj (1 Reply)
Discussion started by: rajeevm
1 Replies
All times are GMT -4. The time now is 07:06 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy