replace issue with large files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting replace issue with large files
# 1  
Old 11-27-2009
replace issue with large files

I have the following problem:
I have two files: S containing sentences (one in each row) and W containing files (one in each row). It might look like this:

S:
a b c apple d.
e f orange g.
h banana i j.

W:
orange
banana
apple

My task is to replace in S all words that appear in W with "X". How can I do that efficiently, if both W and S are large (100K lines)?

The output should look like this:
a b c X d.
e f X g.
h X i j.

Could you help me please?
# 2  
Old 11-27-2009
awk:
Code:
awk ' FILENAME=="W" {arr[$0]++}
        FILENAME=="S" {
            for(i=1; i<=NF; i++) { 
                 if($i in arr) {
                     printf(" X ") 
                 } 
                 else {
                     printf("%s ", $i  )  
                 }    
            }
            printf("\n")
        } ' W S > newfile

# 3  
Old 11-27-2009
Code:
awk 'NR==FNR{a[$0];next}{for(i=0;++i<=NF;){$i=($i in a)?"X":$i}}1' W S > newfile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Large File masking incorrectly happening Ç delimeter issue

The OS version is Red Hat Enterprise Linux Server release 6.10 I have a script to mask some columns with **** in a data file which is delimeted with Ç , I am using awk for the masking , when I try to mask a small file the awk works fine and masks the required column , but when the file is... (6 Replies)
Discussion started by: LinuxUser8092
6 Replies

2. Shell Programming and Scripting

Large search replace using sed results in memory problem.

I have one big file of size 9GB (big_file.txt). This big file has sentences and paragraphs like any usual English document. I have another file consisting of replacement strings for sed to use. The file name is replace.sed and each entry in one line looks like this: s/\<shout\>/shout/g s/\<b is... (2 Replies)
Discussion started by: shoaibjameel123
2 Replies

3. Shell Programming and Scripting

Performance issue in Grepping large files

I have around 300 files(*.rdf,*.fmb,*.pll,*.ctl,*.sh,*.sql,*.prog) which are of large size. Around 8000 keywords(which will be in the file $keywordfile) needed to be searched inside those files. If a keyword is found in a file..I have to insert the filename,extension,catagoery,keyword,occurrence... (8 Replies)
Discussion started by: millan
8 Replies

4. UNIX for Dummies Questions & Answers

Large file data handling issue

I have a single record large file, semicolon ';' and pipe '|' separated. I am doing a vi on the file. It is throwing an error "File to long" I need to actually remove the last | symbol from this file. sed -e 's/\|*$//' filename is working fine for small files. But not working on this big... (13 Replies)
Discussion started by: Gurkamal83
13 Replies

5. Shell Programming and Scripting

Severe performance issue while 'grep'ing on large volume of data

Background ------------- The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files. File-1 ------ Contains 50,000 rows with 2 fields in each row, separated by pipe. Row structure is like Object_Id|Object_Name, as following: 111|XXX 222|YYY 333|ZZZ ... (6 Replies)
Discussion started by: Souvik
6 Replies

6. Solaris

How to safely copy full filesystems with large files (10Gb files)

Hello everyone. Need some help copying a filesystem. The situation is this: I have an oracle DB mounted on /u01 and need to copy it to /u02. /u01 is 500 Gb and /u02 is 300 Gb. The size used on /u01 is 187 Gb. This is running on solaris 9 and both filesystems are UFS. I have tried to do it using:... (14 Replies)
Discussion started by: dragonov7
14 Replies

7. Shell Programming and Scripting

Divide large data files into smaller files

Hello everyone! I have 2 types of files in the following format: 1) *.fa >1234 ...some text... >2345 ...some text... >3456 ...some text... . . . . 2) *.info >1234 (7 Replies)
Discussion started by: ad23
7 Replies

8. Shell Programming and Scripting

highly specific search and replace for a large number of files

hey guys, I have a directory with about 600 files. I need to find a specific word inside a command and replace only that instance of the word in many files. For example, lets say I have a command called 'foo' in many files. One of the input arguments of the 'foo' call is 'bar'. The word 'bar'... (5 Replies)
Discussion started by: ksubrama
5 Replies

9. UNIX for Dummies Questions & Answers

large files?

How do we check 'large files' is enabled on a Unix box -- HP-UX B11.11 (2 Replies)
Discussion started by: ranj@chn
2 Replies

10. UNIX for Dummies Questions & Answers

Large files

I am trying to understand the webserver log file for an error which has occured on my live web site. The webserver access file is very big in size so it's not possible to open this file using vi editor. I know the approximate time the error occured, so i am interested in looking for the log file... (4 Replies)
Discussion started by: sehgalniraj
4 Replies
Login or Register to Ask a Question