How to replace multiple words together?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to replace multiple words together?
# 1  
Old 08-25-2015
How to replace multiple words together?

I am looking for any smart perl solution for multiple word replacement together. My awk is working but for big files it is superslow.

Code:
awk 'NR==FNR {a[$1]=$2;next} {for ( i in a) gsub(i,a[i])}1' new_word_list.txt oldfile > newfile

new_word_list.txt looks like
Code:
AB733  ST756
AB734  ST219
AB11    SG119

here column 1 is old word and column 2 is the new word.

Last edited by Scrutinizer; 08-25-2015 at 01:43 PM.. Reason: code tags
# 2  
Old 08-25-2015
What does the text being replaced look like? If it's regular enough to do in a more intelligent way than a brute-force search that could help a lot. If not, perhaps not.

A brute force search in Perl won't be much faster than a brute force search in awk, the problem is the brute force search.

How big is new_word_list.txt ? Does the first column ever contain special characters? If you have GNU awk, it might be possible to build your own regex and play with a loop on match.
# 3  
Old 08-25-2015
the text are the big files of protein sequences like

Code:
>AB733
------MDRGCRKENVAVDKRVREAGLRPTRQRIALADLLFAKGDRHLSAEELHEEAQAA
GVPVSL
>AB734
MGFDEKMDLDGRKENVTQAGLLRILAVEGAKTYFDTNTSDHHHFYIEGENRIFDIDSGP
>AB11
DLGCRVRLR_PKRRD

the file new_word_list.txt contains 5870 lines.

Last edited by Scrutinizer; 08-25-2015 at 01:43 PM.. Reason: code tags
# 4  
Old 08-25-2015
Perhaps something like this:

Code:
NR==FNR {       A[$1]=$2 ; R=R"|"$1 ; next      }

{
        while(match($0,substr(R,2))) {
                W=substr($0,RSTART,RLENGTH);
                $0=substr($0, 1, RSTART-1) A[W] substr($0, RSTART+RLENGTH);
        }
} 1

It builds a giant regex in R which is used to check if a line contains any tokens of interest. It may be faster than a brute force search since the regex can be checked as a tree instead of each token in order. It also avoids needing gsub to do the replacement. It could probably be improved further, too, to check successively farther down the line instead of the entire line each time.

Last edited by Corona688; 08-25-2015 at 01:04 PM..
# 5  
Old 08-25-2015
Hi.

The thread Perl script to read string from file#1 and find/replace in file#2 contains a perl code for multiple replacement -- Aia and MIG code.

That code was far faster than a similar-by-results perl code that I had written (long ago).

Some modification and testing on your part will probably be needed.

Good luck ... cheers, drl
This User Gave Thanks to drl For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed parser behaving strange on replacing multiple words in multiple files

I have 4000 files like $cat clus_grp_seq10_g.phy 18 1002 anig_OJJ65951_1 ATGGTTTCGCAGCGTGATAGAGAATTGTTTAGGGATGATATTCGCTCGCGAGGAACGAAGCTCAATGCTGCCGAGCGCGAGAGTCTGCTAAGGCCATATCTGCCAGATCCGTCTGACCTTCCACGCAGGCCACTTCAGCGGCGCAAGAAGGTTCCTCG aver_OOF92921_1 ... (1 Reply)
Discussion started by: sammy777888
1 Replies

2. Shell Programming and Scripting

Replace particular words in file based on if finds another words in that line

Hi All, I need one help to replace particular words in file based on if finds another words in that file . i.e. my self is peter@king. i am staying at north sydney. we all are peter@king. How to replace peter to sham if it finds @king in any line of that file. Please help me... (8 Replies)
Discussion started by: Rajib Podder
8 Replies

3. Shell Programming and Scripting

How to replace words in file?

Hi Guys, I have a text where we used Ram in 10 times now I want replace all Ram words by Shyam word then how to do it. (6 Replies)
Discussion started by: aaditya321
6 Replies

4. UNIX for Dummies Questions & Answers

Replace the words in the file to the words that user type?

Hello, I would like to change my setting in a file to the setting that user input. For example, by default it is ONBOOT=ON When user key in "YES", it would be ONBOOT=YES -------------- This code only adds in the entire user input, but didn't replace it. How do i go about... (5 Replies)
Discussion started by: malfolozy
5 Replies

5. Shell Programming and Scripting

Replace specific words with nothing

Hi I have a file like that contains infomation about genes exons introns made as a single string. i am just planning to get the gene name alone with out any extra information. intergenic_Nedd4_exon_0_F Gapvd1_intron_24_R Gapvd1_exon_25_Rmy output file should be intergenic_Nedd4 Gapvd1... (13 Replies)
Discussion started by: raj_k
13 Replies

6. Shell Programming and Scripting

Shell script to find out words, replace them and count words

hello, i 'd like your help about a bash script which: 1. finds inside the html file (it is attached with my post) the code number of the Latest Stable Kernel, 2.finds the link which leads to the download location of the Latest Stable Kernel version, (the right link should lead to the file... (3 Replies)
Discussion started by: alex83
3 Replies

7. Shell Programming and Scripting

Replace words with the first characters

Hello folks, I have a simple request but I can't find a simple solution. Hare is my problem. I have some dates, I need to replace months with only the first 3 characters (jan for january, feb for february, ... all in lower case) ~$ echo '3 october 2010' | sed 3 oct 2010I thought of something... (8 Replies)
Discussion started by: tukuyomi
8 Replies

8. Shell Programming and Scripting

Replace all the words containing a particular pattern

Hi Please help with this one.. I have a file...I need to replace all the words containing a particular pattern say "xyz' and replace the entire containing xyz with abc.. Before xyzwork connect connect xyz disconnect raxyz After the operation i want something like below: abc... (4 Replies)
Discussion started by: ningy
4 Replies

9. Shell Programming and Scripting

want to replace some words with other from a list

I have a list file, in which both the words to be replaced and to be replaced with are there... need to run a script which will accept the list name and replace all the occurances .. ex. the list file contains something like hi=bye go=come Now i want to replace the words hi with bye and go... (3 Replies)
Discussion started by: dixitked
3 Replies
Login or Register to Ask a Question