Replacing text based on replacement tables


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Replacing text based on replacement tables
# 1  
Old 05-17-2009
Question Replacing text based on replacement tables

Dear all, will be grateful for your advices..
The need is (i guess) simple for UNIX experts. Basically, there are replacement tables, which would be used to replace text strings in the data (large volumes..).

An exmpl table (a "config file"):
VIFIS1_1_PE1836 VIBRIO_FISCHERI
VIPAR1_1_PE1662 VIBRIO_PARAHAEMOLYTICUS
VIPAR1_1_PE2235 VIBRIO_PARAHAEMOLYTICUS
VIPAR1_2_PE1355 VIBRIO_PARAHAEMOLYTICUS
VIVUL1_1_PE1801 VIBRIO_VULNIFICUS

would work for replacing all instances of the first column with their row-counterparts. The replacement needs to be done in other files but be based on such a config file. The format of this table is not set, so I can make it anything to fit a replacement script. Column contents will vary, so the script could be m/o/l universal, just taking into account the relative position of the query string and the replace string.

Anyone can think of an easy way to do it? Thanks much!
# 2  
Old 05-17-2009
Use Perl. Read the config file into a hash with column1 as the key and column 2 as its value. Then use the hash in a loop to make substitutions to the target files.
# 3  
Old 05-17-2009
Quote:
Originally Posted by roussine
Basically, there are replacement tables, which would be used to replace text strings in the data (large volumes..).

An exmpl table (a "config file"):
VIFIS1_1_PE1836 VIBRIO_FISCHERI
VIPAR1_1_PE1662 VIBRIO_PARAHAEMOLYTICUS
VIPAR1_1_PE2235 VIBRIO_PARAHAEMOLYTICUS
VIPAR1_2_PE1355 VIBRIO_PARAHAEMOLYTICUS
VIVUL1_1_PE1801 VIBRIO_VULNIFICUS
The following command line works under the assumption than any key in the first table column is a string made only of characters in the class [A-Z0-9_]. The only assumption made on the format of the input file text.txt is that any key must be delimited by at least one character not in the above class; this requirement allows us to break the input into "words", so we do a table lookup and then replace only the matched keys instead of blindly trying to replace each and every key.
Natural language should be ok as input.

Code:
awk -F'[^A-Z0-9_]+' 'FNR==NR{a[$1]=$2;next}{for(n=1;n<=NF;n++)if($n in a)gsub($n,a[$n])}1' table text.txt > result.txt

If the input file has some known structure then perhaps the code can be made more efficient.
# 4  
Old 05-17-2009
Associative arrays in awk work well. Even on gram negative bacteria.

Code:
awk 'FILENAME=="table" {arr[$1]=$2}
       FILENAME=="bigfile { for(i=1; i< NF; i++) {if($i in arr) {$i=arr[$i]}}
                                    print $0;
                                  } ' table bigfile > newbigfile

# 5  
Old 05-17-2009
Quote:
Originally Posted by jim mcnamara
Code:
awk 'FILENAME=="table" {arr[$1]=$2}
       FILENAME=="bigfile { for(i=1; i< NF; i++) {if($i in arr) {$i=arr[$i]}}
                                    print $0;
                                  } ' table bigfile > newbigfile

Unfortunately this does work only when bigfile has whitespaces around each bacteria identifier. Another quirk is that it outputs a file with collapsed whitespaces.

It may be well enough for the original poster, depending on the structure of the input files he/she is planning to process.

Last edited by colemar; 05-17-2009 at 08:44 PM.. Reason: s/virus/bacteria/
# 6  
Old 05-17-2009
if your system has Python,
Code:
import fileinput
table={}
for line in open("table"):
    line=line.strip().split()
    table[line[0]] = line[-1]
for line in fileinput.FileInput("file",inplace=1,backup=".txt"):
    line=line.strip()
    for i,j in table.iteritems():
        if i in line: line=line.replace(i,j)
    print line

# 7  
Old 05-17-2009
And in case you have perl, then:

Code:
$ 
$ cat config.txt
VIFIS1_1_PE1836 VIBRIO_FISCHERI
VIPAR1_1_PE1662 VIBRIO_PARAHAEMOLYTICUS
VIPAR1_1_PE2235 VIBRIO_PARAHAEMOLYTICUS
VIPAR1_2_PE1355 VIBRIO_PARAHAEMOLYTICUS
VIVUL1_1_PE1801 VIBRIO_VULNIFICUS      
$                                      
$ cat bigfile.txt                      
blah blah blah VIFIS1_1_PE1836 blah blah
blah blah blah                          
blah blah VIVUL1_1_PE1801 blah blah VIPAR1_1_PE2235
blah blah                                          
$                                                  
$
$ perl -ne 'BEGIN { open(CF,"config.txt");
>                   while (<CF>) {split; $x{$_[0]}=$_[1]}
>                   close(CF)
>                 }
> { chomp; @y=split/ /; $buf="";
>   foreach $i (@y) {$buf .= (defined($x{$i}) ? $x{$i} : $i)." "};
>   $buf=~s/ $//; print "$buf\n";
> }' bigfile.txt
blah blah blah VIBRIO_FISCHERI blah blah
blah blah blah
blah blah VIBRIO_VULNIFICUS blah blah VIBRIO_PARAHAEMOLYTICUS
blah blah
$
$

tyler_durden
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed text replacement

Hello, I'm using Bash and Sed to replace text within a text file (1.txt) twice in one script. Using a for loop I'm initially replacing any 'apple' words with the variable 'word1' ("leg). I'm then using another for loop to replace any 'apple' words with the variable 'word2' ("arm"). This task is... (2 Replies)
Discussion started by: Flip-Flop
2 Replies

2. UNIX for Dummies Questions & Answers

How to merge two tables based on a matched column?

Hi, Please excuse me , i have searched unix forum, i am unable to find what i expect , my query is , i have 2 files of same structure and having 1 similar field/column , i need to merge 2 tables/files based on the one matched field/column (that is field 1), file 1:... (5 Replies)
Discussion started by: karthikram
5 Replies

3. Shell Programming and Scripting

Block of text replacement using sed

Hi, I have a requirement in which i need to replace text as below - <stringProp name="Recipe">&lt;AddGroup Name=&quot;1001&quot; Path=&quot;ServiceAdministration/Controls/1001/ServiceSwitches&quot;&gt; &lt;Param Name=&quot;AttributeName&quot; Value=&quot;HeaderManipRspIngressRuleSet&quot; Type=&quot;String&quot; /&gt; &lt;Param Name=&quot;Value&quot;... (0 Replies)
Discussion started by: abhitanshu
0 Replies

4. Shell Programming and Scripting

Replacing the text in a row based on certain condition

Hi All, I felt tough to frame my question. Any way find my below input. (.CSV file) SNo, City 1, Chennai 2, None 3, Delhi 4,None Note that I have many rows ans also other columns beside my City column. What I need is the below output. SNo, City 1, Chennai 2, Chennai_new 3, Delhi... (2 Replies)
Discussion started by: ks_reddy
2 Replies

5. UNIX for Dummies Questions & Answers

Cut from tables based on column values

Hello, I have a tab-delimited table that may contain 11,12 or 13 columns. Depending on the number of columns, I want to cut and get a sub table as shown below. However, the awk commands in the code seem to be an issue. What should I be doing differently? #cut columns 1-2,4-5,11 when 12 &... (3 Replies)
Discussion started by: Gussifinknottle
3 Replies

6. UNIX for Dummies Questions & Answers

Script for replacing text in a file based on list

Hi All, I am fairly new to the world of Unix, and I am looking for a way to replace a line of text in a file with a delimited array of values. I have an aliases file that is currently in use on our mail server that we are migrating off of. Until the migration is complete, the server must stay... (8 Replies)
Discussion started by: phoenixjc
8 Replies

7. UNIX and Linux Applications

create 'day' tables based on timestamp in mysql

How would one go about creating 'day' tables based on the timestamp field. I have some 'import' tables which contains data from various days and would like to spilt that data up into 'days' based on the timestamp field in new tables. TABLE_IMPORT1 TABLE_IMPORT2 TABLE_IMPORT3 ... (2 Replies)
Discussion started by: hazno
2 Replies

8. Shell Programming and Scripting

Replacement of text in a file

Hi , I have some data in my file(properties.txt) like this. # agent.properties agent.dmp.Location= agent.name= I need to relpace the agent.dmp.location with agent.dmp.Location = /opt/VRTS/vxvm I am using the follwing to replace the string AGENT_NAME=snmp... (2 Replies)
Discussion started by: raghu.amilineni
2 Replies

9. UNIX for Dummies Questions & Answers

Sed text replacement issue.

Hi, Im trying to find and replace text within a unix file using sed. The command that i have been using is sed '/,null,/ s//, ,/g' result.txt>result.tmp for replacing ",null," with ", ,". But this only replaces the first occurrance of ,null, in every line. I want to do it globally. It... (7 Replies)
Discussion started by: sohaibs
7 Replies

10. UNIX for Dummies Questions & Answers

Text replacement between 2 files

I have 2 files that are tab dilimiter: file1 contains: T 1 2 3 1000 T 5 10 15 9000 T 4 5 6 2000 T 3 7 9 6000 AND SO ON file2 contains: (columns number 1, 2, and 3 are match-pattern to file1) 1 2 3 JOHN 4 4 4 MIKE 4 5 6 TOM 3 7 9 MIKE AND SO ON I want file3 contains... (3 Replies)
Discussion started by: bobo
3 Replies
Login or Register to Ask a Question