Search, replace string in file1 with string from (lookup table) file2?


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
# 1  
Search, replace string in file1 with string from (lookup table) file2?

Hello: I have another question. Please consider the following two sample, tab-delimited files:

File_1:

Abf1 YKL112w
Abf1 YAL054c
Abf1 YGL234w
Ace2 YKL150w
Ace2 YNL328c
Cup9 YDR441c
Cup9 YDR442w
Cup9 YEL040w



File 2:


ABF1 YKL112W
ACE2 YLR131C
CUP9 YPL177C
...

File_2 is a “lookup table;” I want to replace $1 in File_1 with the matching $2 field in File_2, additionally adding a middle column containing the string “tf”, and a column of “ones” (“1” in the first column position), all tab-delimited.

Additionally, it would be ideal if the case could be ignored for the search / replace, but that the alphabetical output be all uppercase [a-z] converted to [A-Z].

FYI, these are yeast genes; in addition to numbers and letters, some of the genes will contain dashes (e.g., YBR162W-A), but none will contain commas, semicolons, spaces, etc.

Output File_3:

1 YKL112W tf YKL112W
1 YKL112W tf YAL054C
1 YKL112W tf YGL234W
1 YLR131C tf YKL150W
1 YLR131C tf YNL328C
1 YLR131C tf YLR439W
1 YPL177C tf YDR441C
1 YPL177C tf YDR442W
1 YPL177C tf YEL040W
...

This is related to (but different from) my earlier query,

https://www.unix.com/shell-programmin...#post302183287

Here, the first column is a “dummy” weight value, to maintain “field compatibility,” with my earlier file, as shown in this example:

1 a gi b
1 a pp a
1 a pp c
1 t gi u
1 t gi w
1 t gi x
1 t pp z
2 a pp d
2 a pp e
2 t gi v
2 t gi z
3 a pp b
3 t gi y
...

Ultimately, I will end up with a file like this, with $1 = weight, $2 = gene1, $3 = association, $4 = gene2:


1 YKL112W tf YKL112W
1 YKL112W tf YAL054C
1 YKL112W tf YGL234W
1 YLR131C tf YKL150W
1 YLR131C tf YNL328C
1 YLR131C tf YLR439W
1 YPL177C tf YDR441C
1 YPL177C tf YDR442W
1 YPL177C tf YEL040W
...
1 YBL012C gi YCL045C
1 YBL012C pp YBL012C
5 YBL012C pp YHR039C-A
1 YLR363W-A gi YNL143C
4 YLR363W-A gi YPR123C
1 YLR363W-A gi YLR467W
1 YLR363W-A pp YNR073C
2 YBL012C pp YGL232W
2 YBL012C pp YOR102W
2 YLR363W-A gi YFL066C
2 YLR363W-A gi YNR073C
3 YBL012C pp YCL045C
3 YLR363W-A gi YKL100C
...

Thank you - Once again, *very* much appreciated!

Sincerely, Greg S. :-)
# 2  
Quote:
Originally Posted by gstuart
Hello: I have another question. Please consider the following two sample, tab-delimited files:

File_1:

Abf1 YKL112w
Abf1 YAL054c
Abf1 YGL234w
Ace2 YKL150w
Ace2 YNL328c
Cup9 YDR441c
Cup9 YDR442w
Cup9 YEL040w



File 2:


ABF1 YKL112W
ACE2 YLR131C
CUP9 YPL177C
...

File_2 is a “lookup table;” I want to replace $1 in File_1 with the matching $2 field in File_2, additionally adding a middle column containing the string “tf”, and a column of “ones” (“1” in the first column position), all tab-delimited.

Additionally, it would be ideal if the case could be ignored for the search / replace, but that the alphabetical output be all uppercase [a-z] converted to [A-Z].

FYI, these are yeast genes; in addition to numbers and letters, some of the genes will contain dashes (e.g., YBR162W-A), but none will contain commas, semicolons, spaces, etc.

Output File_3:

1 YKL112W tf YKL112W
1 YKL112W tf YAL054C
1 YKL112W tf YGL234W
1 YLR131C tf YKL150W
1 YLR131C tf YNL328C
1 YLR131C tf YLR439W
1 YPL177C tf YDR441C
1 YPL177C tf YDR442W
1 YPL177C tf YEL040W
...
This should give the desired output:

Code:
 awk '
FNR==NR{a[tolower($1)]=$2;next} 
tolower($1) in a{print "1 " a[tolower($1)] " tf " toupper($2)}
' "File_2" "File_1"

Regards
# 3  
This is absolutely wonderful! ... :-)

Here is my understanding of Franklin52's code:

Unix Manuals - AWK Reference

# == is “is equal”

tolower(string): Return the string with all upper case characters replaced with their lower case equivalents.

toupper(string): Return the string with all lower case characters replaced with their upper case equivalents.

FNR: Record number in input file.

NR: Number of records processed.

Thus, the above script translates (? - please correct me if I am mistaken) as

awk’
FNR==NR{a[tolower($1)]=$2;next}

while the record number (line) equals the total number of records (is true), do all of the following:
get $1 (the common gene name - converted to LOWERcase - required since the corresponding field in File_1 is lowercase; otherwise, it will fail to “match” - linux is case-sensitive) in the lookup file (File_2), set (change it) to the (already uppercase) systematic gene name ($2) in the same lookup table, then read the next record number (line);

tolower($1) in a{print "1 " a[tolower($1)] " tf " toupper($2)}

now, for each $1 in File_2 (now set to uppercase $2, from the lookup table), in the second file (File_1, the one to be converted), print
“1”, $2 from File_2; “tf”, $2 from File_1 (returned as uppercase, to convert the trailing lowercase c, w, -a, etc.)

' "File_2" "File_1"

File_1 = file to be processed (converted)
File_2 = “lookup file” ("common_to_systematic.tab)

?!


This works brilliantly!! Thank you so much, Franklin52!!

Have a super weekend! ... Greg :-)
# 4  
Renaming multiple files from a lookup table

Is it possible to modify the script above to rename files based on a lookup table?

e.g.:
Current New
A87324.jpg A1372365.jpg
A89732.jpg A98274.jpg
A130347.jpg A73689.jpg
...

Thanks,

Rick
# 5  
Code:
#!/bin/ksh

while read current new x
do
   mv "${current}" "${new}"
done < /path/to/lookupFile

# 6  
hi guys!!

I am new to shell script.. i wanted to know abt sed command and how it work?

here is what i want do, i want to search original string in export.txt file which is:
export mibs =\opt\mymibs\

i want to replace it by
export mibs =\opt\new_mibs\

Please help with it

thanks in advance

Last edited by allrise123; 05-23-2009 at 06:17 PM..
# 7  
Quote:
Originally Posted by vgersh99
Code:
#!/bin/ksh

while read current new x
do
   mv "${current}" "${new}"
done < /path/to/lookupFile

hi vgersh99,
could u solve my above query? thanks

Last edited by allrise123; 05-23-2009 at 08:26 PM..
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #522
Difficulty: Medium
11,223 = 0x2BD7
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Search partial string in a file and replace the string - UNIX

I have the below string which i need to compare with a file and replace this string in the file which matches closely. Can anyone help me on this. string(Scenario 1)- user::r--,user::ourfrd:r-- String(Scenario 2)- user::r-- File **** # file: /local/Desktop/myfile # owner: me # group:... (6 Replies)
Discussion started by: sarathy_a35
6 Replies

2. Shell Programming and Scripting

Lookup value of file1 in file2 using a key

Trying to use awk to match each line in file1 with line in file2 using $1 and $2 and print. File2 is tab-delimeted as is the output and if there is no match then it is skipped. The awk below executes but the output is empty. I think file1 is being split on the : and being saved in array c which... (3 Replies)
Discussion started by: cmccabe
3 Replies

3. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited. I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
Discussion started by: cmccabe
6 Replies

4. Shell Programming and Scripting

Match part of string in file2 based on column in file1

I have a file containing texts and indexes. I need the text between (and including ) INDEX and number "1" alone in line. I have managed this: awk '/INDEX/,/1$/{if (!/1$/)print}' file1.txt It works for all indexes. And then I have second file with years and indexes per year, one per line... (3 Replies)
Discussion started by: phoebus
3 Replies

5. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string. I'm looking to match column1 in file1 to the number... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

6. Shell Programming and Scripting

How to retrieve a number or string from file1 and redirect into file2 in perl script?

hello forum members, I am siva ,As i am new to perl scripting i looking help from forum members. i need a sample program are command for pattern matching. I have file name infile1 which some data, I need to search the particular number are string in the file which repeats n number of... (0 Replies)
Discussion started by: workforsiva
0 Replies

7. Shell Programming and Scripting

search from file1 and replace into file2

I have 2 files: file1.txt: 1|15|XXXXXX||9630716||0096000||30/04/2012|E|O|X||||20120525135617-30.04.2012|PAT66OLM|STA||||00001|STA_0096000_YYYPPPXTMEX00_20120525135617_02_P.pdf|... (2 Replies)
Discussion started by: pparthiv
2 Replies

8. Shell Programming and Scripting

how to find string from file1 in file2

hi; i am looking for simple search script that find string from file1 in file 2 file 1 contain a loot of string like: 204080111111111 204080222222222 204080333333333 in each row and i would like to take the first row for example 204080111111111 from file1 and find it in file2 when it... (1 Reply)
Discussion started by: kpinto
1 Replies

9. Shell Programming and Scripting

Search & replace fields from file1 to file2

hi, I have two xml files with the name source.xml and tobe_replaced.xml. Sample data: source.xml contains: <?xml version="1.0"?> <product description="prod1" product_info="some/info"> <product description="prod2" product_info="xyz/allinfo"> <product description="abc/partialinfo"... (2 Replies)
Discussion started by: dragon.1431
2 Replies

10. UNIX for Dummies Questions & Answers

string replacement using a lookup table

Dear all thanks for helping in advance.. Know this should be fairly simple but I failed in searching for an answer. I have a file (replacement table) containing two columns, e.g.: ACICJ ACIDIPHILIUM ACIF2 ACIDITHIOBACILLUS ACIF5 ACIDITHIOBACILLUS ACIC5 ACIDOBACTERIUM ACIC1 ACIDOTHERMUS... (10 Replies)
Discussion started by: roussine
10 Replies

Featured Tech Videos