
04-11-2008
|
|
Moderator
|
|
|
Join Date: Feb 2007
Posts: 4,342
|
|
Quote:
Originally Posted by gstuart
Hello: I have another question. Please consider the following two sample, tab-delimited files:
File_1:
Abf1 YKL112w
Abf1 YAL054c
Abf1 YGL234w
Ace2 YKL150w
Ace2 YNL328c
Cup9 YDR441c
Cup9 YDR442w
Cup9 YEL040w
…
File 2:
…
ABF1 YKL112W
ACE2 YLR131C
CUP9 YPL177C
...
File_2 is a “lookup table;” I want to replace $1 in File_1 with the matching $2 field in File_2, additionally adding a middle column containing the string “tf”, and a column of “ones” (“1” in the first column position), all tab-delimited.
Additionally, it would be ideal if the case could be ignored for the search / replace, but that the alphabetical output be all uppercase [a-z] converted to [A-Z].
FYI, these are yeast genes; in addition to numbers and letters, some of the genes will contain dashes (e.g., YBR162W-A), but none will contain commas, semicolons, spaces, etc.
Output File_3:
1 YKL112W tf YKL112W
1 YKL112W tf YAL054C
1 YKL112W tf YGL234W
1 YLR131C tf YKL150W
1 YLR131C tf YNL328C
1 YLR131C tf YLR439W
1 YPL177C tf YDR441C
1 YPL177C tf YDR442W
1 YPL177C tf YEL040W
...
|
This should give the desired output:
Code:
awk '
FNR==NR{a[tolower($1)]=$2;next}
tolower($1) in a{print "1 " a[tolower($1)] " tf " toupper($2)}
' "File_2" "File_1"
Regards
|