![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Rules & FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here. |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Perl: Search for string on line then search and replace text | Crypto | Shell Programming and Scripting | 4 | 01-04-2008 06:24 AM |
| Need to replace all occurences of a search string using sed | mjs3221 | Shell Programming and Scripting | 2 | 12-06-2006 10:09 PM |
| String Search & Replace | IwishIknewC | UNIX for Dummies Questions & Answers | 1 | 03-25-2006 02:28 AM |
| Search and replace string between 2 points | whited05 | Shell Programming and Scripting | 3 | 10-11-2005 11:05 AM |
| string search replace | krishna | UNIX for Advanced & Expert Users | 1 | 12-19-2001 09:49 AM |
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
Search, replace string in file1 with string from (lookup table) file2?
Hello: I have another question. Please consider the following two sample, tab-delimited files:
File_1: Abf1 YKL112w Abf1 YAL054c Abf1 YGL234w Ace2 YKL150w Ace2 YNL328c Cup9 YDR441c Cup9 YDR442w Cup9 YEL040w … File 2: … ABF1 YKL112W ACE2 YLR131C CUP9 YPL177C ... File_2 is a “lookup table;” I want to replace $1 in File_1 with the matching $2 field in File_2, additionally adding a middle column containing the string “tf”, and a column of “ones” (“1” in the first column position), all tab-delimited. Additionally, it would be ideal if the case could be ignored for the search / replace, but that the alphabetical output be all uppercase [a-z] converted to [A-Z]. FYI, these are yeast genes; in addition to numbers and letters, some of the genes will contain dashes (e.g., YBR162W-A), but none will contain commas, semicolons, spaces, etc. Output File_3: 1 YKL112W tf YKL112W 1 YKL112W tf YAL054C 1 YKL112W tf YGL234W 1 YLR131C tf YKL150W 1 YLR131C tf YNL328C 1 YLR131C tf YLR439W 1 YPL177C tf YDR441C 1 YPL177C tf YDR442W 1 YPL177C tf YEL040W ... This is related to (but different from) my earlier query, Molecular biologist requires help re: search / replace script Here, the first column is a “dummy” weight value, to maintain “field compatibility,” with my earlier file, as shown in this example: 1 a gi b 1 a pp a 1 a pp c 1 t gi u 1 t gi w 1 t gi x 1 t pp z 2 a pp d 2 a pp e 2 t gi v 2 t gi z 3 a pp b 3 t gi y ... Ultimately, I will end up with a file like this, with $1 = weight, $2 = gene1, $3 = association, $4 = gene2: 1 YKL112W tf YKL112W 1 YKL112W tf YAL054C 1 YKL112W tf YGL234W 1 YLR131C tf YKL150W 1 YLR131C tf YNL328C 1 YLR131C tf YLR439W 1 YPL177C tf YDR441C 1 YPL177C tf YDR442W 1 YPL177C tf YEL040W ... 1 YBL012C gi YCL045C 1 YBL012C pp YBL012C 5 YBL012C pp YHR039C-A 1 YLR363W-A gi YNL143C 4 YLR363W-A gi YPR123C 1 YLR363W-A gi YLR467W 1 YLR363W-A pp YNR073C 2 YBL012C pp YGL232W 2 YBL012C pp YOR102W 2 YLR363W-A gi YFL066C 2 YLR363W-A gi YNR073C 3 YBL012C pp YCL045C 3 YLR363W-A gi YKL100C ... Thank you - Once again, *very* much appreciated! Sincerely, Greg S. :-) |
| Forum Sponsor | ||
|
|
|
|||
|
Quote:
Code:
awk '
FNR==NR{a[tolower($1)]=$2;next}
tolower($1) in a{print "1 " a[tolower($1)] " tf " toupper($2)}
' "File_2" "File_1"
|
|
|||
|
This is absolutely wonderful! ... :-)
Here is my understanding of Franklin52's code: Unix Manuals - AWK Reference # == is “is equal” tolower(string): Return the string with all upper case characters replaced with their lower case equivalents. toupper(string): Return the string with all lower case characters replaced with their upper case equivalents. FNR: Record number in input file. NR: Number of records processed. Thus, the above script translates (? - please correct me if I am mistaken) as awk’ FNR==NR{a[tolower($1)]=$2;next} while the record number (line) equals the total number of records (is true), do all of the following: get $1 (the common gene name - converted to LOWERcase - required since the corresponding field in File_1 is lowercase; otherwise, it will fail to “match” - linux is case-sensitive) in the lookup file (File_2), set (change it) to the (already uppercase) systematic gene name ($2) in the same lookup table, then read the next record number (line); tolower($1) in a{print "1 " a[tolower($1)] " tf " toupper($2)} now, for each $1 in File_2 (now set to uppercase $2, from the lookup table), in the second file (File_1, the one to be converted), print “1”, $2 from File_2; “tf”, $2 from File_1 (returned as uppercase, to convert the trailing lowercase c, w, -a, etc.) ' "File_2" "File_1" File_1 = file to be processed (converted) File_2 = “lookup file” ("common_to_systematic.tab) ?! This works brilliantly!! Thank you so much, Franklin52!! Have a super weekend! ... Greg :-) |
|||
| Google UNIX.COM |