Search, replace string in file1 with string from (lookup table) file2? Post: 302184384

Sponsored Content

Top Forums Shell Programming and Scripting Search, replace string in file1 with string from (lookup table) file2? Post 302184384 by Franklin52 on Friday 11th of April 2008 10:08:12 AM

04-11-2008

Registered User

Quote:

Originally Posted by gstuart

Hello: I have another question. Please consider the following two sample, tab-delimited files:

File_1:

Abf1 YKL112w
Abf1 YAL054c
Abf1 YGL234w
Ace2 YKL150w
Ace2 YNL328c
Cup9 YDR441c
Cup9 YDR442w
Cup9 YEL040w
...

File 2:

...
ABF1 YKL112W
ACE2 YLR131C
CUP9 YPL177C
...

File_2 is a �lookup table;� I want to replace $1 in File_1 with the matching $2 field in File_2, additionally adding a middle column containing the string �tf�, and a column of �ones� (�1� in the first column position), all tab-delimited.

Additionally, it would be ideal if the case could be ignored for the search / replace, but that the alphabetical output be all uppercase [a-z] converted to [A-Z].

FYI, these are yeast genes; in addition to numbers and letters, some of the genes will contain dashes (e.g., YBR162W-A), but none will contain commas, semicolons, spaces, etc.

Output File_3:

1 YKL112W tf YKL112W
1 YKL112W tf YAL054C
1 YKL112W tf YGL234W
1 YLR131C tf YKL150W
1 YLR131C tf YNL328C
1 YLR131C tf YLR439W
1 YPL177C tf YDR441C
1 YPL177C tf YDR442W
1 YPL177C tf YEL040W
...

This should give the desired output:

Code:

 awk '
FNR==NR{a[tolower($1)]=$2;next} 
tolower($1) in a{print "1 " a[tolower($1)] " tf " toupper($2)}
' "File_2" "File_1"

Regards

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

string replacement using a lookup table

Dear all thanks for helping in advance.. Know this should be fairly simple but I failed in searching for an answer. I have a file (replacement table) containing two columns, e.g.: ACICJ ACIDIPHILIUM ACIF2 ACIDITHIOBACILLUS ACIF5 ACIDITHIOBACILLUS ACIC5 ACIDOBACTERIUM ACIC1 ACIDOTHERMUS...

2. Shell Programming and Scripting

Search & replace fields from file1 to file2

hi, I have two xml files with the name source.xml and tobe_replaced.xml. Sample data: source.xml contains: <?xml version="1.0"?> <product description="prod1" product_info="some/info"> <product description="prod2" product_info="xyz/allinfo"> <product description="abc/partialinfo"...

3. Shell Programming and Scripting

how to find string from file1 in file2

hi; i am looking for simple search script that find string from file1 in file 2 file 1 contain a loot of string like: 204080111111111 204080222222222 204080333333333 in each row and i would like to take the first row for example 204080111111111 from file1 and find it in file2 when it...

4. Shell Programming and Scripting

search from file1 and replace into file2

I have 2 files: file1.txt: 1|15|XXXXXX||9630716||0096000||30/04/2012|E|O|X||||20120525135617-30.04.2012|PAT66OLM|STA||||00001|STA_0096000_YYYPPPXTMEX00_20120525135617_02_P.pdf|...

5. Shell Programming and Scripting

How to retrieve a number or string from file1 and redirect into file2 in perl script?

hello forum members, I am siva ,As i am new to perl scripting i looking help from forum members. i need a sample program are command for pattern matching. I have file name infile1 which some data, I need to search the particular number are string in the file which repeats n number of...

6. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string. I'm looking to match column1 in file1 to the number...

7. Shell Programming and Scripting

Match part of string in file2 based on column in file1

I have a file containing texts and indexes. I need the text between (and including ) INDEX and number "1" alone in line. I have managed this: awk '/INDEX/,/1$/{if (!/1$/)print}' file1.txt It works for all indexes. And then I have second file with years and indexes per year, one per line...

8. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited. I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range...

9. Shell Programming and Scripting

Lookup value of file1 in file2 using a key

Trying to use awk to match each line in file1 with line in file2 using $1 and $2 and print. File2 is tab-delimeted as is the output and if there is no match then it is skipped. The awk below executes but the output is empty. I think file1 is being split on the : and being saved in array c which...

10. UNIX for Beginners Questions & Answers

Search partial string in a file and replace the string - UNIX

I have the below string which i need to compare with a file and replace this string in the file which matches closely. Can anyone help me on this. string(Scenario 1)- user::r--,user::ourfrd:r-- String(Scenario 2)- user::r-- File **** # file: /local/Desktop/myfile # owner: me # group:...

LEARN ABOUT V7

diff

DIFF(1) 						      General Commands Manual							   DIFF(1)

NAME

       diff - differential file comparator

SYNOPSIS

       diff [ -efbh ] file1 file2

DESCRIPTION

       Diff  tells what lines must be changed in two files to bring them into agreement.  If file1 (file2) is `-', the standard input is used.	If
       file1 (file2) is a directory, then a file in that directory whose file-name is the same as the file-name of file2  (file1)  is  used.   The
       normal output contains lines of these forms:

	    n1 a n3,n4
	    n1,n2 d n3
	    n1,n2 c n3,n4

       These  lines resemble ed commands to convert file1 into file2.  The numbers after the letters pertain to file2.	In fact, by exchanging `a'
       for `d' and reading backward one may ascertain equally how to convert file2 into file1.	As in ed, identical pairs where n1 = n2 or n3 = n4
       are abbreviated as a single number.

       Following  each	of these lines come all the lines that are affected in the first file flagged by `<', then all the lines that are affected
       in the second file flagged by `>'.

       The -b option causes trailing blanks (spaces and tabs) to be ignored and other strings of blanks to compare equal.

       The -e option produces a script of a, c and d commands for the editor ed, which will recreate file2 from file1.	The -f option  produces  a
       similar	script,  not useful with ed, in the opposite order.  In connection with -e, the following shell program may help maintain multiple
       versions of a file.  Only an ancestral file ($1) and a chain of version-to-version ed scripts ($2,$3,...) made by diff need be on hand.	 A
       `latest version' appears on the standard output.

	    (shift; cat $*; echo '1,$p') | ed - $1

       Except in rare circumstances, diff finds a smallest sufficient set of file differences.

       Option  -h  does  a  fast,  half-hearted job.  It works only when changed stretches are short and well separated, but does work on files of
       unlimited length.  Options -e and -f are unavailable with -h.

FILES

       /tmp/d?????
       /usr/lib/diffh for -h

SEE ALSO

       cmp(1), comm(1), ed(1)

DIAGNOSTICS

       Exit status is 0 for no differences, 1 for some, 2 for trouble.

BUGS

       Editing scripts produced under the -e or -f option are naive about creating lines consisting of a single `.'.

																	   DIFF(1)