Sponsored Content
Top Forums Shell Programming and Scripting Search, replace string in file1 with string from (lookup table) file2? Post 302184136 by gstuart on Thursday 10th of April 2008 03:24:50 PM
Old 04-10-2008
Search, replace string in file1 with string from (lookup table) file2?

Hello: I have another question. Please consider the following two sample, tab-delimited files:

File_1:

Abf1 YKL112w
Abf1 YAL054c
Abf1 YGL234w
Ace2 YKL150w
Ace2 YNL328c
Cup9 YDR441c
Cup9 YDR442w
Cup9 YEL040w
...


File 2:

...
ABF1 YKL112W
ACE2 YLR131C
CUP9 YPL177C
...

File_2 is a “lookup table;” I want to replace $1 in File_1 with the matching $2 field in File_2, additionally adding a middle column containing the string “tf”, and a column of “ones” (“1” in the first column position), all tab-delimited.

Additionally, it would be ideal if the case could be ignored for the search / replace, but that the alphabetical output be all uppercase [a-z] converted to [A-Z].

FYI, these are yeast genes; in addition to numbers and letters, some of the genes will contain dashes (e.g., YBR162W-A), but none will contain commas, semicolons, spaces, etc.

Output File_3:

1 YKL112W tf YKL112W
1 YKL112W tf YAL054C
1 YKL112W tf YGL234W
1 YLR131C tf YKL150W
1 YLR131C tf YNL328C
1 YLR131C tf YLR439W
1 YPL177C tf YDR441C
1 YPL177C tf YDR442W
1 YPL177C tf YEL040W
...

This is related to (but different from) my earlier query,

https://www.unix.com/shell-programmin...#post302183287

Here, the first column is a “dummy” weight value, to maintain “field compatibility,” with my earlier file, as shown in this example:

1 a gi b
1 a pp a
1 a pp c
1 t gi u
1 t gi w
1 t gi x
1 t pp z
2 a pp d
2 a pp e
2 t gi v
2 t gi z
3 a pp b
3 t gi y
...

Ultimately, I will end up with a file like this, with $1 = weight, $2 = gene1, $3 = association, $4 = gene2:


1 YKL112W tf YKL112W
1 YKL112W tf YAL054C
1 YKL112W tf YGL234W
1 YLR131C tf YKL150W
1 YLR131C tf YNL328C
1 YLR131C tf YLR439W
1 YPL177C tf YDR441C
1 YPL177C tf YDR442W
1 YPL177C tf YEL040W
...
1 YBL012C gi YCL045C
1 YBL012C pp YBL012C
5 YBL012C pp YHR039C-A
1 YLR363W-A gi YNL143C
4 YLR363W-A gi YPR123C
1 YLR363W-A gi YLR467W
1 YLR363W-A pp YNR073C
2 YBL012C pp YGL232W
2 YBL012C pp YOR102W
2 YLR363W-A gi YFL066C
2 YLR363W-A gi YNR073C
3 YBL012C pp YCL045C
3 YLR363W-A gi YKL100C
...

Thank you - Once again, *very* much appreciated!

Sincerely, Greg S. :-)
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

string replacement using a lookup table

Dear all thanks for helping in advance.. Know this should be fairly simple but I failed in searching for an answer. I have a file (replacement table) containing two columns, e.g.: ACICJ ACIDIPHILIUM ACIF2 ACIDITHIOBACILLUS ACIF5 ACIDITHIOBACILLUS ACIC5 ACIDOBACTERIUM ACIC1 ACIDOTHERMUS... (10 Replies)
Discussion started by: roussine
10 Replies

2. Shell Programming and Scripting

Search & replace fields from file1 to file2

hi, I have two xml files with the name source.xml and tobe_replaced.xml. Sample data: source.xml contains: <?xml version="1.0"?> <product description="prod1" product_info="some/info"> <product description="prod2" product_info="xyz/allinfo"> <product description="abc/partialinfo"... (2 Replies)
Discussion started by: dragon.1431
2 Replies

3. Shell Programming and Scripting

how to find string from file1 in file2

hi; i am looking for simple search script that find string from file1 in file 2 file 1 contain a loot of string like: 204080111111111 204080222222222 204080333333333 in each row and i would like to take the first row for example 204080111111111 from file1 and find it in file2 when it... (1 Reply)
Discussion started by: kpinto
1 Replies

4. Shell Programming and Scripting

search from file1 and replace into file2

I have 2 files: file1.txt: 1|15|XXXXXX||9630716||0096000||30/04/2012|E|O|X||||20120525135617-30.04.2012|PAT66OLM|STA||||00001|STA_0096000_YYYPPPXTMEX00_20120525135617_02_P.pdf|... (2 Replies)
Discussion started by: pparthiv
2 Replies

5. Shell Programming and Scripting

How to retrieve a number or string from file1 and redirect into file2 in perl script?

hello forum members, I am siva ,As i am new to perl scripting i looking help from forum members. i need a sample program are command for pattern matching. I have file name infile1 which some data, I need to search the particular number are string in the file which repeats n number of... (0 Replies)
Discussion started by: workforsiva
0 Replies

6. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string. I'm looking to match column1 in file1 to the number... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

7. Shell Programming and Scripting

Match part of string in file2 based on column in file1

I have a file containing texts and indexes. I need the text between (and including ) INDEX and number "1" alone in line. I have managed this: awk '/INDEX/,/1$/{if (!/1$/)print}' file1.txt It works for all indexes. And then I have second file with years and indexes per year, one per line... (3 Replies)
Discussion started by: phoebus
3 Replies

8. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited. I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
Discussion started by: cmccabe
6 Replies

9. Shell Programming and Scripting

Lookup value of file1 in file2 using a key

Trying to use awk to match each line in file1 with line in file2 using $1 and $2 and print. File2 is tab-delimeted as is the output and if there is no match then it is skipped. The awk below executes but the output is empty. I think file1 is being split on the : and being saved in array c which... (3 Replies)
Discussion started by: cmccabe
3 Replies

10. UNIX for Beginners Questions & Answers

Search partial string in a file and replace the string - UNIX

I have the below string which i need to compare with a file and replace this string in the file which matches closely. Can anyone help me on this. string(Scenario 1)- user::r--,user::ourfrd:r-- String(Scenario 2)- user::r-- File **** # file: /local/Desktop/myfile # owner: me # group:... (6 Replies)
Discussion started by: sarathy_a35
6 Replies
PR(1)							    BSD General Commands Manual 						     PR(1)

NAME
pr -- print files SYNOPSIS
pr [+page] [-column] [-adFmrt] [[-e] [char] [gap]] [-h header] [[-i] [char] [gap]] [-l lines] [-o offset] [[-s] [char]] [-T timefmt] [[-n] [char] [width]] [-w width] [-] [file ...] DESCRIPTION
The pr utility is a printing and pagination filter for text files. When multiple input files are specified, each is read, formatted, and written to standard output. By default, the input is separated into 66-line pages, each with o A 5-line header with the page number, date, time, and the pathname of the file. o A 5-line trailer consisting of blank lines. If standard output is associated with a terminal, diagnostic messages are suppressed until the pr utility has completed processing. When multiple column output is specified, text columns are of equal width. By default text columns are separated by at least one <blank>. Input lines that do not fit into a text column are truncated. Lines are not truncated under single column output. OPTIONS
In the following option descriptions, column, lines, offset, page, and width are positive decimal integers and gap is a nonnegative decimal integer. +page Begin output at page number page of the formatted input. -column Produce output that is columns wide (default is 1) that is written vertically down each column in the order in which the text is received from the input file. The options -e and -i are assumed. This option should not be used with -m. When used with -t, the min- imum number of lines is used to display the output. -a Modify the effect of the -column option so that the columns are filled across the page in a round-robin order (e.g., when column is 2, the first input line heads column 1, the second heads column 2, the third is the second line in column 1, etc.). This option requires the use of the -column option. -d Produce output that is double spaced. An extra <newline> character is output following every <newline> found in the input. -e [char][gap] Expand each input <tab> to the next greater column position specified by the formula n*gap+1, where n is an integer > 0. If gap is zero or is omitted the default is 8. All <tab> characters in the input are expanded into the appropriate number of <space>s. If any nondigit character, char, is specified, it is used as the input tab character. -F Use a <form-feed> character for new pages, instead of the default behavior that uses a sequence of <newline> characters. -h header Use the string header to replace the file name in the header line. -i [char][gap] In output, replace multiple <space>s with <tab>s whenever two or more adjacent <space>s reach column positions gap+1, 2*gap+1, etc. If gap is zero or omitted, default <tab> settings at every eighth column position is used. If any nondigit character, char, is specified, it is used as the output <tab> character. -l lines Override the 66 line default and reset the page length to lines. If lines is not greater than the sum of both the header and trailer depths (in lines), the pr utility suppresses output of both the header and trailer, as if the -t option were in effect. -m Merge the contents of multiple files. One line from each file specified by a file operand is written side by side into text columns of equal fixed widths, in terms of the number of column positions. The number of text columns depends on the number of file operands suc- cessfully opened. The maximum number of files merged depends on page width and the per process open file limit. The options -e and -i are assumed. -n [char][width] Provide width digit line numbering. The default for width, if not specified, is 5. The number occupies the first width column posi- tions of each text column or each line of -m output. If char (any nondigit character) is given, it is appended to the line number to separate it from whatever follows. The default for char is a <tab>. Line numbers longer than width columns are truncated. -o offset Each line of output is preceded by offset <spaces>s. If the -o option is not specified, the default is zero. The space taken is in addition to the output line width. -r Write no diagnostic reports on failure to open a file. -s char Separate text columns by the single character char instead of by the appropriate number of <space>s (default for char is the <tab> character). -T Specify an strftime(3) format string to be used to format the date and time information in the page header. -t Print neither the five-line identifying header nor the five-line trailer usually supplied for each page. Quit printing after the last line of each file without spacing to the end of the page. -w width Set the width of the line to width column positions for multiple text-column output only. If the -w option is not specified and the -s option is not specified, the default width is 72. If the -w option is not specified and the -s option is specified, the default width is 512. file A pathname of a file to be printed. If no file operands are specified, or if a file operand is '-', the standard input is used. The standard input is used only if no file operands are specified, or if a file operand is '-'. The -s option does not allow the option letter to be separated from its argument, and the options -e, -i, and -n require that both arguments, if present, not be separated from the option letter. ERRORS
If pr receives an interrupt while printing to a terminal, it flushes all accumulated error messages to the screen before terminating. The pr utility exits 0 on success, and 1 if an error occurs. Error messages are written to standard error during the printing process (if output is redirected) or after all successful file printing is complete (when printing to a terminal). SEE ALSO
cat(1), more(1), strftime(3) STANDARDS
The pr utility is IEEE Std 1003.2 (``POSIX.2'') compatible. BSD
June 6, 1993 BSD
All times are GMT -4. The time now is 05:07 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy