To get an output by combining fields from two different files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting To get an output by combining fields from two different files
# 1  
Old 09-26-2008
Power To get an output by combining fields from two different files

Hi guys,
I couldn't find solution to this problem. If anyone knows please help me out.
your guidance is highly appretiated.

I have two files -

FILE1 has the following 7 columns ( - has been added to make columns visible enough else columns are separated by single space)

155.34 - leg - 1 - 344 - TC200232 - 292 - 930
152.88 - leg - 1 - 344 -TC215306 - 2 - 743
123.94 - leg - 1 - 344 -TC210135 - 423 - 1148

FILE2
>TC200232.pep
AYNGFNNSNIIRDGVAIINSSGALKLTNRSYNVIGHAFHPNPVPIFNSSTKNVTSFSTYF
VFAIVPLEKTSGGFGFA
>TC210135.pep
GFGDFGKDSNFESQIALYGDAKVVNGGIQMSGSMGFSAGRILNKKPFKLIDGNPRKMVSF
SLHFVFSLSRENGDGFAFVMVPIGYPFDVFDGGSFGLLGNRKMKFLAVEFDTFMDEKYGD
VNDNHVGVDLSS
>TC215306.pep
PRLKQDLTLVGSVIVSDEKKSVQIPDPEREGDDLKHLVGRAIYSSPIR

I want an output like this - FILE3 - which is same as FILE2 but the line starting with '>' should also contain (region 292 to 930 of SEQ) where 292 and 930 are the corresponding columns 6 and 7 of FILE1 for the common id i.e. TC200232 (present in both the files)

>TC200232.pep (region 292 to 930 of SEQ)
AYNGFNNSNIIRDGVAIINSSGALKLTNRSYNVIGHAFHPNPVPIFNSSTKNVTSFSTYF
VFAIVPLEKTSGGFGFA
>TC210135.pep (region 423 to 1148 of SEQ)
GFGDFGKDSNFESQIALYGDAKVVNGGIQMSGSMGFSAGRILNKKPFKLIDGNPRKMVSF
SLHFVFSLSRENGDGFAFVMVPIGYPFDVFDGGSFGLLGNRKMKFLAVEFDTFMDEKYGD
VNDNHVGVDLSS
>TC215306.pep (region 2 to 743 of SEQ)
PRLKQDLTLVGSVIVSDEKKSVQIPDPEREGDDLKHLVGRAIYSSPIR

Last edited by smriti_shridhar; 09-26-2008 at 08:03 AM.. Reason: formatting
# 2  
Old 09-26-2008
Use nawk or /usr/xpg4/bin/awk on Solaris:

Code:
awk 'NR == FNR { _[$5] = $6" to "$7; next }
($2 in _ && $0 = $0" (region "_[$2]" of SEQ)")||1
' file1 FS='[>.]' file2

# 3  
Old 09-26-2008
Thanks a lot !

I must say it was brilliant.. Smilie Thanks..

I'll be happy if u can explain the code a little bit.

Thanks in advance.
# 4  
Old 09-26-2008
I'll try.
While reading the first non-empty input file (NR == FNR, check the man pages for the meaning of these internal AWK variables) build the associative array named _ : the $5 value as key, the value of $6, the string " to " and the value of $7 as array value.
While reading the next input file if $2 is one of the keys of the _ associative array, then append to the current record the string " (region ", the value of _[$2](remeber the _ array?) and the string " of SEQ)". Otherwise just print the record: || 1 (logical OR and 1 witch in the AWK language means true and therefor triggers the default action which is print the current record).

Hope this helps.
# 5  
Old 10-03-2008
Thanks

Thanks for making a tough effort. It has really helped me. Smilie
# 6  
Old 10-22-2008
Bug problem with a different output

Hi,

There is a problem as im getting same header (line begining with >) for two records in file2.

But this time the order of IDs ( eg - TC200232 ) will remain same in both the files. So I just want to add column6 and 7 of file1 into file2 one after the other without matching the IDs as shown in the output file.


FILE1

155.34 - leg - 1 - 344 - TC200232 - 292 - 930
152.88 - leg - 1 - 344 -TC200232 - 2 - 743
123.94 - leg - 1 - 344 -TC215306 - 423 - 1148

FILE2

>TC200232.pep
AYNGFNNSNIIRDGVAIINSSGALKLTNRSYNVIGHAFHPNPVPIFNSSTKNVTSFSTYF
VFAIVPLEKTSGGFGFA
>TC200232.pep
GFGDFGKDSNFESQIALYGDAKVVNGGIQMSGSMGFSAGRILNKKPFKLIDGNPRKMVSF
SLHFVFSLSRENGDGFAFVMVPIGYPFDVFDGGSFGLLGNRKMKFLAVEFDTFMDEKYGD
VNDNHVGVDLSS
>TC215306.pep
PRLKQDLTLVGSVIVSDEKKSVQIPDPEREGDDLKHLVGRAIYSSPIR

OUTPUT FILE

>TC200232.pep (region 292 to 930 of SEQ)
AYNGFNNSNIIRDGVAIINSSGALKLTNRSYNVIGHAFHPNPVPIFNSSTKNVTSFSTYF
VFAIVPLEKTSGGFGFA
>TC200232.pep (region 2 to 743 of SEQ)
GFGDFGKDSNFESQIALYGDAKVVNGGIQMSGSMGFSAGRILNKKPFKLIDGNPRKMVSF
SLHFVFSLSRENGDGFAFVMVPIGYPFDVFDGGSFGLLGNRKMKFLAVEFDTFMDEKYGD
VNDNHVGVDLSS
>TC215306.pep (region 423 to 1148 of SEQ)
PRLKQDLTLVGSVIVSDEKKSVQIPDPEREGDDLKHLVGRAIYSSPIR

Thanks Smilie

Last edited by smriti_shridhar; 10-22-2008 at 02:24 AM.. Reason: correction
# 7  
Old 10-22-2008
Code:
awk 'NR == FNR { _[++c] = $6" to "$7; next }
(/^>/ && $0 = $0" (region "_[++c]" of SEQ)")||1
' file1 FS='[>.]' c=0 file2

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Merge two text files by two fields and mixed output

Hello, I'm back again looking for your precious help- This time I need to merge two text files with matching two fields, output only common records with mixed output. Let's look at the example: FILE1 56153;AAA0708;3;TEST1TEST1; 89014;BBB0708;3;TEST2TEST2; 89014;BBB0708;4;TEST3TEST3; ... (7 Replies)
Discussion started by: emare
7 Replies

2. Shell Programming and Scripting

Join two files combining multiple columns and produce mix and match output

I would like to join two files when two columns in each file matches with each other and then produce an output when taking multiple columns. Like I have file A 1234,ABCD,23,JOHN,NJ,USA 2345,ABCD,24,SAM,NY,USA 5678,GHIJ,24,TOM,NY,USA 5678,WXYZ,27,MAT,NJ,USA and file B ... (2 Replies)
Discussion started by: mady135
2 Replies

3. Shell Programming and Scripting

Combining columns from multiple files into one single output file

Hi, I have 3 files with one column value as shown File: a.txt ------------ Data_a1 Data_a2 File2: b.txt ------------ Data_b1 Data_b2 Data_b3 Data_b4 File3: c.txt ------------ Data_c1 Data_c2 Data_c3 Data_c4 Data_c5 (6 Replies)
Discussion started by: vfrg
6 Replies

4. Shell Programming and Scripting

awk help: Match data fields from 2 files & output results from both into 1 file

I need to take 2 input files and create 1 output based on matches from each file. I am looking to match field #1 in both files (Userid) and create an output file that will be a combination of fields from both file1 and file2 if there are any differences in the fields 2,3,4,5,or 6. Below is an... (5 Replies)
Discussion started by: ambroze
5 Replies

5. Shell Programming and Scripting

Formatting and combining fields of the input file

Hi, I have a file of the following format: AV 103 AV 104 AV 105 AV 308 AV 517 BN 210 BN 211 BN 212 BN 218 and the desired output is : AV 103-105 3 AV 308 1 AV 517 1 BN 210-212 3 (5 Replies)
Discussion started by: rochitsharma
5 Replies

6. Shell Programming and Scripting

Parsing fields from class list files to use output with newusers command

Hello I am trying to develop a shell script that takes a text file such as this... E-mail@ Soc.Sec.No. *--------Name-----------* Class *School.Curriculum.Major.* Campus.Phone JCC2380 XXX-XX-XXXX CAREY, JULIE C JR-II BISS CPSC BS INFO TECH 412/779-9445 JAC1936 XXX-XX-XXXX... (7 Replies)
Discussion started by: crimputt
7 Replies

7. Shell Programming and Scripting

AWK Compare files, different fields, output

Hi All, Looking for a quick AWK script to output some differences between two files. FILE1 device1 1.1.1.1 PINGS device1 2.2.2.2 PINGS FILE2 2862 SITE1 device1-prod 1.1.1.1 icmp - 0 ... (4 Replies)
Discussion started by: stacky69
4 Replies

8. Shell Programming and Scripting

AWK Matching Fields and Combining Files

Hello! I am writing a program to run through two large lists of data (~300,000 rows), find where rows in one file match another, and combine them based on matching fields. Due to the large file sizes, I'm guessing AWK will be the most efficient way to do this. Overall, the input and output I'm... (5 Replies)
Discussion started by: Michelangelo
5 Replies

9. Shell Programming and Scripting

combining fields in awk

I am using: ps -A -o command,%cpu to get process and cpu usage figures. I want to use awk to split up the columns it returns. If I use: awk '{print "Process: "$1"\nCPU Usage: "$NF"\n"}' the $NF will get me the value in the last column, but if there is more than one word in the... (2 Replies)
Discussion started by: json4639
2 Replies

10. Shell Programming and Scripting

combining fields in two text fields

Can someone tell me how to do this using sed, awk, or any other basic shell scripting? Basically I have two text files with the following contained in each file: File A: a b c d e f g h i File B: 1 2 3 I want the final outcome to look like this: a b c 1 d e f 2 g h i 3 How... (3 Replies)
Discussion started by: shocker
3 Replies
Login or Register to Ask a Question