Connecting 3 files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Connecting 3 files
# 1  
Old 11-26-2013
Connecting 3 files

Hi,

I`m trying to do functional categorization of a species and I need to join 3 files for that. I want to look up the code for each record in file 3 in file 1 ,
code indicated within brackets[] for example OR is the code forAt1g31340, J is the code for At1g53930.

Then I would like to find the description of the code from file 2.
The R in any code can be ignored unless it is [R] itself.
For example [OR], [JR] should be treated as [O] and [J] but [R] is treated as [R].

Actual File 1 is downloaded from ftp://ftp.ncbi.nlm.nih.gov/pub/COG/KOG/kog . I have attached a subset for testing (inline text messing up formatting)

Actual File 2 is downloaded from ftp://ftp.ncbi.nlm.nih.gov/pub/COG/KOG/fun.txt . I have attached for testing. (inline text messing up formatting)

File 3 is a list of ids to be searched
Code:
At1g53930
At2g36170

Expected output with a pipe (|) delimiter
Code:
At1g53930|O|Posttranslational modification, protein turnover, chaperones| CELLULAR PROCESSES AND SIGNALING
At2g36170|J|Translation, ribosomal structure and biogenesis| INFORMATION STORAGE AND PROCESSING

# 2  
Old 11-26-2013
Here is something that you can start working with:
Code:
awk '
        NR == FNR {
                A[$1] = $1
                next
        }
        !/\[/ && FILENAME == "file2.txt" {
                v = $0
        }
        /\[/ && FILENAME == "file2.txt" {
                d = $0
                gsub ( /\[|\]/, X, $1 )
                if ( length ($1) > 1 )
                        sub ( /.$/, X, $1 )
                gsub ( /.*\[[A-Z]\][ ]*/, X, d )
                A[$1] = d OFS v
                next
        }
        /\[/ && FILENAME == "file1.txt" {
                i = $1
                gsub ( /\[|\]/, X, i )
                if ( length (i) > 1 )
                        sub ( /.$/, X, i )
        }
        A[$NF] && FILENAME == "file1.txt" {
                print $NF, i, A[i]
        }
' OFS=\| file3.txt file2.txt file1.txt

This User Gave Thanks to Yoda For This Post:
# 3  
Old 11-26-2013
This doesn't produce any output, even with the sample data.
# 4  
Old 11-26-2013
Quote:
Originally Posted by newbie83
This doesn't produce any output, even with the sample data.
Not sure why! This is what I get.

Sample input datas:
Code:
$ cat file1.txt
[OR] KOG0001 Ubiquitin and ubiquitin-like proteins
  ath:  At1g31340
  ath:  At1g53930
  ath:  At1g53950
[J] KOG0002 60s ribosomal protein L39
  ath:  At2g36170
  ath:  At3g02190

$ cat file2.txt
INFORMATION STORAGE AND PROCESSING
 [J] Translation, ribosomal structure and biogenesis
 [A] RNA processing and modification
 [K] Transcription
 [L] Replication, recombination and repair
 [B] Chromatin structure and dynamics

CELLULAR PROCESSES AND SIGNALING
 [D] Cell cycle control, cell division, chromosome partitioning
 [Y] Nuclear structure
 [V] Defense mechanisms
 [T] Signal transduction mechanisms
 [M] Cell wall/membrane/envelope biogenesis
 [N] Cell motility
 [Z] Cytoskeleton
 [W] Extracellular structures
 [U] Intracellular trafficking, secretion, and vesicular transport
 [O] Posttranslational modification, protein turnover, chaperones

METABOLISM
 [C] Energy production and conversion
 [G] Carbohydrate transport and metabolism
 [E] Amino acid transport and metabolism
 [F] Nucleotide transport and metabolism
 [H] Coenzyme transport and metabolism
 [I] Lipid transport and metabolism
 [P] Inorganic ion transport and metabolism
 [Q] Secondary metabolites biosynthesis, transport and catabolism

POORLY CHARACTERIZED
 [R] General function prediction only
 [S] Function unknown

$ cat file3.txt
At1g53930
At2g36170

Output:
Code:
$ ./prog.awk
At1g53930|O|Posttranslational modification, protein turnover, chaperones |CELLULAR PROCESSES AND SIGNALING
At2g36170|J|Translation, ribosomal structure and biogenesis |INFORMATION STORAGE AND PROCESSING

This User Gave Thanks to Yoda For This Post:
# 5  
Old 11-26-2013
Strange indeed !!

Code:
# cat file1.txt
[OR] KOG0001 Ubiquitin and ubiquitin-like proteins
  ath:  At1g31340
  ath:  At1g53930
  ath:  At1g53950
[J] KOG0002 60s ribosomal protein L39
  ath:  At2g36170
  ath:  At3g02190

# cat file2.txt
INFORMATION STORAGE AND PROCESSING
 [J] Translation, ribosomal structure and biogenesis
 [A] RNA processing and modification
 [K] Transcription
 [L] Replication, recombination and repair
 [B] Chromatin structure and dynamics

CELLULAR PROCESSES AND SIGNALING
 [D] Cell cycle control, cell division, chromosome partitioning
 [Y] Nuclear structure
 [V] Defense mechanisms
 [T] Signal transduction mechanisms
 [M] Cell wall/membrane/envelope biogenesis
 [N] Cell motility
 [Z] Cytoskeleton
 [W] Extracellular structures
 [U] Intracellular trafficking, secretion, and vesicular transport
 [O] Posttranslational modification, protein turnover, chaperones

METABOLISM
 [C] Energy production and conversion
 [G] Carbohydrate transport and metabolism
 [E] Amino acid transport and metabolism
 [F] Nucleotide transport and metabolism
 [H] Coenzyme transport and metabolism
 [I] Lipid transport and metabolism
 [P] Inorganic ion transport and metabolism
 [Q] Secondary metabolites biosynthesis, transport and catabolism

POORLY CHARACTERIZED
 [R] General function prediction only
 [S] Function unknown [root@scalemp fescue_genome]# more file2.txt
INFORMATION STORAGE AND PROCESSING
 [J] Translation, ribosomal structure and biogenesis
 [A] RNA processing and modification
 [K] Transcription
 [L] Replication, recombination and repair
 [B] Chromatin structure and dynamics

CELLULAR PROCESSES AND SIGNALING
 [D] Cell cycle control, cell division, chromosome partitioning
 [Y] Nuclear structure
 [V] Defense mechanisms
 [T] Signal transduction mechanisms
 [M] Cell wall/membrane/envelope biogenesis
 [N] Cell motility
 [Z] Cytoskeleton
 [W] Extracellular structures
 [U] Intracellular trafficking, secretion, and vesicular transport
 [O] Posttranslational modification, protein turnover, chaperones

METABOLISM
 [C] Energy production and conversion
 [G] Carbohydrate transport and metabolism
 [E] Amino acid transport and metabolism
 [F] Nucleotide transport and metabolism
 [H] Coenzyme transport and metabolism
 [I] Lipid transport and metabolism
 [P] Inorganic ion transport and metabolism
 [Q] Secondary metabolites biosynthesis, transport and catabolism

POORLY CHARACTERIZED
 [R] General function prediction only
 [S] Function unknown


# cat file3.txt
At1g53930
At2g36170
# ./prog.awk
#

# 6  
Old 11-26-2013
Did you make any modifications in the awk program that I posted?

Another suggestion: If you are on SunOS / Solaris try using nawk or /usr/xpg4/bin/awk
This User Gave Thanks to Yoda For This Post:
# 7  
Old 11-26-2013
No, I have copied and pasted your code, I have Linux, awk usually worked file until now.. let me try in another machine

---------- Post updated at 03:31 PM ---------- Previous update was at 03:17 PM ----------

worked perfectly on another machine, thanks a ton !!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell script for connecting multiple servers and then copying 30 days old files

Shell script for connecting multiple servers and then copying 30 days old files from those server . HI , I have 6 multiple servers pla1,pla2,pla3,pla4,pla5,pla6 1. These six servers have common shared mount point /var/share 2. Running script from /var/share to connect these servers.I... (1 Reply)
Discussion started by: rcroyal88
1 Replies

2. Shell Programming and Scripting

Error when connecting to remote server to find files with timestamp today's day

I am connecting to remote server and try to check if files with timestamp as Today's day are on the directory. Below is my code TARFILE=${NAME}.tar TARGZFILE=${NAME}.tar.gz ssh ${DESTSERVNAME} 'cd /export/home/iciprod/download/let/monthly; Today=`date +%Y%m%d`; if ;then echo "We... (1 Reply)
Discussion started by: digioleg54
1 Replies

3. Shell Programming and Scripting

connecting through sqlplus

I am trying to connect to one of the oracle sever using uni through sqlplus command: sqlplus -s BOXI_ALPH_AUDITOR,Q078_audit$@Q047 But its not getting connected. I tried using some different server using same syntax its working. What differene i found is the password is having no special... (2 Replies)
Discussion started by: gander_ss
2 Replies

4. UNIX for Dummies Questions & Answers

connecting lines of 2 different files

How would i connect the lines of 2 different files? Also how would i reissue the command to use an equal signsas the seperators between the fields? (1 Reply)
Discussion started by: trob
1 Replies

5. UNIX for Dummies Questions & Answers

Connecting Two Unix Computers To Share Files

I was wondering if I could get some help with two of my Unix computers. Bare with me as I am new to this software and, hardly know anything on these computers, except based on what I have already worked with them. Here is my issue. I have two unix computers setup together, not connected... (6 Replies)
Discussion started by: OrangeNblack
6 Replies

6. UNIX for Dummies Questions & Answers

Connecting to website

Okay, here's the situation: I have a UNIX box hosting a website. The website is basically there to hold a .swf file; when you go to the URL, the .swf file loads, and it pulls data from a database on another computer into a cache. The cache holds things for 24 hours. This all works fine, so it's... (7 Replies)
Discussion started by: BSchow
7 Replies

7. AIX

Connecting to DB

Is it possible to connect to two databases in a single query with different username and passwords? provide an example pls (1 Reply)
Discussion started by: rollthecoin
1 Replies

8. Solaris

Connecting to SAN

I am about to attempt to connect my sun 280R boxes to a EMC SAN. I have Qlogic cards that came from Sun. I am going to load traffic manager, navisphere client. what else do i need, sun foundation suite ro somehting? This is the first time ive ever connected to a SAN. any help would be... (3 Replies)
Discussion started by: BG_JrAdmin
3 Replies

9. UNIX for Dummies Questions & Answers

connecting the ip address

Hi, I have three ip address say x.x.x.x , y.y.y.y and z.z.z.z I am connecting to x.x.x.x first and from there i am telnet y.y.y.y and getting into y and from there i am telnet to z i want to know, can we write a script, which can automatically connect from x to y and from y to z.. is... (1 Reply)
Discussion started by: vasikaran
1 Replies

10. Shell Programming and Scripting

connecting ....sql

if; sqlplus /nolog <<EOF conn / as sysdba spool /tmp/start.out @/oracle/home/start.sql spool off exit EOF fi For this code i am getting error: Test.sh: syntax error at line 7 : `<<' unmatched (8 Replies)
Discussion started by: dreams5617
8 Replies
Login or Register to Ask a Question