Parsing of Newick tree format files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parsing of Newick tree format files
# 1  
Old 11-21-2017
Parsing of Newick tree format files

Dear all,

I have >5000 files in newick tree format (which generally used for representing phylogenetic relationships) something like below.

Code:
(apple_OJJ_1:0.1,banana_OJJ_1:0.4,(((cat_OJJ_1:0.1,(dog_EHA_1:0.2,(elephant_OJJ_1:0.03,fish_CBF_1:0.032):0.088):0.24):0.123,(goat_EIT_1:0.00,hen_EED_1:0.00):0.31):0.05,
((ink_EAU_1:0.16,(jug_OJJ_1:0.9,(kite_OOG0_1:0.08,(lion_OJJ7_1:0.0,(monkey_OJI_1:0.00,(nest_OJ_1:0.000,owl_GAA_1:0.00):0.00):0.05):0.084):0.09):0.0484):0.0179,
(parrot_EAW_1:0.195,(queen_RT92_1:0.0291,rat_EAW_1:0.0156):0.1430):0.083):0.03243):0.0951);

I want to format the files in following format

Code:
(apple_OJJ_1:0.1,banana_OJJ_1:0.4,(((cat_OJJ_1:0.1,(dog_EHA_1:0.2,(elephant_OJJ_1:0.03,fish_CBF_1:0.032):0.088):0.24):0.123,(goat_EIT_1:0.00,hen_EED_1:0.00):0.31):0.05,
((ink_EAU_1:0.16,(jug_OJJ_1:0.9,(kite_OOG0_1:0.08,(lion_OJJ7_1:0.0,(monkey_OJI_1:0.00,(nest_OJ_1:0.000,owl_GAA_1:0.00):0.00):0.05):0.084):0.09):0.0484):0.0179,
(parrot_EAW_1:0.195,(queen_RT92_1:0.0291,rat_EAW_1:0.0156)#1:0.1430):0.083):0.03243):0.0951);

Only addition in the result file is addition of #1, shown by bold red. I want to add this #1 in all the files. I am basically interested in the parentheses ( ) in which the queen_xyz exists.
The task is to find queen_xvz id which can be paired up with any other fruit/object/animal in other files and once queen is found then need to look in which parentheses it exists and
then adding #1 after the first close parentheses ) of queen containing ( ) as shown in above dummy file.
Sorry for the long dummy file but this reflect the original files.
# 2  
Old 11-21-2017
Code:
sed 's@\(queen_[^)]*\)\([)]\)@\1\2#1@' file

This User Gave Thanks to Yoda For This Post:
# 3  
Old 11-22-2017
Quote:
Originally Posted by Yoda
Code:
sed 's@\(queen_[^)]*\)\([)]\)@\1\2#1@' file

Thanks this working. Could you please also suggest the change if I want to print #1 as

Code:
(queen_RT92_1#1:0.0291,rat_EAW_1:0.0156)

not like I asked before. Thanks for your support.
# 4  
Old 11-22-2017
Code:
sed 's@\(queen_[^:]*\)\([:]\)@\1#1\2@' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to find directory listing from root to all files in tree format with details of perm/own/grp?

Hi, My apologies if my query is already available on this forum but I am new and could not find. I need a script to list all directories/sub directories and files with permissions/groups/owners. The script would run from home directory and should capture every directory. How do I do this? ... (4 Replies)
Discussion started by: 8709711
4 Replies

2. Shell Programming and Scripting

Shell script to build directory tree and files

Hi all, I'm trying at the moment to write a shell script to build a directory tree and create files within the built directories. I've scoured through sites and text books and I just can't figure out how to go about it. I would assume that I need to use loops of some sort, but I can't seem... (8 Replies)
Discussion started by: Libertad
8 Replies

3. Shell Programming and Scripting

Parsing a mixed format (flatfile+xml) logfile

I am trying to parse a file that looks like the below: There are thousands of lines like the above and the file is expected to run into hundreds of thousands. The issue i have is the mixed format of the file. If it was just an xmlfile, i would use an xmllint to parse the file. Now, i am... (11 Replies)
Discussion started by: goddevil
11 Replies

4. Shell Programming and Scripting

How to find files in directory tree by date

I'm using a directory naming convention to organize files as exemplified here: 2012/Aug/week-20-Aug/23-Thu/tuv.txt 2012/Aug/week-27-Aug/30-Thu/abc.txt 2012/Sep/week-27-Aug/01-Sat/def.txt 2012/Sep/week-03-Sep/07-Fri/xyz.txt How do I write a command that will list the file names abc.txt and... (4 Replies)
Discussion started by: siegfried
4 Replies

5. Shell Programming and Scripting

Specific directory parsing in a directory tree

Hi friends, Hello again :) i got stuck in problem. Is there any way to get a special directory from directory tree? Here is my problm.." Suppose i have one fix directory structure "/abc/xyz/pqr/"(this will be fix).Under this directory structure i have some other directory and... (6 Replies)
Discussion started by: harpal singh
6 Replies

6. Shell Programming and Scripting

remove a whole directory tree WITH files inside?

Assume I want to remove a whole directory tree beginning with /foo/bar/ The directory or sub-directories may contain files. The top directory /foo/bar/ itself should not be deleted. rm -f- r /foo/bar does not work because it requires a directory tree without files. How does it work... (3 Replies)
Discussion started by: pstein
3 Replies

7. Shell Programming and Scripting

Removing distances from Newick tree format

I have a large numbers of files containing data that look like this: (ID31:0.01682,(ID-123:0.00000,(ID_24:0.00000,ID&890:0.00000):0.00000):0.00000,ID12876:0.00000); (ID_24:-0.00052,(ID31:0.01697,(ID-123:-0.00059,ID&890:0.03528):0.00037):0.00027,ID12876:0.03484); I need to find ":" anywhere... (6 Replies)
Discussion started by: Xterra
6 Replies

8. UNIX for Dummies Questions & Answers

Copy directory tree with files

Iam in the process of copying a directory with thousands of directories and files into a new directory. I need to preserve permissions, owner, group, date and timestamps, everything. Iam using AIX and would need help of writing the command whether it is cp-RP or cpio. Apprecaite your... (3 Replies)
Discussion started by: baanprog
3 Replies

9. Solaris

What command can display files in a tree?

Is there a command that displays a certain path of files in a tree just like the dos command 'tree'? (17 Replies)
Discussion started by: Bradj47
17 Replies

10. Shell Programming and Scripting

Recursively copy only specific files from a directory tree

Hi I am a shell-script newbie and am looking to synchronize certain files in two directory structures. Both these directory-trees are in CVS and so I dont want the CVS directory to be copied over. I want only .sh and .pl files in each subdirectory under these directory trees to be... (3 Replies)
Discussion started by: sharpsharkrocks
3 Replies
Login or Register to Ask a Question