Removing distances from Newick tree format


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing distances from Newick tree format
# 1  
Old 02-12-2011
Removing distances from Newick tree format

I have a large numbers of files containing data that look like this:
Code:
(ID31:0.01682,(ID-123:0.00000,(ID_24:0.00000,ID&890:0.00000):0.00000):0.00000,ID12876:0.00000);
(ID_24:-0.00052,(ID31:0.01697,(ID-123:-0.00059,ID&890:0.03528):0.00037):0.00027,ID12876:0.03484);

I need to find ":" anywhere in the expression and erase it along with the number that follows till I find ",", "(" or ")". The ";" indicates the end of the expression and the next expression needs to be processed and placed in the following row. Thus, I will end up with something like this:
Code:
(ID31,(ID-123,ID_24,ID&890)),ID12876);
(ID_24,(ID31,(ID-123,ID&890)),ID12876);

Thanks in advance!

Last edited by Scott; 02-12-2011 at 11:16 AM.. Reason: Please use CODE tags, not QUOTE tags when posting CODE
# 2  
Old 02-12-2011
I think your expected output is missing a bracket.

Code:
$ sed "s/:[0-9\.-]*//g" file1
(ID31,(ID-123,(ID_24,ID&890)),ID12876);
(ID_24,(ID31,(ID-123,ID&890)),ID12876);

This User Gave Thanks to Scott For This Post:
# 3  
Old 02-12-2011
It works perfectly!

Sorry about missing one of the brackets.
# 4  
Old 02-12-2011
No need to apologise, I just wondered why my output lines were the same length, and yours weren't Smilie
# 5  
Old 02-12-2011
Little problem!

Scottn,
I am processing the input file using your code
Code:
sed "s/:[0-9\.-]*//g" infile

However, in the output file the 'expression' is not linear. So, I tried to linearize it using the following codes one after another
Code:
sed -e :a -e '/^>/!N;s/\n[^>]/ /;ta' infile

Code:
sed 's! !!g' infile

I am using the second one because the output from my first one contain 'gaps/blanks' that are not supposed to be there. Unfortunately, my output file is slightly different from the expected file. I am uploading the 5 files (input, the one Iget using your code, the file I get after linearizing the file. My final output and theexpected output file).
I will be really thankful if you could please take a look at it.
# 6  
Old 02-13-2011
Code:
sed "s/:[0-9\.-]*//g" infile.txt |awk '{printf $0}'

This User Gave Thanks to rdcwayx For This Post:
# 7  
Old 02-13-2011
Perfect!

Thanks!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing of Newick tree format files

Dear all, I have >5000 files in newick tree format (which generally used for representing phylogenetic relationships) something like below. ... (3 Replies)
Discussion started by: ashmit99
3 Replies

2. UNIX for Dummies Questions & Answers

Removing PATTERN from txt without removing lines and general text formatting

Hi Everybody! First post! Totally noobie. I'm using the terminal to read a poorly formatted book. The text file contains, in the middle of paragraphs, hyphenation to split words that are supposed to be on multiple pages. It looks ve -- ry much like this. I was hoping to use grep -v " -- "... (5 Replies)
Discussion started by: AxeHandle
5 Replies

3. UNIX for Dummies Questions & Answers

How to find directory listing from root to all files in tree format with details of perm/own/grp?

Hi, My apologies if my query is already available on this forum but I am new and could not find. I need a script to list all directories/sub directories and files with permissions/groups/owners. The script would run from home directory and should capture every directory. How do I do this? ... (4 Replies)
Discussion started by: 8709711
4 Replies

4. Shell Programming and Scripting

Eliminating sequences based on Distances

I have to remove sequences from a file based on the distance value. I am attaching the file containing the distances (Distance.xls) The second file looks something like this: Sequences.txt >Sample1 Freq 59 ggatatgatgatgaactggt >Sample1 Freq 54 ggatatgatgttgaactggt >Sample1 Freq 44... (2 Replies)
Discussion started by: Xterra
2 Replies

5. Shell Programming and Scripting

removing html format with sed

Hello i am trying to remove the html format from the file using sed. for example remove <p> </p> i tried to do this : sed -e 's/<*>//g' test > test.t but still i have some html format . please help if you have any suggestions lets say this is the html file 1... (11 Replies)
Discussion started by: koricha
11 Replies

6. Shell Programming and Scripting

how to calculate all pairwise distances in two dimensions and transform them into a matrix

Hello to all, I am very new in the shell scripting and I need help. I have data for several individuals in several rows followed by a tag and by 5 values per row, with the name of the individual in the first column, e.g.: IND1 H1 12 13 12 15 14 IND2 H2 12 12 15 14 14 IND3 H1 12 15... (2 Replies)
Discussion started by: Bemar
2 Replies

7. Shell Programming and Scripting

AWK code for finding distances between atoms in two different files

Hi:) I have two separate data files (.xyz) type and I want to see distances between the coordinates of atoms of the two files. For example:- My first files contains 1 1 1 11.50910000 5.17730000 16.49360000 3 1 2 11.21790000 6.36062000 15.60660000 6 1 2 ... (4 Replies)
Discussion started by: ananyob
4 Replies

8. Shell Programming and Scripting

process tree

how to draw a process tree if i know my process id and how can i identify session leaders (1 Reply)
Discussion started by: annapurna konga
1 Replies

9. Shell Programming and Scripting

directory tree

Hi all, The following is a script for displaying directory tree. D=${1:-`pwd`} (cd $D; pwd) find $D -type d -print | sort | sed -e "s,^$D,,"\ -e "/^$/d"\ -e "s,*/\(*\)$,\:-----\1,"\ -e "s,*/,: ,g" | more exit 0 I am trying to understand the above script.But... (3 Replies)
Discussion started by: ravi raj kumar
3 Replies

10. Programming

directory as tree

hi i have modified a program to display directory entries recursively in a tree like form i need an output with the following guidelines: the prog displays the contents of the directory the directory contents are sorted before printing so that directories come before regular files if an entry... (2 Replies)
Discussion started by: anything2
2 Replies
Login or Register to Ask a Question