Could you please repost an example of input file as well as an example of the corresponding output file you expect ?
Should we assume that link between 2 chromosome have no "order" (A-D could be considered like D-A) ?
(or should it be considered like a vector so that the way A-D vs D-A does matter ?)
OK sir ctsgnb ! The input is (just the beginning because the original file contain more than 100,000 lines ! ):
and the output file must be like that :
This is one group even if the IDs in bold charachter don't share more than 80% of identity
a very simple case is when you have A--B--C association but the A and C don't share enough identity to be considered together but is one continue group . I don't now if i'm clear ctsgnb
Thanks again for your help
Last edited by vgersh99; 01-03-2012 at 01:41 PM..
Reason: fixed code tags
I'm sorry I think that sometimes I'm not very clear !
What I want is to group together chromosome sequences that are very closed basing on the identity sequence. The number of lines will depend of the number of group that the code will defined. Did you understand me or not ?
In the last example the output must be one line
1) Under which condition should the algorithm switch to build another group ? (as soon as we meet a X-Y link that is below the threshold ? other ?)
2) Do the order matter inside a line ?
(In other words : is it correct to assume that X-Y can be considered the same way as Y-X) ?
3) Do the order matter between lines ? (in think it does in order to preserve the chaining of pairs... is that correct ?)
---------- Post updated at 10:09 AM ---------- Previous update was at 09:54 AM ----------
Let's start a "kind of" pseudo-code:
Let's say we are going to build some Groups :
G[1]
G[2]
...
Let's start with G[1]
while scanning your input file line by line :
if G[1] is empty, then put G[1]=$1" "$2
if G[1] is not empty, let's check the scanned line :
if $1 is in G[1] and $2 is not : then add $2 into that group
if $2 is in G[1] and $1 is not : then add $1 into that group
if both are in it : ignore it an process next line (should we consider it as a breaking sequence so that we start a new group ?)
if none are in it : build next group : G[++c]=$1 FS $2
Is that algo correct ?
if so, the following :
A D 90
E D 90
C F 90
D C 90
would generate 2 Groups sequence :
A D E
C F D
And not
A D E C F
So before coding, you must think of what logic and what condition should apply for breaking the sequence and/or switch to a new group.
Thanks in advance for clarifing your requirements at first.
I do have a large matrix of the following format and it is tab delimited
ch-ab1-20 ch-bb2-23 ch-ab1-34 ch-ab1-24 er-cc1-45 bv-cc1-78
ch-ab1-20 0 2 3 4 5 6
ch-bb2-23 3 0 5 ... (6 Replies)
Hi All
I do have a matrix in the following format
a_2 a_3 s_4 t_6
b 0 0.9 0.004 0
c 0 0 1 0
d 0 0.98 0 0
e 0.0023 0.96 0 0.0034
I have thousands of rows
I would like to parse the maximum value in each of the row and out put that highest value along the column header of... (2 Replies)
Hi,
I do have couple of files in a folder. The names of each of the files have a pattern.
ahet_005678.txt
ahet_005898.txt
ahet_007678.txt
ahet_004778.txt
...
...
ahet_002378.txt
Each of the above files have the same pattern of data with 4 columns and have an header for the last 3... (4 Replies)
Hi everyone
I am very new at awk but think that that might be the best strategy for this. I have a matrix very similar to a correlation matrix and in practical terms I need to convert it into a list containing the values from the matrix (one value per line) with the first field of the line (row... (5 Replies)
Hi all,
Is there a way to convert full data matrix to linearised left data matrix?
e.g full data matrix
Bh1 Bh2 Bh3 Bh4 Bh5 Bh6 Bh7
Bh1 0 0.241058 0.236129 0.244397 0.237479 0.240767 0.245245
Bh2 0.241058 0 0.240594 0.241931 0.241975 ... (8 Replies)
Hi All
I would like to merge multiple files with the same row and column size into a matrix format
In a folder I have multiple files in the following format
vi 12.txt
a 1
b 5
c 7
d 0
vi 45.txt
a 3
b 6
c 9
d 2
vi 9.txt
a 4 (7 Replies)
Hey guys,
I have this file generated by me... i want to create some HTML output from it.
The problem is that i am really confused about how do I go about reading the file.
The file is in the following format:
TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy... (8 Replies)