Sponsored Content
Full Discussion: Grouping matches by cols
Top Forums Shell Programming and Scripting Grouping matches by cols Post 302234605 by gbalsu on Wednesday 10th of September 2008 02:09:47 AM
Old 09-10-2008
Thank you both, Annihilannic and cfajohnson. I cannot try your code now but will do so first thing in the morning.

The results are from pairwise comparisons of genes - a table of gene1 (A or B or first column) matching gene2 (B or A or second col) by a particular cutoff % identity. I filtered an analysis of highly similar genes. Based on many years of doing gene analysis daily I have a reasonable idea that above this cutoff the gene functions are either very similar or identical.

The rule is that the first time a pair is seen, the first element of the pair becomes the name of the group. I am just using a FIFO scheme here. It really does not matter scientifically whether A or B (gene1 or gene2) gets assigned here. It matters however that once a group label has been identified that label is consistently used so that the same gene is not assigned to a different group. (Annihilannic trapped my mistakes smartly, very nice of you, special thanks. I will learn to stop working when my eyes are really blurry and my brain is fried.)

In my first example, we have A A, A B, and B B. This is 'coz A matches B, else we will only have A A and B B.
Since we see A A first in the list as A matches itself, we assign the group to be A. Now when we read further we get to either of B B or A B. But if A and B match A B or B A will come before B B. So B will be assigned to group A as A was seen before and got a label assigned and when we see B matching itself again we need to assign B to group A.
Alternately B B will be seen w/o either of A B or B A (if B does not match A, in which case we only have A A and B B) and hence will be assigned to group B.
So even if B matches itself (B B) it also matches A (when you see either of A B or B A) and A B is already assigned to A, so B's group will be A. If in a real example it appears in the order B B, A B, A A, no harm done, B will be the group label. So it will not scientifically matter even if we reverse the process and use the second col match as label but we need to then use the same grouping (and/or process of determining the grouping) consistently for other matches to both A and B as we move along.

Sorry, this "lecture" was unintended.
More questions? I will be very happy to answer.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

join cols from multi files into one file

Hi Fields in Files 1,2,3,4 are pipe"|" separated. Say I want to grep col1 from File1 col3 from File2 col4 from File3 and print to File4 in the following order: col3|col1|col4 what is the best way of doing this? Thanks (2 Replies)
Discussion started by: vbshuru
2 Replies

2. Shell Programming and Scripting

awk - print formatted without knowing no of cols

Hi, i want to print(f) the content of a file, but i don't know how many columns it has (i.e. it changes from each time my script is run). The number of columns is constant throughout the file. Any suggestions? (8 Replies)
Discussion started by: bistru
8 Replies

3. Shell Programming and Scripting

How to find number of Cols in a file ?

Hi I have a requirement wherein the file is comma separated. Each records seems to have different number of columns, how I can detect like a row index wise, how many columns are present ? Thanks in advance. (2 Replies)
Discussion started by: videsh77
2 Replies

4. Shell Programming and Scripting

sort and split file by 2 cols (1 col after the other)

Dear All, I am a newbie to shell scripting so this one is really over my head. I have a text file with five fields as below: 76576.867188 6232.454102 2.008904 55.000000 3 76576.867188 6232.454102 3.607231 55.000000 4 76576.867188 6232.454102 1.555146 65.000000 3 76576.867188 6232.454102... (19 Replies)
Discussion started by: Ghetz
19 Replies

5. Programming

Curses not updating LINES/COLS

I'm working with an extremely outdated and old system at work. We do not have ncurses, but we do have curses. I need to make a user interface for users connecting with xterm. One issue I've encountered is if the user resizes the window, I'd like to provide functionality to redraw the screen with... (4 Replies)
Discussion started by: nwboy74
4 Replies

6. Shell Programming and Scripting

awk -- print combinations for 2 cols

Dear all, could you please help me with awk please? I have such input: Input: a d b e c f The number of lines is unknown before reading the file. I need to print possible combination between the two columns like this: Output: a d b d c d a e b e c e a f (2 Replies)
Discussion started by: irrevocabile
2 Replies

7. Shell Programming and Scripting

Join txt files with diff cols and rows

I am a new user of Unix/Linux, so this question might be a bit simple! I am trying to join two (very large) files that both have different # of cols and rows in each file. I want to keep 'all' rows and 'all' cols from both files in the joint file, and the primary key variables are in the rows.... (1 Reply)
Discussion started by: BNasir
1 Replies

8. Shell Programming and Scripting

Compare 2 files and print matches and non-matches in separate files

Hi all, I have two files, chap.txt and complex.txt. chap.txt looks like this: a d l m r k complex.txt looks like this: a c d e l m n j a d l p q r c p r m ......... (7 Replies)
Discussion started by: AshwaniSharma09
7 Replies

9. Shell Programming and Scripting

Bitwise comparison of cols

Hello, I want to compute the bitwise number of matches in pairwise fashion for all columns. The problem is I have 18486955 rows and 750 columns. Please help with code, I believe this will take a lot of time, is there a way of tracking progress? Input Org1 Org2 Org3 A A T A ... (9 Replies)
Discussion started by: ritakadm
9 Replies

10. Shell Programming and Scripting

Getting cut to ignore cols in middle of records

I recently had to remove a number of columns from a sorted copy of a file, but couldn't get the cut command to take fields out, just what to keep. This is the only thing I could find as an example, but could it be simplified? tstamp=`date +%H%M%S` grep -v "T$" filename |egrep -v "^$" |sort... (3 Replies)
Discussion started by: wbport
3 Replies
Template::Plugin::Table(3)				User Contributed Perl Documentation				Template::Plugin::Table(3)

NAME
Template::Plugin::Table - Plugin to present data in a table SYNOPSIS
[% USE table(list, rows=n, cols=n, overlap=n, pad=0) %] [% FOREACH item IN table.row(n) %] [% item %] [% END %] [% FOREACH item IN table.col(n) %] [% item %] [% END %] [% FOREACH row IN table.rows %] [% FOREACH item IN row %] [% item %] [% END %] [% END %] [% FOREACH col IN table.cols %] [% col.first %] - [% col.last %] ([% col.size %] entries) [% END %] DESCRIPTION
The "Table" plugin allows you to format a list of data items into a virtual table. When you create a "Table" plugin via the "USE" directive, simply pass a list reference as the first parameter and then specify a fixed number of rows or columns. [% USE Table(list, rows=5) %] [% USE table(list, cols=5) %] The "Table" plugin name can also be specified in lower case as shown in the second example above. You can also specify an alternative variable name for the plugin as per regular Template Toolkit syntax. [% USE mydata = table(list, rows=5) %] The plugin then presents a table based view on the data set. The data isn't actually reorganised in any way but is available via the "row()", "col()", "rows()" and "cols()" as if formatted into a simple two dimensional table of "n" rows x "n" columns. So if we had a sample "alphabet" list contained the letters '"a"' to '"z"', the above "USE" directives would create plugins that represented the following views of the alphabet. [% USE table(alphabet, ... %] rows=5 cols=5 a f k p u z a g m s y b g l q v b h n t z c h m r w c i o u d i n s x d j p v e j o t y e k q w f l r x We can request a particular row or column using the "row()" and "col()" methods. [% USE table(alphabet, rows=5) %] [% FOREACH item = table.row(0) %] # [% item %] set to each of [ a f k p u z ] in turn [% END %] [% FOREACH item = table.col(2) %] # [% item %] set to each of [ m n o p q r ] in turn [% END %] Data in rows is returned from left to right, columns from top to bottom. The first row/column is 0. By default, rows or columns that contain empty values will be padded with the undefined value to fill it to the same size as all other rows or columns. For example, the last row (row 4) in the first example would contain the values "[ e j o t y undef ]". The Template Toolkit will safely accept these undefined values and print a empty string. You can also use the IF directive to test if the value is set. [% FOREACH item = table.row(4) %] [% IF item %] Item: [% item %] [% END %] [% END %] You can explicitly disable the "pad" option when creating the plugin to returned shortened rows/columns where the data is empty. [% USE table(alphabet, cols=5, pad=0) %] [% FOREACH item = table.col(4) %] # [% item %] set to each of 'y z' [% END %] The "rows()" method returns all rows/columns in the table as a reference to a list of rows (themselves list references). The "row()" methods when called without any arguments calls "rows()" to return all rows in the table. Ditto for "cols()" and "col()". [% USE table(alphabet, cols=5) %] [% FOREACH row = table.rows %] [% FOREACH item = row %] [% item %] [% END %] [% END %] The Template Toolkit provides the "first", "last" and "size" virtual methods that can be called on list references to return the first/last entry or the number of entries in a list. The following example shows how we might use this to provide an alphabetical index split into 3 even parts. [% USE table(alphabet, cols=3, pad=0) %] [% FOREACH group = table.col %] [ [% group.first %] - [% group.last %] ([% group.size %] letters) ] [% END %] This produces the following output: [ a - i (9 letters) ] [ j - r (9 letters) ] [ s - z (8 letters) ] We can also use the general purpose "join" virtual method which joins the items of the list using the connecting string specified. [% USE table(alphabet, cols=5) %] [% FOREACH row = table.rows %] [% row.join(' - ') %] [% END %] Data in the table is ordered downwards rather than across but can easily be transformed on output. For example, to format our data in 5 columns with data ordered across rather than down, we specify "rows=5" to order the data as such: a f . . b g . c h d i e j and then iterate down through each column (a-e, f-j, etc.) printing the data across. a b c d e f g h i j . . . Example code to do so would be much like the following: [% USE table(alphabet, rows=3) %] [% FOREACH cols = table.cols %] [% FOREACH item = cols %] [% item %] [% END %] [% END %] Output: a b c d e f g h i j . . . In addition to a list reference, the "Table" plugin constructor may be passed a reference to a Template::Iterator object or subclass thereof. The Template::Iterator get_all() method is first called on the iterator to return all remaining items. These are then available via the usual Table interface. [% USE DBI(dsn,user,pass) -%] # query() returns an iterator [% results = DBI.query('SELECT * FROM alphabet ORDER BY letter') %] # pass into Table plugin [% USE table(results, rows=8 overlap=1 pad=0) -%] [% FOREACH row = table.cols -%] [% row.first.letter %] - [% row.last.letter %]: [% row.join(', ') %] [% END %] AUTHOR
Andy Wardley <abw@wardley.org> <http://wardley.org/> COPYRIGHT
Copyright (C) 1996-2007 Andy Wardley. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. SEE ALSO
Template::Plugin perl v5.12.1 2009-05-20 Template::Plugin::Table(3)
All times are GMT -4. The time now is 04:22 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy