Sponsored Content
Top Forums Shell Programming and Scripting Clustering data by matching columns Post 302609903 by geetu on Tuesday 20th of March 2012 02:39:56 PM
Old 03-20-2012
Clustering data by matching columns

I am stuck with by DNA clustering analysis. I thought this forum will be a great help with data manipulations. Please help me.

I have a table with 91 columns. First I want to trim the table to only having rows where the column values are single characters which are A,T,G,C or 0. So any row having column values such as AA,AAG, AATG , Y, K etc has to be filtered out.
I figured out the regular expression will be something like [0ATGC]

Next I want to compare all the columns pairwise and group the columns which have the exact same values.The intermediate table output is not required.

Example input for 6 columns

Code:
Col1 Col2 Col3 Col4 Col5 Col6
A G G G T A
A Y R R TT A
A G T T T A
A G G T 0 A
A 0 R T TT AGGGGTT

Trimmed table (no output reqd)

Code:
 Col1 Col2 Col3 Col4 Col5 Col6
A G G G T A
A G T T T A
A G G T 0 A

Clustering output (desired output)

Code:
Col1,Col6 
Col2
Col3
Col4
Col5

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

matching columns from two files

Hey, I have two files that have exactly the same format. They are both tab-delimited and contain 12 columns. However the # of rows vary. What I want to do is match columns # 5,6 and 7 between the two files. If they do match exactly (based on numbers) then I want the whole row from file 2 to... (1 Reply)
Discussion started by: phil_heath
1 Replies

2. UNIX for Dummies Questions & Answers

Suggestion to convert data in rows to data in columns

Hello everyone! I have a huge dataset looking like this: nameX nameX 0 1 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 ............... nameY nameY 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 ..... nameB nameB 0 1 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 ..... (can be several thousands of codes) and I need... (8 Replies)
Discussion started by: kush
8 Replies

3. UNIX for Dummies Questions & Answers

Matching corresponding columns in two different files

Hi to all, I have two separated files: FILE1 "V1" "V2" "V3" Mary James Nicole Robert Francisco Sophie Nancy Antony Matt Josephine Louise Rose Mark Simon Charles FILE2 "V1" "V2" "V3"... (2 Replies)
Discussion started by: eleonoral
2 Replies

4. UNIX for Dummies Questions & Answers

matching columns

Hello experts, I have this problem, I need to match values based on two files, this is what I have: file1 1.1 1.2 1.3 5.5 1.4 1.5 1.6 file2 1 a 2 B 3 C 4 D 5 z (7 Replies)
Discussion started by: Gery
7 Replies

5. Shell Programming and Scripting

Common records after matching on different columns

Hi, I have the following files. cat 1.txt cat 2.txt output.txt The logic is as follows.... (10 Replies)
Discussion started by: jacobs.smith
10 Replies

6. Shell Programming and Scripting

Help with awk Matching columns from two files

Hello, I have two files as following: #bin chrom chromStart chromEnd name score strand observed 585 chr2 29442 29443 rs4637157 0 + C/T 585 chr2 33011 33012 rs13423995 0 + A/G 585 chr2 34502 34503 rs13386087 0 + ... (2 Replies)
Discussion started by: Homa
2 Replies

7. Shell Programming and Scripting

Join two files with matching columns

Hi, I need to join two files together with one common value in a column. I think I can use awk or join or a combination but I can't quite get it. Basically my data looks like this, with the TICKER columns matching up in each file File1 TICKER,column 1, column, 2, column, 3, column 4 ... (6 Replies)
Discussion started by: unkleruckus
6 Replies

8. Shell Programming and Scripting

Matching first 2 columns..

Hello All, I want to make a file which will have primarily lines of file2 but when first 2 fields matches with the file1 it should have those lines of file1.. example is as below.. file1 a;b;1 c;d f;e t;r;5 file2 b;g a;b c;d v;b f;e t;r (2 Replies)
Discussion started by: ailnilanjan
2 Replies

9. Shell Programming and Scripting

awk Matching Columns - Am I missing something?

I am using awk to match columns and output based on those matches. For some reason it is not printing matching columns, am I missing something? Operating system - windows with cygwin. Command that I am using: sed 's/]*,]*/,/g' $tempdir/file1 > $tempdir/file1.$$ && awk -F, 'FNR==NR{f2=$2... (7 Replies)
Discussion started by: dis0wned
7 Replies

10. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Hi, i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file I have tried with join and awk and i keep getting blank outputs or same file is there an easier way than what i am... (4 Replies)
Discussion started by: axis88
4 Replies
TABFUNC(1)						      General Commands Manual							TABFUNC(1)

NAME
tabfunc - convert table to functions for rcalc, etc. SYNOPSIS
tabfunc [ -i ] func1 [func2 ..] DESCRIPTION
Tabfunc reads a table of numbers from the standard input and converts it to an expression suitable for icalc(1), rcalc(1) and their cousins. The input must consist of a M x N matrix of real numbers, with exactly one row per line. The number of columns must always be the same in each line, separated by whitespace and/or commas, with no missing values. The first column is always the independent variable, whose value indexes all of the other elements. This value does not need to be evenly spaced, but it must be either monotonically increas- ing or monotonically decreasing. (I.e. it cannot go up and then down, or down and then up.) Maximum input line width is 4096 characters and the maximum number of data rows is 1024. Input lines not beginning with a numerical value will be silently ignored. The command-line arguments given to tabfunc are the names to be assigned to each column. Tabfunc then produces a single function for each column given. If there are some columns which should be skipped, the dummy name "0" may be given instead of a valid identifier. (It is not necessary to specify a dummy name for extra columns at the end of the matrix.) The -i option causes tabfunc to produce a description that will interpolate values in between those given for the independent variable on the input. EXAMPLE
To convert a small data table and feed it to rcalc for some calculation: rcalc -e `tabfunc f1 f2 < table.dat` -f com.cal AUTHOR
Greg Ward SEE ALSO
cnt(1), icalc(1), neaten(1), rcalc(1), rlam(1), total(1) RADIANCE
10/8/97 TABFUNC(1)
All times are GMT -4. The time now is 05:31 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy