Recode A/T/G/C to 0/1 using a reference column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Recode A/T/G/C to 0/1 using a reference column
# 1  
Old 07-19-2011
Power Recode A/T/G/C to 0/1 using a reference column

Hello,

I have a large file that contains 114 total columns with over 6,000 rows and a header; the final 27 columns are coded in A/T/G/C. There is also a reference column coded A/T/C/G.

e.g. OLD_file
Code:
col1 col2 3 ref ... 27 28 29 30 ...
1 r 22 A ... G A G A ...
2 f 22 C ... T T C T ...
3 g 22 T ... T C T T ...
4 h 22 G ... G G G G ...
.
.
.

I want to create a new file, where the first 26 columns are the same, but I recode the final 27 columns to 0/1 based. Now if the column labeled 'ref' is my reference column, I want to do a logical over the 27 final columns based on the reference column:

For the ith (from column 27 to end) column and jth row (for all rows in the file), if the (i,j)th entry = the ith entry in the reference row of the OLD_file, the (i,j)th entry in the NEW_file = 1 , otherwise the (i,j)th entry in the NEW_file = 0.

So the NEW_file would be recoded:
e.g. NEW_file
Code:
col1 col2 3 ref ... 27 28 29 30 ...
1 r 22 A ... 0 1 0 1 ...
2 f 22 C ... 0 0 1 0 ...
3 g 22 T ... 1 0 1 1 ...
4 h 22 G ... 1 1 1 1 ...
.
.
.




I created a loop in R for this, but it was too slow; I was hoping gawk would be faster-- any ideas?

Thanks a lot in advance! Smilie
# 2  
Old 07-19-2011
I don't know if I understood you correctly, but try this:
Code:
awk '{for (i=27;i<=NF;i++)$i=$i==$4?1:0}1' file

It assumes that ref column is the 4th one.
This User Gave Thanks to bartus11 For This Post:
# 3  
Old 07-19-2011
Looks like it works!

Hey Thanks a lot!
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Grep values from column 2 in reference of column 1

Gents Is it possible to update the code to get the desired output files from the input list. I called variable to the first column. I need to consider the first column as key to grep the values in the second column according to the desired request. input list (attached ) output1 ... (12 Replies)
Discussion started by: jiam912
12 Replies

2. Shell Programming and Scripting

Perl de-reference code reference variable

Guys, May i know how can we de reference the code reference variable.? my $a = sub{$a=shift;$b=shift;print "SUM:",($a+$b),"\n";}; print $a->(4,5); How can we print the whole function ? Please suggest me regarding this. Thanks for your time :) Cheers, Ranga :) (0 Replies)
Discussion started by: rangarasan
0 Replies

3. Shell Programming and Scripting

Recode alphabet into numbers

I have a genotype.bim file where it contains information about SNPs and genotype. As a hypothetical example, let's say genotype.bim snp1 ... A G snp2 ... G T snp3 ... G T snp4 ... G A ... snpN ... C G where first column identifies each SNP and 5th and 6th column has genotype... (3 Replies)
Discussion started by: johnkim0806
3 Replies

4. Shell Programming and Scripting

Help with replace column one content based on reference file

Input file 1 testing 10 20 1 A testing 20 40 1 3 testing 23 232 2 1 testing 10 243 2 . . Reference file 1 final 3 used . . Output file (1 Reply)
Discussion started by: perl_beginner
1 Replies

5. Shell Programming and Scripting

Help with replace column one content based on reference file

Input file 1 testing 10 20 1 A testing 20 40 1 3 testing 23 232 2 1 testing 10 243 2 . . Reference file 1 final 3 used . . Output file (2 Replies)
Discussion started by: perl_beginner
2 Replies

6. Shell Programming and Scripting

Writing an algorithm to recode data points

I have a file that has been partially recoded so that data points that were formerly letter combinations are now -1, 0, or 1. I need to finish recoding the GG and CC data points. The file looks like this: ID 1 2 3 4 5 6 7 8 83845676 0 0 0 0 CC -1 CC CC 838469. -1 -1 1 GG CC 0 CC 1 83847041... (10 Replies)
Discussion started by: doobedoo
10 Replies

7. Shell Programming and Scripting

Changing one column of delimited file column to fixed width column

Hi, Iam new to unix. I have one input file . Input file : ID1~Name1~Place1 ID2~Name2~Place2 ID3~Name3~Place3 I need output such that only first column should change to fixed width column of 15 characters of length. Output File: ID1<<12 spaces>>Name1~Place1 ID2<<12... (5 Replies)
Discussion started by: manneni prakash
5 Replies

8. SuSE

need help with recode command for CR/LF

Not sure if this is a Linux issue or specific to SuSE Linux, but, in the infinite wisdom of the developers they decided to do away with the dos2unix and unix2dos commands which were very handy in handling the CR/LF issue between unix and dos/windows files. More to the point I've created a tr... (1 Reply)
Discussion started by: 2reperry
1 Replies
Login or Register to Ask a Question