![]() |
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| recoding data points using SED?? | doobedoo | Shell Programming and Scripting | 7 | 10-12-2009 03:34 PM |
| need help with recode command for CR/LF | 2reperry | SuSE | 1 | 06-16-2009 04:33 PM |
| to extarct data points | cdfd123 | Shell Programming and Scripting | 5 | 01-12-2008 09:39 AM |
| Gnuplot question: how to plot 3D points as colored points in map view? | karman | UNIX and Linux Applications | 0 | 09-24-2007 08:03 AM |
| Writing both 8-bit and 16-bit data to a file | Breen | High Level Programming | 1 | 03-03-2004 01:59 PM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
I don't have time to test this thoroughly so try and let me know of any bugs: - CODE Code:
nawk '
( FNR == 1 ){
f++
header = $0
next
}
( f == 2 ){ printf("%s\n", header) ; f++ }
## Now we process each record for CC GG etc and apply our rules to them
( f == 3 ) {
for( fi = 2; fi <= NF; fi++ ){
gsub(/00/, ".", $fi)
gsub(/A[CGT]|C[GT]|GT/, "0", $fi)
gsub(/AA/, "-1", $fi)
gsub(/TT/, "1", $fi)
## When min = -1 and max = 0, then both CC and GG = 1;
## When min = 0 and max = 1, then both CC and GG = 1;
## When both the min and max = 0, then CC = -1 and GG = 1;
## When min = -1 and max = 1 NO RULE DEFINED
if( $fi == "CC" || $fi == "GG" ){
if( cls[fn, 0] ) { min = 0 ; max = 0 }
if( cls[fn, -1] )
min = -1
if( cls[fn, 1] )
max = 1
if( ( min == 0 ) && ( max == 0 ) ){
if( $fi == "CC" )
$fi = -1
else
$fi = 1
}
if( ( min == -1 ) && ( max == 0 ) )
$fi = 1
if( ( min == 0 ) && ( max == 1 ) )
$fi = 1
}
}
print $0
}
( f == 1 ){ ## First pass of file
for( i = 2; i <= NF; i++ ){
cls[NF, $i]++
}
}
' infile infile
INPUT I have gone back to the original input file as the one you list has "00" and "1" changed to "." in the first column. Code:
cat infile ID 1 2 3 4 5 6 7 8 83845676 AG AC AT GT CC AA CC CC 83846900 AA AA TT GG CC AG CC TT 83847041 AA 00 AT GT 00 AG CG CT 83847004 AG AA TT TT CC AG CG CT 83847085 AG CC AT GT CG AG CG CT 83847118 00 AA TT GG 00 GG CC CT 83847162 GG AA TT GT CG AG CG CT 83847165 AA AA 00 GG CC AG GG CT OUTPUT Code:
ID 1 2 3 4 5 6 7 8 83845676 0 0 0 0 -1 -1 -1 -1 83846900 -1 -1 1 1 -1 0 -1 1 83847041 -1 . 0 0 . 0 0 0 83847004 0 -1 1 1 -1 0 0 0 83847085 0 -1 0 0 0 0 0 0 83847118 . -1 1 1 . 1 -1 0 83847162 1 -1 1 0 0 0 0 0 83847165 -1 -1 . 1 -1 0 1 0 PS To enter code between code tags highlight the code and then click on the # symbol on the toolbar just above the text box. Good luck ---------- Post updated at 03:51 PM ---------- Previous update was at 07:17 AM ---------- I have had time to look at this in a little more detail and can see it needed a fix. I still can't get the output you require but am unsure if this is because your example output is flawed or not so I need you to take a look at the output and see if it is wrong or not. I wrote the code to do the processing you want but have tried to add in danmero's code without really understanding if it does what you want or not. Here is the code with the fix: - Code:
nawk '
( FNR == 1 ){
f++
header = $0
next
}
( f == 2 ){ printf("%s\n", header) ; f++ }
## Now we process each record for CC GG etc and apply our rules to them
( f == 3 ) {
tmp = $1
gsub(/00/, ".")
gsub(/A[CGT]|C[GT]|GT/, "0")
gsub(/AA/, "-1")
gsub(/TT/, "1")
$1 = tmp
for( fi = 2; fi <= NF; fi++ ){
## When min = -1 and max = 0, then both CC and GG = 1;
## When min = 0 and max = 1, then both CC and GG = 1;
## When both the min and max = 0, then CC = -1 and GG = 1;
## When min = -1 and max = 1 NO RULE DEFINED
if( $fi == "CC" || $fi == "GG" ){
if( cls[fn, 0] ) { min = 0 ; max = 0 }
if( cls[fn, -1] )
min = -1
if( cls[fn, 1] )
max = 1
if( ( min == 0 ) && ( max == 0 ) ){
if( $fi == "CC" )
$fi = -1
else
$fi = 1
}
if( ( min == -1 ) && ( max == 0 ) )
$fi = 1
if( ( min == 0 ) && ( max == 1 ) )
$fi = 1
}
}
print $0
}
( f == 1 ){ ## First pass of file
for( i = 2; i <= NF; i++ ){
cls[NF, $i]++
}
}
' infile infile
Here is the input file: - Code:
ID 1 2 3 4 5 6 7 8 83845676 AG AC AT GT CC AA CC CC 83846900 AA AA TT GG CC AG CC TT 83847041 AA 00 AT GT 00 AG CG CT 83847004 AG AA TT TT CC AG CG CT 83847085 AG CC AT GT CG AG CG CT 83847118 00 AA TT GG 00 GG CC CT 83847162 GG AA TT GT CG AG CG CT 83847165 AA AA 00 GG CC AG GG CT Here is the output: - Code:
ID 1 2 3 4 5 6 7 8 83845676 0 0 0 0 -1 -1 -1 -1 83846900 -1 -1 1 1 -1 0 -1 1 83847041 -1 . 0 0 . 0 0 0 83847004 0 -1 1 1 -1 0 0 0 83847085 0 -1 0 0 0 0 0 0 83847118 . -1 1 1 . 1 -1 0 83847162 1 -1 1 0 0 0 0 0 83847165 -1 -1 . 1 -1 0 1 0 The code I filched off danmero was based on your earlier spec: - Code:
Hello again, Again, I apologize for the confsion. I made a mistake in the first post, the letters should be recoded to -1, 0, 1. This is the tricky part. I need to recode the letters on a per column, alphabetical order basis. There are several different combinations that can occur within a column: AA, AC, CC = -1, 0, 1 AA, AG, GG = -1, 0, 1 AA, AT, TT = -1, 0, 1 CC, CG, GG = -1, 0, 1 CC, CT, TT = -1, 0, 1 GG, GT, TT = -1, 0, 1 Therefore anything with a mixed data point (AC, AG, AT, CG, CT, GT) will ALWAYS = 0, AA will ALWAYS = -1, and TT will ALWAYS = 1. The problem come when recoding CC and GG. As you can see, in some rows CC will come first in the alphabet and will be recoded as -1 (When the combo is CC, CG, GG) . However, in some columns CC does not come first in the alphabet and will be coded as 1 (when the combo is AA, AC, CC). The same problem occurs with GG. IS there any solution to this issue? I hope I explained it better this time!! I don't understand this, you start by talking of columns and end talking of rows so I am just assuming danmero understood you and posted code that did what you want. Let me know if this output is correct or not. Cheers Last edited by steadyonabix; 10-29-2009 at 06:21 PM.. |
|
||||
|
You have to
|
|
||||
|
Here is the original file: Code:
ID 1 2 3 4 5 6 7 8 83845676 0 0 0 0 CC -1 CC CC 83846900 -1 -1 1 GG CC 0 CC 1 83847041 -1 . 0 0 . 0 0 0 83847004 0 -1 1 1 CC 0 0 0 83847085 0 CC 0 0 0 0 0 0 83847118 . -1 1 GG . GG CC 0 83847162 GG -1 1 0 0 0 0 0 83847165 -1 -1 . GG CC 0 GG 0 Here is the output I need: Code:
ID 1 2 3 4 5 6 7 8 83845676 0 0 0 0 -1 -1 -1 -1 83846900 -1 -1 1 -1 -1 0 -1 1 83847041 -1 . 0 0 . 0 0 0 83847004 0 -1 1 1 -1 0 0 0 83847085 0 1 0 0 0 0 0 0 83847118 . -1 1 -1 . 1 -1 0 83847162 1 -1 1 0 0 0 0 0 83847165 -1 -1 . -1 -1 0 1 0 Thanks! |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|