Read column and find differences...


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Read column and find differences...
# 1  
Old 10-29-2012
Read column and find differences...

I have this file

Code:
427	A	C	A/C	12
436	G	C	G/C	12
445	C	T	C/T	12
447	A	G	A/G	9
451	T	C	T/C	5
456	A	G	A/G	12
493	G	A	G/A	12


I wanted to read the first column and find all other ids which are differences less than 10.
Code:
427	A	C	A/C	12	436
436	G	C	G/C	12	427,445
445	C	T	C/T	12	436,447,451
447	A	G	A/G	9	445,451,456
451	T	C	T/C	5	445,447,456
456	A	G	A/G	12	451,447
493	G	A	G/A	12

The last column should be like the above. All id's which are + or - 10 bases apart from that specific id. For example for 436, the boundaries are {426 - 446} other id's which are in that range are 427 and 445 so i displayed them in 6th column..
# 2  
Old 10-29-2012
Assuming no duplicate field 1 values, and that all will fit in memory this should work:

Code:
awk '
    { a[$1+0] = $0; }
    END {
        for( x in a )
        {
            printf( "%s", a[x] );
            sc = " ";
            for( i = x-10; i <= x + 10; i++ )
                if( i != x  &&  i in a )
                {
                    printf( "%s%d", sc, i );
                    sc = ", ";
                }
            printf( "\n" );
        }
    }
' infile


Last edited by agama; 10-29-2012 at 10:19 PM.. Reason: dropped infile with cut/paste
# 3  
Old 10-29-2012
How about:

Code:
awk 'FNR==NR{k[$1];next}
{v=x;
 for(i=$1-10;i<=$1+10;i++) if(i!=$1&&i in k) v=v","i;
 $(NF+1)=substr(v,2)} 1' OFS="\t" infile infile

# 4  
Old 10-29-2012
@agama : Thankyou .. the code works great but only thing is that its not printing in ascending order.

---------- Post updated at 09:22 PM ---------- Previous update was at 09:21 PM ----------

@Chubler_XL : Thank you. this works great !!! Can you explain me how this works as i wanted to understand the code..
# 5  
Old 10-29-2012
It makes two passes of the file (this is why filename is passed on commandline twice).

First pass stores all the IDs (field 1) in k[]:

Code:
FNR==NR{k[$1];next}

The Second pass blanks the string v then builds it up with all the ids within 10 of the current (excluding the current line of course):
Code:
v=x; for(i=$1-10;i<=$1+10;i++) if(i!=$1&&i in k) v=v","i;

This value v is then stripped of the first comma and added as a new field on the end of the line
Code:
$(NF+1)=substr(v,2)

1 this is a true expression and will cause awk to print the current line (ie the line that was just appended with v's contents).

OFS="\t" sets output fieldsep to TAB

Last edited by Chubler_XL; 10-29-2012 at 10:46 PM..
# 6  
Old 10-30-2012
Code:
$
$
$ cat f13
427     A       C       A/C     12
436     G       C       G/C     12
445     C       T       C/T     12
447     A       G       A/G     9
451     T       C       T/C     5
456     A       G       A/G     12
493     G       A       G/A     12
$
$
$
$ perl -lane '$x{$F[0]} = [ @F ];
              END {
                foreach $k (sort keys %x) {
                  foreach $i ($k-10..$k+10) {
                    push (@y, $i) if defined $x{$i} and $i != $k;
                  }
                  printf ("%-7s %-7s %-7s %-7s %-7s %s\n",@{$x{$k}},join(",",@y));
                  @y=()
                }
              }' f13
427     A       C       A/C     12      436
436     G       C       G/C     12      427,445
445     C       T       C/T     12      436,447,451
447     A       G       A/G     9       445,451,456
451     T       C       T/C     5       445,447,456
456     A       G       A/G     12      447,451
493     G       A       G/A     12
$
$
$

tyler_durden
# 7  
Old 10-30-2012
Code:
awk 'FNR==NR{X[$1];next}
    {for(i in X)
    {if((i-$1)*(i-$1)<=100 && i != $1){a[$1]=a[$1]?a[$1]","i:$0"\t"i}
    }print a[$1]?a[$1]:$0
    }' file file


Last edited by pamu; 10-30-2012 at 06:28 AM.. Reason: small correction to change the order..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to find differences between two file

I am trying to find the differences between the two sorted, tab separated, attached files. Thank you :). In update2 there are 52,058 lines and in current2 there are 52,197 so 139 differences should result. However, awk 'FNR==NR{a;next}!($0 in a)' update2 current2 > out2comm -1 -3... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

Read first column and count lines in second column using awk

Hello all, I would like to ask your help here: I've a huge file that has 2 columns. A part of it is: sorted.txt: kss23 rml.67lkj kss23 zhh.6gf kss23 nhd.09.fdd kss23 hp.767.88.89 fl67 nmdsfs.56.df.67 fl67 kk.fgf.98.56.n fl67 bgdgdfg.hjj.879.d fl66 kl..hfh.76.ghg fl66... (5 Replies)
Discussion started by: Padavan
5 Replies

3. Shell Programming and Scripting

How to do find differences between 2 XML Files?

Hello All, Requirement is to compare 2 XML files and see if there are any differences but from some of the providers We are receiving UTF-16 formatted XML file with no end of line as shown below. Excerpt of data file: ÿþ<^@?^@x^@m^@l^@ ^@v^@e^@r^@s^@i^@o^@n^@=^@"^@1^@.^@0^@"^@... (11 Replies)
Discussion started by: Ariean
11 Replies

4. UNIX for Dummies Questions & Answers

Extracting combined differences based on a single column

Dear All, I have two sets of files. File 1 can be any number between 1 and 20 followed by a frequency of that number in a give documents... the lines in the file will be dependent to the analysed document. e.g. file1 1,5 4,1 then I have file two which is basicall same numbers but with... (2 Replies)
Discussion started by: A-V
2 Replies

5. Shell Programming and Scripting

Differences between 2 Flat Files and process the differences

Hi Hope you are having a great weeknd !! I had a question and need your expertise for this : I have 2 files File1 & File2(of same structure) which I need to compare on some columns. I need to find the values which are there in File2 but not in File 1 and put the Differences in another file... (5 Replies)
Discussion started by: newbie_8398
5 Replies

6. HP-UX

Compare 2 systems to find any differences

Hi there, I have 2 machines running HP-UX. One off these controllers is able to send mail and the other cannot. I have looked at all the settings that I know and coannot find any differences. Is there a way to audit the 2 machinces by pulling all the settings then compare any differences? ... (2 Replies)
Discussion started by: lodey
2 Replies

7. Shell Programming and Scripting

Running Differences Column

Hello everyone, I had such a helpful and quick response last time and it worked so perfectly, perhaps someone can help me with this problem I have (once again this is for research and not a homework problem). For instance, I have a file (varying numbers of rows, etc) with three columns of data... (2 Replies)
Discussion started by: Eblue562
2 Replies

8. Shell Programming and Scripting

Read CSV column value based on column name

Hi All, I am newbie to Unix I ve got assignment to work in unix can you please help me in this regard There is a sample CSV file "Username", "Password" "John1", "Scot1" "John2", "Scot2" "John3", "Scot3" "John4", "Scot4" If i give the column name as Password and row number as 4 the... (3 Replies)
Discussion started by: JohnGG
3 Replies

9. UNIX for Dummies Questions & Answers

Compare 2 files for a single column and output differences

Hi, I have a column in 2 different files which i want to compare, and output the results to a different file. The columns are in different positions in those 2 files. File 1 the column is in position 10-15 File 2 the column is in position 15-20 Please advise Thanks (1 Reply)
Discussion started by: samit_9999
1 Replies

10. Shell Programming and Scripting

how to read the column and print the values under that column

hi all:b:, how to read the column and print the values under that column ...?? file1 have something like this cat file1 ======= column1, column2,date,column3,column4..... 1, 23 , 12/02/2008,...... 2, 45, 14/05/2008,..... 3, 56, 16/03/2008,..... cat file2 =======... (6 Replies)
Discussion started by: gemini106
6 Replies
Login or Register to Ask a Question