Count and merge using common column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Count and merge using common column
# 1  
Old 10-31-2012
Count and merge using common column

I have the following records from multiple files.

Code:
415     A       G
415     A       G
415     A       T
415     A       .
415     A       .
421     G       A
421     G       A,C
421     G       A
421     G       A
421     G       A,C
421     G       .
427     A       C
427     A       C
427     A       .
427     A       .

1) i wanted to remove the columns which have "." in third column
2) count the columns and merge based on first column

I want output like this

Code:
 
      3  2,1    415     A       G/T
      5  3,2    421     G       A/A,C
      2         427     A       C

first column "3 2,1 415 A G/T"

3 - how many times 415 is repeated
2,1 - if i count uniq it is giving two times of " 415 A G" and one time "415 A T" pattern. so i wanted to merge this and get final as " 3 2,1 415 G/T"


I used this command to count unique but unable to merge and combine the columns

Code:
cat file | awk '$3 ~/A|T|G|C/{print $0}'| sort | uniq -c

By using above code i am getting the following output

Code:
      2 415     A       G
      1 415     A       T
      3 421     G       A
      2 421     G       A,C
      2 427     A       C

# 2  
Old 10-31-2012
Think this does what you want

Code:
awk '
$3!="." {
    key=$1"\t"$2
    if(!(key in I)) D[++keys]=key
        if(!((key SUBSEP $3) in M)) {
            I[key]++
            K[key,I[key]]=$3
        }
    M[key,$3]++
    S[key]++
}
END {
    for(i=1;i<=keys;i++) {
            key=D[i]
        printf "%d\t", S[key]
        if(I[key]>1)
            for(j=1;j<=I[key];j++)
               printf "%s%d",(j>1?",":""),M[key,K[key,j]]
        printf "\t%s\t", key
        for(j=1;j<=I[key];j++)
               printf "%s%s",(j>1?"/":""),K[key,j]
        printf "\n"
    }
}' infile

This User Gave Thanks to Chubler_XL For This Post:
# 3  
Old 11-01-2012
Quote:
Originally Posted by empyrean
Code:
 
      3  2,1    415     A       G/T
      5  3,2    421     G       A/A,C
      2         427     A       C

I assume if second column is count of occurrences. then third row should contain 2 in second column.

This is lit bit nasty code..Smilie

Code:
awk 'function prnt() {
    split(A[S],ARR);
    n=split(ARR[3],ARR_2,"/");
    if(n>1){for(i=1;i<=n;i++){
    var=var?var","UN[ARR[1],ARR[2],ARR_2[i]]:UN[ARR[1],ARR[2],ARR_2[i]]}}
    print CN[S],var,A[S];var=""
    }
    {if($3 != "."){CN[$1]++;if(!UN[$1,$2,$3]++){A[$1]=A[$1]?A[$1]"/"$3:$0;}
    if(S != $1 && S){prnt()}
    {P=$0;S=$1}}}END{prnt()}' OFS="\t" file

This User Gave Thanks to pamu For This Post:
# 4  
Old 11-01-2012
@pamu your solution requires that input file is sorted, the solution I posted didn't assume this, mostly because empyrean's posted code required sort:

Code:
cat file | awk '$3 ~/A|T|G|C/{print $0}'| sort | uniq -c

This User Gave Thanks to Chubler_XL For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merge multiple files with common header

Hi all, Say i have multiple files x1 x2 x3 x4, all with common header (date, time, year, age),, How can I merge them to one singe file "X" in shell scripting Thanks for your suggestions. (2 Replies)
Discussion started by: msarguru
2 Replies

2. Shell Programming and Scripting

Seperated by columns, merge in a file, sort them on common column

Hi All, I have 4 files in below format. I took them as an example. File 1: Cut from position 1-4 then 6-7 then 8-14 then rest left and make them as columns in one new file. Inserting character H to the initial of all line like HCTOT. CTOT 456787897 Low fever CTOR 556712345 High fever... (2 Replies)
Discussion started by: Mannu2525
2 Replies

3. Shell Programming and Scripting

Count common elements in a column

HI, I have a 3-column tab separated column (approx 1GB) in which I would like to count and output the frequency of all of the common elements in the 1st column. For instance: If my input was the following: dot is-big 2 dot is-round 3 dot is-gray 4 cat is-big 3 hot in-summer 5 My... (4 Replies)
Discussion started by: owwow14
4 Replies

4. Shell Programming and Scripting

Merge with common column

hi i have two files and i wanted to join them using common column. try to do this using "join" command but that did not help. File 1: 123 9a.vcf hy92.vcf hy90.vcf Index Ref Alt Ref Alt Ref Alt 315 14 0 7 4 ... (6 Replies)
Discussion started by: empyrean
6 Replies

5. Shell Programming and Scripting

file merge based on common columns

I have two files 1.txt 34, ABC, 7, 8, 0.9 35, CDE, 6.5, -2, 0.01 2.txt 34, ABC, 9, 6, -1.9 35, CDE, 8.5, -2.3, 5.01 So in both files common columns are 1 and 2 so final o/p should look like 34, ABC, 7, 8, 0.9, 9, 6, -1.9 35, CDE, 6.5, -2, 0.01, 8.5, -2.3, 5.01 I tried using... (3 Replies)
Discussion started by: manas_ranjan
3 Replies

6. UNIX for Dummies Questions & Answers

Merge rows with common column

Dear all I have big file with two columns A_AA960715 GO:0006952 A_AA960715 GO:0008152 A_AA960715 GO:0016491 A_AA960715 GO:0007165 A_AA960715 GO:0005618 A_AA960716 GO:0006952 A_AA960716 GO:0005618 A_AA960716... (15 Replies)
Discussion started by: AAWT
15 Replies

7. UNIX for Dummies Questions & Answers

Writing a loop to merge multiple files by common column

I have 100 data files labelled 250.1.txt through 250.100.txt. The second column of the data files partially match (there is about %90 overlap). Each data file has 4 columns. I want the merge all these text files by the matching values in the second column. In the output, the first column should... (1 Reply)
Discussion started by: evelibertine
1 Replies

8. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

9. Shell Programming and Scripting

merge rows based on a common column

Hi guys, Please guide me if you have a solution to this problem. I have tried paste -s but it's not giving the desired output. I have a file with the following content- A123 box1 B345 bat2 C431 my_id A123 service C431 box1 A123 my_id I need two different outputs- OUTPUT1 A123... (6 Replies)
Discussion started by: smriti_shridhar
6 Replies

10. Shell Programming and Scripting

merge based on common, awk help

All, $ cat x.txt z 11 az x 12 ax y 13 ay $ cat y.txt ay TT ax NN Output required: y 13 ay TT x 12 ax NN (3 Replies)
Discussion started by: jkl_jkl
3 Replies
Login or Register to Ask a Question