count the unique records based on certain columns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting count the unique records based on certain columns
# 1  
Old 08-23-2012
count the unique records based on certain columns

Hi everyone,

I have a file result.txt with records as following and another file mirna.txt with a list of miRNAs e.g. miR22, miR123, miR13 etc.

Gene Transcript miRNA

Gar Nm_111233 miR22
Gar Nm_123440 miR22
Gar Nm_129939 miR22
Hel Nm_233900 miR13
Hel Nm_678900 miR13
Bart Nm_178181 miR22
Gar Nm_789999 miR43

Now I want to count the number of gene for each miRNA in mirna.txt


e.g.
miR22 2
miR13 1
miR15 0
miR43 1



Previously, I used the following command but it counts every occurence of miRNA.

for gene in `cat mirna.txt`; do awk -v gene=$gene '{for(i=1; i<=NF; i++) if ($i==gene) c++} END {print c}' result.txt>>output.txt; done;


Any help is appreciated. Thanks in advance.


Mic

Last edited by miclow; 08-23-2012 at 10:00 PM..
# 2  
Old 08-23-2012
This should count the ones present:

Code:
awk '
    NR > 1 && NF > 1 { c[$NF]++ }
    END {
        for( x in c )
            print x, c[x];
    }
'  result.txt

# 3  
Old 08-23-2012
Hi agama,

Thanks for your reply but the output I got using your script is:

miR43 1
miR13 2
miR22 3

I expect to get the following output instead
miR22 2
miR13 1
miR43 1
# 4  
Old 08-23-2012
Quote:
Originally Posted by miclow
Hi agama,

Thanks for your reply but the output I got using your script is:

miR43 1
miR13 2
miR22 3

I expect to get the following output instead
miR22 2
miR13 1
miR43 1
I interpreted your objective very incorrectly. Corrected code below:

Code:
 awk '
    NR > 1 && NF > 1 {
        if( ! seen[$1 " " $NF]++ )
            c[$NF]++
    }
    END {
        for( x in c )
            print x, c[x];
    }' input >output


Last edited by agama; 08-24-2012 at 12:56 AM.. Reason: cut/paste truncation
# 5  
Old 08-24-2012
Either one of these should give you the counts your looking for:
Code:
sort -u -k1,1 -k3,3 t | cut -d' ' -f3 | awk '{a[$2]++} END{for(e in a){print e FS a[e]}}'
sort -u -k1,1 -k3,3 t | cut -d' ' -f3 | uniq -c

This User Gave Thanks to spacebar For This Post:
# 6  
Old 08-24-2012
A few changes to use the contents of your list file to print those that didn't appear with a zero.


Code:
awk '
    NR != FNR { list[$1] = 1; next; }
    NR > 1 && NF > 1 {
        if( ! seen[$1 " " $NF]++ )
            c[$NF]++
    }
    END {
        for( x in list )
            printf( "%s %d\n", x, c[x] );
    }
'  results.txt mirna.txt >output-file

# 7  
Old 08-24-2012
Quote:
Originally Posted by agama
A few changes to use the contents of your list file to print those that didn't appear with a zero.


Code:
awk '
NR != FNR { list[$1] = 1; next; }
NR > 1 && NF > 1 {
if( ! seen[$1 " " $NF]++ )
c[$NF]++
}
END {
for( x in list )
printf( "%s %d\n", x, c[x] );
}
' results.txt mirna.txt >output-file


Hi agama,

Thank you very much. The script works perfectly and I got the output desired. Smilie

Cheers,
Mic

---------- Post updated at 03:58 PM ---------- Previous update was at 03:55 PM ----------

Quote:
Originally Posted by spacebar
Either one of these should give you the counts your looking for:
Code:
sort -u -k1,1 -k3,3 t | cut -d' ' -f3 | awk '{a[$2]++} END{for(e in a){print e FS a[e]}}'

Code:
sort -u -k1,1 -k3,3 t | cut -d' ' -f3 | uniq -c


Hi spacebar,

Thanks for help!

Cheers,
Mic
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Insert Columns before the last Column based on the Count of Delimiters

Hi, I have a requirement where in I need to insert delimiters before the last column of the total delimiters is less than a specified number. Say if the delimiters is less than 139, I need to insert 2 columns ( with blanks) before the last field awk -F 'Ç' '{ if (NF-1 < 139)} END { "Insert 2... (5 Replies)
Discussion started by: arunkesi
5 Replies

2. Shell Programming and Scripting

Merge records based on multiple columns

Hi, I have a file with 16 columns and out of these 16 columns 14 are key columns, 15 th is order column and 16th column is having information. I need to concate the 16th column based on value of 1-14th column as key in order of 15th column. Here are the example file Input File (multiple... (3 Replies)
Discussion started by: Ravi Agrawal
3 Replies

3. Linux

To get all the columns in a CSV file based on unique values of particular column

cat sample.csv ID,Name,no 1,AAA,1 2,BBB,1 3,AAA,1 4,BBB,1 cut -d',' -f2 sample.csv | sort | uniq this gives only the 2nd column values Name AAA BBB How to I get all the columns of CSV along with this? (1 Reply)
Discussion started by: sanvel
1 Replies

4. Shell Programming and Scripting

Find and count unique date values in a file based on position

Hello, I need some sort of way to extract every date contained in a file, and count how many of those dates there are. Here are the specifics: The date format I'm looking for is mm/dd/yyyy I only need to look after line 45 in the file (that's where the data begins) The columns of... (2 Replies)
Discussion started by: ronan1219
2 Replies

5. Shell Programming and Scripting

Print unique records in 2 columns using awk

Is it possible to print the records that has only 1 value in 2nd column. Ex: input awex1 1 awex1 2 awex1 3 assww 1 ader34 1 ader34 2 output assww 1 (5 Replies)
Discussion started by: quincyjones
5 Replies

6. UNIX for Dummies Questions & Answers

How to count specific columns and merge with unique ones?

Hi. I am not sure the title gives an optimal description of what I want to do. I have several text files that contain data in many columns. All the files are organized the same way, but the data in the columns might differ. I want to count the number of times data occur in specific columns,... (0 Replies)
Discussion started by: JamesT
0 Replies

7. Shell Programming and Scripting

awk : extracting unique lines based on columns

Hi, snp.txt CHR_A SNP_A BP_A_st BP_A_End CHR_B BP_B SNP_B R2 p-SNP_A p-SNP_B 5 rs1988728 74904317 74904318 5 74960646 rs1427924 0.377333 0.000740085 0.013930081 5 ... (12 Replies)
Discussion started by: genehunter
12 Replies

8. Shell Programming and Scripting

using awk to count no of records based on conditions

Hi I am having files with date and time stamp as the folder names like 200906051400,200906051500,200906051600 .....hence everyday 24 files will be generated i need to do certain things on this 24 files daily file contains the data like 200906050016370 0 1244141195225298lessrv3 ... (13 Replies)
Discussion started by: aemunathan
13 Replies

9. Shell Programming and Scripting

Record count based on a keyword in the records

Hi, Am having files with many records, i need to count and display the number of records based on the keyword in one of the column of the records. for e.g THE FILE CONTAINS TWO RECORDS LIKE. 200903031143150 0 1236060795054357lessrv1 BSNLSERVICE1 BSNLSERVICE1 ... (4 Replies)
Discussion started by: aemunathan
4 Replies
Login or Register to Ask a Question