count the unique records based on certain columns


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting count the unique records based on certain columns
# 1  
count the unique records based on certain columns

Hi everyone,

I have a file result.txt with records as following and another file mirna.txt with a list of miRNAs e.g. miR22, miR123, miR13 etc.

Gene Transcript miRNA

Gar Nm_111233 miR22
Gar Nm_123440 miR22
Gar Nm_129939 miR22
Hel Nm_233900 miR13
Hel Nm_678900 miR13
Bart Nm_178181 miR22
Gar Nm_789999 miR43

Now I want to count the number of gene for each miRNA in mirna.txt


e.g.
miR22 2
miR13 1
miR15 0
miR43 1



Previously, I used the following command but it counts every occurence of miRNA.

for gene in `cat mirna.txt`; do awk -v gene=$gene '{for(i=1; i<=NF; i++) if ($i==gene) c++} END {print c}' result.txt>>output.txt; done;


Any help is appreciated. Thanks in advance.


Mic

Last edited by miclow; 08-23-2012 at 11:00 PM..
# 2  
This should count the ones present:

Code:
awk '
    NR > 1 && NF > 1 { c[$NF]++ }
    END {
        for( x in c )
            print x, c[x];
    }
'  result.txt

# 3  
Hi agama,

Thanks for your reply but the output I got using your script is:

miR43 1
miR13 2
miR22 3

I expect to get the following output instead
miR22 2
miR13 1
miR43 1
# 4  
Quote:
Originally Posted by miclow
Hi agama,

Thanks for your reply but the output I got using your script is:

miR43 1
miR13 2
miR22 3

I expect to get the following output instead
miR22 2
miR13 1
miR43 1
I interpreted your objective very incorrectly. Corrected code below:

Code:
 awk '
    NR > 1 && NF > 1 {
        if( ! seen[$1 " " $NF]++ )
            c[$NF]++
    }
    END {
        for( x in c )
            print x, c[x];
    }' input >output


Last edited by agama; 08-24-2012 at 01:56 AM.. Reason: cut/paste truncation
# 5  
Either one of these should give you the counts your looking for:
Code:
sort -u -k1,1 -k3,3 t | cut -d' ' -f3 | awk '{a[$2]++} END{for(e in a){print e FS a[e]}}'
sort -u -k1,1 -k3,3 t | cut -d' ' -f3 | uniq -c

This User Gave Thanks to spacebar For This Post:
# 6  
A few changes to use the contents of your list file to print those that didn't appear with a zero.


Code:
awk '
    NR != FNR { list[$1] = 1; next; }
    NR > 1 && NF > 1 {
        if( ! seen[$1 " " $NF]++ )
            c[$NF]++
    }
    END {
        for( x in list )
            printf( "%s %d\n", x, c[x] );
    }
'  results.txt mirna.txt >output-file

# 7  
Quote:
Originally Posted by agama
A few changes to use the contents of your list file to print those that didn't appear with a zero.


Code:
awk '
NR != FNR { list[$1] = 1; next; }
NR > 1 && NF > 1 {
if( ! seen[$1 " " $NF]++ )
c[$NF]++
}
END {
for( x in list )
printf( "%s %d\n", x, c[x] );
}
' results.txt mirna.txt >output-file


Hi agama,

Thank you very much. The script works perfectly and I got the output desired. Smilie

Cheers,
Mic

---------- Post updated at 03:58 PM ---------- Previous update was at 03:55 PM ----------

Quote:
Originally Posted by spacebar
Either one of these should give you the counts your looking for:
Code:
sort -u -k1,1 -k3,3 t | cut -d' ' -f3 | awk '{a[$2]++} END{for(e in a){print e FS a[e]}}'

Code:
sort -u -k1,1 -k3,3 t | cut -d' ' -f3 | uniq -c


Hi spacebar,

Thanks for help!

Cheers,
Mic
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #68
Difficulty: Easy
Awk is abbreviated from the names of the developers - Aho, Weinberger, and Kernighan.
True or False?

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Insert Columns before the last Column based on the Count of Delimiters

Hi, I have a requirement where in I need to insert delimiters before the last column of the total delimiters is less than a specified number. Say if the delimiters is less than 139, I need to insert 2 columns ( with blanks) before the last field awk -F '' '{ if (NF-1 < 139)} END { "Insert 2... (5 Replies)
Discussion started by: arunkesi
5 Replies

2. Shell Programming and Scripting

Merge records based on multiple columns

Hi, I have a file with 16 columns and out of these 16 columns 14 are key columns, 15 th is order column and 16th column is having information. I need to concate the 16th column based on value of 1-14th column as key in order of 15th column. Here are the example file Input File (multiple... (3 Replies)
Discussion started by: Ravi Agrawal
3 Replies

3. Linux

To get all the columns in a CSV file based on unique values of particular column

cat sample.csv ID,Name,no 1,AAA,1 2,BBB,1 3,AAA,1 4,BBB,1 cut -d',' -f2 sample.csv | sort | uniq this gives only the 2nd column values Name AAA BBB How to I get all the columns of CSV along with this? (1 Reply)
Discussion started by: sanvel
1 Replies

4. Shell Programming and Scripting

Find and count unique date values in a file based on position

Hello, I need some sort of way to extract every date contained in a file, and count how many of those dates there are. Here are the specifics: The date format I'm looking for is mm/dd/yyyy I only need to look after line 45 in the file (that's where the data begins) The columns of... (2 Replies)
Discussion started by: ronan1219
2 Replies

5. Shell Programming and Scripting

Print unique records in 2 columns using awk

Is it possible to print the records that has only 1 value in 2nd column. Ex: input awex1 1 awex1 2 awex1 3 assww 1 ader34 1 ader34 2 output assww 1 (5 Replies)
Discussion started by: quincyjones
5 Replies

6. UNIX for Dummies Questions & Answers

How to count specific columns and merge with unique ones?

Hi. I am not sure the title gives an optimal description of what I want to do. I have several text files that contain data in many columns. All the files are organized the same way, but the data in the columns might differ. I want to count the number of times data occur in specific columns,... (0 Replies)
Discussion started by: JamesT
0 Replies

7. Shell Programming and Scripting

awk : extracting unique lines based on columns

Hi, snp.txt CHR_A SNP_A BP_A_st BP_A_End CHR_B BP_B SNP_B R2 p-SNP_A p-SNP_B 5 rs1988728 74904317 74904318 5 74960646 rs1427924 0.377333 0.000740085 0.013930081 5 ... (12 Replies)
Discussion started by: genehunter
12 Replies

8. Shell Programming and Scripting

using awk to count no of records based on conditions

Hi I am having files with date and time stamp as the folder names like 200906051400,200906051500,200906051600 .....hence everyday 24 files will be generated i need to do certain things on this 24 files daily file contains the data like 200906050016370 0 1244141195225298lessrv3 ... (13 Replies)
Discussion started by: aemunathan
13 Replies

9. Shell Programming and Scripting

Record count based on a keyword in the records

Hi, Am having files with many records, i need to count and display the number of records based on the keyword in one of the column of the records. for e.g THE FILE CONTAINS TWO RECORDS LIKE. 200903031143150 0 1236060795054357lessrv1 BSNLSERVICE1 BSNLSERVICE1 ... (4 Replies)
Discussion started by: aemunathan
4 Replies

Featured Tech Videos