counting using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting counting using awk
# 8  
Old 05-30-2011
Quote:
Originally Posted by Diya123
Thanks for the quick reply. Now it dint print anything in top.10 and placed everything in rest.90

Also I need them to be printed side by side.

Chr1_1_50 50 ACXA Chr1_1_50 10 ACXA Chr_1_1_50 20 ACXA

so on...

for both 90% and 10%.

Thanks,


Diya
My test from your sample file are fine. If you really need help, paste some real data as sample in file1, 2, 3, and provide the expect output format.

In fact, we only provide the idea, if you really understand what the awk doing, you should be able to adjust it with your reuqest.
# 9  
Old 05-30-2011
Hi,

Thanks for the reply and sorry for bothering with so many questions.

When you say FNR<=10 (it means its taking only top 10 rows).. I want to have top 10% rows. SO even in this case we use the same?

Diya
# 10  
Old 05-30-2011
Now I fully understand what you want to do.

Code:
awk 'NR==FNR{X=NR} NR>FNR && FNR<=int(X * 0.1) {print $1,$3} ' file1 <(sort -nrk2,2 file1) > top10.1
awk 'NR==FNR{X=NR} NR>FNR && FNR<=int(X * 0.1) {print $1,$3} ' file2 <(sort -nrk2,2 file2) > top10.2
awk 'NR==FNR{X=NR} NR>FNR && FNR<=int(X * 0.1) {print $1,$3} ' file3 <(sort -nrk2,2 file3) > top10.3
sort -u top10.1 top10.2 top10.3 > top.10.list

awk 'NR==FNR{a[$1 FS $2];next} {print > ($1 FS $3 in a?"top.10":"rest.90") }' top.10.list file1 file2 file3

for your new format (Chr1_1_50 50 ACXA Chr1_1_50 10 ACXA Chr_1_1_50 20 ACXA), try this code:
Code:
awk 'NR==FNR{a[$1 FS $2];next} 
         { if ($1 FS $3 in a) {top10[$1 FS $3]=top10[$1 FS $3] FS $0} 
            else {rest90[$1 FS $3]=rest90[$1 FS $3] FS $0}
         } 
         END {for (i in top10) print top10[i] > "top.10";
                 for (i in rest90) print rest90[i] > "rest.90"}' top.10.list file1 file2 file3


Last edited by rdcwayx; 05-30-2011 at 04:00 AM..
# 11  
Old 05-30-2011
Thank you..

I tested my files and it works perfectly fine.

Regards,

Diya
# 12  
Old 06-01-2011
Hi,

I was looking this data this morning and I realized some thing went wrong. (Not from your side, probably my explanation).

When I mentioned top 10 % from file 1 2 and 3 I mean that for every gene.

For instance we need to pick top 10% counts for each gene(value in column 3) from all the three files.

The code above just picks top10% irrespective of the symbol in column 3.

How can accomplish this?
# 13  
Old 06-01-2011
Please provide more detail informations with more data, and show me the top10 you need get from the file.
# 14  
Old 06-01-2011
hi,

My files are listed below

file 1

Code:
chrA_16256301_16256350	0	XXYZ
chrA_16256351_16256400	0	XXYZ
chrA_16256401_16256450	0	XXYZ
chrA_16256451_16256500	0	XXYZ
chrA_16256501_16256550	0	XXYZ
chrA_16256551_16256600	0	XXYZ
chrA_16256651_16256700	0	XXYZ
chrA_16256701_16256750	0	XXYZ
chrA_16256751_16256800	0	XXYZ
chrA_16256801_16256850	0	XXYZ
chrA_16256851_16256900	0	XXYZ
chrA_16256901_16256950	0	XXYZ
chrA_16256951_16257000	0	XXYZ
chrA_16257001_16257050	0	XXYZ
chrA_16257051_16257100	0	TTGC
chrA_16257101_16257150	0	TTGC
chrA_16257151_16257200	0	TTGC
chrA_16257201_16257250	0	TTGC
chrA_16257251_16257300	0	TTGC
chrA_16257301_16257350	0	TTGC
chrA_42399451_42399500	0	TTGC
chrA_42399501_42399550	0	TTGC
chrA_42399551_42399600	0	TTGC
chrA_42399601_42399650	0	TTGC
chrA_42399651_42399700	0	TTGC
chrA_42399701_42399750	0	TTGC
chrA_42399751_42399800	0	TTGC
chrA_42399801_42399850	0	TTGC
chrA_42399851_42399900	0	TTGC
chrA_42399901_42399950	0	TTGC
chrA_42399951_42400000	0	ABCD
chrA_42400001_42400050	0	ABCD
chrA_42400051_42400100	0	ABCD
chrA_42400101_42400150	0	ABCD
chrA_42400151_42400200	0	ABCD
chrA_42400201_42400250	0	ABCD
chrA_42400251_42400300	0	ABCD
chrA_42400301_42400350	0	ABCD
chrA_42400351_42400400	0	ABCD
chrA_42400401_42400450	0	ABCD
chrA_42400451_42400500	0	ABCD
chrA_42400501_42400550	0	ABCD
chrA_42400551_42400600	0	ABCD
chrA_42400601_42400650	0	ABCD
chrA_42400651_42400700	0	ABCD

Files 2 and 3 have equal rows as in file 1 only exception the counts in column 2 differ.

Now what I want is for each unique symbol(ABCD) get the top 10% counts from all 3 files and out put it to top10.txt and the rest 90% counts for each symbol(ABCD) to rest.90.txt

Also we need to consider that

if rows 1,5,7 are in top10% counts in file1 then those rows need to be pulled from file 2,3
if rows 1,6,8 are in top10% counts in file2 then those rows need to be pulled from
file 1,3
if rows 2,6,9 are in top10% counts in file3 then those rows need to be pulled from
files 1,2

Output of top.10.txt should have rows 1,2,5,6,7,8,9 from all 3 files and the remaining rows should be outputed to rest.90.txt

The code above sorts the rows with counts and just separates top 10% and rest 90% to 2 different outputs. Where as we need to do this taking into account for each symbol.


Thanks,

Diya
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Word-counting and substitution with awk

Hi!! I am trying to write a program which allows me to count how many times I used the same word in a text: {$0 = tolower ($0) gsub (/_]/, "", $0) for (i = 1; i <= NF; i++) freq++ } END { for (word in freq) printf "%s\t%d\n", word, freq It seems work but... (3 Replies)
Discussion started by: ettore8888
3 Replies

2. Shell Programming and Scripting

Counting lines in a file using awk

I want to count lines of a file using AWK (only) and not in the END part like this awk 'END{print FNR}' because I want to use it. Does anyone know of a way? Thanks a lot. (7 Replies)
Discussion started by: guitarist684
7 Replies

3. UNIX for Dummies Questions & Answers

Awk: Counting occurrences between two files

Hi, I have two text files (1.txt and 2.txt). 2.txt contains two columns which are extracted from 1.txt using a simple if(condition) print. I want to: - count how many times the values contained in 2.txt appear in 1.txt -if they appear just one time, I have to delete the entire row in... (5 Replies)
Discussion started by: Pintug
5 Replies

4. Shell Programming and Scripting

awk counting question

Probably a simple to this, but unsure how to do it. I would prefer an AWK solution. Below is the data set. 1 2 3 2 5 7 4 6 9 1 5 4 8 5 7 1 1 10 15 3 12 3 7 9 9 8 10 4 5 2 9 1 10 4 7 9 7 12 6 9 13 8 For the second... (11 Replies)
Discussion started by: mollydog11
11 Replies

5. Shell Programming and Scripting

Counting Fields with awk

ok, so a user can specify options as is shown below: ExA: cpu.pl!23!25!-allow or ExB: cpu.pl!23!25!-block!all options are delimited by the exclamation mark. now, in example A, there are 4 options provided by the user. in example B, there are 5 options provided by the user. ... (3 Replies)
Discussion started by: SkySmart
3 Replies

6. Shell Programming and Scripting

Counting Instances of a String with AWK

I have a list of URLs and I want to be able to count the number of instances of addresses ending in a certain TLD and output and sort it like so. 5 bdcc.com 48 zrtzr.com 49 rvo.com Input is as so ync.org sduzj.edu sduzj.edu sduzj.edu sduzj.edu sduzj.edu sduzj.edu sduzj.edu... (1 Reply)
Discussion started by: Pjstaab
1 Replies

7. Shell Programming and Scripting

awk finding counting sequence

Can awk count numbers until it reaches the end of the sequence after the slash? input: serv1a, 32, 41/47, 53, 89/100, 108/11, 113. serv1b, 1/2, 114/18, 121/35, 139/40, 143/55, 159/64, serv2, 255/56, 274/77, 763, 774/75, 777, 1434/35, 1444/50, 1715, 2025/31, 2048. serv10b, 804, 808, 929/32,... (9 Replies)
Discussion started by: sdf
9 Replies

8. Shell Programming and Scripting

counting non integer number in awk

Hi, I am having the following number in the file tmp 31013.004 20675.336 43318.190 30512.926 48992.559 277893.111 41831.330 8749.113 415980.576 28273.054 I want to add these numbers, I am using following script awk 'END{print s}{s += $1}' tmp its giving answer 947239 which is correct,... (3 Replies)
Discussion started by: chaitubek
3 Replies

9. Shell Programming and Scripting

Counting with Awk

I need "awk solution" for simple counting! File looks like: STUDENT GRADE student1 A student2 A student3 B student4 A student5 B Desired Output: GRADE No.of Students A 3 B 2 Thanks for awking! (4 Replies)
Discussion started by: saint2006
4 Replies

10. Shell Programming and Scripting

Counting records with AWK

I've been working with an awk script and I'm wondeing id it's possible to count records in a file which DO NOT contain, in this instance fields 12 and 13. With the one script I am wanting to display the count for the records WITH fields 12 and 13 and a seperate count of records WITHOUT fields... (2 Replies)
Discussion started by: Glyn_Mo
2 Replies
Login or Register to Ask a Question