search and count


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting search and count
# 1  
Old 05-10-2012
search and count

Hi,

I have 2 files.

file1:
Code:
 ABC  1160  1260
DEF   1360 1580
DEF   2300 2800
XYZ  1600  2200

file2:
Code:
 chr1_1000_1050
chr1_1100_1150
chr3_1151_1200
chr3_1201_1250
chr6_1301_1350
chr6_1351_1400
chr6_1550_1600
chrX_1600_1650
chrX_1851_1900

For each row in file2 I want to know if it falls between the column 2 and column3 of file 1.. if so then it should be assigned that may counts..

output
Code:
 ABC  1160  1260  2
DEF   1360 1580  2
DEF   2300 2800 0
XYZ  1600  2200  2

If I am not clear.. I can explain again in detail.

Thanks,
# 2  
Old 05-10-2012
How about:

Code:
awk 'NR==FNR{from[NR]=$2;to[NR]=$3;next}
{c=0;for(i in to)
  if(from[i]<$3&&to[i]>$2||from[i]>$2&&to[i]<$3) c++
 print $0 OFS c }' FS="_" file2 FS="[ \t]*" file1

# 3  
Old 05-11-2012
I am not sure if I am doing something wrong.. but I get a syntax error.I have colored the text with red..

Code:
NR==FNR{from[NR]=$2;to[NR]=$3;next}{c=0;for(i in to)if(from[i]<$3&&to[i]>$2||from[i]>$2&&to[i]<$3) c++ print $0 OFS c }


Last edited by Scrutinizer; 05-11-2012 at 01:43 PM.. Reason: code tags
# 4  
Old 05-11-2012
If you must put two commands in a row, put a ; between them.

You'll also want to put { } around all the commands you wish to be in the for-loop, otherwise it will just take the first command after the for-loop.
# 5  
Old 05-14-2012
Hi,

I have tried the above code with my original dataset and it does not seem to give me the right output. However the code runs perfect on the example file.. My original file is complex.. I have changed my files accordingly.

file1:
Code:
chr1    87333735        87334735
chr1    94522156        94523156
chr1    179102446       179103446
chr2    1230097 1231097
chr1    6342783 6343783
chr2    147131761       147132761
chr1    167787600       167788600
chr1    167853465       167854465
chr3    167867712       167868712
chr3    167870899       167871899

file2:
Code:
chr1	245025451	245025500
chr1	245025951	245026000
chr1	245026151	245026200
chr2	245027551	245027600
chr1	245027601	245027650
chr2	245027651	245027700
chr1	247003001	247003050
chr1	247047901	247047950
chr4	247048701	247048750
chr1	247050751	247050800
chr3	247051101	247051150
chr1	247061401	247061450
chr3	247071451	247071500

What I want is for each row in file 2 basing on column 1(chr1,chr2 etc) it has to check if it falls in the interval range of file1 column2 and column3 for the specific column1. In other words if file 2 column 1 is chr1 then it has to assign the rows to chr1 of file2 by assigning that many counts to file 1 column 4.

Let me know if I am not clear.

Thanks,
# 6  
Old 05-14-2012
For the new format of file2 change last line above to print $0 OFS c }' FS="[ \t]*" file2 file1

Note: that none of the values in your test file2 (approx 250 million) fall within the ranges in file1 (approx 87-167 million) so all counts were zero in the output.
# 7  
Old 05-14-2012
Thanks for the reply. But in the code where is it considering the chr number of column1?? when you are looking at rows which have chr1 in file 2 then in file1 also it should look at chr1..Only if it matches then the counts should be assigned.

Regards,
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk variable search and line count between variable-search pattern

Input: |Running the Rsync|Sun Oct 16 22:48:01 BST 2016 |End of the Rsync|Sun Oct 16 22:49:54 BST 2016 |Running the Rsync|Sun Oct 16 22:54:01 BST 2016 |End of the Rsync|Sun Oct 16 22:55:45 BST 2016 |Running the Rsync|Sun Oct 16 23:00:02 BST 2016 |End of the Rsync|Sun Oct 16 23:01:44 BST 2016... (4 Replies)
Discussion started by: busyboy
4 Replies

2. UNIX for Dummies Questions & Answers

Search and count a unique string

Hi Guys, I have a file as follows. Here is my story: For each field, the string in the 5th column needs to be searched in other fields of the same column and counted if the 1st column of the field is different from that of the primary field. BTW, the unique strings of 1st column need to be... (6 Replies)
Discussion started by: a_bahreini
6 Replies

3. UNIX for Dummies Questions & Answers

How to search and count strings?

Hi, Is there a command to do a sensitive/in-sensitive search for a string on a line and print how many times that string appears? For example, if I have a line of text below: dog cat rat apple banana dog lion tiger dog Is there a command to search for dog that will print out 3 as a... (7 Replies)
Discussion started by: newbie_01
7 Replies

4. Shell Programming and Scripting

Search and count patterns

Hi, I have a text file the contents are like this now i want to search patterns Z , Z etc and count the occurrence of such patterns, after Z value can be any random digits, please help me it is urgent... output like this Z .............>5 Z ............>8 (9 Replies)
Discussion started by: sreejithalokkan
9 Replies

5. Shell Programming and Scripting

Pattern search and count

Hi all, I need to search the database log find out the most frequently used tables for a certain period of time. The search pattern is : the database.table so, i need to look for ABCD.* in the entire log and then need the top ten tables. I thought of using awk, search for the pattern ... (7 Replies)
Discussion started by: ysvsr1
7 Replies

6. Shell Programming and Scripting

Search, group , print count

hi All, need help. have a file like below A, error in 123 B, log files are present A, error in 23444 B, log files are present A, move to next line C, matching messages -- expected output-- A , count =2 , error in * A , count =1 , move to next line B , count =2 , log files are present... (2 Replies)
Discussion started by: arun1401
2 Replies

7. UNIX for Dummies Questions & Answers

Search and Count Occurrences of Pattern in a File

I need to search and count the occurrences of a pattern in a file. The catch here is it's a pattern and not a word ( not necessarily delimited by spaces). For eg. if ABCD is the pattern I need to search and count, it can come in all flavors like (ABCD, ABCD), XYZ.ABCD=100, XYZ.ABCD>=500,... (6 Replies)
Discussion started by: tektips
6 Replies

8. UNIX for Advanced & Expert Users

search and count

Hi, I would like to seek help regarding searching a pattern on a particular input. Example input: "1|trunc(sysdate-1)|substring(pcol)" I would like to search for "|" and count it. any help will be much appreciated. Thanks! :) Newbie (2 Replies)
Discussion started by: janzper
2 Replies

9. Shell Programming and Scripting

pattern search and count

i want to search a word in a file and find the count of occurences even if pattern occures twice in a same line. for example file has the following content. yes no no nooo yees no yes if I search for "no" it should give count as 4 Pls help. Thanks (9 Replies)
Discussion started by: RahulJoshi
9 Replies

10. UNIX for Dummies Questions & Answers

search& count for the occurence of a word

Greetings, I need to search and count all the occurences of a word in all the files in a directory. Any suggestions greatly appreciated. Thanks (1 Reply)
Discussion started by: skoppana
1 Replies
Login or Register to Ask a Question