Filter records based on 2nd file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Filter records based on 2nd file
# 1  
Old 11-18-2013
Filter records based on 2nd file

Hello,

I want to filter records of a file if they fall in range associated with a second file. First the chr number (2nd col of 1st file and 1st col of 2nd file) needs to be matched. Then if the 3rd col of the first file falls within any of the ranges specified by the 2nd and 3rd cols , then that record goes to the output.
All files are sorted from low to high.

File to be filtered looks like
Code:
9927    chr1    83      T       C
9927    chr1    92      A       C
9927    chr1    97      A       C
9927    chr2    262     C       G
9927    chr2    292     C       G
9927    chr2    367     C       G

Range file looks like

chr1    46    84
chr1    95    227
chr2    261  326

Filtered output

9927    chr1    83      T       C
9927    chr1    97      A       C
9927    chr2    262     C       G
9927    chr2    292     C       G

I have 758 files to be filtered, I think I can do a loop like the following
if I have the inside magic_script.

Code:
for file in * do magic_script  $file range_file > $file_filtered done


Last edited by ritakadm; 11-18-2013 at 03:09 PM..
# 2  
Old 11-18-2013
Hi,
Try it:
Code:
$ cat chr1.txt
9927    chr1    83      T       C
9927    chr1    92      A       C
9927    chr1    97      A       C
9927    chr2    262     C       G
9927    chr2    292     C       G
9927    chr2    367     C       G
$ cat chr2.txt
chr1    46    84
chr1    95    227
chr2    261  326

Code:
$ sed 's/  / /g' <(awk '{printf("xxxx %s %s\nyyyy %s %s\n",$1,$2,$1,$3)}' chr2.txt) chr1.txt | sort -k2 -n -k3 | sed -n '/xxxx/,/yyyy/{/xxxx\|yyyy/!p;}'
9927  chr1  83   T    C
9927  chr1  97   A    C
9927  chr2  262   C    G
9927  chr2  292   C    G

Regards.
This User Gave Thanks to disedorgue For This Post:
# 3  
Old 11-18-2013
Edit - Nevermind, don't pay attention to this stupid question.


Is it acceptable to use a range file like this?
Quote:
Range file looks like
chr1 46 227
chr2 261 326
This User Gave Thanks to tukuyomi For This Post:
# 4  
Old 11-18-2013
Here is an awk based approach that might work:
Code:
awk '
        NR == FNR {
                A[$1] = A[$1] ? A[$1] "," $2 "," $3 : $2 "," $3
                next
        }
        A[$2] {
                n = split ( A[$2], R, "," )
                for ( i = 1; i <= n; i += 2 )
                {
                        if ( $3 >= R[i] && $3 <= R[i+1] )
                        {
                                if ( ! ( R[$0] ) )
                                {
                                        print $0
                                        R[$0] = $0
                                }
                        }
                }
        }
' OFS='\t' rangefile file

This User Gave Thanks to Yoda For This Post:
# 5  
Old 11-18-2013
Try also this awk code as well :
Code:
awk 'NR==FNR{A[++i,1]=$1;A[i,2]=$2;A[i,3]=$3;next}
{j=0;while(j++<i)if(($2==A[j,1])&&($3>=A[j,2])&&($3<=A[j,3]))print}
' filterfile datafile

This User Gave Thanks to tukuyomi For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Filter records from a log file based on timestamp

Dear Experts, I have a log file that contains a timestamp, I would like to filter record from that file based on timestamp. For example refer below file - cat sample.txt Jan 19 20:51:48 mukul-Vostro-14-3468 systemd: pam_unix(systemd-user:session): session opened for user root by (uid=0)... (6 Replies)
Discussion started by: mukulverma2408
6 Replies

2. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Hi Experts, I have csv file with 30, 40 columns Pasting just 2 column for problem description. Need to print error if below combination is not present in file check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same. For all such rows, the field... (7 Replies)
Discussion started by: as7951
7 Replies

3. Shell Programming and Scripting

Filter duplicate records from csv file with condition on one column

I have csv file with 30, 40 columns Pasting just three column for problem description I want to filter record if column 1 matches CN or DN then, check for values in column 2 if column contain 1235, 1235 then in column 3 values must be sequence of 2345, 2345 and if column 2 contains 6789, 6789... (5 Replies)
Discussion started by: as7951
5 Replies

4. Shell Programming and Scripting

Awk/sed/cut to filter out records from a file based on criteria

I have two files and would need to filter out records based on certain criteria, these column are of variable lengths, but the lengths are uniform throughout all the records of the file. I have shown a sample of three records below. Line 1-9 is the item number "0227546_1" in the case of the first... (15 Replies)
Discussion started by: MIA651
15 Replies

5. Shell Programming and Scripting

Filter tab file based on column value

Hello I have a tab text file with many columns and have to filter rows ONLY if column 22 has the value of '0', '1', '2' or '3' (out of 0-5). If Column 22 has value '0','1', '2' or '3' (highlighted below), then remove anything less than 10 and greater 100 (based on column 5) AND remove anything... (1 Reply)
Discussion started by: nans
1 Replies

6. Shell Programming and Scripting

Shell script to filter records in a zip file that contains matching columns from another file

Not sure if this is the correct forum for this question. I have two files. file1.zip, file2 Input: file1.zip col1, col2 , col3 a , b , 0:0:0:0:0:c436:9346:d40b x, y, 0:0:0:0:0:880:39f9:c9a7 m, n , 0:0:0:0:0:80c7:9161:fe00 file2.txt col1 c4:36:93:46:d4:0b... (1 Reply)
Discussion started by: anil.v
1 Replies

7. UNIX for Dummies Questions & Answers

Filter records in a huge text file from a filter text file

Hi Folks, I have a text file with lots of rows with duplicates in the first column, i want to filter out records based on filter columns in a different filter text file. bash scripting is what i need. Data.txt Name OrderID Quantity Sam 123 300 Jay 342 498 Kev 78 2500 Sam 420 50 Vic 10... (3 Replies)
Discussion started by: tech_frk
3 Replies

8. Shell Programming and Scripting

Apply condition on fixed width file and filter records

Dear members.. I have a fixed width file. Requirement is as below:- 1. Scan each record from this fixed width file 2. Check for value under field no "6" equals to "ABC". If yes, then filter this record into the output file Please suggest a unix command to achieve this, my guess awk might... (6 Replies)
Discussion started by: sureshg_sampat
6 Replies

9. Shell Programming and Scripting

Filter records in a file using AWK

I want to filter records in one of my file using AWK command (or anyother command). I am using the below code awk -F@ '$1=="0003"&&"$2==20100402" print {$0}' $INPUT > $OUTPUT I want to pass the 0003 and 20100402 values through a variable. How can I do this? Any help is much... (1 Reply)
Discussion started by: gpaulose
1 Replies

10. Shell Programming and Scripting

filter out all the records which are having space in the 8th filed of my file

I have a file which is having fileds separtaed by delimiter. Ex: C;4498;qwa;cghy;;;;40;;222122 C;4498;sample;city;;;;34 2;;222123 C;4498;qwe;xcbv;;;;34-2;;222124 C;4498;jj;sffz;;;;41;;222120 C;4498;eert;qwq;;;;34 A;;222125 C;4498;jj;szxzzd;;;;34;;222127 out of these records I... (3 Replies)
Discussion started by: indusri
3 Replies
Login or Register to Ask a Question