Matching position and output neighbors within 500 distant


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Matching position and output neighbors within 500 distant
# 1  
Old 01-15-2014
Matching position and output neighbors within 500 distant

Hi,
I have been struggling to match positions output its neigbors. Can you please help ?

I have 2 files.They both have the same format (same number of columns) but first file is a kind of subset of second file

The first file looks like this (tab delimited):

Code:
1 11567687 snpid20
1 153881 snpid1
2 56768799 snpid7
3 3156760 snpid4
3 1567687 snpid7

I want to search every line of first file from the second file.
I want to output every line (from second file) within plus or minus 500 the value of the second column of first file

Thanks
# 2  
Old 01-15-2014
With GNU grep
Code:
fgrep -f firstfile -A 500 -B 500 secondfile

If the matching lines are exactly the same, you can require a precise match with a further -x option.
A further -n option will indicate the line numbers and if it's a matched line or a neighbor line.
# 3  
Old 01-15-2014
awk based solution
Code:
 
 $ cat test.sh
 awk '
{
        if (NR == FNR)  ### reading 1st file
                        ### accumulate 2nd column values in array of key values
        {
                arr[NR] = $2;
        }
        else            ### reading 2nd file and check for value of 2nd column
                        ### to be within +/- 500 of any accumulated key values
        {
                maxval = $2 + 500;
                minval = $2 - 500;
                for ( x in arr )
                {
                        if(minval <= arr[x] && arr[x] <= maxval)
                                print $0;
                }
         }
}
' $1 $2

Here is how I ran it
Code:
 
 $ cat a
1       11567687        snpid20
1       153881  snpid1
2       56768799        snpid7
3       3156760 snpid4
3       1567687 snpid7
$ cat b
1       11567600        snpid20
3       1000000 snpid7
 $ test.sh b a
 1       11567687        snpid20

hope this is what you were looking for
This User Gave Thanks to migurus For This Post:
# 4  
Old 01-18-2014
Quote:
Originally Posted by migurus
awk based solution
Code:
 
 $ cat test.sh
 awk '
{
        if (NR == FNR)  ### reading 1st file
                        ### accumulate 2nd column values in array of key values
        {
                arr[NR] = $2;
        }
        else            ### reading 2nd file and check for value of 2nd column
                        ### to be within +/- 500 of any accumulated key values
        {
                maxval = $2 + 500;
                minval = $2 - 500;
                for ( x in arr )
                {
                        if(minval <= arr[x] && arr[x] <= maxval)
                                print $0;
                }
         }
}
' $1 $2

Here is how I ran it
Code:
 
 $ cat a
1       11567687        snpid20
1       153881  snpid1
2       56768799        snpid7
3       3156760 snpid4
3       1567687 snpid7
$ cat b
1       11567600        snpid20
3       1000000 snpid7
 $ test.sh b a
 1       11567687        snpid20

hope this is what you were looking for

Thanks
This almost works except that it outputs all +/- 500 range of the keys. for example when searching for +/- 500 of the key 11567687 in "1 11567687 snpid20" it should output all values +/- 500 from second file that have their column 1 as 1, when searching for +/- 500 of key 3156760 "3 3156760 snpid4" it should output all values +/- 500 from second file that have their column 1 as 3.
# 5  
Old 01-18-2014
Built on miguru's proposal, this (untested) may do what you want:
Code:
awk     'NR == FNR      {arr1[NR] = $1 
                         arr2[NR] = $2  
                         MAX=NR
                         next}

                        {maxval = $2 + 500 
                         minval = $2 - 500
                         for ( i=1; i<=MAX; i++ ) {
                                 if ($1 != arr1[i]) continue
                                 if (minval <= arr2[i] && arr2[i] <= maxval) print
                                }
                        }
        ' $1 $2

If you're sure there's no near duplicates in file 1, you may want to break after print.

Last edited by RudiC; 01-18-2014 at 09:57 AM.. Reason: typo
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

String pattern matching and position

I am not an expert with linux, but following various posts on this forum, I have been trying to write a script to match pattern of charters occurring together in a file. My file has approximately 200 million characters (upper and lower case), with about 50 characters per line. I have merged all... (5 Replies)
Discussion started by: biowizz
5 Replies

2. Shell Programming and Scripting

awk usage for position matching

i have a requirement like this if the line contains from position 294 to 299 is equal to "prabhu" ,then print entire line . i want to use awk awk '{if(substr(294-299) == 'prabhu') print "line" }' filename (1 Reply)
Discussion started by: ptappeta
1 Replies

3. Shell Programming and Scripting

Shell script to retrieve first degree neighbors

I have a file with two columns and each pair in the rows denote 2 connected nodes in the network file, edge_list.txt. Given a query file, input.txt, I want to retrieve the nodes that are directly connected (first degree neighbors) to the nodes present in the input.txt. Kindly help. ... (3 Replies)
Discussion started by: Sanchari
3 Replies

4. UNIX for Dummies Questions & Answers

Process on distant server

Hello, I have a question regarding how to manage a process on a distant unix server. I perform calculations on a dedicated Unix server (RedHat ELS5.5) using Matlab (installed on the server). The commands are written in a terminal session (via ssh) on my laptop (MacBook Pro6,2 - MacOS X 10.6.7).... (1 Reply)
Discussion started by: antonino_ch
1 Replies

5. UNIX for Dummies Questions & Answers

Help with finding matching position on strings

I have a DNA file like below and I am able to write a short program which finds/not an input motif, but I dont understand how I can include in the code to report which position the motif was found. Example I want to find the first or all "GAT" motifs and want the program to report which position... (12 Replies)
Discussion started by: pawannoel
12 Replies

6. Shell Programming and Scripting

Find the position of lines matching string

I have a file with the below format, GS*8***** ST*1******** A* B* E* RMR*123455(This is the unique number to locate this row) F* SE*1*** GE** GS*9***** ST*2 H* J* RMR*567889(This is the unique number to locate this row) L* SE* GE***** (16 Replies)
Discussion started by: Muthuraj K
16 Replies

7. Shell Programming and Scripting

loop with OK or NOK output at the same position

Hi This is my script $ cat ./openldap_test.sh #!/bin/bash for ldap_srv in 'testserver1' 'server2' 'server3' 'server4' 'testserver5' 'server6' 'server7' 'server8' 'server9' 'testserver10'; do ldapsearch -LLL -x -H ldap://$ldap_srv '(cn=examplebox)' memberNisNetgroup > /dev/null if ; then... (1 Reply)
Discussion started by: slashdotweenie
1 Replies

8. Shell Programming and Scripting

Fill the values between -500 to 500 -awk

input -200 2.4 0 2.6 30 2.8 output -500 0 -499 0 -488 0 .......... .......... .... -200 2.4 .... ... 0 2.6 (6 Replies)
Discussion started by: quincyjones
6 Replies

9. Shell Programming and Scripting

Cut output to same byte position

Hi folks I have a file with thousands of lines with fixed length fields: sample (assume x is a blank space) 111333xx444TTTLKOPxxxxxxxxx I need to make a copy of this file but with only some of the field positions, for example I'd like to copy the sample to the follwing: so I'd like to... (13 Replies)
Discussion started by: HealthyGuy
13 Replies

10. UNIX for Advanced & Expert Users

View file on distant machine

Hello everybody, I have a program that connects to a distant machine using a specific port. Then inetd executes a command on that distant machine (M2). What I'd like to do is write a scipt that, given the port, it gives me the command executed. (The script should be launched on the local... (5 Replies)
Discussion started by: Majid
5 Replies
Login or Register to Ask a Question