Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Matching position and output neighbors within 500 distant Post 302884169 by fat on Saturday 18th of January 2014 03:12:38 AM
Old 01-18-2014
Quote:
Originally Posted by migurus
awk based solution
Code:
 
 $ cat test.sh
 awk '
{
        if (NR == FNR)  ### reading 1st file
                        ### accumulate 2nd column values in array of key values
        {
                arr[NR] = $2;
        }
        else            ### reading 2nd file and check for value of 2nd column
                        ### to be within +/- 500 of any accumulated key values
        {
                maxval = $2 + 500;
                minval = $2 - 500;
                for ( x in arr )
                {
                        if(minval <= arr[x] && arr[x] <= maxval)
                                print $0;
                }
         }
}
' $1 $2

Here is how I ran it
Code:
 
 $ cat a
1       11567687        snpid20
1       153881  snpid1
2       56768799        snpid7
3       3156760 snpid4
3       1567687 snpid7
$ cat b
1       11567600        snpid20
3       1000000 snpid7
 $ test.sh b a
 1       11567687        snpid20

hope this is what you were looking for

Thanks
This almost works except that it outputs all +/- 500 range of the keys. for example when searching for +/- 500 of the key 11567687 in "1 11567687 snpid20" it should output all values +/- 500 from second file that have their column 1 as 1, when searching for +/- 500 of key 3156760 "3 3156760 snpid4" it should output all values +/- 500 from second file that have their column 1 as 3.
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

View file on distant machine

Hello everybody, I have a program that connects to a distant machine using a specific port. Then inetd executes a command on that distant machine (M2). What I'd like to do is write a scipt that, given the port, it gives me the command executed. (The script should be launched on the local... (5 Replies)
Discussion started by: Majid
5 Replies

2. Shell Programming and Scripting

Cut output to same byte position

Hi folks I have a file with thousands of lines with fixed length fields: sample (assume x is a blank space) 111333xx444TTTLKOPxxxxxxxxx I need to make a copy of this file but with only some of the field positions, for example I'd like to copy the sample to the follwing: so I'd like to... (13 Replies)
Discussion started by: HealthyGuy
13 Replies

3. Shell Programming and Scripting

Fill the values between -500 to 500 -awk

input -200 2.4 0 2.6 30 2.8 output -500 0 -499 0 -488 0 .......... .......... .... -200 2.4 .... ... 0 2.6 (6 Replies)
Discussion started by: quincyjones
6 Replies

4. Shell Programming and Scripting

loop with OK or NOK output at the same position

Hi This is my script $ cat ./openldap_test.sh #!/bin/bash for ldap_srv in 'testserver1' 'server2' 'server3' 'server4' 'testserver5' 'server6' 'server7' 'server8' 'server9' 'testserver10'; do ldapsearch -LLL -x -H ldap://$ldap_srv '(cn=examplebox)' memberNisNetgroup > /dev/null if ; then... (1 Reply)
Discussion started by: slashdotweenie
1 Replies

5. Shell Programming and Scripting

Find the position of lines matching string

I have a file with the below format, GS*8***** ST*1******** A* B* E* RMR*123455(This is the unique number to locate this row) F* SE*1*** GE** GS*9***** ST*2 H* J* RMR*567889(This is the unique number to locate this row) L* SE* GE***** (16 Replies)
Discussion started by: Muthuraj K
16 Replies

6. UNIX for Dummies Questions & Answers

Help with finding matching position on strings

I have a DNA file like below and I am able to write a short program which finds/not an input motif, but I dont understand how I can include in the code to report which position the motif was found. Example I want to find the first or all "GAT" motifs and want the program to report which position... (12 Replies)
Discussion started by: pawannoel
12 Replies

7. UNIX for Dummies Questions & Answers

Process on distant server

Hello, I have a question regarding how to manage a process on a distant unix server. I perform calculations on a dedicated Unix server (RedHat ELS5.5) using Matlab (installed on the server). The commands are written in a terminal session (via ssh) on my laptop (MacBook Pro6,2 - MacOS X 10.6.7).... (1 Reply)
Discussion started by: antonino_ch
1 Replies

8. Shell Programming and Scripting

Shell script to retrieve first degree neighbors

I have a file with two columns and each pair in the rows denote 2 connected nodes in the network file, edge_list.txt. Given a query file, input.txt, I want to retrieve the nodes that are directly connected (first degree neighbors) to the nodes present in the input.txt. Kindly help. ... (3 Replies)
Discussion started by: Sanchari
3 Replies

9. Shell Programming and Scripting

awk usage for position matching

i have a requirement like this if the line contains from position 294 to 299 is equal to "prabhu" ,then print entire line . i want to use awk awk '{if(substr(294-299) == 'prabhu') print "line" }' filename (1 Reply)
Discussion started by: ptappeta
1 Replies

10. UNIX for Dummies Questions & Answers

String pattern matching and position

I am not an expert with linux, but following various posts on this forum, I have been trying to write a script to match pattern of charters occurring together in a file. My file has approximately 200 million characters (upper and lower case), with about 50 characters per line. I have merged all... (5 Replies)
Discussion started by: biowizz
5 Replies
VCF-ISEC(1)							   User Commands						       VCF-ISEC(1)

NAME
vcf-isec - create intersections, unions, complements on bgzipped and tabix indexed VCF or tab-delimited files SYNOPSIS
vcf-isec [OPTIONS] file1.vcf file2.vcf ... DESCRIPTION
About: Create intersections, unions, complements on bgzipped and tabix indexed VCF or tab-delimited files. Note that lines from all files can be intermixed together on the output, which can yield unexpected results. OPTIONS
-C, --chromosomes <list|file> Process the given chromosomes (comma-separated list or one chromosome per line in a file). -c, --complement Output positions present in the first file but missing from the other files. -d, --debug Debugging information -f, --force Continue even if the script complains about differing columns. -o, --one-file-only Print only entries from the left-most file. Without -o, all unique positions will be printed. -n, --nfiles [+-=]<int> Output positions present in this many (=), this many or more (+), or this many or fewer (-) files. -p, --prefix <path> If present, multiple files will be created with all possible isec combinations. (Suitable for Venn Diagram analysis.) -t, --tab <chr:pos:file> Tab-delimited file with indexes of chromosome and position columns. (1-based indexes) -w, --win <int> In repetitive sequences, the same indel can be called at different positions. Consider records this far apart as matching (be it a SNP or an indel). -h, -?, --help This help message. EXAMPLES
bgzip file.vcf; tabix -p vcf file.vcf.gz bgzip file.tab; tabix -s 1 -b 2 -e 2 file.tab.gz vcf-isec 0.1.5 July 2011 VCF-ISEC(1)
All times are GMT -4. The time now is 08:14 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy