Sponsored Content
Top Forums Shell Programming and Scripting Identify the overlapping and non overlapping regions Post 302888830 by data_miner on Monday 17th of February 2014 03:06:27 PM
Old 02-17-2014
Identify the overlapping and non overlapping regions

Code:
file1
chr	pos1	pos2	pos3	pos4
1)chr1	1000	2000	3000	4000 
2)chr1	1380	1480	6800	7800	
3)chr1	6700	7700	1200	2200	
4)chr2	8500	9500	5670	6670

Code:
file2
chr	pos1	pos2	pos3	pos4
1)chr2	8500	9500	5000	6000	
2)chr1	6700	7700	1200	2200
3)chr1	1380	1480	6700	7700
4)chr1	1000	2000	4900	5900

I have 2 input files file1 and file2 each containing 5 columns. The first column contains the chromosomes (range from 1-19,X of which only chr1 and chr2 were shown in example).
what i want to do is
condition1 if chr pos1 and pos2 in both files overlap
then i want to compare the pos3 and pos4. if they (pos3 and pos4) overlap i want to output them to output_1file
and

if they (pos3 and pos4) wont overlap then output to output_2 file.
so if we compare file 1 with file2
Code:
output_1file
2)chr1	1380	1480	6800	7800
3)chr1	6700	7700	1200	2200
4)chr2	8500	9500	5670	6670

Code:
output_2file
1)chr1	1000	2000	3000	4000

my definition of overlap
The positions need not be exactly same. They should contain common region atleast by 1bp(base pair).
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

overlapping words on command line

i tried resize command , but it's not working...... (4 Replies)
Discussion started by: gaurav123
4 Replies

2. Shell Programming and Scripting

script to find non overlapping positions

Hi, I am a newbie in unix programming so maybe this is a simple question. I would like to know how can I make a script that outputs only the values that are not between any given start and end positions Example file1: 2 30 40 80 82 100 file2: ID1 1 ID2 35 ID3 80 ID4 81 ID6 160... (1 Reply)
Discussion started by: fadista
1 Replies

3. Shell Programming and Scripting

matching columns with overlapping value ranges

Hi, I want to match and print columns that match. So my file looks like this: h1 20 30 h1 25 27 h2 50 70 h2 90 95 h2 60 80 h2 70 75 h3 130 150 h3 177 190 h4 140 190 h4 300 305 So there are 6 columns. Column 1 and 4 are names. I am able to get the... (2 Replies)
Discussion started by: kylle345
2 Replies

4. UNIX Desktop Questions & Answers

non-overlapping terminals

Hi Everyone! I was wondering if there's an easy way to have terminals (gnome-terminal for instance) be open in such a way that they're not overlapping each other? I suppose I could play around with the --geometry option but that would imply me checking whether a terminal is already at a given... (3 Replies)
Discussion started by: anthalamus
3 Replies

5. Programming

Overlapping pictureboxes?

I am making a game, but I can't figure out how to put one image over the other. The background of the front image, covers up the picturebox under it. For example, I have two fish images, but when one is in front of the other, its background covers up the other fish. I attached a picture as an... (1 Reply)
Discussion started by: romeo5577
1 Replies

6. Solaris

shared memory overlapping

hey guys, i'm having trouble with a real time multi threaded program that uses lots of shared memory on solaris 8. it sometime crashes out of the blue, a randomness that suggests some sort of memory leak or shared memory overlap. any tips? freeware or otherwise useful software? any way i can... (2 Replies)
Discussion started by: princeofnothing
2 Replies

7. IP Networking

Test for overlapping IP ranges

Greetings folks, I have a rather lengthy list of banned IP ranges in iptables. Initially it was constructed as a rather ad-hoc affair, then later I discovered a site which had IP Block By Country lists, and imported several into iptables. If possible, I'd like to be able to verify if the list... (0 Replies)
Discussion started by: putter1900
0 Replies

8. UNIX for Dummies Questions & Answers

finding overlapping names in different txt files

Dear Gurus, I have 57 tab-delimited different text files, each one containing entries in 3 columns. The first column in each file contains names of objects. Some names are present in more than one file. I would like to find those names and store them in a separate text file, preferably with a... (6 Replies)
Discussion started by: Unilearn
6 Replies

9. UNIX for Dummies Questions & Answers

Merge two files with non-overlapping identities

Hi All, I wish to merge two files: file1: with header rsSNP-ID Chromosome Chr-Pos rs171 1 175261679 rs242 1 20869461 rs538 1 6160958 file2: without header disease:AAT deficiency:M0525101 rs1243168 20109307 1 disease:AAT deficiency:M0525101 rs4900229 20109307 1... (3 Replies)
Discussion started by: luoruicd
3 Replies

10. Shell Programming and Scripting

Assigning the names from overlapping regions

I have 2 files; file 1 having smaller positions that overlap with the positions with positions in file2. file1 aaa 20 22 apple aaa 18 25 banana aaa 12 30 grapes aaa 22 25 melon file2 aaa 18 26 cdded aaa 10 35 abcde I want to get something like this output aaa 18 26 cdded banana... (4 Replies)
Discussion started by: anurupa777
4 Replies
SORT(1) 						      General Commands Manual							   SORT(1)

NAME
sort - sort and/or merge files SYNOPSIS
sort [ -cmuMbdfinrwtx ] [ +pos1 [ -pos2 ] ... ] ... [ -k pos1 [ ,pos2 ] ] ... [ -o output ] [ -T dir ... ] [ option ... ] [ file ... ] DESCRIPTION
Sort sorts lines of all the files together and writes the result on the standard output. If no input files are named, the standard input is sorted. The default sort key is an entire line. Default ordering is lexicographic by runes. The ordering is affected globally by the following options, one or more of which may appear. -M Compare as months. The first three non-white space characters of the field are folded to upper case and compared so that precedes etc. Invalid fields compare low to -b Ignore leading white space (spaces and tabs) in field comparisons. -d `Phone directory' order: only letters, accented letters, digits and white space are significant in comparisons. -f Fold lower case letters onto upper case. Accented characters are folded to their non-accented upper case form. -i Ignore characters outside the ASCII range 040-0176 in non-numeric comparisons. -w Like -i, but ignore only tabs and spaces. -n An initial numeric string, consisting of optional white space, optional plus or minus sign, and zero or more digits with optional decimal point, is sorted by arithmetic value. -g Numbers, like -n but with optional e-style exponents, are sorted by value. -r Reverse the sense of comparisons. -tx `Tab character' separating fields is x. The notation +pos1 -pos2 restricts a sort key to a field beginning at pos1 and ending just before pos2. Pos1 and pos2 each have the form m.n, optionally followed by one or more of the flags Mbdfginr, where m tells a number of fields to skip from the beginning of the line and n tells a number of characters to skip further. If any flags are present they override all the global ordering options for this key. A missing .n means .0; a missing -pos2 means the end of the line. Under the -tx option, fields are strings separated by x; otherwise fields are non-empty strings separated by white space. White space before a field is part of the field, except under option -b. A b flag may be attached independently to pos1 and pos2. The notation -k pos1[,pos2] is how POSIX sort defines fields: pos1 and pos2 have the same format but different meanings. The value of m is origin 1 instead of origin 0 and a missing .n in pos2 is the end of the field. When there are multiple sort keys, later keys are compared only after all earlier keys compare equal. Lines that otherwise compare equal are ordered with all bytes significant. These option arguments are also understood: -c Check that the single input file is sorted according to the ordering rules; give no output unless the file is out of sort. -m Merge; assume the input files are already sorted. -u Suppress all but one in each set of equal lines. Ignored bytes and bytes outside keys do not participate in this comparison. -o The next argument is the name of an output file to use instead of the standard output. This file may be the same as one of the inputs. -Tdir Put temporary files in dir rather than in /tmp. EXAMPLES
Print in alphabetical order all the unique spellings in a list of words where capitalized words differ from uncapitalized. Print the users file sorted by user name (the second colon-separated field). Print the first instance of each month in an already sorted file. Options -um with just one input file make the choice of a unique representative from a set of equal lines predictable. grep -n '^' input | sort -t: +1f +0n | sed 's/[0-9]*://' A stable sort: input lines that compare equal will come out in their original order. FILES
/tmp/sort.<pid>.<ordinal> SOURCE
/sys/src/cmd/sort.c SEE ALSO
uniq(1), look(1) DIAGNOSTICS
Sort comments and exits with non-null status for various trouble conditions and for disorder discovered under option -c. BUGS
An external null character can be confused with an internally generated end-of-field character. The result can make a sub-field not sort less than a longer field. Some of the options, e.g. -i and -M, are hopelessly provincial. SORT(1)
All times are GMT -4. The time now is 06:49 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy