Sponsored Content
Top Forums Shell Programming and Scripting Assigning the names from overlapping regions Post 302820869 by anurupa777 on Thursday 13th of June 2013 02:52:56 PM
Old 06-13-2013
Assigning the names from overlapping regions

I have 2 files; file 1 having smaller positions that overlap with the positions with positions in file2.
file1
Code:
aaa 20 22 apple
aaa 18 25 banana
aaa 12 30 grapes
aaa 22 25 melon

file2
Code:
aaa 18 26 cdded
aaa 10 35 abcde

I want to get something like this
output
Code:
aaa 18 26 cdded banana melon  
aaa 10 35 abcde apple  banana grapes melon

 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk: union regions

Hi all, I have difficulty to solve the followign problem. mydata: StartPoint EndPoint 22 55 2222 2230 33 66 44 58 222 240 11 25 22 60 33 45 The union of above... (2 Replies)
Discussion started by: phoeberunner
2 Replies

2. UNIX for Dummies Questions & Answers

finding overlapping names in different txt files

Dear Gurus, I have 57 tab-delimited different text files, each one containing entries in 3 columns. The first column in each file contains names of objects. Some names are present in more than one file. I would like to find those names and store them in a separate text file, preferably with a... (6 Replies)
Discussion started by: Unilearn
6 Replies

3. UNIX for Dummies Questions & Answers

extract regions of file based on start and end position

Hi, I have a file1 of many long sequences, each preceded by a unique header line. file2 is 3-columns list: headers name, start position, end position. I'd like to extract the sequence region of file1 specified in file2. Based on a post elsewhere, I found the code: awk... (2 Replies)
Discussion started by: pathunkathunk
2 Replies

4. Forum Support Area for Unregistered Users & Account Problems

Trouble Registering? Countries or Regions Abusing Forums

The forums have been seeing a sharp increase in spam bots, forum robots, and malicious registrations from certain countries. If you have been directed to this thread due to a "No Permission Error" when trying to register please post in this thread and request permission to register, including... (1 Reply)
Discussion started by: Neo
1 Replies

5. Shell Programming and Scripting

Obtain the names of the flanking regions

Hi I have 2 files; usually the end position in the file1 is the start position in the file2 and the end position in file2 will be the start position in file1 (flanks) file1 Id start end aaa1 0 3000070 aaa1 3095270 3095341 aaa1 3100822 3100894 aaa1 ... (1 Reply)
Discussion started by: anurupa777
1 Replies

6. Shell Programming and Scripting

Identify the overlapping and non overlapping regions

file1 chr pos1 pos2 pos3 pos4 1)chr1 1000 2000 3000 4000 2)chr1 1380 1480 6800 7800 3)chr1 6700 7700 1200 2200 4)chr2 8500 9500 5670 6670 file2 chr pos1 pos2 pos3 pos4 1)chr2 8500 9500 5000 6000 2)chr1 6700 7700 1200 2200 3)chr1 1380 1480 6700 7700 4)chr1 1000 2000 4900 5900 I... (2 Replies)
Discussion started by: data_miner
2 Replies

7. Shell Programming and Scripting

Extraction of upstream and downstream regions from long sequence file

Hello, here I am posting my query again with modified data input files. see my query is : i have two input files file1 and file2. file1 is smalldata.fasta >gi|546671471|gb|AWWX01449637.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig449636, whole genome shotgun sequence... (20 Replies)
Discussion started by: harpreetmanku04
20 Replies

8. UNIX for Dummies Questions & Answers

Retrieving names of files in a dir without overlapping

Hi, I have been trying to retrieve the names of files present in a directory one by one but the names of files are getting overlapped on one another. I tried the below command. ls -1 > filename please help me in getting the file names line by line without overlapping. I am using korn... (6 Replies)
Discussion started by: Pradhikshan
6 Replies

9. Shell Programming and Scripting

Extract Big and continuous regions

Hi all, I have a file like this I want to extract only those regions which are big and continous chr1 3280000 3440000 chr1 3440000 3920000 chr1 3600000 3920000 # region coming within the 3440000 3920000. so i don't want it to be printed in output chr1 3920000 4800000 chr1 ... (2 Replies)
Discussion started by: amrutha_sastry
2 Replies
funjoin(1)							SAORD Documentation							funjoin(1)

NAME
funjoin - join two or more FITS binary tables on specified columns SYNOPSIS
funjoin [switches] <ifile1> <ifile2> ... <ifilen> <ofile> OPTIONS
-a cols # columns to activate in all files -a1 cols ... an cols # columns to activate in each file -b 'c1:bvl,c2:bv2' # blank values for common columns in all files -bn 'c1:bv1,c2:bv2' # blank values for columns in specific files -j col # column to join in all files -j1 col ... jn col # column to join in each file -m min # min matches to output a row -M max # max matches to output a row -s # add 'jfiles' status column -S col # add col as status column -t tol # tolerance for joining numeric cols [2 files only] DESCRIPTION
funjoin joins rows from two or more (up to 32) FITS Binary Table files, based on the values of specified join columns in each file. NB: the join columns must have an index file associated with it. These files are generated using the funindex program. The first argument to the program specifies the first input FITS table or raw event file. If "stdin" is specified, data are read from the standard input. Subsequent arguments specify additional event files and tables to join. The last argument is the output FITS file. NB: Do not use Funtools Bracket Notation to specify FITS extensions and row filters when running funjoin or you will get wrong results. Rows are accessed and joined using the index files directly, and this bypasses all filtering. The join columns are specified using the -j col switch (which specifies a column name to use for all files) or with -j1 col1, -j2 col2, ... -jn coln switches (which specify a column name to use for each file). A join column must be specified for each file. If both -j col and -jn coln are specified for a given file, then the latter is used. Join columns must either be of type string or type numeric; it is illegal to mix numeric and string columns in a given join. For example, to join three files using the same key column for each file, use: funjoin -j key in1.fits in2.fits in3.fits out.fits A different key can be specified for the third file in this way: funjoin -j key -j3 otherkey in1.fits in2.fits in3.fits out.fits The -a "cols" switch (and -a1 "col1", -a2 "cols2" counterparts) can be used to specify columns to activate (i.e. write to the output file) for each input file. By default, all columns are output. If two or more columns from separate files have the same name, the second (and subsequent) columns are renamed to have an underscore and a numeric value appended. The -m min and -M max switches specify the minimum and maximum number of joins required to write out a row. The default minimum is 0 joins (i.e. all rows are written out) and the default maximum is 63 (the maximum number of possible joins with a limit of 32 input files). For example, to write out only those rows in which exactly two files have columns that match (i.e. one join): funjoin -j key -m 1 -M 1 in1.fits in2.fits in3.fits ... out.fits A given row can have the requisite number of joins without all of the files being joined (e.g. three files are being joined but only two have a given join key value). In this case, all of the columns of the non-joined file are written out, by default, using blanks (zeros or NULLs). The -b c1:bv1,c2:bv2 and -b1 'c1:bv1,c2:bv2' -b2 'c1:bv1,c2 - bv2' ... switches can be used to set the blank value for columns common to all files and/or columns in a specified file, respectively. Each blank value string contains a comma-separated list of col- umn:blank_val specifiers. For floating point values (single or double), a case-insensitive string value of "nan" means that the IEEE NaN (not-a-number) should be used. Thus, for example: funjoin -b "AKEY:???" -b1 "A:-1" -b3 "G:NaN,E:-1,F:-100" ... means that a non-joined AKEY column in any file will contain the string "???", the non-joined A column of file 1 will contain a value of -1, the non-joined G column of file 3 will contain IEEE NaNs, while the non-joined E and F columns of the same file will contain values -1 and -100, respectively. Of course, where common and specific blank values are specified for the same column, the specific blank value is used. To distinguish which files are non-blank components of a given row, the -s (status) switch can be used to add a bitmask column named "JFILES" to the output file. In this column, a bit is set for each non-blank file composing the given row, with bit 0 corresponds to the first file, bit 1 to the second file, and so on. The file names themselves are stored in the FITS header as parameters named JFILE1, JFILE2, etc. The -S col switch allows you to change the name of the status column from the default "JFILES". A join between rows is the Cartesian product of all rows in one file having a given join column value with all rows in a second file having the same value for its join column and so on. Thus, if file1 has 2 rows with join column value 100, file2 has 3 rows with the same value, and file3 has 4 rows, then the join results in 2*3*4=24 rows being output. The join algorithm directly processes the index file associated with the join column of each file. The smallest value of all the current columns is selected as a base, and this value is used to join equal-valued columns in the other files. In this way, the index files are traversed exactly once. The -t tol switch specifies a tolerance value for numeric columns. At present, a tolerance value can join only two files at a time. (A completely different algorithm is required to join more than two files using a tolerance, somethng we might consider implementing in the future.) The following example shows many of the features of funjoin. The input files t1.fits, t2.fits, and t3.fits contain the following columns: [sh] fundisp t1.fits AKEY KEY A B ----------- ------ ------ ------ aaa 0 0 1 bbb 1 3 4 ccc 2 6 7 ddd 3 9 10 eee 4 12 13 fff 5 15 16 ggg 6 18 19 hhh 7 21 22 fundisp t2.fits AKEY KEY C D ----------- ------ ------ ------ iii 8 24 25 ggg 6 18 19 eee 4 12 13 ccc 2 6 7 aaa 0 0 1 fundisp t3.fits AKEY KEY E F G ------------ ------ -------- -------- ----------- ggg 6 18 19 100.10 jjj 9 27 28 200.20 aaa 0 0 1 300.30 ddd 3 9 10 400.40 Given these input files, the following funjoin command: funjoin -s -a1 "-B" -a2 "-D" -a3 "-E" -b "AKEY:???" -b1 "AKEY:XXX,A:255" -b3 "G:NaN,E:-1,F:-100" -j key t1.fits t2.fits t3.fits foo.fits will join the files on the KEY column, outputting all columns except B (in t1.fits), D (in t2.fits) and E (in t3.fits), and setting blank values for AKEY (globally, but overridden for t1.fits) and A (in file 1) and G, E, and F (in file 3). A JFILES column will be output to flag which files were used in each row: AKEY KEY A AKEY_2 KEY_2 C AKEY_3 KEY_3 F G JFILES ------------ ------ ------ ------------ ------ ------ ------------ ------ -------- ----------- -------- aaa 0 0 aaa 0 0 aaa 0 1 300.30 7 bbb 1 3 ??? 0 0 ??? 0 -100 nan 1 ccc 2 6 ccc 2 6 ??? 0 -100 nan 3 ddd 3 9 ??? 0 0 ddd 3 10 400.40 5 eee 4 12 eee 4 12 ??? 0 -100 nan 3 fff 5 15 ??? 0 0 ??? 0 -100 nan 1 ggg 6 18 ggg 6 18 ggg 6 19 100.10 7 hhh 7 21 ??? 0 0 ??? 0 -100 nan 1 XXX 0 255 iii 8 24 ??? 0 -100 nan 2 XXX 0 255 ??? 0 0 jjj 9 28 200.20 4 SEE ALSO
See funtools(7) for a list of Funtools help pages version 1.4.2 January 2, 2008 funjoin(1)
All times are GMT -4. The time now is 02:04 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy