Sponsored Content
Top Forums Shell Programming and Scripting Matching column and search closest elements Post 302908629 by giuliangiuseppe on Wednesday 9th of July 2014 07:13:11 AM
Old 07-09-2014
Matching column and search closest elements

Hi all
I have a great challenge that I am not able to resolve.
Briefly, I have a file like this:

Code:
ID_1 chr1 100 -
ID_2 chr2 300 +

and another file like this:

Code:
name_1 chr1 150 no -
name_2 chr1 250 yes -
name_3 chr2 350 yes +
name_4 chr2 280 yes +

Well, for each entry in file1 I would like to find the closest (cloumn 3) feature in file2.
So, for instance for entry1 in file1, I would like to check in file2 which is the element that is closest to "chr1 100" (the second column must match).
Moreover i would like to take in consideration only the element in file two in which the 4th column is "yes"(or at least I can have the possibility to decide this parameter) and the 5th column match with the entry in file1(or also in this case I have the possibility to decide this).

The output file for the example above should be (if I have 4th columns muast matches) like this:
Code:
ID_1 chr1 100 - name2 chr1 250 yes - 150 2
ID_2 chr2 300 + name4 chr2 280 yes + -20 1

So I would like to output all entry in file1 with the closest feature in file2 and report (last 2 column) the distance between column 3 and, for example for entry1, that the closest feature "yes" is the second met.

I really hope that my explanation wa good.

If you need furthr information let me know.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search array elements as file for a matching string

I would like to find a list of files in a directory less than 2 days old and put them into an array variable. And then search for each file in the array for a matching string say "Return-code= 0". If it matches, then display the array element with a message as "OK". Your help will be greatly... (1 Reply)
Discussion started by: mkbaral
1 Replies

2. Shell Programming and Scripting

Perl:Use of array elements in pattern matching

I need to use array elements while pattern matching. @myarr = (ELEM1, ELEM2, ELEM3); following is the statement which I am using in my code. Basically I want to replace the ELEM1/2/3 with other thing which is mentioned as REPL here. if (condition) { s/(ELEM1|ELEM2|ELEM3): REPL: /; } I... (3 Replies)
Discussion started by: deo_kaustubh
3 Replies

3. UNIX for Dummies Questions & Answers

Average for repeated elements in a column

I have a file that looks like this 452 025_E3 8 025_E3 82 025_F5 135 025_F5 5 025_F5 23 025_G2 38 025_G2 71 025_G2 9 026_A12 81 026_A12 10 026_A12 some of the elements in column2 are repeated. I want an output file that will extract the... (1 Reply)
Discussion started by: FelipeAd
1 Replies

4. Shell Programming and Scripting

Filtering lines for column elements based on corresponding counts in another column

Hi, I have a file like this ACC 2 2 21 aaa AC 443 3 22 aaa GCT 76 1 33 xxx TCG 34 2 33 aaa ACGT 33 1 22 ggg TTC 99 3 44 wee CCA 33 2 33 ggg AAC 1 3 55 ddd TTG 10 1 22 ddd TTGC 98 3 22 ddd GCT 23 1 21 sds GTC 23 4 32 sds ACGT 32 2 33 vvv CGT 11 2 33 eee CCC 87 2 44... (1 Reply)
Discussion started by: polsum
1 Replies

5. Shell Programming and Scripting

Find lines with matching column 1 value, retain only the one with highest value in column 2

I have a file like: I would like to find lines lines with duplicate values in column 1, and retain only one based on two conditions: 1) keep line with highest value in column 3, 2) if column 3 values are equal, retain the line with the highest value in column 4. Desired output: I was able to... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

6. Shell Programming and Scripting

Count common elements in a column

HI, I have a 3-column tab separated column (approx 1GB) in which I would like to count and output the frequency of all of the common elements in the 1st column. For instance: If my input was the following: dot is-big 2 dot is-round 3 dot is-gray 4 cat is-big 3 hot in-summer 5 My... (4 Replies)
Discussion started by: owwow14
4 Replies

7. Shell Programming and Scripting

Matching column then append to existing File as new column

Good evening I have the below requirements, as I am not an experts in Linux/Unix and am looking for your ideas how I can do this. I have file called file1 and file2. I need to get the second column which is text1_random_alphabets and find that in file 2, if it's exists then print the 3rd... (4 Replies)
Discussion started by: mychbears
4 Replies

8. Shell Programming and Scripting

Matching column value from 2 different file using awk and append value from different column

Hi, I have 2 csv files. a.csv HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG 027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012... (7 Replies)
Discussion started by: tententen
7 Replies

9. UNIX for Beginners Questions & Answers

Add column and multiply its result to all elements of another column

Input file is as follows: 1 | 6 2 | 7 3 | 8 4 | 9 5 | 10 Output reuired (sum of the first column $1*$2) 1 | 6 | 90 2 | 7 | 105 3 | 8 | 120 4 |9 | 135 5 |10 | 150 Please enclose sample input, sample output, and code... (5 Replies)
Discussion started by: Sagar Singh
5 Replies

10. UNIX for Beginners Questions & Answers

Matching column search in two files

Hi, I have a tab delimited file1: NC_013499.1 3180 3269 GQ342961.1 NC_030295.1 5925 6014 FN398100.2 NC_007915.1 6307 6396 KU529284.1 NC_013499.1 5033 5122 GQ342961.1 And a second file2: NC_030295.1 RefSeq gene 136 5115 ... (6 Replies)
Discussion started by: Ibk
6 Replies
tabs(1) 						      General Commands Manual							   tabs(1)

NAME
tabs - set tabs on a terminal SYNOPSIS
[tabspec] n] type] DESCRIPTION
sets the tab stops on the user's terminal according to the tab specification tabspec, after clearing any previous settings. The user's terminal must have remotely-settable hardware tabs. If you are using a non-HP terminal, you should keep in mind that behavior will vary for some tab settings. Four types of tab specification are accepted for tabspec: ``canned'', repetitive, arbitrary, and file. If no is given, the default value is i.e., UNIX ``standard'' tabs. The lowest column number is 1. Note that for tabs, column 1 always refers to the left-most column on a terminal, even one whose column markers begin at 0. Gives the name of one of a set of ``canned'' tabs. Recognized codes and their meanings are as follows: 1,10,16,36,72 Assembler, IBM S/370, first format 1,10,16,40,72 Assembler, IBM S/370, second format 1,8,12,16,20,55 COBOL, normal format 1,6,10,14,49 COBOL compact format (columns 1-6 omitted). Using this code, the first typed character corresponds to card column 7, one space gets you to column 8, and a tab reaches column 12. Files using this tab setup should have specify a format specification file as defined by below. The file should have the following format specification: 1,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62,67 COBOL compact format (columns 1-6 omitted), with more tabs than This is the recommended format for COBOL. The appro- priate format specification is: 1,7,11,15,19,23 FORTRAN 1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61 PL/I 1,10,55 SNOBOL 1,12,20,44 UNIVAC 1100 Assembler In addition to these ``canned'' formats, three other types exist: A repetitive specification requests tabs at columns 1+n, 1+2xn, etc. Of particular importance is the value this represents the UNIX ``standard'' tab setting, and is the most likely tab setting to be found at a terminal. Another special case is the value implying no tabs at all. The arbitrary format permits the user to type any chosen set of numbers, separated by commas, in ascending order. Up to 40 numbers are allowed. If any number (except the first one) is preceded by a plus sign, it is taken as an increment to be added to the previous value. Thus, the tab lists 1,10,20,30 and 1,10,+10,+10 are considered identical. If the name of a file is given, reads the first line of the file, searching for a format specification. If it finds one there, it sets the tab stops according to it, otherwise it sets them as This type of specification can be used to ensure that a tabbed file is printed with correct tab settings, and is suitable for use with the command (see pr(1)): Any of the following can be used also; if a given option occurs more than once, the last value given takes effect: usually needs to know the type of terminal in order to set tabs and always needs to know the type to set margins. type is a name listed in term(5). If no option is supplied, searches for the value in the environment (see environ(5)). If is not defined in the environment, tries a sequence that will work for many terminals. The margin argument can be used for some terminals. It causes all tabs to be moved over n columns by making column n+1 the left margin. If is given without a value of n, the value assumed is 10. The normal (left-most) margin on most terminals is obtained by The margin for most terminals is reset only when the option is given explicitly. Tab and margin setting is performed via the standard output. EXTERNAL INFLUENCES
Environment Variables determines the interpretation of text within file as single- and/or multi-byte characters. determines the language in which messages are displayed. If or is not specified in the environment or is set to the empty string, the value of is used as a default for each unspecified or empty variable. If is not specified or is set to the empty string, a default of "C" (see lang(5)) is used instead of If any internationalization variable contains an invalid setting, behaves as if all internationalization variables are set to "C". See environ(5). International Code Set Support Single- and multi-byte character code sets are supported. DIAGNOSTICS
Arbitrary tabs are ordered incorrectly. A zero or missing increment found in an arbitrary specification. A ``canned'' code cannot be found. option was used and file cannot be opened. option was used and the specification in that file points to yet another file. Indirection of this form is not permitted. WARNINGS
There is no consistency among different terminals regarding ways of clearing tabs and setting the left margin. It is generally impossible to usefully change the left margin without also setting tabs. clears only 20 tabs (on terminals requiring a long sequence), but is willing to set 64. SEE ALSO
nroff(1), pr(1), tset(1), environ(5), term(5). STANDARDS CONFORMANCE
tabs(1)
All times are GMT -4. The time now is 10:29 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy