Read file1 into an indexed array pair, then read file2 into variables and walk through the array pair looking for the relationships, and when you find a hit, echo out the desired variables. Maybe I have that backwards, but that is the flavor.
If there is a size problem, sorting the files if not always sorted and some sort of merge would be faster, but that is a bit hard in shell. Also, you may need multiple lines of reference to solve a line of query, so you are back keeping at least a local array, or seeking back before the next query line.
Sticking the data in an RDBMS, or using a JDBC or similar text file as table SQL tool makes it a trivial SQL query.
Anyone help ! From where can I download a free version of FreeBSD ?
I am trying to teach myself this OS, have all the documentation needed, but am short the OS itself.
If anyone can send me a link, I would be most appreciative ! (3 Replies)
Hi all,
I have difficulty to solve the followign problem.
mydata:
StartPoint EndPoint
22 55
2222 2230
33 66
44 58
222 240
11 25
22 60
33 45
The union of above... (2 Replies)
Hi, I have a file1 of many long sequences, each preceded by a unique header line. file2 is 3-columns list: headers name, start position, end position. I'd like to extract the sequence region of file1 specified in file2.
Based on a post elsewhere, I found the code:
awk... (2 Replies)
Discussion started by: pathunkathunk
2 Replies
4. Forum Support Area for Unregistered Users & Account Problems
The forums have been seeing a sharp increase in spam bots, forum robots, and malicious registrations from certain countries. If you have been directed to this thread due to a "No Permission Error" when trying to register please post in this thread and request permission to register, including... (1 Reply)
I have 2 files; file 1 having smaller positions that overlap with the positions with positions in file2.
file1
aaa 20 22 apple
aaa 18 25 banana
aaa 12 30 grapes
aaa 22 25 melon
file2
aaa 18 26 cdded
aaa 10 35 abcde
I want to get something like this
output
aaa 18 26 cdded banana... (4 Replies)
HI,
I have a Complete genome fasta file and I have list of sub sequence regions
in the format as :
4353..5633
6795..9354
1034..14456
I want a script which can mask these region in a single complete genome fasta file with the alphabet N
kindly help (2 Replies)
Hello, here I am posting my query again with modified data input files.
see my query is :
i have two input files file1 and file2.
file1 is smalldata.fasta
>gi|546671471|gb|AWWX01449637.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig449636, whole genome shotgun sequence... (20 Replies)
Hi all,
I have a file like this I want to extract only those regions which are big and continous
chr1 3280000 3440000
chr1 3440000 3920000
chr1 3600000 3920000 # region coming within the 3440000 3920000. so i don't want it to be printed in output
chr1 3920000 4800000
chr1 ... (2 Replies)
I have a positions file with markers in col1 and position defined by chromosome and location in col2 and col3
m1 ch1 1
m2 ch1 5
m3 ch1 50
m4 ch2 567
m5 ch2 4567
m6 ch2 7766
m7 ch2 554433
m8 ch3 76
m9 ch3 456
m10 ch3 2315
Given a set of query marker, I would like to know what are the... (1 Reply)
Discussion started by: jianp83
1 Replies
LEARN ABOUT BSD
join
JOIN(1) General Commands Manual JOIN(1)NAME
join - relational database operator
SYNOPSIS
join [ options ] file1 file2
DESCRIPTION
Join forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If file1 is `-', the standard
input is used.
File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the first in
each line.
There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally con-
sists of the common field, then the rest of the line from file1, then the rest of the line from file2.
Fields are normally separated by blank, tab or newline. In this case, multiple separators count as one, and leading separators are dis-
carded.
These options are recognized:
-an In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2.
-e s Replace empty output fields by string s.
-jn m Join on the mth field of file n. If n is missing, use the mth field in each file.
-o list
Each output line comprises the fields specified in list, each element of which has the form n.m, where n is a file number and m is a
field number.
-tc Use character c as a separator (tab character). Every appearance of c in a line is significant.
SEE ALSO sort(1), comm(1), awk(1)BUGS
With default field separation, the collating sequence is that of sort -b; with -t, the sequence is that of a plain sort.
The conventions of join, sort, comm, uniq, look and awk(1) are wildly incongruous.
7th Edition April 29, 1985 JOIN(1)