What I need to check is for each gene if column 2 of file 2 is present in column 2 of file 1 and column 3 of file 2 is present in column 3 of file 1 . If so then these two are supposed to be placed in column 4 and 5 of file 1. I just explained one example here. I have 15000 entries with different genes(column5 of file 2). so a for loop should be there to iterate..
o/p
Thanks,
Last edited by Franklin52; 02-11-2012 at 08:59 AM..
Reason: Adding code tags
Hi All, Prepare a perl script for extracting data from xml file. The xml data look like as
AC StartTime="1227858839" ID="88" ETime="1227858837" DSTFlag="false" Type="2" Duration="303" />
<AS StartTime="1227858849" SigPairs="119 40 98 15 100 32 128 18 131 23 70 39 123 20 120 27 100 17 136 12... (3 Replies)
Hi,
I'm using AWK to try to extract data from multiple files (*.txt). The script should look for a flag that occurs at a specific position in each file and it should return the data to the right of that flag.
I should end up with one line for each file, each containing 3 columns:... (8 Replies)
Hi,
Could someone please assist on a quick way of How to extract data from indexed files (ISAM files) maintained in an UNIX(AIX) server.The file data needs to be extracted in flat text file or CSV or excel format .
Usually we have programs in microfocus COBOL to extract data, but would like... (2 Replies)
Hello everyone, I'm new to this forum and i am new as a shell scripter.
my problem is to have html files in a directory and I would like to extract from these some data that lies between two different lines
Here's my situation
<td align="default"> oxidizability (mg / l):
data_to_extract... (6 Replies)
Hi!
I have one file with data that looks like this:
1 data data data data
2 data data data data
3 data data data data
.
.
.
1 data data data data
2 data data data data
3 data data data data
.
.
.
I would like to have awk to write each block to a separate file, like this:
1... (3 Replies)
Hi,
I want to make a query about extracting data from two files that both have data ranges.
the data that i want to extract; when there is matching between file1 column 2 is equal to file2 column2 , and file1 column 3 and column 4 is within the range of file2 columns 3 and 4. I would like rows... (1 Reply)
I am trying to extract common list of Organisms from different files
For example I took 3 files and showed expected result. In real I have more than 1000 files. I am aware about the useful use of awk and grep but unaware in depth so need guidance regarding it.
I want to use awk/ grep/ cut/... (7 Replies)
Hi,
I have directory with multiple files from which i need to extract portion of specif lines and insert it in a new file, the new file will contain a separate columns for each file data.
Example:
I need to extract Value_1 & Value_3 from all files and insert in output file as below:
... (2 Replies)
Hello,
Using the information in file 1, I would like to extract from file2 all rows which matchs in column 3.
file 1
1233
1230
1231
1232
file2
65733.00 19775.00 1220
65733.00 19793.00 1220
65733.00 19801.00 1220
65733.00 19809.00 1231
65733.00 19817.00 ... (2 Replies)
Discussion started by: jiam912
2 Replies
LEARN ABOUT DEBIAN
tabix
tabix(1) Bioinformatics tools tabix(1)NAME
bgzip - Block compression/decompression utility
tabix - Generic indexer for TAB-delimited genome position files
SYNOPSIS
bgzip [-cdhB] [-b virtualOffset] [-s size] [file]
tabix [-0lf] [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol] [-S lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]]
DESCRIPTION
Tabix indexes a TAB-delimited genome position file in.tab.bgz and creates an index file in.tab.bgz.tbi when region is absent from the com-
mand-line. The input data file must be position sorted and compressed by bgzip which has a gzip(1) like interface. After indexing, tabix is
able to quickly retrieve data lines overlapping regions specified in the format "chr:beginPos-endPos". Fast data retrieval also works over
network if URI is given as a file name and in this case the index file will be downloaded if it is not present locally.
OPTIONS OF TABIX -p STR Input format for indexing. Valid values are: gff, bed, sam, vcf and psltab. This option should not be applied together with any
of -s, -b, -e, -c and -0; it is not used for data retrieval because this setting is stored in the index file. [gff]
-s INT Column of sequence name. Option -s, -b, -e, -S, -c and -0 are all stored in the index file and thus not used in data retrieval.
[1]
-b INT Column of start chromosomal position. [4]
-e INT Column of end chromosomal position. The end column can be the same as the start column. [5]
-S INT Skip first INT lines in the data file. [0]
-c CHAR Skip lines started with character CHAR. [#]
-0 Specify that the position in the data file is 0-based (e.g. UCSC files) rather than 1-based.
-h Print the header/meta lines.
-B The second argument is a BED file. When this option is in use, the input file may not be sorted or indexed. The entire input will
be read sequentially. Nonetheless, with this option, the format of the input must be specificed correctly on the command line.
-f Force to overwrite the index file if it is present.
-l List the sequence names stored in the index file.
EXAMPLE
(grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip > sorted.gff.gz;
tabix -p gff sorted.gff.gz;
tabix sorted.gff.gz chr1:10,000,000-20,000,000;
NOTES
It is straightforward to achieve overlap queries using the standard B-tree index (with or without binning) implemented in all SQL data-
bases, or the R-tree index in PostgreSQL and Oracle. But there are still many reasons to use tabix. Firstly, tabix directly works with a
lot of widely used TAB-delimited formats such as GFF/GTF and BED. We do not need to design database schema or specialized binary formats.
Data do not need to be duplicated in different formats, either. Secondly, tabix works on compressed data files while most SQL databases do
not. The GenCode annotation GTF can be compressed down to 4%. Thirdly, tabix is fast. The same indexing algorithm is known to work effi-
ciently for an alignment with a few billion short reads. SQL databases probably cannot easily handle data at this scale. Last but not the
least, tabix supports remote data retrieval. One can put the data file and the index at an FTP or HTTP server, and other users or even web
services will be able to get a slice without downloading the entire file.
AUTHOR
Tabix was written by Heng Li. The BGZF library was originally implemented by Bob Handsaker and modified by Heng Li for remote file access
and in-memory caching.
SEE ALSO samtools(1)tabix-0.2.0 11 May 2010 tabix(1)