sorry this was my first post, so I didn't have much idea.
Regarding problem:
It is a particular gene position in the whole genome, now this gene is madeup of CDS, mRNA, Introns etc..the information is right below it like:
etc..until the information of next gene comes..
like: (say gene2)
So I have file with these 'gene' location, now I need to extract its sub-location, like whether its in CDS, mRNA or Intron(in case no match found).
The location of gene(that we need to find) is in separate file:
I have to read it line by line, extract gene position, then search it in the main gene info. (gff) file. like:
note: '..' denotes FROM postion 52698648 TO 52707224.
Last edited by reena2305; 06-28-2011 at 01:32 AM..
Good evening All,
I have a perl script to pull out all occurrences of a files beginning with xx and ending in .p. I will then loop through all 1K files in a directory. I can grep for xx*.p files but it gives me the entire line. I wish to output to a single colum with only the hits found. ... (3 Replies)
Create a script that copies files from one specified directory to another specified directory, in the order they were created in the original directory between specified times. Copy the files at a specified interval. (2 Replies)
Hi,
I am logging to a linux server through a user "user1" in /home directory.
There is a script in a directory in 'root' for which all permissions are available including the directory. This script when executed creates a file in the directory.
When the script is added to crontab, on... (1 Reply)
Hey,
I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time:
for i in *.rtf.out
do
awk '/^>/{f=++d".fasta"} {print > $i.out}' $i
done (1 Reply)
I have a file with
<suit:run date="Trump Tue 06/19/2012 11:41 AM EDT" machine="garg-ln" build="19921" level="beta" release="6.1.5" os="Linux">
Need to find word "build" then
extract build number, which is 19921 also
release number, which is 6.1.5 then
concatenate them to one variable as... (6 Replies)
I have hundreds of files to process. In each file
I need to look for a pattern then
extract value(s) from next line and then
search for value(s) selected from point (2) in the same file at a specific position.
HEADER ELECTRON TRANSPORT 18-MAR-98 1A7V
TITLE CYTOCHROME... (7 Replies)
Hi
This is my third past and very impressed with previous post replies
Hoping the same for below query
How to find a existing file location and directory location in solaris box (1 Reply)
Hi,
I have a log file which is the output from a xml script :
<?xml version="1.0" ?>
<!DOCTYPE svc_result SYSTEM "MLP_SVC_RESULT_320.DTD">
<svc_result ver="3.2.0">
<slia ver="3.0.0">
<pos>
<msid type="MSISDN" enc="ASC">8093078040</msid>
<poserr>
... (4 Replies)
I have the following data set about the snps ID txt file
POS ID
78599583 rs987435
33395779 rs345783
189807684 rs955894
33907909 rs6088791
75664046 rs11180435
218890658 rs17571465
127630276 rs17011450
90919465 rs6919430
and a gene... (7 Replies)
I want to search a small string in a large string and find the locations of the string. For this I used grep "string" -ob <file name where the large string is stored>. Now this gives me the locations of that string. Now how do I store these locations in a text file.
Please use CODE tags as... (7 Replies)
Discussion started by: ANKIT ROY
7 Replies
LEARN ABOUT DEBIAN
tabix
tabix(1) Bioinformatics tools tabix(1)NAME
bgzip - Block compression/decompression utility
tabix - Generic indexer for TAB-delimited genome position files
SYNOPSIS
bgzip [-cdhB] [-b virtualOffset] [-s size] [file]
tabix [-0lf] [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol] [-S lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]]
DESCRIPTION
Tabix indexes a TAB-delimited genome position file in.tab.bgz and creates an index file in.tab.bgz.tbi when region is absent from the com-
mand-line. The input data file must be position sorted and compressed by bgzip which has a gzip(1) like interface. After indexing, tabix is
able to quickly retrieve data lines overlapping regions specified in the format "chr:beginPos-endPos". Fast data retrieval also works over
network if URI is given as a file name and in this case the index file will be downloaded if it is not present locally.
OPTIONS OF TABIX -p STR Input format for indexing. Valid values are: gff, bed, sam, vcf and psltab. This option should not be applied together with any
of -s, -b, -e, -c and -0; it is not used for data retrieval because this setting is stored in the index file. [gff]
-s INT Column of sequence name. Option -s, -b, -e, -S, -c and -0 are all stored in the index file and thus not used in data retrieval.
[1]
-b INT Column of start chromosomal position. [4]
-e INT Column of end chromosomal position. The end column can be the same as the start column. [5]
-S INT Skip first INT lines in the data file. [0]
-c CHAR Skip lines started with character CHAR. [#]
-0 Specify that the position in the data file is 0-based (e.g. UCSC files) rather than 1-based.
-h Print the header/meta lines.
-B The second argument is a BED file. When this option is in use, the input file may not be sorted or indexed. The entire input will
be read sequentially. Nonetheless, with this option, the format of the input must be specificed correctly on the command line.
-f Force to overwrite the index file if it is present.
-l List the sequence names stored in the index file.
EXAMPLE
(grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip > sorted.gff.gz;
tabix -p gff sorted.gff.gz;
tabix sorted.gff.gz chr1:10,000,000-20,000,000;
NOTES
It is straightforward to achieve overlap queries using the standard B-tree index (with or without binning) implemented in all SQL data-
bases, or the R-tree index in PostgreSQL and Oracle. But there are still many reasons to use tabix. Firstly, tabix directly works with a
lot of widely used TAB-delimited formats such as GFF/GTF and BED. We do not need to design database schema or specialized binary formats.
Data do not need to be duplicated in different formats, either. Secondly, tabix works on compressed data files while most SQL databases do
not. The GenCode annotation GTF can be compressed down to 4%. Thirdly, tabix is fast. The same indexing algorithm is known to work effi-
ciently for an alignment with a few billion short reads. SQL databases probably cannot easily handle data at this scale. Last but not the
least, tabix supports remote data retrieval. One can put the data file and the index at an FTP or HTTP server, and other users or even web
services will be able to get a slice without downloading the entire file.
AUTHOR
Tabix was written by Heng Li. The BGZF library was originally implemented by Bob Handsaker and modified by Heng Li for remote file access
and in-memory caching.
SEE ALSO samtools(1)tabix-0.2.0 11 May 2010 tabix(1)