Thanks for your replies. I am getting closer. My bad for not providing enough info. Lets say I have 10,000 lines of data, but only 4 to 10 lines start with "A". I only want to search for "five" in the lines that start with "A" so I don't get results of ,,, for all 10,000 lines of data.
Last edited by vgersh99; 01-06-2015 at 12:57 PM..
Reason: once again - code tags, please!
Hi All,
I have a file that I need to be able to find a pattern match on a line, search that line for a text pattern, and replace that text.
An example of 4 lines in my file is:
1. MatchText_randomNumberOfText moreData ReplaceMe moreData
2. MatchText_randomNumberOfText moreData moreData... (4 Replies)
Hi all.
I have the following command that is successfully searching for any one of the strings on all lines of a file and replacing it with the instructed value.
cat inputFile | awk '{gsub(/aaa|bbb|ccc|ddd/,"1234")}1' > outputFile
This does in fact replace any occurrence of aaa, bbb,... (2 Replies)
I am having a text file which is having more than 200 lines.
EX:
001010122 12000 BIB 12000 11200 1200003
001010122 2000 AND 12000 11200 1200003
001010122 12000 KVB 12000 11200 1200003
In the above file i want to search for string KVB and add/replace... (1 Reply)
Dear All
I am having a text file which is having more than 200 lines.
EX:
001010122 12000 BIB 12000 11200 1200003
001010122 2000 AND 12000 11200 1200003
001010122 12000 KVB 12000 11200 1200003
In the above file i want to search for string KVB... (5 Replies)
Hi guys,
I have a text file named file1.txt that is formatted like this:
001 , ID , 20000
002 , Name , Brandon
003 , Phone_Number , 616-234-1999
004 , SSNumber , 234-23-234
005 , Model , Toyota
007 , Engine ,V8
008 , GPS , OFF
and I have file2.txt formatted like this:
... (2 Replies)
Have a file which has the create statement like below
create table emp
( empno integer,
empname char(50))
primary index(empno);
i need to find a string starting with create and ends with semi-colon ;. if so insert the below statement before create statement
rename table emp to emp_rename;... (2 Replies)
I want to search a small string in a large string and find the locations of the string. For this I used grep "string" -ob <file name where the large string is stored>. Now this gives me the locations of that string. Now how do I store these locations in a text file.
Please use CODE tags as... (7 Replies)
Hi dears
i use bash shell
i have INPUT.txt
like this
number of columns different in one
some row have 12 , some 11 columns
see last column
INPUT.txt
CodeGender Age Grade Dialect Session Sentence Start End Length Phonemic Phonetic
63 M 27 BS/BA TEHRANI 3 4 298320 310050... (2 Replies)
hi all,
trying this using shell/bash with sed/awk/grep
I have two files, one containing one column, the other containing multiple columns (comma delimited).
file1.txt
abc12345
def12345
ghi54321
...
file2.txt
abc1,text1,texta
abc,text2,textb
def123,text3,textc
gh,text4,textd... (6 Replies)
Discussion started by: shogun1970
6 Replies
LEARN ABOUT DEBIAN
tabix
tabix(1) Bioinformatics tools tabix(1)NAME
bgzip - Block compression/decompression utility
tabix - Generic indexer for TAB-delimited genome position files
SYNOPSIS
bgzip [-cdhB] [-b virtualOffset] [-s size] [file]
tabix [-0lf] [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol] [-S lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]]
DESCRIPTION
Tabix indexes a TAB-delimited genome position file in.tab.bgz and creates an index file in.tab.bgz.tbi when region is absent from the com-
mand-line. The input data file must be position sorted and compressed by bgzip which has a gzip(1) like interface. After indexing, tabix is
able to quickly retrieve data lines overlapping regions specified in the format "chr:beginPos-endPos". Fast data retrieval also works over
network if URI is given as a file name and in this case the index file will be downloaded if it is not present locally.
OPTIONS OF TABIX -p STR Input format for indexing. Valid values are: gff, bed, sam, vcf and psltab. This option should not be applied together with any
of -s, -b, -e, -c and -0; it is not used for data retrieval because this setting is stored in the index file. [gff]
-s INT Column of sequence name. Option -s, -b, -e, -S, -c and -0 are all stored in the index file and thus not used in data retrieval.
[1]
-b INT Column of start chromosomal position. [4]
-e INT Column of end chromosomal position. The end column can be the same as the start column. [5]
-S INT Skip first INT lines in the data file. [0]
-c CHAR Skip lines started with character CHAR. [#]
-0 Specify that the position in the data file is 0-based (e.g. UCSC files) rather than 1-based.
-h Print the header/meta lines.
-B The second argument is a BED file. When this option is in use, the input file may not be sorted or indexed. The entire input will
be read sequentially. Nonetheless, with this option, the format of the input must be specificed correctly on the command line.
-f Force to overwrite the index file if it is present.
-l List the sequence names stored in the index file.
EXAMPLE
(grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip > sorted.gff.gz;
tabix -p gff sorted.gff.gz;
tabix sorted.gff.gz chr1:10,000,000-20,000,000;
NOTES
It is straightforward to achieve overlap queries using the standard B-tree index (with or without binning) implemented in all SQL data-
bases, or the R-tree index in PostgreSQL and Oracle. But there are still many reasons to use tabix. Firstly, tabix directly works with a
lot of widely used TAB-delimited formats such as GFF/GTF and BED. We do not need to design database schema or specialized binary formats.
Data do not need to be duplicated in different formats, either. Secondly, tabix works on compressed data files while most SQL databases do
not. The GenCode annotation GTF can be compressed down to 4%. Thirdly, tabix is fast. The same indexing algorithm is known to work effi-
ciently for an alignment with a few billion short reads. SQL databases probably cannot easily handle data at this scale. Last but not the
least, tabix supports remote data retrieval. One can put the data file and the index at an FTP or HTTP server, and other users or even web
services will be able to get a slice without downloading the entire file.
AUTHOR
Tabix was written by Heng Li. The BGZF library was originally implemented by Bob Handsaker and modified by Heng Li for remote file access
and in-memory caching.
SEE ALSO samtools(1)tabix-0.2.0 11 May 2010 tabix(1)