Finding Nth Column Post: 302596476

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Finding nth occurrence in line and replacing it

Hi, I have several files with data that have to be imported to a database. These files contain records with separator characters. Some records are corrupt (2 separators are missing) and I need to correct them prior to importing them into the db. Example: ...

2. Shell Programming and Scripting

Editing 1st or nth column

Hi, I have a file whick is pipe delimited : 100| alpha| tabgo|watch| |||| 444444 | alpha| tabgo|watch| |||| 444444 | sweden |tabgo|watch| |||| 444444 | US| tabgo|watch| |||| 444444 100| factory| tabgo|watch| |||| 444444 | ABC| tabgo|watch| |||| 444444 | launch| tabgo|watch| ||||...

3. UNIX for Dummies Questions & Answers

Finding nth line across multiple files

I have several files (around 50) that have the similar format. I need to extract the 5th line from every file and output that into a text file. So far, I have been able to figure out how to do it for a single file: $ awk 'NR==5' text1.txt > results.txt OR $ sed -n '5p' text1.txt > results.txt...

4. Shell Programming and Scripting

get 3rd column of nth line

hi; i have a file.txt and its 9th, 10th and 11th line lines are: RbsLocalCell=S2C1 maxPortIP 4 (this is 9th line) RbsLocalCell=S3C1 maxPortIP 4 (this is 10th line) RbsLocalCell=S1C1 ...

5. UNIX for Dummies Questions & Answers

finding the nth match

I have a file that has information for a person....each person gets 3 or more lines to describe them. I was hoping to match person 1, then person 2.....BUT I do not know how to tell grep I only want the first (2nd, 3rd or nth) match. The alternative is doing line by line logic, which is fine...

6. Shell Programming and Scripting

Using AWK to find top Nth values in Nth column

I have an awk script to find the maximum value of the 2nd column of a 2 column datafile, but I need to find the top 5 maximum values of the 2nd column. Here is the script that works for the maximum value. awk 'BEGIN { subjectmax=$1 ; max=0} $2 >= max {subjectmax=$1 ; max=$2} END {print...

7. Shell Programming and Scripting

Calculating average for every Nth line in the Nth column

Is there an awk script that can easily perform the following operation? I have a data file that is in the format of 1944-12,5.6 1945-01,9.8 1945-02,6.7 1945-03,9.3 1945-04,5.9 1945-05,0.7 1945-06,0.0 1945-07,0.0 1945-08,0.0 1945-09,0.0 1945-10,0.2 1945-11,10.5 1945-12,22.3...

8. Answers to Frequently Asked Questions

Finding the nth Particular Week in a Month � shell script

I see lot of request posted in internet to find out the day of nth week in a Month. example: what is the date of 3rd Sunday in October What is the date of 2nd Friday in June 2012 what is the date of 4th Saturday in January 2011..etc.. The below shell script is used to find out the...

9. Shell Programming and Scripting

Taking nth column and putting its value in n+1 column using awk

10. UNIX for Dummies Questions & Answers

Getting the lines with nth column non-null

Hi, I have a huge list of archives (.gz). Each archive is about 40MB. A file is generated every minute so if I want to analyze the data for 1 hour I get already 60 files for example. These are text files, ';' separated, each line having about 300 fields (columns). What I need to do is to...

LEARN ABOUT DEBIAN

tabix

tabix(1)						       Bioinformatics tools							  tabix(1)

NAME

       bgzip - Block compression/decompression utility

       tabix - Generic indexer for TAB-delimited genome position files

SYNOPSIS

       bgzip [-cdhB] [-b virtualOffset] [-s size] [file]

       tabix [-0lf] [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol] [-S lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]]

DESCRIPTION

       Tabix  indexes a TAB-delimited genome position file in.tab.bgz and creates an index file in.tab.bgz.tbi when region is absent from the com-
       mand-line. The input data file must be position sorted and compressed by bgzip which has a gzip(1) like interface. After indexing, tabix is
       able  to quickly retrieve data lines overlapping regions specified in the format "chr:beginPos-endPos". Fast data retrieval also works over
       network if URI is given as a file name and in this case the index file will be downloaded if it is not present locally.

OPTIONS OF TABIX

       -p STR	 Input format for indexing. Valid values are: gff, bed, sam, vcf and psltab. This option should not be applied together  with  any
		 of -s, -b, -e, -c and -0; it is not used for data retrieval because this setting is stored in the index file. [gff]

       -s INT	 Column  of  sequence name. Option -s, -b, -e, -S, -c and -0 are all stored in the index file and thus not used in data retrieval.
		 [1]

       -b INT	 Column of start chromosomal position. [4]

       -e INT	 Column of end chromosomal position. The end column can be the same as the start column. [5]

       -S INT	 Skip first INT lines in the data file. [0]

       -c CHAR	 Skip lines started with character CHAR. [#]

       -0	 Specify that the position in the data file is 0-based (e.g. UCSC files) rather than 1-based.

       -h	 Print the header/meta lines.

       -B	 The second argument is a BED file. When this option is in use, the input file may not be sorted or indexed. The entire input will
		 be read sequentially. Nonetheless, with this option, the format of the input must be specificed correctly on the command line.

       -f	 Force to overwrite the index file if it is present.

       -l	 List the sequence names stored in the index file.

EXAMPLE

       (grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip > sorted.gff.gz;

       tabix -p gff sorted.gff.gz;

       tabix sorted.gff.gz chr1:10,000,000-20,000,000;

NOTES

       It  is  straightforward	to  achieve overlap queries using the standard B-tree index (with or without binning) implemented in all SQL data-
       bases, or the R-tree index in PostgreSQL and Oracle. But there are still many reasons to use tabix. Firstly, tabix directly  works  with  a
       lot  of	widely used TAB-delimited formats such as GFF/GTF and BED. We do not need to design database schema or specialized binary formats.
       Data do not need to be duplicated in different formats, either. Secondly, tabix works on compressed data files while most SQL databases	do
       not.  The  GenCode annotation GTF can be compressed down to 4%.	Thirdly, tabix is fast. The same indexing algorithm is known to work effi-
       ciently for an alignment with a few billion short reads. SQL databases probably cannot easily handle data at this scale. Last but  not  the
       least,  tabix supports remote data retrieval. One can put the data file and the index at an FTP or HTTP server, and other users or even web
       services will be able to get a slice without downloading the entire file.

AUTHOR

       Tabix was written by Heng Li. The BGZF library was originally implemented by Bob Handsaker and modified by Heng Li for remote  file  access
       and in-memory caching.

SEE ALSO

       samtools(1)

tabix-0.2.0							    11 May 2010 							  tabix(1)