02-07-2012
Finding Nth Column
Please help me how can I display every nth field present in a "|" delimited file.
Ex: If a have a file with data as a|b|c|d|e|f|g|h|k|l|m|n
I want to display every 3rd feild which means the output should be
c
f
k
n
Please help me.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi,
I have several files with data that have to be imported to a database. These files contain records with separator characters. Some records are corrupt (2 separators are missing) and I need to correct them prior to importing them into the db.
Example:
... (5 Replies)
Discussion started by: stresing
5 Replies
2. Shell Programming and Scripting
Hi,
I have a file whick is pipe delimited :
100| alpha| tabgo|watch| |||| 444444
| alpha| tabgo|watch| |||| 444444
| sweden |tabgo|watch| |||| 444444
| US| tabgo|watch| |||| 444444
100| factory| tabgo|watch| |||| 444444
| ABC| tabgo|watch| |||| 444444
| launch| tabgo|watch| ||||... (4 Replies)
Discussion started by: darshanw
4 Replies
3. UNIX for Dummies Questions & Answers
I have several files (around 50) that have the similar format. I need to extract the 5th line from every file and output that into a text file. So far, I have been able to figure out how to do it for a single file:
$ awk 'NR==5' text1.txt > results.txt
OR
$ sed -n '5p' text1.txt > results.txt... (6 Replies)
Discussion started by: oriqin
6 Replies
4. Shell Programming and Scripting
hi;
i have a file.txt and its 9th, 10th and 11th line lines are:
RbsLocalCell=S2C1 maxPortIP 4 (this is 9th line)
RbsLocalCell=S3C1 maxPortIP 4 (this is 10th line)
RbsLocalCell=S1C1 ... (11 Replies)
Discussion started by: gc_sw
11 Replies
5. UNIX for Dummies Questions & Answers
I have a file that has information for a person....each person gets 3 or more lines to describe them.
I was hoping to match person 1, then person 2.....BUT I do not know how to tell grep I only want the first (2nd, 3rd or nth) match.
The alternative is doing line by line logic, which is fine... (8 Replies)
Discussion started by: countryStyle
8 Replies
6. Shell Programming and Scripting
I have an awk script to find the maximum value of the 2nd column of a 2 column datafile, but I need to find the top 5 maximum values of the 2nd column.
Here is the script that works for the maximum value.
awk 'BEGIN { subjectmax=$1 ; max=0} $2 >= max {subjectmax=$1 ; max=$2} END {print... (3 Replies)
Discussion started by: ncwxpanther
3 Replies
7. Shell Programming and Scripting
Is there an awk script that can easily perform the following operation?
I have a data file that is in the format of
1944-12,5.6
1945-01,9.8
1945-02,6.7
1945-03,9.3
1945-04,5.9
1945-05,0.7
1945-06,0.0
1945-07,0.0
1945-08,0.0
1945-09,0.0
1945-10,0.2
1945-11,10.5
1945-12,22.3... (3 Replies)
Discussion started by: ncwxpanther
3 Replies
8. Answers to Frequently Asked Questions
I see lot of request posted in internet to find out the day of nth week in a Month.
example:
what is the date of 3rd Sunday in October
What is the date of 2nd Friday in June 2012
what is the date of 4th Saturday in January 2011..etc..
The below shell script is used to find out the... (1 Reply)
Discussion started by: itkamaraj
1 Replies
9. Shell Programming and Scripting
Hello Members,
Need your expert opinion how to tackle below.
I have an input file that looks like below:
USS|AWCC|AFGAW|93|70
USSAA|Roshan TDCA|AFGTD|93|72,79
ALB|Vodafone|ALBVF|355|69
ALGEE|Wataniya (Nedjma)|DZAWT|213|50,550
I like output file in below format:
... (7 Replies)
Discussion started by: umarsatti
7 Replies
10. UNIX for Dummies Questions & Answers
Hi,
I have a huge list of archives (.gz). Each archive is about 40MB. A file is generated every minute so if I want to analyze the data for 1 hour I get already 60 files for example.
These are text files, ';' separated, each line having about 300 fields (columns).
What I need to do is to... (11 Replies)
Discussion started by: Nenad
11 Replies
tabix(1) Bioinformatics tools tabix(1)
NAME
bgzip - Block compression/decompression utility
tabix - Generic indexer for TAB-delimited genome position files
SYNOPSIS
bgzip [-cdhB] [-b virtualOffset] [-s size] [file]
tabix [-0lf] [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol] [-S lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]]
DESCRIPTION
Tabix indexes a TAB-delimited genome position file in.tab.bgz and creates an index file in.tab.bgz.tbi when region is absent from the com-
mand-line. The input data file must be position sorted and compressed by bgzip which has a gzip(1) like interface. After indexing, tabix is
able to quickly retrieve data lines overlapping regions specified in the format "chr:beginPos-endPos". Fast data retrieval also works over
network if URI is given as a file name and in this case the index file will be downloaded if it is not present locally.
OPTIONS OF TABIX
-p STR Input format for indexing. Valid values are: gff, bed, sam, vcf and psltab. This option should not be applied together with any
of -s, -b, -e, -c and -0; it is not used for data retrieval because this setting is stored in the index file. [gff]
-s INT Column of sequence name. Option -s, -b, -e, -S, -c and -0 are all stored in the index file and thus not used in data retrieval.
[1]
-b INT Column of start chromosomal position. [4]
-e INT Column of end chromosomal position. The end column can be the same as the start column. [5]
-S INT Skip first INT lines in the data file. [0]
-c CHAR Skip lines started with character CHAR. [#]
-0 Specify that the position in the data file is 0-based (e.g. UCSC files) rather than 1-based.
-h Print the header/meta lines.
-B The second argument is a BED file. When this option is in use, the input file may not be sorted or indexed. The entire input will
be read sequentially. Nonetheless, with this option, the format of the input must be specificed correctly on the command line.
-f Force to overwrite the index file if it is present.
-l List the sequence names stored in the index file.
EXAMPLE
(grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip > sorted.gff.gz;
tabix -p gff sorted.gff.gz;
tabix sorted.gff.gz chr1:10,000,000-20,000,000;
NOTES
It is straightforward to achieve overlap queries using the standard B-tree index (with or without binning) implemented in all SQL data-
bases, or the R-tree index in PostgreSQL and Oracle. But there are still many reasons to use tabix. Firstly, tabix directly works with a
lot of widely used TAB-delimited formats such as GFF/GTF and BED. We do not need to design database schema or specialized binary formats.
Data do not need to be duplicated in different formats, either. Secondly, tabix works on compressed data files while most SQL databases do
not. The GenCode annotation GTF can be compressed down to 4%. Thirdly, tabix is fast. The same indexing algorithm is known to work effi-
ciently for an alignment with a few billion short reads. SQL databases probably cannot easily handle data at this scale. Last but not the
least, tabix supports remote data retrieval. One can put the data file and the index at an FTP or HTTP server, and other users or even web
services will be able to get a slice without downloading the entire file.
AUTHOR
Tabix was written by Heng Li. The BGZF library was originally implemented by Bob Handsaker and modified by Heng Li for remote file access
and in-memory caching.
SEE ALSO
samtools(1)
tabix-0.2.0 11 May 2010 tabix(1)