I am trying to output all lines in a file where $7 is less than 30. The below code does create a result file, but with all lines in the original file. The original file is tab deliminated is that the problem? Thank you .
file.txt
Desired result.txt ---------- Post updated at 12:12 PM ---------- Previous update was at 11:59 AM ----------
It was the FS=OFS="'," .... should be OFS="/t" , guess I need to pay more attention. FS is Field seperator and OFS is Output Field Seperator, right? Thank you .
Last edited by cmccabe; 07-30-2015 at 02:14 PM..
Reason: added desired result
Can some body tell me how to print number of line from a particular file, with sed. ?
Input file format
AAAA
BBBB
CCCC
SDFFF
DDDD
DDDD
Command to print line 2 and 3 ?
BBBB
CCCC
And also please tell me how to assign column sum to variable.
I user the following command it... (1 Reply)
Is there a way to tell awk to ignore the first 11 lines of a file?? example, I have a csv file with all the heading information in the first lines. I want to split the file into 5-6 different files but I want to retain the the first 11 lines of the file.
As it is now I run this command:
... (8 Replies)
Hi,
I need help in printing out the dates with the largest value in front of it using awk.
436 28/Feb/2008
436 27/Feb/2008
436 20/Feb/2008
422 13/Feb/2008
420 23/Feb/2008
409 21/Feb/2008
402 26/Feb/2008
381 22/Feb/2008
374 24/Feb/2008
360... (7 Replies)
I have a CSV file with a variable number of fields per record. How do I print lines of a certain number of fields only? Several permutations of the following (including the use of escape characters) have failed to retrieve the line I'm after (1,2,3,4)...
$ cat myfile
1,2,3,4
1,2,3
$ # Print... (1 Reply)
How do I get the last NR of a csv file?
If I use the line
awk -F, '{print NR}' csvfile.csv
and there are 42 lines, I get:
...
39
40
41
42
How do I extract the last number, which in this case is 42?
---------- Post updated at 11:05 AM ---------- Previous update was at 10:57 AM... (1 Reply)
Hi,
I have a problem when doing calculations in awk.
I want to add up a few numbers and output the result.
testfile:
48844322.87
7500.00
10577415.87
3601951.41
586877.64
1947813.89
$ awk '{x=x+$1};END{print x}' testfile
6.55659e+07The problem is the number format. It should show... (3 Replies)
Hi All,
I have a file a.txt, content as mentioned below:
22454750
This data in this control file and
I have a variable called vCount which contains a number.
I need to extract the 22454750 from the above file and compare with the variable vCount. If match fine or else exit.
... (5 Replies)
Hello,
I am new to AWK and in UNIX in general. I am hoping you can help me out here.
Here is my data:
root@ubuntu:~# cat circuits.list
WORD1
AA
BB
CC
DD
Active
ISP1
ISP NAME1
XX-XXXXXX1
WORD1
AA
BB
CC (9 Replies)
I want to check my data quality. I want to output the lines with non-number. I used the grep command:
grep '' myfile.csv
Since my file is csv file, I don't want to output the lines with comma. And I also don't want to output "." or space. But I still get the lines like the following:... (8 Replies)
Discussion started by: twotwo
8 Replies
LEARN ABOUT DEBIAN
tabix
tabix(1) Bioinformatics tools tabix(1)NAME
bgzip - Block compression/decompression utility
tabix - Generic indexer for TAB-delimited genome position files
SYNOPSIS
bgzip [-cdhB] [-b virtualOffset] [-s size] [file]
tabix [-0lf] [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol] [-S lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]]
DESCRIPTION
Tabix indexes a TAB-delimited genome position file in.tab.bgz and creates an index file in.tab.bgz.tbi when region is absent from the com-
mand-line. The input data file must be position sorted and compressed by bgzip which has a gzip(1) like interface. After indexing, tabix is
able to quickly retrieve data lines overlapping regions specified in the format "chr:beginPos-endPos". Fast data retrieval also works over
network if URI is given as a file name and in this case the index file will be downloaded if it is not present locally.
OPTIONS OF TABIX -p STR Input format for indexing. Valid values are: gff, bed, sam, vcf and psltab. This option should not be applied together with any
of -s, -b, -e, -c and -0; it is not used for data retrieval because this setting is stored in the index file. [gff]
-s INT Column of sequence name. Option -s, -b, -e, -S, -c and -0 are all stored in the index file and thus not used in data retrieval.
[1]
-b INT Column of start chromosomal position. [4]
-e INT Column of end chromosomal position. The end column can be the same as the start column. [5]
-S INT Skip first INT lines in the data file. [0]
-c CHAR Skip lines started with character CHAR. [#]
-0 Specify that the position in the data file is 0-based (e.g. UCSC files) rather than 1-based.
-h Print the header/meta lines.
-B The second argument is a BED file. When this option is in use, the input file may not be sorted or indexed. The entire input will
be read sequentially. Nonetheless, with this option, the format of the input must be specificed correctly on the command line.
-f Force to overwrite the index file if it is present.
-l List the sequence names stored in the index file.
EXAMPLE
(grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip > sorted.gff.gz;
tabix -p gff sorted.gff.gz;
tabix sorted.gff.gz chr1:10,000,000-20,000,000;
NOTES
It is straightforward to achieve overlap queries using the standard B-tree index (with or without binning) implemented in all SQL data-
bases, or the R-tree index in PostgreSQL and Oracle. But there are still many reasons to use tabix. Firstly, tabix directly works with a
lot of widely used TAB-delimited formats such as GFF/GTF and BED. We do not need to design database schema or specialized binary formats.
Data do not need to be duplicated in different formats, either. Secondly, tabix works on compressed data files while most SQL databases do
not. The GenCode annotation GTF can be compressed down to 4%. Thirdly, tabix is fast. The same indexing algorithm is known to work effi-
ciently for an alignment with a few billion short reads. SQL databases probably cannot easily handle data at this scale. Last but not the
least, tabix supports remote data retrieval. One can put the data file and the index at an FTP or HTTP server, and other users or even web
services will be able to get a slice without downloading the entire file.
AUTHOR
Tabix was written by Heng Li. The BGZF library was originally implemented by Bob Handsaker and modified by Heng Li for remote file access
and in-memory caching.
SEE ALSO samtools(1)tabix-0.2.0 11 May 2010 tabix(1)