Sponsored Content
Top Forums Shell Programming and Scripting Help needed to validate and merge Two files Post 302322071 by figaro on Tuesday 2nd of June 2009 05:44:58 PM
Old 06-02-2009
If this is going to be routine, you are probably far better off using a database, also considering your data is tab delimited. Databases are more robust, especially when files get big. In such case, the problem boils down to reading in your data and getting the SQL for selection right.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sftp shell that will validate files

Hi. I've written a shell that will sftp about 5,000 to 10,000 files a day (they are small 200 byte files). Where I'm stuck at is that I need to not only copy those files from the ftp server, I also need to delete the files on the ftp server after they have been ftped BUT before they've been... (0 Replies)
Discussion started by: biglarrrr
0 Replies

2. Shell Programming and Scripting

sftp validate files before removal

I have a shell, which locks calls an expect script that connects to an sftp and downloads all files. I need it to then check each file in my directory and compare the sizes. If the file name and file size match, then delete the file on the server. Is this possible? :confused: (2 Replies)
Discussion started by: tariqjamal
2 Replies

3. Shell Programming and Scripting

script needed to merge two files and report differences

Hello, I have two txt files that look like this: db.0.0.0.0: Total number of NS records = 1 db.127.0.0.0: Total number of NS records = 1 Total number of PTR records = 1 db.172.19.0.0: Total number of NS records = 1 Total number of PTR records = 3 db.172.19.59.0: Total... (8 Replies)
Discussion started by: richsark
8 Replies

4. Shell Programming and Scripting

How to validate files after FTP

Hi, I have to write a script and the requirment where using ftp i need to get 8 files from a remote box to local. I know how how to ftp and do get. However i want do a validation after get. After geting for 8 files, i want to compare the size of the 8 files when it was in remote box with size... (6 Replies)
Discussion started by: tarakant
6 Replies

5. Shell Programming and Scripting

Validate before moving files.

File Format dataline1,12,,,,; dataline2,24,,,; dataline3,12,,,,; dataline4,24,,,; COUNT=0004; Hi, I have source(/home/rgupta) and destination folders /home/rgupta/arch and /home/rgupta/err. Source folders has many files. I want to move the files from source to destination with some... (1 Reply)
Discussion started by: ravigupta2u
1 Replies

6. Shell Programming and Scripting

[Validate] Script to remove files olderthan X days

Hi, I have written a small shell script to remove the files olderthan X days (say 30). But I not sure how it acts on the filesystem, as I am using rm. if then echo "##############################################" echo "Invalid no .of arguments\n" echo "Usage:\n$0 PATH... (6 Replies)
Discussion started by: karumudi7
6 Replies

7. Shell Programming and Scripting

Validate input files and update

We have a job which we need to run on daily bases, before loading data in a table we need to validate whether the input file is received or not.Inputfile formatsrc_sps_d_Call_Center_Reporting_yyyymmdd_01.dat SPS-Service nameYYYY-yearMM-MonthDD-dayLike above we will get n number of files for... (1 Reply)
Discussion started by: katakamvivek
1 Replies

8. Shell Programming and Scripting

Validate input files daily

We have a job which we need to run on daily bases, before loading data in a table we need to validate whether the input file is received or not. Daily client will place the files in a particular path.Below files which I need to process for 04/01/2013(Load date).... (2 Replies)
Discussion started by: katakamvivek
2 Replies

9. Shell Programming and Scripting

awk to compare files and validate order of headers

The below awk verifies the count and order of each text file in the directory. The script does execute and produce output, however the order of the headers are not compared to key. The portion in bold is supposed to do that. If the order of the headers in each text file is the same as key, then... (0 Replies)
Discussion started by: cmccabe
0 Replies

10. Shell Programming and Scripting

Validate compressed files

Hi All, I have zip file that needs to be validated and checked for 5 times with sleep of 60 seconds. Some thing like below #!/bin/bash counter=1 while do curl -i -k -X GET `strings tmp.txt |grep Location| cut -f2 -d" "` -H "Authorization: Token $TOKEN" -o $zip_file ## this is... (6 Replies)
Discussion started by: Master_Mind
6 Replies
tabix(1)						       Bioinformatics tools							  tabix(1)

NAME
bgzip - Block compression/decompression utility tabix - Generic indexer for TAB-delimited genome position files SYNOPSIS
bgzip [-cdhB] [-b virtualOffset] [-s size] [file] tabix [-0lf] [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol] [-S lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]] DESCRIPTION
Tabix indexes a TAB-delimited genome position file in.tab.bgz and creates an index file in.tab.bgz.tbi when region is absent from the com- mand-line. The input data file must be position sorted and compressed by bgzip which has a gzip(1) like interface. After indexing, tabix is able to quickly retrieve data lines overlapping regions specified in the format "chr:beginPos-endPos". Fast data retrieval also works over network if URI is given as a file name and in this case the index file will be downloaded if it is not present locally. OPTIONS OF TABIX
-p STR Input format for indexing. Valid values are: gff, bed, sam, vcf and psltab. This option should not be applied together with any of -s, -b, -e, -c and -0; it is not used for data retrieval because this setting is stored in the index file. [gff] -s INT Column of sequence name. Option -s, -b, -e, -S, -c and -0 are all stored in the index file and thus not used in data retrieval. [1] -b INT Column of start chromosomal position. [4] -e INT Column of end chromosomal position. The end column can be the same as the start column. [5] -S INT Skip first INT lines in the data file. [0] -c CHAR Skip lines started with character CHAR. [#] -0 Specify that the position in the data file is 0-based (e.g. UCSC files) rather than 1-based. -h Print the header/meta lines. -B The second argument is a BED file. When this option is in use, the input file may not be sorted or indexed. The entire input will be read sequentially. Nonetheless, with this option, the format of the input must be specificed correctly on the command line. -f Force to overwrite the index file if it is present. -l List the sequence names stored in the index file. EXAMPLE
(grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip > sorted.gff.gz; tabix -p gff sorted.gff.gz; tabix sorted.gff.gz chr1:10,000,000-20,000,000; NOTES
It is straightforward to achieve overlap queries using the standard B-tree index (with or without binning) implemented in all SQL data- bases, or the R-tree index in PostgreSQL and Oracle. But there are still many reasons to use tabix. Firstly, tabix directly works with a lot of widely used TAB-delimited formats such as GFF/GTF and BED. We do not need to design database schema or specialized binary formats. Data do not need to be duplicated in different formats, either. Secondly, tabix works on compressed data files while most SQL databases do not. The GenCode annotation GTF can be compressed down to 4%. Thirdly, tabix is fast. The same indexing algorithm is known to work effi- ciently for an alignment with a few billion short reads. SQL databases probably cannot easily handle data at this scale. Last but not the least, tabix supports remote data retrieval. One can put the data file and the index at an FTP or HTTP server, and other users or even web services will be able to get a slice without downloading the entire file. AUTHOR
Tabix was written by Heng Li. The BGZF library was originally implemented by Bob Handsaker and modified by Heng Li for remote file access and in-memory caching. SEE ALSO
samtools(1) tabix-0.2.0 11 May 2010 tabix(1)
All times are GMT -4. The time now is 03:57 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy