Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Diff command on two Fastq.gz files Post 302876927 by jim mcnamara on Wednesday 27th of November 2013 08:27:27 AM
Old 11-27-2013
There are VERY, VERY likely NO errors in the merge because gzip, zcat, etc., do a checksum.

A better verification if you honestly do not understand what compression does:
Code:
(gunzip -c file1.gz; gunzip -c file2.gz)> tmp1; gunzip -c bigfile > tmp2
diff tmp1 tmp2
rm tmp1 tmp2

gunzip -c is the same as zcat....
If compression had problems everyone would be doing diffs, and probably not using gzip, gunzip for anything.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

diff 2 files; output diff's to 3rd file

Hello, I want to compare two files. All records in file 2 that are not in file 1 should be output to file 3. For example: file 1 123 1234 123456 file 2 123 2345 23456 file 3 should have 2345 23456 I have looked at diff, bdiff, cmp, comm, diff3 without any luck! (2 Replies)
Discussion started by: blt123
2 Replies

2. Shell Programming and Scripting

Find duplicates from multuple files with 2 diff types of files

I need to compare 2 diff type of files and find out the duplicate after comparing each types of files: Type 1 file name is like: file1.abc (the extension abc could any 3 characters but I can narrow it down or hardcode for 10/15 combinations). The other file is file1.bcd01abc (the extension... (2 Replies)
Discussion started by: ricky007
2 Replies

3. AIX

diff command

hello i've two files. how i get the diff between the two files to new file. thanks best regards ariec (3 Replies)
Discussion started by: ariec
3 Replies

4. Shell Programming and Scripting

diff command help

Hi all diff file1 file 2 command will give us op of diff between two file. But it aslo give its position and sign "<" or ">". I dont want position and sign in op. Only diff of content should be come as op. Kindly help me for this. Regards Jaydeep (1 Reply)
Discussion started by: jaydeep_sadaria
1 Replies

5. UNIX for Dummies Questions & Answers

Diff command of two files

Hi, I use the diff command to compare two files and append this output to a file. I would like to now not only produce the differences but be able to output the total number of changes made, the number of new files added and the number of files deleted, is there I can do this using the diff... (2 Replies)
Discussion started by: cyberfrog
2 Replies

6. Shell Programming and Scripting

Combining 3 fastq files

Hello, I am working with next-gen short-read sequence data, which we receive in 3 fastq files. These are arranged in 4-line groups for each read: line1: read identifier, beginning, e.g., "@HWI-ST1342..." line2: DNA sequence, for files 1 and 2, 101 characters, for file 3, 7 chars. line3: "+"... (2 Replies)
Discussion started by: ljk
2 Replies

7. Shell Programming and Scripting

Extract length wise sequences from fastq file

I have a fastq file from small RNA sequencing with sequence lengths between 15 - 30. I wanted to filter sequence lengths between 21-25 and write to another fastq file. how can i do that? (4 Replies)
Discussion started by: empyrean
4 Replies

8. Shell Programming and Scripting

One-way diff command?

Hello, I am trying to find the different files between multiple directories in Linux, here is a small assumption of what is inside the directories dir1 dir2 dir3 1.txt 1.txt 1.txt 2.txt 3.txt 3.txt 5.txt 4.txt 5.txt 6.txt 7.txt 8.txt I am using the following... (4 Replies)
Discussion started by: Error404
4 Replies

9. Shell Programming and Scripting

Diff 3 files, but diff only their 2nd column

Guys i have 3 files, but i want to compare and diff only the 2nd column path=`/home/whois/doms` for i in `cat domain.tx` do whois $i| sed -n '/Registry Registrant ID:/,/Registrant Email:/p' > $path/$i.registrant whois $i| sed -n '/Registry Admin ID:/,/Admin Email:/p' > $path/$i.admin... (10 Replies)
Discussion started by: kenshinhimura
10 Replies

10. UNIX for Beginners Questions & Answers

Comparing fastq files and outputting common records

I have two files: File_1: @M04961:22:000000000-B5VGJ:1:1101:9280:7106 1:N:0:86 GGCATGAAAACATACAAACCGTCTTTCCAGAAATTGTTCCAAGTATCGGCAACAGCTTTATCAATACCATGAAAAATATCAACCACACCAGAAGCAGCAT + GGGGGGGGGGGGGGGGGCCGGGGGF,EDFFGEDFG,@DGGCGGEGGG7DCGGGF68CGFFFGGGG@CGDGFFDFEFEFF:30CGAFFDFEFF8CAF;;8F ... (3 Replies)
Discussion started by: Xterra
3 Replies
GZIP(1) 						    BSD General Commands Manual 						   GZIP(1)

NAME
gzip -- compression/decompression tool using Lempel-Ziv coding (LZ77) SYNOPSIS
gzip [-cdfhkLlNnqrtVv] [-S suffix] file [file [...]] gunzip [-cfhkLNqrtVv] [-S suffix] file [file [...]] zcat [-fhV] file [file [...]] DESCRIPTION
The gzip program compresses and decompresses files using Lempel-Ziv coding (LZ77). If no files are specified, gzip will compress from stan- dard input, or decompress to standard output. When in compression mode, each file will be replaced with another file with the suffix, set by the -S suffix option, added, if possible. In decompression mode, each file will be checked for existence, as will the file with the suffix added. Each file argument must contain a separate complete archive; when multiple files are indicated, each is decompressed in turn. In the case of gzcat the resulting data is then concatenated in the manner of cat(1). If invoked as gunzip then the -d option is enabled. If invoked as zcat or gzcat then both the -c and -d options are enabled. This version of gzip is also capable of decompressing files compressed using compress(1) or bzip2(1). OPTIONS
The following options are available: -1, --fast -2, -3, -4, -5, -6, -7, -8 -9, --best These options change the compression level used, with the -1 option being the fastest, with less compression, and the -9 option being the slowest, with optimal compression. The default compression level is 6. -c, --stdout, --to-stdout This option specifies that output will go to the standard output stream, leaving files intact. -d, --decompress, --uncompress This option selects decompression rather than compression. -f, --force This option turns on force mode. This allows files with multiple links, symbolic links to regular files, overwriting of pre-existing files, reading from or writing to a terminal, and when combined with the -c option, allowing non-compressed data to pass through unchanged. -h, --help This option prints a usage summary and exits. -k, --keep Keep (don't delete) input files during compression or decompression. -L, --license This option prints gzip license. -l, --list This option displays information about the file's compressed and uncompressed size, ratio, uncompressed name. With the -v option, it also displays the compression method, CRC, date and time embedded in the file. -N, --name This option causes the stored filename in the input file to be used as the output file. -n, --no-name This option stops the filename and timestamp from being stored in the output file. -q, --quiet With this option, no warnings or errors are printed. -r, --recursive This option is used to gzip the files in a directory tree individually, using the fts(3) library. -S suffix, --suffix suffix This option changes the default suffix from .gz to suffix. -t, --test This option will test compressed files for integrity. -V, --version This option prints the version of the gzip program. -v, --verbose This option turns on verbose mode, which prints the compression ratio for each file compressed. ENVIRONMENT
If the environment variable GZIP is set, it is parsed as a white-space separated list of options handled before any options on the command line. Options on the command line will override anything in GZIP. SEE ALSO
bzip2(1), compress(1), xz(1), fts(3), zlib(3) HISTORY
The gzip program was originally written by Jean-loup Gailly, licensed under the GNU Public Licence. Matthew R. Green wrote a simple front end for NetBSD 1.3 distribution media, based on the freely re-distributable zlib library. It was enhanced to be mostly feature-compatible with the original GNU gzip program for NetBSD 2.0. This implementation of gzip was ported based on the NetBSD gzip, and first appeared in FreeBSD 7.0. AUTHORS
This implementation of gzip was written by Matthew R. Green <mrg@eterna.com.au> with unpack support written by Xin LI <delphij@FreeBSD.org>. BUGS
According to RFC 1952, the recorded file size is stored in a 32-bit integer, therefore, it can not represent files larger than 4GB. This limitation also applies to -l option of gzip utility. BSD
October 9, 2011 BSD
All times are GMT -4. The time now is 03:14 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy