How to compare data from 2 zip files and capture the new records from file2 to a new file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to compare data from 2 zip files and capture the new records from file2 to a new file
# 8  
Old 12-05-2011
Quote:
Originally Posted by Corona688
Those should be .gz or .z, not .zip.
Code:
$ man zcat
 
GZIP(1)                                                                GZIP(1)
 
 
 
NAME
       gzip, gunzip, zcat - compress or expand files
 
SYNOPSIS
       gzip [ -acdfhlLnNrtvV19 ] [-S suffix] [ name ...  ]
       gunzip [ -acfhlLnNrtvV ] [-S suffix] [ name ...  ]
       zcat [ -fhLV ] [ name ...  ]
 
DESCRIPTION
       Gzip  reduces  the  size  of  the  named  files using Lempel-Ziv coding
       (LZ77).  Whenever possible, each file  is  replaced  by  one  with  the
       extension .gz, while keeping the same ownership modes, access and modi-
       fication times.  (The default extension is -gz for VMS,  z  for  MSDOS,
       OS/2  FAT, Windows NT FAT and Atari.)

Actual zip is difficult to use in a pipe chain.

Knowing what you have, I'd do this:

Code:
mkfifo data1 data2
zcat < a.gz > data1 &
zcat < b.gz > data2 &
 
awk 'BEGIN { while(getline <"data1") L[$0]=1 }; !L[$0]' data2
 
wait
rm -f data1 data2

---------- Post updated at 03:17 PM ---------- Previous update was at 03:13 PM ----------

A simplified version:

Code:
mkfifo data1
zcat < a.zip > data1 &
zcat < b.zip | awk 'BEGIN { while(getline <"data1") L[$0]=1 }; !L[$0]'
wait
rm data1

Thanks !! This code is working as expected. I tried with less than a million records in each file and it gave me the output in less than a minute. But when i tried with 27 million records in each file, it is still executing from an hour. Will this consume lot of disk space ? Is there a way to get the output faster ?
# 9  
Old 12-05-2011
Quote:
Originally Posted by koneru
But when i tried with 27 million records in each file, it is still executing from an hour. Will this consume lot of disk space ? Is there a way to get the output faster ?
It has to hold the complete, uncompressed contents of "a" in memory to tell if any lines from "b" exist in it. How else would it know, when it can't make any assumptions like ordering? This doesn't take disk space but takes as much memory as it needs to hold "a" uncompressed.

That can't be simplified or sped up without sorting the input files first -- which takes time itself, and would alter the order of output.

Last edited by Corona688; 12-05-2011 at 02:57 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare two files and write data to second file using awk

Hi Guys, I wanted to compare a delimited file and positional file, for a particular key files and if it matches then append the positional file with some data. Example: Delimited File -------------- Byer;Amy;NONE1;A5218257;E5218257 Byer;Amy;NONE1;A5218260;E5218260 Positional File... (3 Replies)
Discussion started by: Ajay Venkatesan
3 Replies

2. Shell Programming and Scripting

awk - compare records of 1 file with 3 files

hi.. I want to compare records present in 1 file with those in 3 other files and print those records of file 1 which are not present in any of the files. for eg - file1 file2 file3 file4 1 1 5 7 2 2 6 9 3 4 5 6 7 8 9 ... (3 Replies)
Discussion started by: Abhiraj Singh
3 Replies

3. Shell Programming and Scripting

Compare and find records of file1 not in file2

hi.. i am using solaris system and ksh and using nawk to get records of file1 not in file2(not line by line comparison). code i am using is nawk 'NR==FNR{a++} !a {print"line:" FNR"->" $0} ' file2 file1 same command with awk runs perfectly on darwin kernel(mac) but in solaris it does line by... (2 Replies)
Discussion started by: Abhiraj Singh
2 Replies

4. Shell Programming and Scripting

Compare multiple files, identify common records and combine unique values into one file

Good morning all, I have a problem that is one step beyond a standard awk compare. I would like to compare three files which have several thousand records against a fourth file. All of them have a value in each row that is identical, and one value in each of those rows which may be duplicated... (1 Reply)
Discussion started by: nashton
1 Replies

5. Shell Programming and Scripting

Compare two files with different number of records and output only the Extra records from file1

Hi Freinds , I have 2 files . File 1 |nag|HYd|1|Che |esw|Gun|2|hyd |pra|bhe|3|hyd |omu|hei|4|bnsj |uer|oeri|5|uery File 2 |nag|HYd|1|Che |esw|Gun|2|hyd |uer|oi|3|uery output : (9 Replies)
Discussion started by: i150371485
9 Replies

6. Shell Programming and Scripting

Compare values in two files. For matching rows print corresponding values from File 1 in File2.

- I have two files (File 1 and File 2) and the contents of the files are mentioned below. - I am trying to compare the values of Column1 of File1 with Column1 of File2. If a match is found, print the corresponding value from Column2 of File1 in Column5 of File2. - I tried to modify and use... (10 Replies)
Discussion started by: Santoshbn
10 Replies

7. Shell Programming and Scripting

Compare a common field in two files and append a column from File 1 in File2

Hi Friends, I am new to Shell Scripting and need your help in the below situation. - I have two files (File 1 and File 2) and the contents of the files are mentioned below. - "Application handle" is the common field in both the files. (NOTE :- PLEASE REFER TO THE ATTACHMENT "Compare files... (2 Replies)
Discussion started by: Santoshbn
2 Replies

8. Shell Programming and Scripting

Based on num of records in file1 need to check records in file2 to set some condns

Hi All, I have two files say file1 and file2. I want to check the number of records in file1 and if its atleast 2 (i.e., 2 or greater than 2 ) then I have to check records in file2 .If records in file2 is atleast 1 (i.e. if its not empty ) i have to set some conditions . Could you pls... (3 Replies)
Discussion started by: mavesum
3 Replies

9. UNIX for Dummies Questions & Answers

Count records in a zip file

Hello, I searched the forums on the keywords in the title I used above, but I did not find the answer: Is it possible to count records in a .zip file on an AIX machine if i don't have pkunzip installed? From all the research I'm reading in google and the reading of pkunzip in Unix.com,... (3 Replies)
Discussion started by: tekster757
3 Replies

10. Shell Programming and Scripting

Compare data in 2 files and delete if file exist

Hi there, I have written a script called "compare" (see below) to make comparison between 2 files namely test_put.log and Output_A0.log #!/bin/ksh while read file do found="no" while read line do echo $line | grep $file > /dev/null if then echo $file found found="yes" break fi... (3 Replies)
Discussion started by: lweegp
3 Replies
Login or Register to Ask a Question