How to compare data from 2 zip files and capture the new records from file2 to a new file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to compare data from 2 zip files and capture the new records from file2 to a new file
# 1  
Old 12-01-2011
How to compare data from 2 zip files and capture the new records from file2 to a new file

I have 2 zip files which have about 20 million records in each file. file 2 will have additional records than file 1. I want to compare the records in both the files and capture the new records from file 2 into another file file3. Please help me with a command/script which provides me the desired result in a faster way.

Example:-

File1 - a.zip

1,2,3

File 2 - b.zip

2,3,4
1,2,3

Required output

2,3,4
# 2  
Old 12-01-2011
faster way than what? What have you already done?

What do the contents of the zip file look like? As in, what filenames?
# 3  
Old 12-01-2011
The contents of the zip file look like what i have mentioned in the example.
# 4  
Old 12-01-2011
Quote:
Originally Posted by Corona688
faster way than what? What have you already done?

What do the contents of the zip file look like? As in, what filenames?
To get a file out of a .zip, you need a filename, not just the zip file, because zip can hold more than one file.

If the .zip files don't contain filenames, they're not really .zip, so what are they?
# 5  
Old 12-01-2011
I am able to compare uncompressed files using awk, but want a way to compare compressed files.

Each .zip file has one text file within it and resides in different directories. Both the .zip files have the same name a.zip and both the text files within the two zip files have the same name a.txt

zcat /home/20111201/a.zip

1,2,3

zcat /home/20111202/a.zip

2,3,4
1,2,3
# 6  
Old 12-01-2011
Those should be .gz or .z, not .zip.
Code:
$ man zcat

GZIP(1)                                                                GZIP(1)



NAME
       gzip, gunzip, zcat - compress or expand files

SYNOPSIS
       gzip [ -acdfhlLnNrtvV19 ] [-S suffix] [ name ...  ]
       gunzip [ -acfhlLnNrtvV ] [-S suffix] [ name ...  ]
       zcat [ -fhLV ] [ name ...  ]

DESCRIPTION
       Gzip  reduces  the  size  of  the  named  files using Lempel-Ziv coding
       (LZ77).  Whenever possible, each file  is  replaced  by  one  with  the
       extension .gz, while keeping the same ownership modes, access and modi-
       fication times.  (The default extension is -gz for VMS,  z  for  MSDOS,
       OS/2  FAT, Windows NT FAT and Atari.)

Actual zip is difficult to use in a pipe chain.

Knowing what you have, I'd do this:

Code:
mkfifo data1 data2
zcat < a.gz > data1 &
zcat < b.gz > data2 &

awk 'BEGIN { while(getline <"data1") L[$0]=1 }; !L[$0]' data2

wait
rm -f data1 data2

---------- Post updated at 03:17 PM ---------- Previous update was at 03:13 PM ----------

A simplified version:

Code:
mkfifo data1
zcat < a.zip > data1 &
zcat < b.zip | awk 'BEGIN { while(getline <"data1") L[$0]=1 }; !L[$0]'
wait
rm data1

# 7  
Old 12-01-2011
See this as well, maybe it will be helpful:


The Power of Z Commands – Zcat, Zless, Zgrep, Zdiff Examples (link removed)
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare two files and write data to second file using awk

Hi Guys, I wanted to compare a delimited file and positional file, for a particular key files and if it matches then append the positional file with some data. Example: Delimited File -------------- Byer;Amy;NONE1;A5218257;E5218257 Byer;Amy;NONE1;A5218260;E5218260 Positional File... (3 Replies)
Discussion started by: Ajay Venkatesan
3 Replies

2. Shell Programming and Scripting

awk - compare records of 1 file with 3 files

hi.. I want to compare records present in 1 file with those in 3 other files and print those records of file 1 which are not present in any of the files. for eg - file1 file2 file3 file4 1 1 5 7 2 2 6 9 3 4 5 6 7 8 9 ... (3 Replies)
Discussion started by: Abhiraj Singh
3 Replies

3. Shell Programming and Scripting

Compare and find records of file1 not in file2

hi.. i am using solaris system and ksh and using nawk to get records of file1 not in file2(not line by line comparison). code i am using is nawk 'NR==FNR{a++} !a {print"line:" FNR"->" $0} ' file2 file1 same command with awk runs perfectly on darwin kernel(mac) but in solaris it does line by... (2 Replies)
Discussion started by: Abhiraj Singh
2 Replies

4. Shell Programming and Scripting

Compare multiple files, identify common records and combine unique values into one file

Good morning all, I have a problem that is one step beyond a standard awk compare. I would like to compare three files which have several thousand records against a fourth file. All of them have a value in each row that is identical, and one value in each of those rows which may be duplicated... (1 Reply)
Discussion started by: nashton
1 Replies

5. Shell Programming and Scripting

Compare two files with different number of records and output only the Extra records from file1

Hi Freinds , I have 2 files . File 1 |nag|HYd|1|Che |esw|Gun|2|hyd |pra|bhe|3|hyd |omu|hei|4|bnsj |uer|oeri|5|uery File 2 |nag|HYd|1|Che |esw|Gun|2|hyd |uer|oi|3|uery output : (9 Replies)
Discussion started by: i150371485
9 Replies

6. Shell Programming and Scripting

Compare values in two files. For matching rows print corresponding values from File 1 in File2.

- I have two files (File 1 and File 2) and the contents of the files are mentioned below. - I am trying to compare the values of Column1 of File1 with Column1 of File2. If a match is found, print the corresponding value from Column2 of File1 in Column5 of File2. - I tried to modify and use... (10 Replies)
Discussion started by: Santoshbn
10 Replies

7. Shell Programming and Scripting

Compare a common field in two files and append a column from File 1 in File2

Hi Friends, I am new to Shell Scripting and need your help in the below situation. - I have two files (File 1 and File 2) and the contents of the files are mentioned below. - "Application handle" is the common field in both the files. (NOTE :- PLEASE REFER TO THE ATTACHMENT "Compare files... (2 Replies)
Discussion started by: Santoshbn
2 Replies

8. Shell Programming and Scripting

Based on num of records in file1 need to check records in file2 to set some condns

Hi All, I have two files say file1 and file2. I want to check the number of records in file1 and if its atleast 2 (i.e., 2 or greater than 2 ) then I have to check records in file2 .If records in file2 is atleast 1 (i.e. if its not empty ) i have to set some conditions . Could you pls... (3 Replies)
Discussion started by: mavesum
3 Replies

9. UNIX for Dummies Questions & Answers

Count records in a zip file

Hello, I searched the forums on the keywords in the title I used above, but I did not find the answer: Is it possible to count records in a .zip file on an AIX machine if i don't have pkunzip installed? From all the research I'm reading in google and the reading of pkunzip in Unix.com,... (3 Replies)
Discussion started by: tekster757
3 Replies

10. Shell Programming and Scripting

Compare data in 2 files and delete if file exist

Hi there, I have written a script called "compare" (see below) to make comparison between 2 files namely test_put.log and Output_A0.log #!/bin/ksh while read file do found="no" while read line do echo $line | grep $file > /dev/null if then echo $file found found="yes" break fi... (3 Replies)
Discussion started by: lweegp
3 Replies
Login or Register to Ask a Question