Sponsored Content
Top Forums Shell Programming and Scripting How to compare data from 2 zip files and capture the new records from file2 to a new file Post 302579362 by Corona688 on Monday 5th of December 2011 01:47:14 PM
Old 12-05-2011
Quote:
Originally Posted by koneru
But when i tried with 27 million records in each file, it is still executing from an hour. Will this consume lot of disk space ? Is there a way to get the output faster ?
It has to hold the complete, uncompressed contents of "a" in memory to tell if any lines from "b" exist in it. How else would it know, when it can't make any assumptions like ordering? This doesn't take disk space but takes as much memory as it needs to hold "a" uncompressed.

That can't be simplified or sped up without sorting the input files first -- which takes time itself, and would alter the order of output.

Last edited by Corona688; 12-05-2011 at 02:57 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare data in 2 files and delete if file exist

Hi there, I have written a script called "compare" (see below) to make comparison between 2 files namely test_put.log and Output_A0.log #!/bin/ksh while read file do found="no" while read line do echo $line | grep $file > /dev/null if then echo $file found found="yes" break fi... (3 Replies)
Discussion started by: lweegp
3 Replies

2. UNIX for Dummies Questions & Answers

Count records in a zip file

Hello, I searched the forums on the keywords in the title I used above, but I did not find the answer: Is it possible to count records in a .zip file on an AIX machine if i don't have pkunzip installed? From all the research I'm reading in google and the reading of pkunzip in Unix.com,... (3 Replies)
Discussion started by: tekster757
3 Replies

3. Shell Programming and Scripting

Based on num of records in file1 need to check records in file2 to set some condns

Hi All, I have two files say file1 and file2. I want to check the number of records in file1 and if its atleast 2 (i.e., 2 or greater than 2 ) then I have to check records in file2 .If records in file2 is atleast 1 (i.e. if its not empty ) i have to set some conditions . Could you pls... (3 Replies)
Discussion started by: mavesum
3 Replies

4. Shell Programming and Scripting

Compare a common field in two files and append a column from File 1 in File2

Hi Friends, I am new to Shell Scripting and need your help in the below situation. - I have two files (File 1 and File 2) and the contents of the files are mentioned below. - "Application handle" is the common field in both the files. (NOTE :- PLEASE REFER TO THE ATTACHMENT "Compare files... (2 Replies)
Discussion started by: Santoshbn
2 Replies

5. Shell Programming and Scripting

Compare values in two files. For matching rows print corresponding values from File 1 in File2.

- I have two files (File 1 and File 2) and the contents of the files are mentioned below. - I am trying to compare the values of Column1 of File1 with Column1 of File2. If a match is found, print the corresponding value from Column2 of File1 in Column5 of File2. - I tried to modify and use... (10 Replies)
Discussion started by: Santoshbn
10 Replies

6. Shell Programming and Scripting

Compare two files with different number of records and output only the Extra records from file1

Hi Freinds , I have 2 files . File 1 |nag|HYd|1|Che |esw|Gun|2|hyd |pra|bhe|3|hyd |omu|hei|4|bnsj |uer|oeri|5|uery File 2 |nag|HYd|1|Che |esw|Gun|2|hyd |uer|oi|3|uery output : (9 Replies)
Discussion started by: i150371485
9 Replies

7. Shell Programming and Scripting

Compare multiple files, identify common records and combine unique values into one file

Good morning all, I have a problem that is one step beyond a standard awk compare. I would like to compare three files which have several thousand records against a fourth file. All of them have a value in each row that is identical, and one value in each of those rows which may be duplicated... (1 Reply)
Discussion started by: nashton
1 Replies

8. Shell Programming and Scripting

Compare and find records of file1 not in file2

hi.. i am using solaris system and ksh and using nawk to get records of file1 not in file2(not line by line comparison). code i am using is nawk 'NR==FNR{a++} !a {print"line:" FNR"->" $0} ' file2 file1 same command with awk runs perfectly on darwin kernel(mac) but in solaris it does line by... (2 Replies)
Discussion started by: Abhiraj Singh
2 Replies

9. Shell Programming and Scripting

awk - compare records of 1 file with 3 files

hi.. I want to compare records present in 1 file with those in 3 other files and print those records of file 1 which are not present in any of the files. for eg - file1 file2 file3 file4 1 1 5 7 2 2 6 9 3 4 5 6 7 8 9 ... (3 Replies)
Discussion started by: Abhiraj Singh
3 Replies

10. Shell Programming and Scripting

Compare two files and write data to second file using awk

Hi Guys, I wanted to compare a delimited file and positional file, for a particular key files and if it matches then append the positional file with some data. Example: Delimited File -------------- Byer;Amy;NONE1;A5218257;E5218257 Byer;Amy;NONE1;A5218260;E5218260 Positional File... (3 Replies)
Discussion started by: Ajay Venkatesan
3 Replies
DIFFSTAT(1)						      General Commands Manual						       DIFFSTAT(1)

NAME
diffstat - make histogram from diff-output USAGE
diffstat [options] [file-specifications] SYNOPSIS
This program reads the output of diff and displays a histogram of the insertions, deletions, and modifications per-file. DESCRIPTION
Diffstat is a program that is useful for reviewing large, complex patch files. It reads from one or more input files which contain output from diff, producing a histogram of the total lines changed for each file referenced. If the input filename ends with .bz2, .Z or .gz, diffstat will read the uncompressed data via a pipe. Diffstat recognizes the most popular types of output from diff: unified preferred by the patch utility. context best for readability, but not very compact. default not good for much, but simple to generate. Diffstat detects the lines that are output by diff to tell which files are compared, and then counts the markers in the first column that denote the type of change (insertion, deletion or modification). These are shown in the histogram as "+", "-" and "!" characters. If no filename is given on the command line, diffstat reads the differences from the standard input. OPTIONS
-c prefix each line of output with "#", making it a comment-line for shell scripts. -f format specify 0 for concise, 1 for normal output. -k suppress the merging of filenames in the report. -n number specify the minimum width used for filenames. If you don't specify this, diffstat uses the length of the longest filename, after stripping common prefixes. -p number override the logic that strips common pathnames, simulating the patch "-p" option. -u suppress the sorting of filenames in the report. -V prints the current version number -w number specify the maximum width of the histogram. The plot will never be shorter than 10 columns, just in case the filenames get too large. ENVIRONMENT
Diffstat runs in a portable UNIX(R) environment. FILES
Diffstat is a single binary module, which uses no auxiliary files. BUGS
Diffstat makes a lot of assumptions about the format of a diff file. There's no easy way to determine the degree of overlap between the "before" and "after" displays of modified lines. SEE ALSO
diff (1). AUTHOR
Thomas Dickey <dickey@invisible-island.net>. DIFFSTAT(1)
All times are GMT -4. The time now is 07:42 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy