Sponsored Content
Top Forums Shell Programming and Scripting Problem comparing 2 files with lot of data Post 302128283 by rafisha on Monday 23rd of July 2007 10:40:53 PM
Old 07-23-2007
Data Problem comparing 2 files with lot of data

Hello everyone, here's the scenario

I have two files, each one has around 1,300,000 lines and each line has a column (phone numbers). I have to get the phones that are in file1 but not in file2. I can get these phones trough Oracle but my boss does not want that so he gave me the files with the phone numbers (he said it will take hours to finish the query and that will reduce the server resources or something like that).

First I tried to solve the problem with some perl scripting but it took like 10 minutes just to read the files and because my poor programming skills i tried to do the search with a double foreach, something like this:

@file1 = <SOME1>;
@file2 = <SOME2>;
$n = 0;
$flag = true; #if $flag = false then the element is in file2

foreach $row1 (@file1)
{
foreach $row2 (@file2)
{
if($row1 == $row2)
$flag = false
}
if($flag)
{
$anArray[$n]\=$row1; #ignore the backslash please
$n++;
}
$flag = true;
}

if($n > 0)
{
foreach $row3 (@anArray)
{
print OUT_FILE "$row3\n";
}
}



The data from the files is like this:


FILE1
----------------------------
1234567890
0987654321
2345678901
9012345678


FILE2
----------------------------
1234567890
0987654321
2345678901


OUT_FILE must be
----------------------------
9012345678



but this solution wil take ages to finish so now i am thinking in using awk or another lenguage but i really don't know which one is better for this problem and what algorithm i should use (besides i have never used awk or shell scripting, I'm new using UNIX), I was thinking in sort the files and then do a binary search but i have some doubts about it so i feel really lost now

Thanks for your help
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

rename a lot of files again

here I go again...kinda hard to explain so I apologize. I need to rename a bunch of files in a directory. I need to remove the first three characters of the filename, and then toward the end of the filename there is constant text inside of brackets. here is a demo (not for real) 'ls -1' of the... (11 Replies)
Discussion started by: ajp7701
11 Replies

2. Shell Programming and Scripting

Last field problem while comparing two csv files

Hi All, I've two .csv files as below file1.csv abc, tdf, 223, tpx jgsd, tex, 342, rpy a, jdjdsd, 423, djfkld Where as file2.csv is the new version of file1.csv with some added fields in the end of each line and some additional lines. lfj, eru, 98, jkldj, 39, jdkj9 abc, tdf, 223, tpx,... (3 Replies)
Discussion started by: ganapati
3 Replies

3. UNIX for Dummies Questions & Answers

Sorting with unique piping for a lot of files

Hi power user, if I have this file: file1.txt: 1111 1111 2222 2222 3333 3333 3333 4444 4444 4444 when I run the sort file1.txt | uniq > data1.txt the result is (2 Replies)
Discussion started by: anjas
2 Replies

4. Shell Programming and Scripting

chmod a lot of files

So i have about 600gb of data.. in which there are alot of directories and alot of files.. Im trying to put this on a ftp server.. So i want to set the permissions on the directories to be 755 and the permission on the files to be 644. So i used: find . -type d -exec chmod 755 {}\; and find .... (6 Replies)
Discussion started by: supermiguel
6 Replies

5. Shell Programming and Scripting

Rename a lot of files using shells script

Hi This is the list file that i have : The files is more than this. I will rename one by one file become like this : So just change the time stamp 200906 become 200905. Is it possible using script ? Thanks (3 Replies)
Discussion started by: justbow
3 Replies

6. Shell Programming and Scripting

Problem in comparing 2 fields from 2 files

I've 2 files. Need to compare File1.Field1,File1.Field2 with File2.Field1,File2.Field2. If matches then create a new file. File1 10 A|ADB|967143.24|1006101.5 3E HK|DHB|24294.76|242513.89 ABN ACU|ADB|22104.69|51647.14 ABN BU|DBA|39137.14|109128.38 ABN|ADB|64466.89|167936.55 ABOC... (2 Replies)
Discussion started by: buster
2 Replies

7. Shell Programming and Scripting

Need to modify a lot of html files

Hello, I have about 3400 files in a tree structure (about 80% are html files). 1. I need to modify every html file to remove <p> style and old things like font attribute and add another style. 2. I need to change the root of all links that are in the html. e.g. change /old/path/ to /new/path... (1 Reply)
Discussion started by: Yaazkal
1 Replies

8. UNIX for Dummies Questions & Answers

Lot of warn files filling /

hi guys I have suse 11 sp1 and I have a lot of warn file filling / these are under /var/log there's this big one -rw-r----- 1 root root 3.9G Feb 1 10:28 warn warn: ASCII text and the others that are about 2.5 to 3MB - they are about 130 warn-*.bz2 -rw-r----- 1 root root 3.9G Feb... (2 Replies)
Discussion started by: karlochacon
2 Replies

9. Shell Programming and Scripting

Comparing the data in a 2 files

Hi Friends, I have a file 1 CREATE MULTISET TABLE TEYT_Q9_T.TEST ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO ( XYZ DECIMAL(10,0), ABC VARCHAR(5) CHARACTER SET LATIN NOT CASESPECIFIC, PQR... (3 Replies)
Discussion started by: i150371485
3 Replies

10. Shell Programming and Scripting

Convert a lot of files in subdirectories automatically

Hi, I have a huge structure of directories and subdirectories contsining some data. The lowest folders contain a file "image.png" which need to be converted to "folder.jpg". But how can I do that for all these files automatically? That's what I alredy have find /path -type f -name... (1 Reply)
Discussion started by: KarlKarpfen
1 Replies
comm(1) 							   User Commands							   comm(1)

NAME
comm - select or reject lines common to two files SYNOPSIS
comm [-123] file1 file2 DESCRIPTION
The comm utility reads file1 and file2, which must be ordered in the current collating sequence, and produces three text columns as output: lines only in file1; lines only in file2; and lines in both files. If the input files were ordered according to the collating sequence of the current locale, the lines written will be in the collating sequence of the original lines. If not, the results are unspecified. OPTIONS
The following options are supported: -1 Suppresses the output column of lines unique to file1. -2 Suppresses the output column of lines unique to file2. -3 Suppresses the output column of lines duplicated in file1 and file2. OPERANDS
The following operands are supported: file1 A path name of the first file to be compared. If file1 is -, the standard input is used. file2 A path name of the second file to be compared. If file2 is -, the standard input is used. USAGE
See largefile(5) for the description of the behavior of comm when encountering files greater than or equal to 2 Gbyte ( 2**31 bytes). EXAMPLES
Example 1: Printing a list of utilities specified by files If file1, file2, and file3 each contain a sorted list of utilities, the command example% comm -23 file1 file2 | comm -23 - file3 prints a list of utilities in file1 not specified by either of the other files. The entry: example% comm -12 file1 file2 | comm -12 - file3 prints a list of utilities specified by all three files. And the entry: example% comm -12 file2 file3 | comm -23 -file1 prints a list of utilities specified by both file2 and file3, but not specified in file1. ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of comm: LANG, LC_ALL, LC_COLLATE, LC_CTYPE, LC_MESSAGES, and NLSPATH. EXIT STATUS
The following exit values are returned: 0 All input files were successfully output as specified. >0 An error occurred. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWesu | +-----------------------------+-----------------------------+ |CSI |enabled | +-----------------------------+-----------------------------+ |Interface Stability |Standard | +-----------------------------+-----------------------------+ SEE ALSO
cmp(1), diff(1), sort(1), uniq(1), attributes(5), environ(5), largefile(5), standards(5) SunOS 5.10 3 Mar 2004 comm(1)
All times are GMT -4. The time now is 06:41 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy