The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
awk - comparing files dbrundrett Shell Programming and Scripting 6 01-18-2009 10:51 PM
Comparing data in file with values in table Mohit623 Shell Programming and Scripting 0 01-22-2008 08:57 AM
Comparing 2 files hdixon UNIX for Dummies Questions & Answers 2 08-01-2007 01:24 PM
comparing shadow files with real files terrym UNIX for Advanced & Expert Users 4 02-09-2007 02:38 AM
Comparing data list... giannicello UNIX for Dummies Questions & Answers 4 03-06-2003 01:08 PM

 
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
Prev Previous Post   Next Post Next
  #1 (permalink)  
Old 07-23-2007
rafisha rafisha is offline
Registered User
  
 

Join Date: Jul 2007
Location: Monterrey, Mexico
Posts: 4
Unhappy Problem comparing 2 files with lot of data

Hello everyone, here's the scenario

I have two files, each one has around 1,300,000 lines and each line has a column (phone numbers). I have to get the phones that are in file1 but not in file2. I can get these phones trough Oracle but my boss does not want that so he gave me the files with the phone numbers (he said it will take hours to finish the query and that will reduce the server resources or something like that).

First I tried to solve the problem with some perl scripting but it took like 10 minutes just to read the files and because my poor programming skills i tried to do the search with a double foreach, something like this:

@file1 = <SOME1>;
@file2 = <SOME2>;
$n = 0;
$flag = true; #if $flag = false then the element is in file2

foreach $row1 (@file1)
{
foreach $row2 (@file2)
{
if($row1 == $row2)
$flag = false
}
if($flag)
{
$anArray[$n]\=$row1; #ignore the backslash please
$n++;
}
$flag = true;
}

if($n > 0)
{
foreach $row3 (@anArray)
{
print OUT_FILE "$row3\n";
}
}



The data from the files is like this:


FILE1
----------------------------
1234567890
0987654321
2345678901
9012345678


FILE2
----------------------------
1234567890
0987654321
2345678901


OUT_FILE must be
----------------------------
9012345678



but this solution wil take ages to finish so now i am thinking in using awk or another lenguage but i really don't know which one is better for this problem and what algorithm i should use (besides i have never used awk or shell scripting, I'm new using UNIX), I was thinking in sort the files and then do a binary search but i have some doubts about it so i feel really lost now

Thanks for your help
 

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 06:43 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0