Comparing 2 huge text files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Comparing 2 huge text files
# 8  
Old 05-18-2011
@ygemici

I got the following error :
Code:
# nawk 'NR==FNR{if (/^uid:/) a[$2];next}{x=$0;sub("@.*","",x);if(!(x in a)) print $1}' f2 f1
nisha@SYSTEMS.NYFIX.COM
rdpena@SYSTEMS.NYFIX.COM
service/backups-ora@SYSTEMS.NYFIX.COM
ivanr@SYSTEMS.NYFIX.COM
nasapova@SYSTEMS.NYFIX.COM
tpulay@SYSTEMS.NYFIX.COM
rsueno@SYSTEMS.NYFIX.COM
peterd@SYSTEMS.NYFIX.COM
casehan@SYSTEMS.NYFIX.COM
akrapivi@SYSTEMS.NYFIX.COM
# egrep -v $(sed -n 's/uid: \(.*\)/\1/p' f2 | sed ':a N;s/\n/|/;ta') f1
Label too long: :a N;s/\n/|/;ta
# uname -a
SunOS <anonymized> 5.10 Generic_141414-01 sun4u sparc SUNW,Sun-Fire-V490
#

I got the same error with the semicolon before the N
Code:
# egrep -v $(sed -n 's/uid: \(.*\)/\1/p' f2  |sed ':a;N;s/\n/|/;ta') f1
Label too long: :a;N;s/\n/|/;ta

# 9  
Old 05-18-2011
Quote:
Originally Posted by ctsgnb
@ygemici

I got the following error :
Code:
# nawk 'NR==FNR{if (/^uid:/) a[$2];next}{x=$0;sub("@.*","",x);if(!(x in a)) print $1}' f2 f1
nisha@SYSTEMS.NYFIX.COM
rdpena@SYSTEMS.NYFIX.COM
service/backups-ora@SYSTEMS.NYFIX.COM
ivanr@SYSTEMS.NYFIX.COM
nasapova@SYSTEMS.NYFIX.COM
tpulay@SYSTEMS.NYFIX.COM
rsueno@SYSTEMS.NYFIX.COM
peterd@SYSTEMS.NYFIX.COM
casehan@SYSTEMS.NYFIX.COM
akrapivi@SYSTEMS.NYFIX.COM
# egrep -v $(sed -n 's/uid: \(.*\)/\1/p' f2 | sed ':a N;s/\n/|/;ta') f1
Label too long: :a N;s/\n/|/;ta
# uname -a
SunOS <anonymized> 5.10 Generic_141414-01 sun4u sparc SUNW,Sun-Fire-V490
#

hmm yes it is solaris Smilie
i modified some Smilie
Code:
# egrep -v $(sed -n 's/uid: \(.*\)/\1/p' access.ldif |sed -e ':a' -e '$!N;s/\n/|/' -e 'ta') k5login

# 10  
Old 05-18-2011
@ygemici :

Yup, those '-e' make it works just fine . Smilie

But

1) ... I just wonder if it would still work fine if the files are huge so that the uid1|uid2|... strings becomes very long

2) By the way, if you have some subpattern matching like nisha and nishap, if you fall in a case where you grep -v nisha you may filter out nishap which is not the intended behaviour ...

Last edited by ctsgnb; 05-18-2011 at 09:33 AM..
# 11  
Old 05-20-2011
hi ctsgnb,

i have this additional requirement and here goes:

Code:
awk 'NR==FNR{if (/^uid:/) a[$2];next}{x=$0;sub("@.*","",x);if(!(x in a)) print $1}' access.ldif k5login

from the above script...i need to save the script output to temporarily file let say k5login-temp and then parse it or compare it again to another ldif file let say bny.ldif. I'm having a hard time for a modified script .

---------- Post updated at 03:51 PM ---------- Previous update was at 03:00 PM ----------

ah... i think i got it..testing now...
# 12  
Old 05-23-2011
hi Awk Masters,

Code:
awk 'NR==FNR{if (/^uid:/) a[$2];next}{x=$0;sub("@.*","",x);if(!(x in a)) print $1}' access.ldif k5login

from the script above, instead of printing the output... i would like to delete it automatically in k5login file those does not exists from ldif file .

Anyone can revised the script above.

Thanks in advance

Last edited by linuxgeek; 05-24-2011 at 02:35 AM.. Reason: re-phrase
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl: Need help comparing huge files

What do i need to do have the below perl program load 205 million record files into the hash. It currently works on smaller files, but not working on huge files. Any idea what i need to do to modify to make it work with huge files: #!/usr/bin/perl $ot1=$ARGV; $ot2=$ARGV; open(mfileot1,... (12 Replies)
Discussion started by: mrn6430
12 Replies

2. Shell Programming and Scripting

How to fix line breaks format text for huge files?

Hi, I need to correct line breaks for huge files (more than 1MM records in a file) and then format it properly. Except the header and trailer, each record starts with 'D'. Requirement:Scan the whole file except the header and trailer records and see if any of the records start with... (19 Replies)
Discussion started by: kikionline
19 Replies

3. Shell Programming and Scripting

Comparing two huge files on field basis.

Hi all, I have two large files and i want a field by field comparison for each record in it. All fields are tab seperated. file1: Email SELVAKUMAR RAMACHANDRAN Email SHILPA SAHU Web NIYATI SONI Web NIYATI SONI Email VIINII DOSHI Web RAJNISH KUMAR Web ... (4 Replies)
Discussion started by: Suman Singh
4 Replies

4. Shell Programming and Scripting

comparing to text files

Hi All, I have two files of the following formats file 1 - this is a big file >AB_1 gi|229194403|ref|ZP_04321208.1| group II intron reverse transcriptase/maturase gdfjafhlkhlnlklaklskckcfhhahgfahajfkkallalfafafa >AB_2 gi|229194404|ref|ZP_04321209.1| gfksjgfkjsfjslfslfslhf >AB_3... (1 Reply)
Discussion started by: Lucky Ali
1 Replies

5. UNIX for Advanced & Expert Users

Best way to search for patterns in huge text files

I have the following situation: a text file with 50000 string patterns: abc2344536 gvk6575556 klo6575556 .... and 3 text files each with more than 1 million lines: ... 000000 abc2344536 46575 0000 000000 abc2344536 46575 4444 000000 abc2344555 46575 1234 ... I... (8 Replies)
Discussion started by: andy2000
8 Replies

6. Shell Programming and Scripting

Comparing two huge files

Hi, I have two files file A and File B. File A is a error file and File B is source file. In the error file. First line is the actual error and second line gives the information about the record (client ID) that throws error. I need to compare the first field (which doesnt start with '//') of... (11 Replies)
Discussion started by: kmkbuddy_1983
11 Replies

7. UNIX for Dummies Questions & Answers

comparing Huge Files - Performance is very bad

Hi All, Can you please help me in resolving the following problem? My requirement is like this: 1) I have two files YESTERDAY_FILE and TODAY_FILE. Each one is having nearly two million data. 2) I need to check each record of TODAY_FILE in YESTERDAY_FILE. If exists we can skip that by... (5 Replies)
Discussion started by: madhukalyan
5 Replies

8. AIX

comparing within text files

hi! some looping problem here... i have a 2-column text file 4835021 20060903FAL0132006 4835021 20060904FAL0132006 4835021 20060905FAL0132006 4835023 20060903FAL0132006 4835023 20061001HAL0132006 4835023 ... (3 Replies)
Discussion started by: d3ck_tm
3 Replies

9. Solaris

Huge (repeated Entry) text files

Somebody HELP! I have a huge log file (TEXT) 76298035 bytes. It's a logfile of IMEIs and IMSIS that I get from my EIR node. Here is how the contents of the file look like: 000000, 1 33016382000913 652020100423994 1 33016382002353 652020100430743 1 33017035101003 652020100441736... (4 Replies)
Discussion started by: axl
4 Replies

10. UNIX for Dummies Questions & Answers

comparing text files

I am comparing text files where there are number of rows of numbers from window to unix box Is there any way of checking lets say 4 document of text file and seeing the difference only (or missing rows of numbers) with simple commands with lets say a batch file FROM ABSOULTE... (2 Replies)
Discussion started by: sjumma
2 Replies
Login or Register to Ask a Question