Comparing 2 huge text files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Comparing 2 huge text files
# 1  
Old 05-18-2011
Comparing 2 huge text files

I have this 2 files:

k5login

access.ldif

Code:
dn: uid=jjamnik,ou=People,dc=prod,dc=nyfix,dc=com
cn: Josh Jamnik
objectClass: account
objectClass: posixAccount
objectClass: top
userPassword:: e0tFUkJFUk9TfWpqYW1uaWtAU1lTVEVNUy5OWUZJWC5DT00=
loginShell: /bin/bash
gidNumber: 409
gecos: Josh Jamnik
structuralObjectClass: account
entryUUID: b8a40bfa-3056-102a-89a3-93bef80cd7c4
creatorsName: cn=Manager,dc=prod,dc=nyfix,dc=com
createTimestamp: 20060212210311Z
uid: jjamnik
uidNumber: 6503
homeDirectory: /home/prodbus/jjamnik
entryCSN: 20090819203245Z#00000e#00#000000
modifiersName: cn=Manager,dc=prod,dc=nyfix,dc=com
modifyTimestamp: 20090819203245Z

dn: uid=nishap,ou=People,dc=prod,dc=nyfix,dc=com
cn: Nisha Patel
objectClass: account
objectClass: posixAccount
objectClass: top
userPassword:: e0tFUkJFUk9TfW5pc2hhcEBTWVNURU1TLk5ZRklYLkNPTQ==
loginShell: /bin/bash
gidNumber: 409
gecos: Nisha Patel
structuralObjectClass: account
entryUUID: 874cc37a-3057-102a-89a9-93bef80cd7c4
creatorsName: cn=Manager,dc=prod,dc=nyfix,dc=com
createTimestamp: 20060212210858Z
uid: nishap
uidNumber: 6506
homeDirectory: /home/prodeng/nishap

dn: uid=sanwar,ou=People,dc=prod,dc=nyfix,dc=com
cn: Sohel Anwar
objectClass: account
objectClass: posixAccount
objectClass: top
userPassword:: e0tFUkJFUk9TfXNhbndhckBTWVNURU1TLk5ZRklYLkNPTQ==
loginShell: /bin/bash
uidNumber: 6514
gecos: Sohel Anwar
structuralObjectClass: account
entryUUID: 1078797a-305b-102a-89bb-93bef80cd7c4
creatorsName: cn=Manager,dc=prod,dc=nyfix,dc=com
createTimestamp: 20060212213417Z
uid: sanwar
gidNumber: 410
homeDirectory: /home/network/sanwar
entryCSN: 20090610030006Z#000000#00#000000
modifiersName: cn=Manager,dc=prod,dc=nyfix,dc=com
modifyTimestamp: 20090610030006Z

This is to compare k5login to access.ldif file. The output should print those uid which is not existing in access.ldif file like this:

I hope that solutions will come up with my inquiry.

Smilie

Last edited by zaxxon; 05-18-2011 at 08:25 AM.. Reason: code tags
# 2  
Old 05-18-2011
Code:
awk 'NR==FNR{sub("@.*","");a[$1];next}/^uid:/&&!($2 in a)' k5login access.ldif

... oops no, that one was displaying those from access.ldif that does not exist in k5login ...

... here you go to get those from k5login that doesn't exist in access.ldif :

Code:
awk 'NR==FNR{if (/^uid:/) a[$2];next}{sub("@.*","");if(!($1 in a)) print $1}' access.ldif k5login

If you need to display the full mail address :
Code:
nawk 'NR==FNR{if (/^uid:/) a[$2];next}{x=$0;sub("@.*","",x);if(!(x in a)) print $1}' access.ldif k5login

If running on SunOS / Solaris plateform, use nawk or /usr/xpg4/bin/awk instead of awk

... note that in your example there are different : nisha and nishap

Last edited by ctsgnb; 05-18-2011 at 08:57 AM..
# 3  
Old 05-18-2011
Thanks for your reply but im getting only one result, can we use awk ?
Code:
(csi15,root)# nawk 'NR==FNR{sub("@.*","");a[$1];next}/^uid:/&&!($2 in a)' k5login access.ldif
uid: nishap

---------- Post updated at 07:32 PM ---------- Previous update was at 07:29 PM ----------

i'm getting this syntax error:
Code:
(csi15,root)# pwd
/usr/xpg4/bin
(csi15,root)# awk 'NR==FNR{sub("@.*","");a[$1];next}/^uid:/&&!($2 in a)' k5login access.ldif
awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 1
awk: bailing out near line 1

Moderator's Comments:
Mod Comment Please use [code] and [/code] tags when posting code, data or logs etc. to preserve formatting and enhance readability, thanks.

Last edited by zaxxon; 05-18-2011 at 08:39 AM.. Reason: code tags
# 4  
Old 05-18-2011
When you change into that directory, you have to use ./ in front of the command, to execute the file from the directory where you are standing. Else it will just fetch the command that is found via your PATH variable. Also ctsgnb stated to use nawk - not sure if there is a awk link or binary in that directory.
# 5  
Old 05-18-2011
I updated my previous post,

please try

Code:
/usr/xpg4/bin/awk 'NR==FNR{if (/^uid:/) a[$2];next}{x=$0;sub("@.*","",x);if(!(x in a)) print $1}' access.ldif k5login

or
Code:
nawk 'NR==FNR{if (/^uid:/) a[$2];next}{x=$0;sub("@.*","",x);if(!(x in a)) print $1}' access.ldif k5login

or
Code:
awk 'NR==FNR{if (/^uid:/) a[$2];next}{x=$0;sub("@.*","",x);if(!(x in a)) print $1}' access.ldif k5login

Make sure you are in the directory where your access.ldif and k5login files are located

or use nawk or /usr/xpg4/bin/awk if on SunOS / Solaris plateform

Last edited by ctsgnb; 05-18-2011 at 09:04 AM..
This User Gave Thanks to ctsgnb For This Post:
# 6  
Old 05-18-2011
@ctsgnb ... awesome! you make my life easier... life is great ! Smilie

i will review my awk since i forgot this already.

thanks!
# 7  
Old 05-18-2011
more simple solution Smilie
Code:
# egrep -v $(sed -n 's/uid: \(.*\)/\1/p' access.ldif |sed ':a;N;s/\n/|/;ta') k5login
nisha@SYSTEMS.NYFIX.COM
rdpena@SYSTEMS.NYFIX.COM
service/backups-ora@SYSTEMS.NYFIX.COM
ivanr@SYSTEMS.NYFIX.COM
nasapova@SYSTEMS.NYFIX.COM
tpulay@SYSTEMS.NYFIX.COM
rsueno@SYSTEMS.NYFIX.COM
peterd@SYSTEMS.NYFIX.COM
casehan@SYSTEMS.NYFIX.COM
akrapivi@SYSTEMS.NYFIX.COM
....

This User Gave Thanks to ygemici For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl: Need help comparing huge files

What do i need to do have the below perl program load 205 million record files into the hash. It currently works on smaller files, but not working on huge files. Any idea what i need to do to modify to make it work with huge files: #!/usr/bin/perl $ot1=$ARGV; $ot2=$ARGV; open(mfileot1,... (12 Replies)
Discussion started by: mrn6430
12 Replies

2. Shell Programming and Scripting

How to fix line breaks format text for huge files?

Hi, I need to correct line breaks for huge files (more than 1MM records in a file) and then format it properly. Except the header and trailer, each record starts with 'D'. Requirement:Scan the whole file except the header and trailer records and see if any of the records start with... (19 Replies)
Discussion started by: kikionline
19 Replies

3. Shell Programming and Scripting

Comparing two huge files on field basis.

Hi all, I have two large files and i want a field by field comparison for each record in it. All fields are tab seperated. file1: Email SELVAKUMAR RAMACHANDRAN Email SHILPA SAHU Web NIYATI SONI Web NIYATI SONI Email VIINII DOSHI Web RAJNISH KUMAR Web ... (4 Replies)
Discussion started by: Suman Singh
4 Replies

4. Shell Programming and Scripting

comparing to text files

Hi All, I have two files of the following formats file 1 - this is a big file >AB_1 gi|229194403|ref|ZP_04321208.1| group II intron reverse transcriptase/maturase gdfjafhlkhlnlklaklskckcfhhahgfahajfkkallalfafafa >AB_2 gi|229194404|ref|ZP_04321209.1| gfksjgfkjsfjslfslfslhf >AB_3... (1 Reply)
Discussion started by: Lucky Ali
1 Replies

5. UNIX for Advanced & Expert Users

Best way to search for patterns in huge text files

I have the following situation: a text file with 50000 string patterns: abc2344536 gvk6575556 klo6575556 .... and 3 text files each with more than 1 million lines: ... 000000 abc2344536 46575 0000 000000 abc2344536 46575 4444 000000 abc2344555 46575 1234 ... I... (8 Replies)
Discussion started by: andy2000
8 Replies

6. Shell Programming and Scripting

Comparing two huge files

Hi, I have two files file A and File B. File A is a error file and File B is source file. In the error file. First line is the actual error and second line gives the information about the record (client ID) that throws error. I need to compare the first field (which doesnt start with '//') of... (11 Replies)
Discussion started by: kmkbuddy_1983
11 Replies

7. UNIX for Dummies Questions & Answers

comparing Huge Files - Performance is very bad

Hi All, Can you please help me in resolving the following problem? My requirement is like this: 1) I have two files YESTERDAY_FILE and TODAY_FILE. Each one is having nearly two million data. 2) I need to check each record of TODAY_FILE in YESTERDAY_FILE. If exists we can skip that by... (5 Replies)
Discussion started by: madhukalyan
5 Replies

8. AIX

comparing within text files

hi! some looping problem here... i have a 2-column text file 4835021 20060903FAL0132006 4835021 20060904FAL0132006 4835021 20060905FAL0132006 4835023 20060903FAL0132006 4835023 20061001HAL0132006 4835023 ... (3 Replies)
Discussion started by: d3ck_tm
3 Replies

9. Solaris

Huge (repeated Entry) text files

Somebody HELP! I have a huge log file (TEXT) 76298035 bytes. It's a logfile of IMEIs and IMSIS that I get from my EIR node. Here is how the contents of the file look like: 000000, 1 33016382000913 652020100423994 1 33016382002353 652020100430743 1 33017035101003 652020100441736... (4 Replies)
Discussion started by: axl
4 Replies

10. UNIX for Dummies Questions & Answers

comparing text files

I am comparing text files where there are number of rows of numbers from window to unix box Is there any way of checking lets say 4 document of text file and seeing the difference only (or missing rows of numbers) with simple commands with lets say a batch file FROM ABSOULTE... (2 Replies)
Discussion started by: sjumma
2 Replies
Login or Register to Ask a Question