Sponsored Content
Full Discussion: Comparing 2 huge text files
Top Forums Shell Programming and Scripting Comparing 2 huge text files Post 302523191 by ctsgnb on Wednesday 18th of May 2011 06:57:27 AM
Old 05-18-2011
Code:
awk 'NR==FNR{sub("@.*","");a[$1];next}/^uid:/&&!($2 in a)' k5login access.ldif

... oops no, that one was displaying those from access.ldif that does not exist in k5login ...

... here you go to get those from k5login that doesn't exist in access.ldif :

Code:
awk 'NR==FNR{if (/^uid:/) a[$2];next}{sub("@.*","");if(!($1 in a)) print $1}' access.ldif k5login

If you need to display the full mail address :
Code:
nawk 'NR==FNR{if (/^uid:/) a[$2];next}{x=$0;sub("@.*","",x);if(!(x in a)) print $1}' access.ldif k5login

If running on SunOS / Solaris plateform, use nawk or /usr/xpg4/bin/awk instead of awk

... note that in your example there are different : nisha and nishap

Last edited by ctsgnb; 05-18-2011 at 08:57 AM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

comparing text files

I am comparing text files where there are number of rows of numbers from window to unix box Is there any way of checking lets say 4 document of text file and seeing the difference only (or missing rows of numbers) with simple commands with lets say a batch file FROM ABSOULTE... (2 Replies)
Discussion started by: sjumma
2 Replies

2. Solaris

Huge (repeated Entry) text files

Somebody HELP! I have a huge log file (TEXT) 76298035 bytes. It's a logfile of IMEIs and IMSIS that I get from my EIR node. Here is how the contents of the file look like: 000000, 1 33016382000913 652020100423994 1 33016382002353 652020100430743 1 33017035101003 652020100441736... (4 Replies)
Discussion started by: axl
4 Replies

3. AIX

comparing within text files

hi! some looping problem here... i have a 2-column text file 4835021 20060903FAL0132006 4835021 20060904FAL0132006 4835021 20060905FAL0132006 4835023 20060903FAL0132006 4835023 20061001HAL0132006 4835023 ... (3 Replies)
Discussion started by: d3ck_tm
3 Replies

4. UNIX for Dummies Questions & Answers

comparing Huge Files - Performance is very bad

Hi All, Can you please help me in resolving the following problem? My requirement is like this: 1) I have two files YESTERDAY_FILE and TODAY_FILE. Each one is having nearly two million data. 2) I need to check each record of TODAY_FILE in YESTERDAY_FILE. If exists we can skip that by... (5 Replies)
Discussion started by: madhukalyan
5 Replies

5. Shell Programming and Scripting

Comparing two huge files

Hi, I have two files file A and File B. File A is a error file and File B is source file. In the error file. First line is the actual error and second line gives the information about the record (client ID) that throws error. I need to compare the first field (which doesnt start with '//') of... (11 Replies)
Discussion started by: kmkbuddy_1983
11 Replies

6. UNIX for Advanced & Expert Users

Best way to search for patterns in huge text files

I have the following situation: a text file with 50000 string patterns: abc2344536 gvk6575556 klo6575556 .... and 3 text files each with more than 1 million lines: ... 000000 abc2344536 46575 0000 000000 abc2344536 46575 4444 000000 abc2344555 46575 1234 ... I... (8 Replies)
Discussion started by: andy2000
8 Replies

7. Shell Programming and Scripting

comparing to text files

Hi All, I have two files of the following formats file 1 - this is a big file >AB_1 gi|229194403|ref|ZP_04321208.1| group II intron reverse transcriptase/maturase gdfjafhlkhlnlklaklskckcfhhahgfahajfkkallalfafafa >AB_2 gi|229194404|ref|ZP_04321209.1| gfksjgfkjsfjslfslfslhf >AB_3... (1 Reply)
Discussion started by: Lucky Ali
1 Replies

8. Shell Programming and Scripting

Comparing two huge files on field basis.

Hi all, I have two large files and i want a field by field comparison for each record in it. All fields are tab seperated. file1: Email SELVAKUMAR RAMACHANDRAN Email SHILPA SAHU Web NIYATI SONI Web NIYATI SONI Email VIINII DOSHI Web RAJNISH KUMAR Web ... (4 Replies)
Discussion started by: Suman Singh
4 Replies

9. Shell Programming and Scripting

How to fix line breaks format text for huge files?

Hi, I need to correct line breaks for huge files (more than 1MM records in a file) and then format it properly. Except the header and trailer, each record starts with 'D'. Requirement:Scan the whole file except the header and trailer records and see if any of the records start with... (19 Replies)
Discussion started by: kikionline
19 Replies

10. Shell Programming and Scripting

Perl: Need help comparing huge files

What do i need to do have the below perl program load 205 million record files into the hash. It currently works on smaller files, but not working on huge files. Any idea what i need to do to modify to make it work with huge files: #!/usr/bin/perl $ot1=$ARGV; $ot2=$ARGV; open(mfileot1,... (12 Replies)
Discussion started by: mrn6430
12 Replies
bup-margin(1)						      General Commands Manual						     bup-margin(1)

NAME
bup-margin - figure out your deduplication safety margin SYNOPSIS
bup margin [options...] DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids. For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects. If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if you're getting dangerously close to 160 bits. OPTIONS
--predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. --ignore-midx don't use .midx files, use only .idx files. This is only really useful when used with --predict. EXAMPLE
$ bup margin Reading indexes: 100.00% (1612581/1612581), done. 40 40 matching prefix bits 1.94 bits per doubling 120 bits (61.86 doublings) remaining 4.19338e+18 times larger is possible Everyone on earth could have 625878182 data sets like yours, all in one repository, and we would expect 1 object collision. $ bup margin --predict PackIdxList: using 1 index. Reading indexes: 100.00% (1612581/1612581), done. 915 of 1612581 (0.057%) SEE ALSO
bup-midx(1), bup-save(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-margin(1)
All times are GMT -4. The time now is 07:08 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy