Compare two files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare two files
# 1  
Old 12-02-2013
Compare two files

Hi all,

I need help on the following.
I have two files:
File1.txt
Code:
< 2233122266196246529, NOT_USED, NOT_USED, NOT_USED, 2, NOT_USED, Y, N, 0, (VPN) 284016910526692, -1, 0, -1, NOT_USED, 2013-04-12T01:48:43.645+02:00 / KCC script, C1C, (VPNPRO) mtel-MVPN_(VPN
ACC) 193708_A359887654295, NOT_USED, -1, N, 959992017453809664, 0, 933861691161804800, <NULL>, <NULL>, <NULL>, 2233122266196246528, <NULL>, <NULL>, <NULL> >

File2.txt
Code:
< (VPN) 284016911195423, (VPNPRO) mtel-MVPN_(VPNACC) 193708_666 >

I need to compare all rows from file1.txt to file2.txt and extract all of these in file2.txt where the bold parts are equals and the number after underline(_666) is 3 or 4 digits.

Thank you in advance.
# 2  
Old 12-02-2013
Giving us two lines from file1.txt (sometimes called File1.txt) and one line from file2.txt (sometimes called File2.txt) makes us guess about a lot.

Are the Bold tags actually in the input files?

Is the value to be compared always at the start of the 2nd field on even lines in File1.txt and at the start of the 6th field in every line in File2.txt?

Or, is there some other way we are supposed to identify the fields to be compared?
# 3  
Old 12-02-2013
Hi Don,

Thank you for your reply.
First, ignore capital letter. The two files are file1.txt and file2.txt.
The both files have more than 1 million rows and the bold tags are in both of them. The rows in files start with < and end with >.
Examples in previous post are 1 row from file1.txt and 1 row from file2.txt.
Below I have given a large part of both files
Code:
< 2233122266196246529, NOT_USED, NOT_USED, NOT_USED, 2, NOT_USED, Y, N, 0, (VPN) 284016910526692, -1, 0, -1, NOT_USED, 2013-04-12T01:48:43.645+02:00 / KCC script, C1C, (VPNPRO) mtel-MVPN_(VPN ACC) 193708_A359887654295, NOT_USED, -1, N, 959992017453809664, 0, 933861691161804800, <NULL>, <NULL>, <NULL>, 2233122266196246528, <NULL>, <NULL>, <NULL> >
< 1119872920737644544, NOT_USED, NOT_USED, NOT_USED, 2, NOT_USED, Y, N, 0, (VPN) 284019910340503, -1, 0, -1, NOT_USED, 2010-04-08T16:52:44.104+03:00 / ID = 53269236, C1C, (VPNPRO) mtel-MVPN_(VPNACC) 15819_A359884985869, <NULL>, -1, N, 959992017453809664, 0, 933861691161804800, <NULL>, <NULL>, <NULL>, 1119872919840063488, <NULL>, <NULL>, <NULL> >
< 1119872930367766528, NOT_USED, NOT_USED, NOT_USED, 2, NOT_USED, Y, N, 0, (VPN) 284019910340498, -1, 0, -1, NOT_USED, 2010-04-08T16:52:53.609+03:00 / ID = 53269240, C1C, (VPNPRO) mtel-MVPN_(VPNACC) 15820_A359884987574, <NULL>, -1, N, 959992017453809664, 0, 933861691161804800, <NULL>, <NULL>, <NULL>, 1119872929495351296, <NULL>, <NULL>, <NULL> >

file2.txt
Code:
< (VPN) 284015910031285, (VPNPRO) mtel-MVPN_(VPNACC) 15819_A359889099175 >
< (VPN) 284010910530070, (VPNPRO) mtel-MVPN_(VPNACC) 15820_A359889526870 >
< (VPN) 999993080285668, (VPNPRO) mtel-MVPN_(VPNACC) 15819_200 >
< (VPN) 999993080285669, (VPNPRO) mtel-MVPN_(VPNACC) 15820_201 >

I need to search every row from file1.txt in file2.txt and extract these from file2.txt where bold tags are equals and number after bold tag in file2.txt is 3 or 4 digits.
In other words i need rows 3 and 4 from the file2.txt in the examples above.
# 4  
Old 12-02-2013
Code:
 perl -nle 'if (/\(VPN\s*ACC\)\s+(.+?)_/){$hash{$1}++}
END{ open(fh,"file2");
while (<fh>) { if (/\s+([0-9]+?)_[0-9]..\s+\>/) {if ($hash{$1}) { printf $_;} }}}' file1

# 5  
Old 12-02-2013
Based on some assumptions/guesses (as Don Cragun alluded to), try also
Code:
awk     'NR==FNR        {sub (/_.*$/, "", $25);T[$25]; next}
                        {for (i in T) if (match ($6, i"_[0-9][0-9][0-9]$")) print}
        ' file1 file2
< (VPN) 999993080285668, (VPNPRO) mtel-MVPN_(VPNACC) 15819_200 >
< (VPN) 999993080285669, (VPNPRO) mtel-MVPN_(VPNACC) 15820_201 >

# 6  
Old 12-02-2013
Since the OP says that file1.txt and file2.txt actually does have BOLD tags (i.e., [B] and [/B]) in the data to mark the digit strings to be matched and awk's match function will treat those tags in the variable i as matching expressions instead of as literal text, RudiC's code doesn't seem to work. I didn't get any output from pravin27's perl script either (I assume for the same reason, but haven't dug into it.) And given that we have been shown lines with "VPN ACC" and some with "VPNACC" and some lines with "KCC script," and some lines with "ID = <digit_string>,", I'm not confident that the bolded string will always be in field 25 in file1.txt. Note also that the OP said he was looking for _ and 3 or 4 digits (not just 3) in file2.txt following the matching bolded digit string (but none of the sample input had 4 digits).

The following seems to work in my tests, but we could probably do something more efficient if we had a better definition of the input file record formats:
Code:
awk '
BEGIN { bds = "[[]B[]]([0-9])+[[]/B[]]" # ERE to match a bolded digit string.
}
# If a line in the 1st file ...
FNR == NR {
        # contains a bolded string of digits, save that bolded string.
        if(match($0, bds))
                bs[substr($0, RSTART, RLENGTH)]
printf("NR=%d, bs[%s] created\n", NR, substr($0, RSTART, RLENGTH))
        next
}
# If a line in the 2nd input file contains a bolded string followed by an
# underscore, 3 or 4 digits, and a space...
/[[]\/B[]]_([0-9]){3,4} / {
        # print the line if the bolded digit string in this line appeared in
        # the first file.
        if(match($0, bds) && substr($0, RSTART, RLENGTH) in bs) print
}' file1.txt file2.txt

With the sample data given in message #3 in this thread, it produces the output:
Code:
NR=1, bs[193708] created
NR=2, bs[15819] created
NR=3, bs[15820] created
< (VPN) 999993080285668, (VPNPRO) mtel-MVPN_(VPNACC) 15819_200 >
< (VPN) 999993080285669, (VPNPRO) mtel-MVPN_(VPNACC) 15820_201 >

Note that the bold text in the output above is from BOLD tags in file1.txt and file2.txt; not from me massaging the output HTML to highlight the matched fields as they are displayed in this forum. The printf statement shown in red in the script is a debugging aid that produces the text in red in the output. Remove the printf to get just the output requested.

As always, if you want to run this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.

I am still not confidant that this will work. If relatively long lines are split (as they were in the 1st message in this thread (but not in the 3rd message in this thread), there is nothing to prevent the bolded string from being split across two lines in file1.txt. I could modify the script to work even if this happens, but given the poor specification of input file formats, I didn't take the extra time to do that.
# 7  
Old 12-03-2013
Thank you all for your replies.

@RudiC and pravin27: Your suggestions doesn't work for me.
@Don: Sorry Don but I didn't understand your question in post #2 regarding bold tags.
I just made the tags bold for your convenience in my post but in files they are regular. Sorry for my misunderstanding.
The two files are output from SQL commands.
One line from file1.txt consist 30 fields separated by commas. A line from file2.txt has two fields separated by comma. The lines aren't split as shown in 3rd post in thread.
If you use comma as separator the strings for compare are in 17-th filed in file1.txt and in 2-nd field in file2.txt.
Code:
cat file1.txt | awk -F"," '{print $17}' 
 (VPNPRO) mtel-MVPN_(VPN ACC) 193708_A359887654295
 (VPNPRO) mtel-MVPN_(VPNACC) 15819_A359884985869
 (VPNPRO) mtel-MVPN_(VPNACC) 15820_A359884987574

Code:
cat file2.txt | awk -F"," '{print $2}' 
 (VPNPRO) mtel-MVPN_(VPNACC) 15819_A359889099175 >
 (VPNPRO) mtel-MVPN_(VPNACC) 15820_A359889526870 >
 (VPNPRO) mtel-MVPN_(VPNACC) 15819_200 >
 (VPNPRO) mtel-MVPN_(VPNACC) 15820_201 >

Thank you for your time Don and sorry again for my misunderstanding.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare files and share output from both files

hi all, Thanks to all for your great help... I have a scenario that I have two files (file1 & file2). I need to compare two files entire row by row and share the output if any discrepancies within two files. File1: DB1|TB1|C1,C3 DB2|TB2|C1,C2 DB3|TB3|C1,C2,C3,C4 File2: ... (2 Replies)
Discussion started by: Selva_2507
2 Replies

2. Shell Programming and Scripting

Compare multiple files, and extract items that are common to ALL files only

I have this code awk 'NR==FNR{a=$1;next} a' file1 file2 which does what I need it to do, but for only two files. I want to make it so that I can have multiple files (for example 30) and the code will return only the items that are in every single one of those files and ignore the ones... (7 Replies)
Discussion started by: castrojc
7 Replies

3. Shell Programming and Scripting

Compare two files, then overwrite first file with only that in both files

I want to compare two files, and search for items that are in both. Then override the first file with that containing only elements which were in both files. I imagine something with diff, but not sure. File 1 One Two Three Four Five File 2 One Three Four Six Eight (2 Replies)
Discussion started by: castrojc
2 Replies

4. Shell Programming and Scripting

Compare files

Please help me with awk.I have two files with the below details file1 123456789 2012 987654321 2011 a1234567892012 a1234abcde2012 b1234567892012 c1234567892012 98765a12342012 file2 a1234 01234 b1234 33333 I need to check whether the items in file2 is present in file1 .If it is... (2 Replies)
Discussion started by: Mary James
2 Replies

5. Shell Programming and Scripting

Require compare command to compare 4 files

I have four files, I need to compare these files together. As such i know "sdiff and comm" commands but these commands compare 2 files together. If I use sdiff command then i have to compare each file with other which will increase the codes. Please suggest if you know some commands whcih can... (6 Replies)
Discussion started by: nehashine
6 Replies

6. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies

7. Shell Programming and Scripting

How to compare 2 files & get only few columns based on a condition related to both files?

Hiiiii friends I have 2 files which contains huge data & few lines of it are as shown below File1: b.dat(which has 21 columns) SSR 1976 8 12 13 10 44.00 39.0700 70.7800 7.0 0 0.00 0 2.78 0.00 0.00 0 0.00 2.78 0 NULL ISC 1976 8 12 22 32 37.39 36.2942 70.7338... (6 Replies)
Discussion started by: reva
6 Replies

8. Shell Programming and Scripting

compare files in two directories and output changed files to third directory

I have searched about 30 threads, a load of Google pages and cannot find what I am looking for. I have some of the parts but not the whole. I cannot seem to get the puzzle fit together. I have three folders, two of which contain different versions of multiple files, dist/file1.php dist/file2.php... (4 Replies)
Discussion started by: bkeep
4 Replies

9. Shell Programming and Scripting

compare two files and to remove the matching lines on both the files

I have two files and need to compare the two files and to remove the matching lines from both the files (4 Replies)
Discussion started by: shellscripter
4 Replies

10. Shell Programming and Scripting

compare two files

I have file1 and file2: file1: 11 xxx kksd ... 22 kkk kdsglg... 33 sss kdfjdksa... 44 kdsf dskjfkas ... hh kdkf kdkkd.. jg dkf dfkdk ... ... file2: jg 22 hh ... I need to check each line of file1. if the field one is in file2, I will keep it; if not, the whole line will be... (17 Replies)
Discussion started by: fredao
17 Replies
Login or Register to Ask a Question