Compare files


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Compare files
# 1  
Old 09-11-2014
Compare files

Hello all,

I have two files pdflist and xmllist like the one below:

Content of pdflist:
Code:
a.pdf
b.pdf
d.pdf

Content of xmllist:
Code:
a.xml
b.xml
c.xml


In the above list, d.pdf is not having its pair d.xml
Similarly, c.xml is is not having its pair c.pdf
I am supposed to segregate the xml pdf pair files and filter out other files.

ie: my o/p shld be two files whr one shld contains all the paired files and the other shld contain rest of the files.

Contents of pair.txt:
Code:
a.pdf
a.xml
b.pdf
b.xml

Contents of other.txt:
Code:
d.pdf
c.xml

Is there a way to achieve this?

Last edited by rbatte1; 09-11-2014 at 07:34 AM.. Reason: Changed ICODE tags to just CODE tags
# 2  
Old 09-11-2014
Hi,

You can use "comm" for details see;
Code:
man comm

Regards

Dave
# 3  
Old 09-11-2014
You could do this in a few steps with sed and grep:-
Code:
#!/bin/ksh

printf "s/.xml$/.pdf\$/\ns/^/\^/\n" > /tmp/xml.sed    # Set up editing commands
sed -s /tmp/xml.sed xmllist > /tmp/edited_xmllist     # Run the edit to convert records
                                                      # to suffix .pdf and put beginning and end markers for grep to read.

grep -vf /tmp/edited_xmllist pdflist                  # Get records from pdflist that do not match the edited_xmllist file.

Similarly you can do it with the files the other way around and change the edit.

The two edits to the input file are:-
  1. Change the .xml to .pdf and add an end of line marker $ so a.xml does not validate a.xmlz
  2. Add a start of line marker to ensure that a c.xml does not validate not_c.pdf
Perhaps someone can smarten this up. I'm still early on learning sed


I hope that this helps,
Robin
# 4  
Old 09-11-2014
Hi anijan,

Ignore what I said about "comm", I looked at your output data instead of the input data.

Regards

Dave
# 5  
Old 09-11-2014
Hello again anijan,

I too have just looked again at the output you require and I've only considered half the job.

From what I have have described above, you get the pdf files not listed in the xmlfile. You will need to capture this into file /tmp/unsorted_other.txt and then:-
  • Run it without the -v flag on the grep capturing the output to /tmp/unsorted_pair.txt
  • Run it with the files and edit reversed to get the xml files not listed in pdflist and append this to /tmp/unsorted_other.txt
  • Run it a fourth time, reversed file names and without the -v flag to get the xml files that are matched by pdflist, appending the output to /tmp/unsorted_pair.txt
  • Run sort /tmp/unsorted_other.txt > other.txt
  • Run sort /tmp/unsorted_pair.txt > pair.txt
I think that should get you what you want. If it doesn't post up the output/errors in CODE tags for us to review.



Robin

Last edited by rbatte1; 09-11-2014 at 08:21 AM.. Reason: Clarity
# 6  
Old 09-11-2014
Try
Code:
awk     'NR==FNR        {PDF[$1]++; next}
         $1 in PDF      {print $1".pdf" > "pair.txt"
                         print          > "pair.txt"
                         delete PDF[$1]
                         next}
                        {print $0 > "other.txt"}
         END            {for (i in PDF) print i".pdf" > "other.txt"}
        ' FS="." file1 file2
pair.txt:
a.pdf
a.xml
b.pdf
b.xml
other.txt:
c.xml
d.pdf

This User Gave Thanks to RudiC For This Post:
# 7  
Old 09-12-2014
Thank you Dave, Robin and RudiC. All your inputs were very useful. I was able to achieve my output with comm and awk commands..!
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare files and share output from both files

hi all, Thanks to all for your great help... I have a scenario that I have two files (file1 & file2). I need to compare two files entire row by row and share the output if any discrepancies within two files. File1: DB1|TB1|C1,C3 DB2|TB2|C1,C2 DB3|TB3|C1,C2,C3,C4 File2: ... (2 Replies)
Discussion started by: Selva_2507
2 Replies

2. Shell Programming and Scripting

Compare multiple files, and extract items that are common to ALL files only

I have this code awk 'NR==FNR{a=$1;next} a' file1 file2 which does what I need it to do, but for only two files. I want to make it so that I can have multiple files (for example 30) and the code will return only the items that are in every single one of those files and ignore the ones... (7 Replies)
Discussion started by: castrojc
7 Replies

3. Shell Programming and Scripting

Compare two files, then overwrite first file with only that in both files

I want to compare two files, and search for items that are in both. Then override the first file with that containing only elements which were in both files. I imagine something with diff, but not sure. File 1 One Two Three Four Five File 2 One Three Four Six Eight (2 Replies)
Discussion started by: castrojc
2 Replies

4. Shell Programming and Scripting

Compare files

Please help me with awk.I have two files with the below details file1 123456789 2012 987654321 2011 a1234567892012 a1234abcde2012 b1234567892012 c1234567892012 98765a12342012 file2 a1234 01234 b1234 33333 I need to check whether the items in file2 is present in file1 .If it is... (2 Replies)
Discussion started by: Mary James
2 Replies

5. Shell Programming and Scripting

Require compare command to compare 4 files

I have four files, I need to compare these files together. As such i know "sdiff and comm" commands but these commands compare 2 files together. If I use sdiff command then i have to compare each file with other which will increase the codes. Please suggest if you know some commands whcih can... (6 Replies)
Discussion started by: nehashine
6 Replies

6. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies

7. Shell Programming and Scripting

How to compare 2 files & get only few columns based on a condition related to both files?

Hiiiii friends I have 2 files which contains huge data & few lines of it are as shown below File1: b.dat(which has 21 columns) SSR 1976 8 12 13 10 44.00 39.0700 70.7800 7.0 0 0.00 0 2.78 0.00 0.00 0 0.00 2.78 0 NULL ISC 1976 8 12 22 32 37.39 36.2942 70.7338... (6 Replies)
Discussion started by: reva
6 Replies

8. Shell Programming and Scripting

compare files in two directories and output changed files to third directory

I have searched about 30 threads, a load of Google pages and cannot find what I am looking for. I have some of the parts but not the whole. I cannot seem to get the puzzle fit together. I have three folders, two of which contain different versions of multiple files, dist/file1.php dist/file2.php... (4 Replies)
Discussion started by: bkeep
4 Replies

9. Shell Programming and Scripting

compare two files and to remove the matching lines on both the files

I have two files and need to compare the two files and to remove the matching lines from both the files (4 Replies)
Discussion started by: shellscripter
4 Replies

10. Shell Programming and Scripting

compare two files

I have file1 and file2: file1: 11 xxx kksd ... 22 kkk kdsglg... 33 sss kdfjdksa... 44 kdsf dskjfkas ... hh kdkf kdkkd.. jg dkf dfkdk ... ... file2: jg 22 hh ... I need to check each line of file1. if the field one is in file2, I will keep it; if not, the whole line will be... (17 Replies)
Discussion started by: fredao
17 Replies
Login or Register to Ask a Question