Find diff between two patterns in two files and append


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Find diff between two patterns in two files and append
# 1  
Old 11-01-2012
Find diff between two patterns in two files and append

Hi,

I'm a newbie at programming in Unix, and I seem to have a task that is greater than I can handle. Trying to learn awk by the way (but in the end, i just need something that works). My goal is to compare two files and output the difference between the two. I've been reading, and I think I could do a line-by-line comparison, but I don't really understand how I could do this if the lines are not aligned. I learned how to do a pattern search too, but I don't know how to compare patterns. Diff function gives me the differences, cool. However, what I want is to compare text between two patterns side by side and output the differences or else append what is extra in file 2 to file 1 within similar sections. Let me provide an example to be clearer...

Edit: I thought of an alternative. Search for what is similar between file 1 and file 2, and delete whats similar in file 2. Seems much easier.

2 input files
Code:
input1.txt

# P: 90-175
# Q: ask111
# R:
# P: 510-579
# Q: ask111
# R:
yu    x1520    100.00    14
# P: 580-659
# Q: ask111
# R:
# P: 660-724
# Q: ask111
# R:
# P: 700-765
# Q: ask111
# R:
yu    a340    100.00    15
yu    b340    100.00    15
yu    c340    100.00    15
# P: 740-803
# Q: ask111
# R:
yu    z1838    100.00    15

input2.txt

# P: 90-175
# Q: ask244
# R:
# P: 510-579
# Q: ask244
# R:
yu    x1520    100.00    14
yu    v2838    100.00    15
yu    w999    100.00    15
# P: 580-659
# Q: ask244
# R:
# P: 660-724
# Q: ask244
# R:
# P: 700-765
# Q: ask244
# R:
yu    a340    100.00    15
yu    c340    100.00    15
yu    z1838    100.00    15
# P: 740-803
# Q: ask244
# R:
yu    z1838    100.00    15

Output should either be:
Ideally appended to file 1 in the right section(makes the job so easy, but I can imagine it being hard to do with awk, for example):
Code:
# P: 90-175
# Q: ask
# R:
# P: 510-579
# Q: ask
# R:
yu    x1520    100.00    14
yu    v2838    100.00    15
yu    w999    100.00    15
# P: 580-659
# Q: ask
# R:
# P: 660-724
# Q: ask
# R:
# P: 700-765
# Q: ask
# R:
yu    a340    100.00    15
yu    b340    100.00    15
yu    c340    100.00    15
yu    z1838    100.00    15
# P: 740-803
# Q: ask
# R:
yu    z1838    100.00    15

Or else outputted with proper headings so I can manually append it to the right sections:
For example...
Code:
# P: 510-579
# Q: ask
# R:
yu    v2838    100.00    15
yu    w999    100.00    15
# P: 700-765
# Q: ask
# R:
yu    z1838    100.00    15

So confused...

Last edited by legato22; 11-01-2012 at 05:42 PM..
# 2  
Old 11-01-2012
With comm (and sort) you can select lines that are common, a only, b only in any combination with tab markers of which is which. Man Page for comm (opensolaris Section 1) - The UNIX and Linux Forums Think of the files as sets of lines, and comm can tell you, for files sorted unique, what is a not b, a and b or b not a.

Now, if there is a multilevel structure within the file, you need to preprocess it so detail lines have category prefixes, like "# R:" above. If order is important after "# R:", you need to number the lines in each group, but that will accentuate minor differences.

Diff can process two files for differences and provides options for several flavors of output. There is even a diff3 so if you have a common starting point and files 2 developers made from it, you can merge their changes using the -e output into ex, to make the fourth corner. Of course, if they changed the same lines, it has more trouble -- time to have a local coding standard or a diff-friendly code beautifier.

Last edited by DGPickett; 11-01-2012 at 06:16 PM..
# 3  
Old 11-01-2012
Quote:
Originally Posted by DGPickett
With comm (and sort) you can select lines that are common, a only, b only in any combination with tab markers of which is which. Man Page for comm (opensolaris Section 1) - The UNIX and Linux Forums Think of the files as sets of lines, and comm can tell you, for files sorted unique, what is a not b, a and b or b not a.

Now, if there is a multilevel structure within the file, you need to preprocess it so detail lines have category prefixes, like "# R:" above. If order is important after "# R:", you need to number the lines in each group, but that will accentuate minor differences.

Diff can process two files for differences and provides options for several flavors of output. There is even a diff3 so if you have a common starting point and files 2 developers made from it, you can merge their changes using the -e output into ex, to make the fourth corner. Of course, if they changed the same lines, it has more trouble -- time to have a local coding standard or a diff-friendly code beautifier.
Thanks, good idea...preprocessing the data. Let me think about how to do it. I tried the comm command. Would be fine, but the main issue is the multilevel structure. For now, I'm using an awk script to just find what is present in file 2 that is not present in file 1, manually going back to file 1 and repasting it under appropriate sections.
# 4  
Old 11-02-2012
If you store one file in appropriate associative arrays, and then update it from a second file, you can then disgorge the updated data as a normalized file. Deletes would be in how updates are handled, that new sections replace whole old corresponding sections. You build a string key for your associative array just like the prefix needed for the comm solution. Think of a system with the appearance of just one disk with just one big directory and / is just a file name character. It works OK because this directory is a random hash container.

Processing the data into a working form with hierarchy prefixes on every line means you can sort and comm them, but as I say, if the order is important, you need to number them at that level of the profix. After processing for changes, you can reverse the process to make a normalized output file with hierarchy. (The funny thing about protocols in communication is that each protocol essentially adds more prefixes to the message, like html in http in tcp in ip in ethernet.)

Last edited by DGPickett; 11-02-2012 at 01:16 PM..
This User Gave Thanks to DGPickett For This Post:
# 5  
Old 11-02-2012
Thanks for the ideas. I'll prob work more on it this weekend. For now I gotta get the practical work done, so I'm sticking to manual unfortunately. I'm pretty new to programming. Took some java in highschool, but thats basically it. Just discovered these shell functions/programming - my code is pretty scrappy, but it works for simple tasks. I work around my poor coding skills by writing shell scripts that sequentially perform simple tasks on the output file from each previous step.

Last edited by legato22; 11-02-2012 at 01:25 PM..
# 6  
Old 11-02-2012
There are many great tutorials online, so as often as I read man pages, for shell function I often google up a tutorial on the specific area of shell I want to use,

You seem to have a first level key after P with a payload of an attribute Q and multiple lines of R. So, when you have the P, you prefix the Q and all the R lines with it. I am not sure what the rules are for merging Q or deleting R. In XML, an item might be:
Code:
<P low="700" hi="765" Q="ask">
 <R f1="yu" f2="z1838" f3="100.00" f4="15"/>
 </P>


Last edited by DGPickett; 11-02-2012 at 02:11 PM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash - Find files excluding file patterns and subfolder patterns

Hello. For a given folder, I want to select any files find $PATH1 -f \( -name "*" but omit any files like pattern name ! -iname "*.jpg" ! -iname "*.xsession*" ..... \) and also omit any subfolder like pattern name -type d \( -name "/etc/gconf/gconf.*" -o -name "*cache*" -o -name "*Cache*" -o... (2 Replies)
Discussion started by: jcdole
2 Replies

2. Shell Programming and Scripting

Find files not matching multiple patterns and then delete anything older than 10 days

Hi, I have multiple files in my log folder. e.g: a_m1.log b_1.log c_1.log d_1.log b_2.log c_2.log d_2.log e_m1.log a_m2.log e_m2.log I need to keep latest 10 instances of each file. I can write multiple find commands but looking if it is possible in one line. m file are monthly... (4 Replies)
Discussion started by: wahi80
4 Replies

3. Shell Programming and Scripting

Find matched patterns and print them with other patterns not the whole line

Hi, I am trying to extract some patterns from a line. The input file is space delimited and i could not use column to get value after "IN" or "OUT" patterns as there could be multiple white spaces before the next digits that i need to print in the output file . I need to print 3 patterns in a... (3 Replies)
Discussion started by: redse171
3 Replies

4. Shell Programming and Scripting

Find matched patterns in a column of 2 files with different size and merge them

Hi, i have input files like below:- input1 Name Seq_ID NewID Scores MT1 A0QZX3 1.65 277.4 IVO A0QZX3 1.65 244.5 HPO A0QZX3 1.65 240.5 RgP A0Q3PP 5.32 241.0 GX1 LPSZ3S 96.1 216.9 MEL LPSS3X 4.23 204.1 LDD LPSS3X 4.23 100.2 input2 Fac AddName NewID ... (9 Replies)
Discussion started by: redse171
9 Replies

5. Shell Programming and Scripting

Find matched patterns in multiple files

Hi, I need help to find matched patterns in 30 files residing in a folder simultaneously. All these files only contain 1 column. For example, File1 Gr_1 st-e34ss-11dd bt-wwd-fewq pt-wq02-ddpk pw-xsw17-aqpp Gr_2 srq-wy09-yyd9 sqq-fdfs-ffs9 Gr_3 etas-qqa-dfw ddw-ppls-qqw... (10 Replies)
Discussion started by: redse171
10 Replies

6. Shell Programming and Scripting

Required help to print diff columns for 2 patterns using awk

Hi All, I need help for below scenario : I have a principals.xml_24092012backup file : cat principals.xml_24092012backup </user> <user username="eramire" password="2D393C01720749256303D204826A374D9AE9ABABBF8A"> <roleMapping rolename="VIEW_EVERYTHING"/> </user> ... (2 Replies)
Discussion started by: kiran_j
2 Replies

7. Shell Programming and Scripting

Find files that do not match specific patterns

Hi all, I have been searching online to find the answer for getting a list of files that do not match certain criteria but have been unsuccessful. I have a directory that has many jpg files. What I need to do is get a list of the files that do not match both of the following patterns (I have... (21 Replies)
Discussion started by: nikos-koutax
21 Replies

8. Shell Programming and Scripting

Find diff bet 2 files and store result in another file

Hi I want to compare 2 files. The files have the same amount of rows and columns. So each line must be compare against the other and if one differs from the other, the result of both must be stored in a seperate file. I am doing this in awk. Here is my file1: Blocks... (2 Replies)
Discussion started by: ladyAnne
2 Replies

9. Shell Programming and Scripting

search directory-find files-append at end of line

Hi, I have a command "get_data" with some parameters in few *.text files of a directory. I want to first find those files that contain this command and then append the following parameter to the end of the command. example of an entry in the file :- get_data -x -m50 /etc/web/getid this... (1 Reply)
Discussion started by: PrasannaKS
1 Replies

10. Shell Programming and Scripting

Find duplicates from multuple files with 2 diff types of files

I need to compare 2 diff type of files and find out the duplicate after comparing each types of files: Type 1 file name is like: file1.abc (the extension abc could any 3 characters but I can narrow it down or hardcode for 10/15 combinations). The other file is file1.bcd01abc (the extension... (2 Replies)
Discussion started by: ricky007
2 Replies
Login or Register to Ask a Question