Compare two string and get "exact" difference only


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare two string and get "exact" difference only
# 1  
Old 02-09-2012
Compare two string and get "exact" difference only

Hi all;

Pretty green to perl programming; been searching high and low for a perl (preferably) or unix script that will compare 2 CSV strings in the same file that are separated buy the "|" character (so basically they're side by side) and give the results of ONLY the exact change; note that 19 is not a line number it's just a numeric field. If it helps; we can assume that both CSV strings have fields so we can call them:
rule, client, location, script, destination, enabled, search, compress

Also; rule can never be altered\changed only fields 2 to 8 can be modified.

So with all that here's an exmple; if I have a file (call it file1) with this content:
Code:
19,gmp,charlie,brown,is,funny,<super>man<isalive> <super>boy<isolder> <wonderwoman>is<cute>,YES|19,gmp,charlie,brown,is,funny,<super>man<isalive> <super>boy<isolder>,YES
300,gjv,mary,hadalittlelamb,its,flease,<was>whaite<assnow> <and>everywhere<that> <mary went>thelambwas<sure to go>,YES|300,gjv,mary,hadalittlelamb,its,flease,<was>white<assnow> <and>everywhere<that> <marywent>thelambwas<suretogo> <thefarmer>inthe<dell> <hewas>neverto<comeout>,YES
3012,gjv,timmy,wasabad,boy,andhe,<was>always<causing> <trouble>around<town> <untilone>day<hewenttoofar>,YES|3012,gjv,timmy,wasabad,boy,andhe,<was>always<causing> <trouble>around<town> <hewas>caught<andsentaway>,YES

Output I am looking for (to be written to file2):
Code:
-------------------------------------
AUDIT REPORT
 
CHANGED:
field1:19 
BEFORE:
19,gmp,charlie,brown,is,funny,<super>man<isalive> <super>boy<isolder> <wonderwoman>is<cute>,YES
AFTER:
19,gmp,charlie,brown,is,funny,<super>man<isalive> <super>boy<isolder>,YES
SPECIFICS:
<wonderwoman>is<cute> ----> removed
 
CHANGED:
field1:300
BEFORE:
300,gjv,mary,hadalittlelamb,its,flease,<was>whaite<assnow> <and>everywhere<that> <mary went>thelambwas<sure to go>,YES
AFTER:
300,gjv,mary,hadalittlelamb,its,flease,<was>white<assnow> <and>everywhere<that> <marywent>thelambwas<suretogo> <thefarmer>inthe<dell> <hewas>neverto<comeout>,YES
SPECIFICS:
<thefarmer>inthe<dell> ----> added
<hewas>neverto<comeout> ----> added
 
CHANGED:
field1:3012
BEFORE:
3012,gjv,timmy,wasabad,boy,andhe,<was>always<causing> <trouble>around<town> <untilone>day<hewenttoofar>,YES
AFTER:
3012,gjv,timmy,wasabad,boy,andhe,<was>always<causing> <trouble>around<town> <hewas>caught<andsentaway>,YES
SPECIFICS:
<untilone>day<hewenttoofar> ----> removed
<hewas>caught<andsentaway> ----> added
 
SUMMARY
Rule was changed for 2 clients: gmp,gjv
Total number of rules changes: 3
Rules changed:
gmp: 19
gjv: 300,3012
--------------------------------------------------------

Notes: notice on the SUMMARY that even though there were 2 records changed for client gjv that it only appears once in the line "Rule was changed for 2 clients:" but it is accouhted for in the line "Total number of rules changes:"

Thanks
G

Last edited by jim mcnamara; 02-09-2012 at 01:15 PM.. Reason: added code tags; please use them
# 2  
Old 02-09-2012
Here is a skeleton for one possible solution:
Code:
#!/bin/ksh
IFS='|'
while read mBefore mAfter; do
  if [[ "${mBefore}" != "${mAfter}" ]]; then
    echo ${mBefore} | sed 's/,/|/g' | read mB1 mB2 mB3 mB4 mB5 mB6 mB7 mB8
    echo ${mAfter} | sed 's/,/|/g' | read mA1 mA2 mA3 mA4 mA5 mA6 mA7 mA8
    if [[ "${mB1}" != "{$mA1}" ]]; then
      echo "Field 1 was changed."
      echo "Before: ${mB1}"
      echo "After: ${mA1}"
    fi
  fi
done < Input_File

Change it according to your requirements.
# 3  
Old 02-10-2012
Thanks for your time...I will try it out.
Sincere regards
Giuliano

---------- Post updated at 02:53 PM ---------- Previous update was at 08:24 AM ----------

Not really working as I intended.
Thanks

---------- Post updated at 02:53 PM ---------- Previous update was at 02:53 PM ----------

Not really working as I intended.
Thanks
# 4  
Old 02-11-2012
Hi.

I tend to use whatever is "standard" or available on systems before I start coding a custom solution. So here is an example using utility dwdiff. The output is not in the format you desired, but it was quick to put together. The key part is the dwdiff and the slight reformatting with sed:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate difference by "word", dwdiff.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C dwdiff

FILE=${1-data1}

pl " Sample of data file $FILE:"
cut -c1-50 $FILE

pl " Results:"
dwdiff -s -3 -d"," <( cut -d"|" -f1 $FILE ) <( cut -d"|" -f2 $FILE ) |
sed -e 's/^ *//' -e 's/\] *{/\]\n{/'

exit 0

producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
dwdiff 1.8.2

-----
 Sample of data file data1:
19,gmp,charlie,brown,is,funny,<super>man<isalive> 
300,gjv,mary,hadalittlelamb,its,flease,<was>whaite
3012,gjv,timmy,wasabad,boy,andhe,<was>always<causi

-----
 Results:
old: 54 words  47 87% common  1 1% deleted  6 11% changed
new: 52 words  47 90% common  0 0% inserted  5 9% changed
======================================================================
[-<wonderwoman>is<cute>-]
======================================================================
[-<was>whaite<assnow>-]
{+<was>white<assnow>+}
======================================================================
[-<mary went>thelambwas<sure to go>-]
{+<marywent>thelambwas<suretogo> <thefarmer>inthe<dell> <hewas>neverto<comeout>+}
======================================================================
[-<untilone>day<hewenttoofar>-]
{+<hewas>caught<andsentaway>+}
======================================================================

The square brackets mark deletions, curly braces insertions. Both can be changed.

A web page for dwdiff is at dwdiff - ghalkes:~#

Best wishes ... cheers, drl
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Mindboggling difference between using "tee" and "/usr/bin/tee" in bash

I'm on Ubuntu 14.04 and I manually updated my coreutils so that "tee" is now on version 8.27 I was running a script using bash where there is some write to pipe error at some point causing the tee command to exit abruptly while the script continues to run. The newer version of tee seems to prevent... (2 Replies)
Discussion started by: stompadon
2 Replies

2. Shell Programming and Scripting

Delete all log files older than 10 day and whose first string of the first line is "MSH" or "<?xml"

Dear Ladies & Gents, I have a requirement to delete all the log files in /var/log/test directory that are older than 10 days and their first line begin with "MSH" or "<?xml" or "FHS". I've put together the following BASH script, but it's erroring out: for filename in $(find /var/log/test... (2 Replies)
Discussion started by: Hiroshi
2 Replies

3. Shell Programming and Scripting

grep with "[" and "]" and "dot" within the search string

Hello. Following recommendations for one of my threads, this is working perfectly : #!/bin/bash CNT=$( grep -c -e "some text 1" -e "some text 2" -e "some text 3" "/tmp/log_file.txt" ) Now I need a grep success for some thing like : #!/bin/bash CNT=$( grep -c -e "some text_1... (4 Replies)
Discussion started by: jcdole
4 Replies

4. Shell Programming and Scripting

tcsh - understanding difference between "echo string" and "echo string > /dev/stdout"

I came across and unexpected behavior with redirections in tcsh. I know, csh is not best for redirections, but I'd like to understand what is happening here. I have following script (called out_to_streams.csh): #!/bin/tcsh -f echo Redirected to STDOUT > /dev/stdout echo Redirected to... (2 Replies)
Discussion started by: marcink
2 Replies

5. Shell Programming and Scripting

how to use "cut" or "awk" or "sed" to remove a string

logs: "/home/abc/public_html/index.php" "/home/abc/public_html/index.php" "/home/xyz/public_html/index.php" "/home/xyz/public_html/index.php" "/home/xyz/public_html/index.php" how to use "cut" or "awk" or "sed" to get the following result: abc abc xyz xyz xyz (8 Replies)
Discussion started by: timmywong
8 Replies

6. Shell Programming and Scripting

Using sed to find text between a "string " and character ","

Hello everyone Sorry I have to add another sed question. I am searching a log file and need only the first 2 occurances of text which comes after (note the space) "string " and before a ",". I have tried sed -n 's/.*string \(*\),.*/\1/p' filewith some, but limited success. This gives out all... (10 Replies)
Discussion started by: haggismn
10 Replies

7. Shell Programming and Scripting

grep regex, match exact string which includes "/" anywhere on line.

I have a file that contains the 2 following lines (from /proc/mounts) /dev/sdc1 /mnt/backup2 xfs rw,relatime,attr2,noquota 0 0 /dev/sdb1 /mnt/backup xfs rw,relatime,attr2,noquota 0 0 I need to match the string in the second column exactly so that only one result is returned, e.g. > grep... (2 Replies)
Discussion started by: jelloir
2 Replies

8. Solaris

difference between "root" and "usr" packages

Hi, could someone pls enlighten me on the difference between the "root" package and "usr" package? Like in this example: pkginfo -l SUNWGtku | grep -i desc DESC: GTK - The GIMP Toolkit (Usr) and pkginfo -l SUNWGtkr | grep -i desc DESC: GTK - The GIMP Toolkit (Root)... (6 Replies)
Discussion started by: masloff
6 Replies

9. Shell Programming and Scripting

"How to get an exact string from a txt file?"

I have many Gaussian output files, which contain a string start from "HF=" but follws the different values. I'm trying to get this exact string from these txt files. example 1, 2.524075,-0.563322,-1.285286\H,0,-2.544438,-0.678834,1.199166\H,0,2.18 ... (7 Replies)
Discussion started by: liuzhencc
7 Replies

10. Shell Programming and Scripting

input string="3MMTQSZ348GGMZRQWMJM4SD6M";output string="3MMTQ-SZ348-GGMZR-QWMJM-4SD6

input string="3MMTQSZ348GGMZRQWMJM4SD6M" output string="3MMTQ-SZ348-GGMZR-QWMJM-4SD6M" using linux shell script (4 Replies)
Discussion started by: pankajd
4 Replies
Login or Register to Ask a Question