Diffing words - percentages


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Diffing words - percentages
# 1  
Old 01-04-2013
Diffing words - percentages

is there a way to do the following:

say i have two words:

Code:
WelcomeMattTom

and 

WelcomeMTom

How can i compare the two words to know how much alike, in percentages they are?

like, how similar is WelcomeMTom to WelcomeMattTom?

not clear yet?

say i introduced a third word, WelcomeMattTomm, how similar is WelcomeMattTomm to WelcomeMattTom?

im looking for a way to do this in bash/awk. something like this:

Code:
./script.sh <firstword>  <secondword>
98%

which would mean secondword is 98% similar to firstword.

os: linux
# 2  
Old 01-05-2013
You want similarity algorithms

Here is a good article explaining one approach (it talks about java):
How to Strike a Match

Levenshtein distance may be the most likely candidate for you:
Levenshtein distance - Wikipedia, the free encyclopedia

Here is perl module wordnet::similarity
WordNet::Similarity - search.cpan.org

You have to download this module and part of the parent module, too. It gives examples. You will have to work out your percentage calculation using results from a module like this one. Or roll your own (article 1 above). I would recommend doing some reading (above) before messing with this. Similairity algorithms can do interesting and sometimes confusing things. IMO.
This User Gave Thanks to jim mcnamara For This Post:
# 3  
Old 01-05-2013
Hi.

You might start here Levenshtein distance - Wikipedia, the free encyclopedia where there is an explanation of edit distance, some pseudocode, as well as a number of references. Found with google search for distance between 2 strings

Google is your friend.

Best wishes ... cheers, drl

( edit 1: similar to Jim's reply )
This User Gave Thanks to drl For This Post:
# 4  
Old 01-05-2013
oh wow. thanks guys!

i thought there'd be a quick fix for this. but guess i was wrong. lol
# 5  
Old 01-05-2013
Here is a GNU awk code for calculating Levenshtein distance. I hope this will help.
These 2 Users Gave Thanks to Yoda For This Post:
# 6  
Old 01-05-2013
If you just want a quick and dirty estimation:
Code:
d=$(diff <(echo "$1" |sed 's/./&\n/g') <(echo "$2" |sed 's/./&\n/g') |grep -c '^[<>]')
echo $((100-100*d/(${#1}+${#2})))%

This User Gave Thanks to binlib For This Post:
# 7  
Old 01-05-2013
Quote:
Originally Posted by binlib
If you just want a quick and dirty estimation:
Code:
d=$(diff <(echo "$1" |sed 's/./&\n/g') <(echo "$2" |sed 's/./&\n/g') |grep -c '^[<>]')
echo $((100-100*d/(${#1}+${#2})))%


oh my!!!! this one does exactly what i wanted. i knew there had to be a much simpler way. thank you so much. thank you!!!

and thanks to everyone else that responded. i really really appreciate your help. thank you!

Last edited by SkySmart; 01-05-2013 at 09:09 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace particular words in file based on if finds another words in that line

Hi All, I need one help to replace particular words in file based on if finds another words in that file . i.e. my self is peter@king. i am staying at north sydney. we all are peter@king. How to replace peter to sham if it finds @king in any line of that file. Please help me... (8 Replies)
Discussion started by: Rajib Podder
8 Replies

2. Shell Programming and Scripting

Search words in any quote position and then change the words

hi, i need to replace all words in any quote position and then need to change the words inside the file thousand of raw. textfile data : "Ninguno","Confirma","JuicioABC" "JuicioCOMP","Recurso","JuicioABC" "JuicioDELL","Nulidad","Nosino" "Solidade","JuicioEUR","Segundo" need... (1 Reply)
Discussion started by: benjietambling
1 Replies

3. UNIX for Dummies Questions & Answers

Replace the words in the file to the words that user type?

Hello, I would like to change my setting in a file to the setting that user input. For example, by default it is ONBOOT=ON When user key in "YES", it would be ONBOOT=YES -------------- This code only adds in the entire user input, but didn't replace it. How do i go about... (5 Replies)
Discussion started by: malfolozy
5 Replies

4. Shell Programming and Scripting

Gawk gensub, match capital words and lowercase words

Hi I have strings like these : Vengeance mitt Men Vengeance gloves Women Quatro Windstopper Etip gloves Quatro Windstopper Etip gloves Girls Thermobite hooded jacket Thermobite Triclimate snow jacket Boys Thermobite Triclimate snow jacket and I would like to get the lower case words at... (2 Replies)
Discussion started by: louisJ
2 Replies

5. Shell Programming and Scripting

How count the number of two words associated with the two words occurring in the file?

Hi , I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l (1 Reply)
Discussion started by: jmarx
1 Replies

6. Shell Programming and Scripting

Grouping and calculation of percentages

Hi, I have a table like this, Group type L1 L2 L3 L4 L5 L6 A xx1 0 3 3 2 1 0 A xx2 2 2 2 1 7 2 B yy1 2 4 6 6 3 1 C yy2 7 7 7 0 2 3 C zz2 8 8 2 ... (6 Replies)
Discussion started by: polsum
6 Replies

7. Shell Programming and Scripting

Script to provide percentages?

so i'm have been stifled here inn my attempts at this. i need to calculate an unusual figure. what is the percentage difference between 400 and 3? usually, to get the percentage, you just divide the smaller number by the bigger number. then multiply the answer by 100. in this case... (10 Replies)
Discussion started by: SkySmart
10 Replies

8. Shell Programming and Scripting

Comparing sizes in percentages of 2 files in bash

Hi guys, I hope you can enlight me with a script I'm doing for Solaris 10. Script goes like this: #!/usr/bin/bash fechahoy=`perl /export/home/info/John/fechamod.pl` fechayer=`perl /export/home/info/John/fecha.pl` echo $fechahoy echo $fechayer DAT1=`ssh ivt@blahblah ls -la... (1 Reply)
Discussion started by: sr00t
1 Replies

9. Shell Programming and Scripting

Shell script to find out words, replace them and count words

hello, i 'd like your help about a bash script which: 1. finds inside the html file (it is attached with my post) the code number of the Latest Stable Kernel, 2.finds the link which leads to the download location of the Latest Stable Kernel version, (the right link should lead to the file... (3 Replies)
Discussion started by: alex83
3 Replies

10. Shell Programming and Scripting

ksh script using expr to calculate percentages

Within a ksh script on HP-UX I trying to calculate a percentage of a number (number/100 x percentage) using the below method and expr. TARPERC=`expr 16 / 100 \* 5` TARSUM=`expr 16 + $TARPERC` ZIPSUM=`expr $TARSUM \* 2` If the input is 16 outputs are: TARPERC: 0 TARSUM: 16 ZIPSUM: 32... (6 Replies)
Discussion started by: wurzul
6 Replies
Login or Register to Ask a Question