10-01-2012
identical=exact which should mean the checksums match. Similarity is a really difficult problem - google for Levenshtein distance or Wagner-Fischer algorithm.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I got many pair files, which only have small difference, such as more space, or more empty line, and some unreadable characters.
If list by commend "diff", I can see many many difference.
So I'd like to write a script to compare the pair files, if 95% contents are same, I will think they are... (2 Replies)
Discussion started by: rdcwayx
2 Replies
2. Shell Programming and Scripting
May i know how do i go along finding similar entry in a .txt file, which is used a as a "database" and post and error saying the entry existed when we key in the entry.
---------- Post updated at 05:18 PM ---------- Previous update was at 05:16 PM ----------
i mean post an error saying the... (5 Replies)
Discussion started by: santonio
5 Replies
3. UNIX for Dummies Questions & Answers
I have a file that has the words I want to find in other files (but lets say I just want to find my words in a single file). Those words are IDs, so if my word is ZZZ4, outputs like aaZZZ4, ZZZ4bb, aaZZZ4bb, ZZ4, ZZZ, ZyZ4, ZZZ4.8 (or anything like that) WON'T BE USEFUL.
I need the whole word... (6 Replies)
Discussion started by: chicchan
6 Replies
4. Shell Programming and Scripting
Hi
I have one directory whose name i don't remember exactly only starting letter i know which is Resp.
Can you please let me know the command to find the similar directory in the root.
Rajesh (3 Replies)
Discussion started by: guddu_12
3 Replies
5. Shell Programming and Scripting
Hello,
I have 4 files like this:
file1:
cg24163616 15 297
cg09335911 123 297
cg13515808 565 776
cg12242345 499 705
cg22905282 225 427
cg16674860 286 779
cg14251734 303 724
cg19316579 211 717
cg00612625 422 643
file2:... (2 Replies)
Discussion started by: linseyr
2 Replies
6. UNIX for Dummies Questions & Answers
I have a table, say this:
name1 num1 num2 num3 num4
name2 num5 num6 num7 num8
name3 num1 num3 num4 num9
name2 num8 num9 num1 num2
name2 num4 num5 num6 num4
name4 num4 num5 num7 num8
name5 num1 num3 num9 num7
name5 num6 num8 num3 num4
I want a code that will sort my data according... (4 Replies)
Discussion started by: FelipeAd
4 Replies
7. UNIX for Dummies Questions & Answers
HI,
I have a long file which looks like
"1xxx_0_1" "1xxx" 500 5 "ABC*3-DEF*3-LL"
"2yyy_0_1" "2yyy" 600 10 "ABC*2-DEF*2-LL"
"3ddd_0_1" "3ddd" 150 52 "ABC*3-DEF*3-LL"
"1xxx_0_1" "1xxx" 500 5 "ABC*3-DEF*3-LL"
"2yyy_0_1" "2yyy" 600 10 "ABC*2-DEF*2-LL"
... (3 Replies)
Discussion started by: XXLMMN
3 Replies
8. Shell Programming and Scripting
Hi,
I have file in my $datadir as below :-
SAT_1.txt
SAT_2.txt
BAT_UD.lst
BAT_DD1.lst
DUTT_1.txt
DUTT_la.txt
Expected result :-
should get all the above file in $<Filename>_file.lst
Below is my code :-
for i in SAT BAT DUTT
do
touch a.lst
cd $datadir (1 Reply)
Discussion started by: satishmallidi
1 Replies
9. Solaris
Hi,
I need to compare the /etc/passwd files from 2 servers, and extract the users that are similar in these two files. I sorted the 2 files based on the user IDs (UID) (3rd column). I first sorted the files using the username (1st column), however when I use comm to compare the files there is no... (1 Reply)
Discussion started by: anaigini45
1 Replies
10. What is on Your Mind?
Today I change the DB and the PHP code and rebuilt the database for similar threads at the end of each post, increasing from a max of 5 to a max of 10 similar threads per post:
More UNIX and Linux Forum Topics You Might Find Helpful
It was quite easy to do:
1. Increased the max size of... (17 Replies)
Discussion started by: Neo
17 Replies
LEARN ABOUT PHP
levenshtein
LEVENSHTEIN(3) 1 LEVENSHTEIN(3)
levenshtein - Calculate Levenshtein distance between two strings
SYNOPSIS
int levenshtein (string $str1, string $str2)
DESCRIPTION
int levenshtein (string $str1, string $str2, int $cost_ins, int $cost_rep, int $cost_del)
The Levenshtein distance is defined as the minimal number of characters you have to replace, insert or delete to transform $str1 into
$str2. The complexity of the algorithm is O(m*n), where n and m are the length of $str1 and $str2 (rather good when compared to simi-
lar_text(3), which is O(max(n,m)**3), but still expensive).
In its simplest form the function will take only the two strings as parameter and will calculate just the number of insert, replace and
delete operations needed to transform $str1 into $str2.
A second variant will take three additional parameters that define the cost of insert, replace and delete operations. This is more general
and adaptive than variant one, but not as efficient.
PARAMETERS
o $str1
- One of the strings being evaluated for Levenshtein distance.
o $str2
- One of the strings being evaluated for Levenshtein distance.
o $cost_ins
- Defines the cost of insertion.
o $cost_rep
- Defines the cost of replacement.
o $cost_del
- Defines the cost of deletion.
RETURN VALUES
This function returns the Levenshtein-Distance between the two argument strings or -1, if one of the argument strings is longer than the
limit of 255 characters.
EXAMPLES
Example #1
levenshtein(3) example
<?php
// input misspelled word
$input = 'carrrot';
// array of words to check against
$words = array('apple','pineapple','banana','orange',
'radish','carrot','pea','bean','potato');
// no shortest distance found, yet
$shortest = -1;
// loop through words to find the closest
foreach ($words as $word) {
// calculate the distance between the input word,
// and the current word
$lev = levenshtein($input, $word);
// check for an exact match
if ($lev == 0) {
// closest word is this one (exact match)
$closest = $word;
$shortest = 0;
// break out of the loop; we've found an exact match
break;
}
// if this distance is less than the next found shortest
// distance, OR if a next shortest word has not yet been found
if ($lev <= $shortest || $shortest < 0) {
// set the closest match, and shortest distance
$closest = $word;
$shortest = $lev;
}
}
echo "Input word: $input
";
if ($shortest == 0) {
echo "Exact match found: $closest
";
} else {
echo "Did you mean: $closest?
";
}
?>
The above example will output:
Input word: carrrot
Did you mean: carrot?
SEE ALSO
soundex(3), similar_text(3), metaphone(3).
PHP Documentation Group LEVENSHTEIN(3)