Sponsored Content
Top Forums Shell Programming and Scripting Looking to find files that are similar. Post 302708591 by jim mcnamara on Monday 1st of October 2012 08:23:25 PM
Old 10-01-2012
identical=exact which should mean the checksums match. Similarity is a really difficult problem - google for Levenshtein distance or Wagner-Fischer algorithm.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

compare the similar files

I got many pair files, which only have small difference, such as more space, or more empty line, and some unreadable characters. If list by commend "diff", I can see many many difference. So I'd like to write a script to compare the pair files, if 95% contents are same, I will think they are... (2 Replies)
Discussion started by: rdcwayx
2 Replies

2. Shell Programming and Scripting

Find similar entry in a .txt file acting as a database.

May i know how do i go along finding similar entry in a .txt file, which is used a as a "database" and post and error saying the entry existed when we key in the entry. ---------- Post updated at 05:18 PM ---------- Previous update was at 05:16 PM ---------- i mean post an error saying the... (5 Replies)
Discussion started by: santonio
5 Replies

3. UNIX for Dummies Questions & Answers

Find EXACT word in files, just the word: no prefix, no suffix, no 'similar', just the word

I have a file that has the words I want to find in other files (but lets say I just want to find my words in a single file). Those words are IDs, so if my word is ZZZ4, outputs like aaZZZ4, ZZZ4bb, aaZZZ4bb, ZZ4, ZZZ, ZyZ4, ZZZ4.8 (or anything like that) WON'T BE USEFUL. I need the whole word... (6 Replies)
Discussion started by: chicchan
6 Replies

4. Shell Programming and Scripting

Find the similar directory

Hi I have one directory whose name i don't remember exactly only starting letter i know which is Resp. Can you please let me know the command to find the similar directory in the root. Rajesh (3 Replies)
Discussion started by: guddu_12
3 Replies

5. Shell Programming and Scripting

How to find similar values in different files

Hello, I have 4 files like this: file1: cg24163616 15 297 cg09335911 123 297 cg13515808 565 776 cg12242345 499 705 cg22905282 225 427 cg16674860 286 779 cg14251734 303 724 cg19316579 211 717 cg00612625 422 643 file2:... (2 Replies)
Discussion started by: linseyr
2 Replies

6. UNIX for Dummies Questions & Answers

Find the average based on similar names in the first column

I have a table, say this: name1 num1 num2 num3 num4 name2 num5 num6 num7 num8 name3 num1 num3 num4 num9 name2 num8 num9 num1 num2 name2 num4 num5 num6 num4 name4 num4 num5 num7 num8 name5 num1 num3 num9 num7 name5 num6 num8 num3 num4 I want a code that will sort my data according... (4 Replies)
Discussion started by: FelipeAd
4 Replies

7. UNIX for Dummies Questions & Answers

To find similar items in a column

HI, I have a long file which looks like "1xxx_0_1" "1xxx" 500 5 "ABC*3-DEF*3-LL" "2yyy_0_1" "2yyy" 600 10 "ABC*2-DEF*2-LL" "3ddd_0_1" "3ddd" 150 52 "ABC*3-DEF*3-LL" "1xxx_0_1" "1xxx" 500 5 "ABC*3-DEF*3-LL" "2yyy_0_1" "2yyy" 600 10 "ABC*2-DEF*2-LL" ... (3 Replies)
Discussion started by: XXLMMN
3 Replies

8. Shell Programming and Scripting

To find ls of similar pattern files in a directory by passing the variable.

Hi, I have file in my $datadir as below :- SAT_1.txt SAT_2.txt BAT_UD.lst BAT_DD1.lst DUTT_1.txt DUTT_la.txt Expected result :- should get all the above file in $<Filename>_file.lst Below is my code :- for i in SAT BAT DUTT do touch a.lst cd $datadir (1 Reply)
Discussion started by: satishmallidi
1 Replies

9. Solaris

Getting similar lines in two files

Hi, I need to compare the /etc/passwd files from 2 servers, and extract the users that are similar in these two files. I sorted the 2 files based on the user IDs (UID) (3rd column). I first sorted the files using the username (1st column), however when I use comm to compare the files there is no... (1 Reply)
Discussion started by: anaigini45
1 Replies

10. What is on Your Mind?

Similar Threads: More UNIX and Linux Forum Topics You Might Find Helpful Update

Today I change the DB and the PHP code and rebuilt the database for similar threads at the end of each post, increasing from a max of 5 to a max of 10 similar threads per post: More UNIX and Linux Forum Topics You Might Find Helpful It was quite easy to do: 1. Increased the max size of... (17 Replies)
Discussion started by: Neo
17 Replies
FTFF(1) 						      General Commands Manual							   FTFF(1)

NAME
ftff - fault tolerant file find utility SYNOPSIS
ftff [-#fFhIpq][-t#][start_directory] file_to_find DESCRIPTION
ftff recursively descends the directory hierarchy and reports all objects in the file system with a name that approximately matches the given filename. ftff achieves fault tolerance by calculating the so called Weighted Levenshtein Distance. The Levenshtein Distance is defined as the minimum number of character insertions, deletions and replacements that transform a string A into a string B. ftff behaves like 'find start_directory -name file_to_find -print' with the following differences: - ftff is fault tolerant - ftff is NOT case sensitive - the level of fault tolerance can be adjusted by specifying the optional parameter tolerance. A tolerance of 0 specifies exact match. OPTIONS
-h Prints a little help/usage information. -f Follow symbolic links on directories. Note: a symbolic link like "somewhere -> .." causes naturally an endless loop. By default ftff does not follow symbolic links to directories. -F Classify the file type by appending a character to each file name. This character is: '*' for regular files that are executable '/' for directories '@' for symbolic links '|' for FIFOs '=' for sockets -p print the actual distance value in front of the filename. This value is equal to the number of insertions, deletions and replace- ments necessary to transform the file that was found into the search key (the file_to_find). -q keep quiet and do not print any warning about non readable directories. -# or -t# Set the fault tolerance level to #. The fault tolerance level is an integer in the range 0-255. It specifies the maximum number of errors permitted in finding the approximate match. The default tolerance is (strlen(searchpattern) - number of wildcards)/6 + 1 -I Do case sensitive search (default is case in-sensitive) file_to_find The filename to search for. '*' and '?' can be used as wildcards. '?' denotes one single character. '*' denotes an arbitrary number of characters. start_directory The directory to start the search. The current directory is the default. The last argument to ftff is not parsed for options as the program needs at least one file-name argument. This means that ftff -x will not complain about a wrong option but search for the file named -x. EXAMPLE
ftff samething This will e.g. find a file called something or sameting or sum-thing or ... To find all files that start with any prefix, have something like IOComm in between and end on a two letter suffix: ftff '*iocomm.??' To find all files that exactly start with the prefix DuPeg: ftff -0 'dupeg*' BUGS
The wildcards '?' and '*' can not be escaped. These characters function always as wildcards. This is however not a big problem since there is normally hardly any file that has these characters in its name. AUTHOR
Guido Socher (guido@linuxfocus.org) SEE ALSO
whichman(1), find(1) Search utilities August 1998 FTFF(1)
All times are GMT -4. The time now is 04:13 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy