Duplicate filename algorithm Post: 302405835

10 More Discussions You Might Find Interesting

1. Programming

Algorithm problem

Looking for an algorithm to compute the number of days between two given dates I came across a professor's C program located here: http://cr.yp.to/2001-275/struct1.c I was wondering if anyone could tell me where the value 678882 in the line int d = dateday - 678882; comes from and also the...

2. Shell Programming and Scripting

algorithm

PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 21444 tomusr 213M 61M sleep 29 10 1:20:46 0.1% java/43 21249 root 93M 44M sleep 29 10 1:07:19 0.2% java/56 is there anyway i can use a command to get the total of the SIZE? 306M (Derive from...

3. UNIX for Dummies Questions & Answers

Report of duplicate files based on part of the filename

I have the files logged in the file system with names in the format of : filename_ordernumber_date_time eg: file_1_12012007_1101.txt file_2_12022007_1101.txt file_1_12032007_1101.txt I need to find out all the files that are logged multiple times with same order number. In the above eg, I...

4. Shell Programming and Scripting

gzcat into awk and then change FILENAME and process new FILENAME

I am trying to write a script that prompts users for date and time, then process the gzip file into awk. During the ksh part of the script another file is created and needs to be processed with a different set of pattern matches then I need to combine the two in the end. I'm stuck at the part...

5. Shell Programming and Scripting

Filename from splitting files to have the same filename of the original file with counter value

Hi all, I have a list of xml file. I need to split the files to a different files when see the <ko> tag. The list of filename are B20090908.1100-20090908.1200_CDMA=1,NO=2,SITE=3.xml B20090908.1200-20090908.1300_CDMA=1,NO=2,SITE=3.xml B20090908.1300-20090908.1400_CDMA=1,NO=2,SITE=3.xml ...

6. Programming

Please help me to develop algorithm

Hi guys , in my study book from which I re-learn C is task to generate all possible characters combination from numbers entered by the user. I know this algorithm must use combinatorics to calculate all permutations. Problem is how to implement algortihm. // This program reads the four numbers...

7. Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Hi, In a file, I have to mark duplicate records as 'D' and the latest record alone as 'C'. In the below file, I have to identify if duplicate records are there or not based on Man_ID, Man_DT, Ship_ID and I have to mark the record with latest Ship_DT as "C" and other as "D" (I have to create...

8. UNIX for Dummies Questions & Answers

banker's algorithm.. help

i'm doing banker's algorithm.. got some error there but i cant fix it.. please help!! #!/bin/bash echo "enter no.of resources: " read n1 echo -n "enter the max no .of resources for each type: " for(( i=0; i <$n1; i++ )) do read ${t} done echo -n "enter no .of...

9. UNIX for Dummies Questions & Answers

to extract all the part of the filename before a particular word in the filename

Hi All, Thanks in Advance I am working on a shell script. I need some assistance. My Requirement: 1) There are some set of files in a directory like given below OTP_UFSC_20120530000000_acc.csv OTP_UFSC_20120530000000_faf.csv OTP_UFSC_20120530000000_prom.csv...

10. Programming

to extract all the part of the filename before a particular word in the filename

Hi All, Thanks in Advance I am working on a shell script. I need some assistance. My code: if then set "subscriber" "promplan" "mapping" "dedicatedaccount" "faflistSub" "faflistAcc" "accumulator"\ "pam_account"; for i in 1 2 3 4 5 6 7 8;...

LEARN ABOUT DEBIAN

rdfind

rdfind(1)							      rdfind								 rdfind(1)

NAME

       rdfind - finds duplicate files

SYNOPSIS

       rdfind [ options ] directory1 | file1 [ directory2 | file2 ] ...

DESCRIPTION

       rdfind finds duplicate files across and/or within several directories. It calculates checksum only if necessary.  rdfind runs in O(Nlog(N))
       time with N being the number of files.

       If two (or more) equal files are found, the program decides which of them is the original and the rest are considered duplicates.  This	is
       done by ranking the files to each other and deciding which has the highest rank. See section RANKING for details.

       If  you need better control over the ranking than given, you can use some preprocessor which sorts the file names in desired order and then
       run the program using xargs. See examples below for how to use find and xargs in conjunction with rdfind.

       To include files or directories that have names starting with -, use rdfind ./- to not confuse them with options.

RANKING

       Given two or more equal files, the one with the highest rank is selected to be the original and the rest are duplicates. The rules of rank-
       ing  are  given	below,	where the rules are executed from start until an original has been found. Given two files A and B which have equal
       content, the ranking is as follows:

       If A was found while scanning an input argument earlier than than B, A is higher ranked.

       If A was found at a depth lower than B, A is higher ranked (A closer to the root)

       If A was found earlier than B, A is higher ranked.

       The last rule is needed when two files are found in the same directory (obviously not given in separate arguments, otherwise the first rule
       applies) and gives the same order between the files as the operating system delivers the files while listing the directory. This is operat-
       ing system specific behaviour.

OPTIONS

       Searching options etc:

       -ignoreempty true|false
	      Ignore empty files. (default)

       -followsymlinks true|false
	      Follow symlinks. Default is false.

       -removeidentinode true|false
	      removes items found which have identical inode and device ID. Default is true.

       -checksum md5|sha1
	      what type of checksum to be used: md5 or sha1. Default is md5.

       Action options:

       -makesymlinks true|false
	      Replace duplicate files with symbolic links

       -makehardlinks true|false
	      Replace duplicate files with hard links

       -makeresultsfile true|false
	      Make a results file results.txt (default) in the current directory.

       -outputname name
	      Make the results file name to be "name" instead of the default results.txt.

       -deleteduplicates true|false
	      Delete (unlink) files.

       General options:

       -sleep Xms
	      sleeps X milliseconds between reading each file, to reduce load. Default is 0 (no sleep). Note that only a few values are  supported
	      at present: 0,1-5,10,25,50,100 milliseconds.

       -n -dryrun
	      displays what should have been done, dont actually delete or link anything.

       -h, -help, --help
	      displays brief help message.

       -v, -version, --version
	      displays version number.

EXAMPLES

       Search for duplicate files in home directory and a backup directory:
	      rdfind ~ /mnt/backup

       Delete duplicate in a backup directory:
	      rdfind -deletefiles true /mnt/backup

       Search for duplicate files in directories called foo:
	      find . -type d -name foo -print0 |xargs -0 rdfind

FILES

       results.txt  (the  default name is results.txt and can be changed with option outputname, see above) The results file results.txt will con-
       tain one row per duplicate file found, along with a header row explaining the columns.  A text describes  why  the  file  is  considered  a
       duplicate:

       DUPTYPE_UNKNOWN some internal error

       DUPTYPE_FIRST_OCCURRENCE the file that is considered to be the original.

       DUPTYPE_WITHIN_SAME_TREE files in the same tree (found when processing the directory in the same input argument as the original)

       DUPTYPE_OUTSIDE_TREE the file is found during processing another input argument than the original.

ENVIRONMENT

DIAGNOSTICS

EXIT VALUES

       0 on success, nonzero otherwise.

BUGS
/FEATURES
       When specifying the same directory twice, it keeps the first encountered as the most important (original), and the rest as duplicates. This
       might not be what you want.

       The symlink creates absolute links. This might not be what you want. To create relative links instead, you may use the  symlinks  (2)  com-
       mand, which is able to convert absolute links to relative links.

       Older  versions	unfortunately  contained  a misspelling on the word occurrence. This is now corrected (since 1.3), which might affect user
       scripts parsing the output file written by rdfind.

       There are lots of enhancements left to do. Please contribute!

SECURITY CONSIDERATIONS

       Avoid manipulating the directories while rdfind is reading.  rdfind is quite brittle in that case.  Especially,	when  deleting	or  making
       links, rdfind can be subject to a symlink attack.  Use with care!

AUTHOR

       Paul Dreik 2006, reachable at rdfind@pauldreik.se Rdfind can be found at http://rdfind.pauldreik.se/

       Do  you	find rdfind useful? Drop me a line! It is always fun to hear from people who actually use it and what data collections they run it
       on.

THANKS

       Several persons have helped with suggestions and improvements: Niels Moller, Carl Payne and Salvatore Ansani. Thanks also to you who tested
       the program and sent me feedback.

VERSION

       1.3.1 (release date 2012-05-07) svn id: $Id: rdfind.1 766 2012-05-07 17:26:17Z pauls $

COPYRIGHT

       This program is distributed under GPLv2 or later, at your option.

SEE ALSO

       md5sum(1), find(1), symlinks(2)

May 2012							       1.3.1								 rdfind(1)

10 More Discussions You Might Find Interesting

1. Programming

Algorithm problem

Discussion started by: williamf

2. Shell Programming and Scripting

algorithm

Discussion started by: filthymonk

3. UNIX for Dummies Questions & Answers

Report of duplicate files based on part of the filename

Discussion started by: sudheshnaiyer

4. Shell Programming and Scripting

gzcat into awk and then change FILENAME and process new FILENAME

Discussion started by: timj123

5. Shell Programming and Scripting

Filename from splitting files to have the same filename of the original file with counter value

Discussion started by: natalie23

6. Programming

Please help me to develop algorithm

Discussion started by: solaris_user

7. Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Discussion started by: machomaddy

8. UNIX for Dummies Questions & Answers

banker's algorithm.. help

Discussion started by: syah

9. UNIX for Dummies Questions & Answers

to extract all the part of the filename before a particular word in the filename

Discussion started by: aealexanderraj

10. Programming

to extract all the part of the filename before a particular word in the filename

Discussion started by: aealexanderraj

LEARN ABOUT DEBIAN

rdfind