Sponsored Content
Top Forums Shell Programming and Scripting Duplicate file remover using md5sum--good enough? Post 302449565 by Michael Stora on Tuesday 31st of August 2010 02:48:41 AM
Old 08-31-2010
Quote:
Originally Posted by rdcwayx
First, if your system has no md5sum and stat commands, or not at default path, the command will have the risk to delete all files under the current folder.
Wow! Thanks for pointing this out. I'll do something to detect this and abort the script.

Quote:
Second, stat command is useless in your script. if MD5 key is same, the file size should be same.
Not true in the case of all MD5 or Sha collisions but probably no practical risk.

Quote:
The first find and while loop can be replaced by:

Code:
find . -type f -exec md5sum {} \;  > /tmp/file_list

For example, I got this output:

Code:
$ cat /tmp/file_list
d41d8cd98f00b204e9800998ecf8427e *./abc
323ba8e2da815f896181c53564c4b1d2 *./abcd/abc
323ba8e2da815f896181c53564c4b1d2 *./abcd/def
ae70e0b0a0077a006942c876250bc0f5 *./infile
5f1b0a73a2b4dc51bcad52c357d55d19 *./outfile
323ba8e2da815f896181c53564c4b1d2 *./xyx

Yes but I want to keep the first file in alphabetical basename order not alphabetical directory name order. That will require more than just the checksum command.

exec can only call a single command or script (which really slows things down with disk access) but it cannot call a function (because it looks to bulletins and $PATH to find the command)--it might work if I export the function, however. It is also subject to race conditions, while piping the completed output of find is not. However, the -type flag helps remove the if then else statement:
Code:
find "$dir" -type f | { while read -r path; do
    name=${path##*/}                                                        #basename
    sum=$(sha1sum "$path")
    echo "${sum%%' '*}"','"$name"\\"$path"                                  #sha1sum,basename\path.
done }

Quote:
Second while loop can be replaced by: (no need to sort it)

Code:
 awk 'a[$1]++ {gsub(/^\*/,"",$2); print "rm ", $2}' /tmp/file_list |sh

Is using $1 and $2 in this way robust to whitespace in the file name or path? I suspect it is not, but I don't know enough awk to be sure.

Mike

Last edited by Michael Stora; 08-31-2010 at 04:01 AM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

What is md5sum???

Hi all, I am kinda puzzled. When and Why do we use md5sum? I've read man pages for mp5sum, but didn't get anything out of it. Please, can someone explain this to me in couple of words. Thank you all. (1 Reply)
Discussion started by: solvman
1 Replies

2. UNIX for Dummies Questions & Answers

the file: MD5SUM

i downloaded a Linux distribution from a FTP site today, and i found there is a file named MD5SUM in the same directory, with the following contents: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 c9a4d963a49e384e10dec9c2bd49ad73 valhalla-SRPMS-disc1.iso 41b03d068e84d2a17147aa27e704f79b ... (1 Reply)
Discussion started by: samprax
1 Replies

3. Shell Programming and Scripting

Remover Banner and SQL prompt from isql

Hi, I am using isql and putting the output in a file in a shell script in Linux. However in my output i am getting the banner and a SQL prompt(at the start and the end of query output) +---------------------------------------+ | Connected! | | ... (6 Replies)
Discussion started by: lifzgud
6 Replies

4. Programming

Computing an MD5Sum in C

Is it possible to call the unix command md5sum from within a C program. I am trying to write a C program that scans a directory and computes the MD5Sum of all the files in the directory. Whenever I use md5sum 'filename' I get the error 'md5sum undeclared'. Is there a header file or some library... (3 Replies)
Discussion started by: snag49ers
3 Replies

5. Shell Programming and Scripting

Script to check MD5SUM on file

Hi, I currently have a shell script that takes an RPM and scp's it to a set of remote servers and installs it. What I would like to be able to do is make the script get the md5sum of the RPM locally (so get the md5sum of the rpm from where im running the script) and then scp the rpm to the... (0 Replies)
Discussion started by: tb1986
0 Replies

6. Shell Programming and Scripting

how to get a md5sum in perl

hi All: i write a adduser script in perl , but I don't know how to deal with the password , for it stored as md5. and i don't use the shell command passwd. give me some advice...thanks (1 Reply)
Discussion started by: kingdream
1 Replies

7. Shell Programming and Scripting

Using md5sum to name file based on URL

I am trying to download a file and make the filename of the file be the md5sum of the URL. I know to use wgets to download the file but I do not know how to do the rest...any help would be appreciated. (2 Replies)
Discussion started by: The undertaker
2 Replies

8. Shell Programming and Scripting

md5sum on a file with backslash in its name

Hi there, I found something very weird! Should I report that as a bug or is it me misusing the command? I've got a file with a backslash in its name. I know it's a horrible policy but it's not me. The file came from a mac computer because this is a backup server. Anyway, when using... (8 Replies)
Discussion started by: chebarbudo
8 Replies

9. Shell Programming and Scripting

Removing md5sum lines stored in text file

Hello. I'm writing a script where every file you create will generate a md5sum and store it into a text file. Say I create 2 files, it'll look like this in the text file: d41d8cd98f00b204e9800998ecf8427e /helloworld/saystheman d41d8cd98f00b204e9800998ecf8427e /helloworld/test I... (3 Replies)
Discussion started by: batarangs_
3 Replies

10. Shell Programming and Scripting

Compare two md5sum

Hello, First of all I want to apologize because i'm not a admin or coder and maybe all my efforts to write only this small script in my life would need one week full time reading man pages and forums but... I don't have the money to offer me to get this time and the script I want to do seems... (5 Replies)
Discussion started by: toscan
5 Replies
All times are GMT -4. The time now is 02:53 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy