Delete duplicate files from one of two directory structures


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Delete duplicate files from one of two directory structures
# 1  
Old 10-15-2009
Delete duplicate files from one of two directory structures

Hello everyone,

I have been struggling to clean up a back-up mess I created when manually duplicating a directory structure and then working in both of them..
The structures now are significantly different and contain in the order of 15 k files of which most are duplicates.
Now I am trying to merge those dirs and had a look at FSlint, Meld, diff, and fdupes.
While all of those are good tools, I have not found them able to do what I need so I am looking for a way to reduce manual work to a minimum by deleting duplicates from the second dir structure. I will have to sort/merge the remaining files by hand.
The closest to doing that is with fslint's findup (Linux.com :: Tidy up your filesystem with FSlint) which returns a list of duplicate files separated by empty lines.
Since the duplicates listed may also be within a single one of the directory structures, I cannot be sure to delete files that are also present in path 1.
I can't make myself really clear, I'm afraid, so here's an example:

dir1/path/dup1
dir2/somepath/dup1 <-- delete
dir2/path/to/dup1 <-- delete

One attempt may be to first delete duplicates from dir2 and afterwards compare with dir1.

Any help appreciated!
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script needed to delete to the list of files in a directory based on last created & delete them

Hi My directory structure is as below. dir1, dir2, dir3 I have the list of files to be deleted in the below path as below. /staging/retain_for_2years/Cleanup/log $ ls -lrt total 0 drwxr-xr-x 2 nobody nobody 256 Mar 01 16:15 01-MAR-2015_SPDBS2 drwxr-xr-x 2 root ... (2 Replies)
Discussion started by: prasadn
2 Replies

2. Shell Programming and Scripting

Delete all files if another files in the same directory has a matching occurrence of a specific word

he following are the files available in my directory RSK_123_20141113_031500.txt RSK_123_20141113_081500.txt RSK_126_20141113_041500.txt RSK_126_20141113_081800.txt RSK_128_20141113_091600.txt Here, "RSK" is file prefix and 123 is a code name and rest is just timestamp of the file when its... (7 Replies)
Discussion started by: kridhick
7 Replies

3. Ubuntu

delete duplicate rows with awk files

Hi every body I have some text file with a lots of duplicate rows like this: 165.179.568.197 154.893.836.174 242.473.396.153 165.179.568.197 165.179.568.197 165.179.568.197 154.893.836.174 how can I delete the repeated rows? Thanks Saeideh (2 Replies)
Discussion started by: sashtari
2 Replies

4. Shell Programming and Scripting

Remove duplicate files in same directory

Hi all. Am doing continuous backup of mailboxes using rsync. So whenever a new mail arrives it is automatically copied on backup server. When a new mail arrives it is named as xyz:2, when it is read by the email client an S is appended xyz:2,S Eventually , 2 copies of the same file exist on... (7 Replies)
Discussion started by: coolatt
7 Replies

5. Shell Programming and Scripting

Delete all files if another files in the same directory has a matching occurence of a specific word

Hello, I have several files in a specific directory. A specific string in one file can occur in another files. If this string is in other files. Then all the files in which this string occured should be deleted and only 1 file should remain with the string. Example. file1 ShortName "Blue... (2 Replies)
Discussion started by: premier_de
2 Replies

6. UNIX for Dummies Questions & Answers

Production Directory Structures

We (our company) has just purchased a new IBM unix machine. We have been doing some research and have found that it is NOT a good idea to put your own in-house-written applications under the existing file folders such as /usr or /bin ect. Instead you should place these applications in directories... (7 Replies)
Discussion started by: jbrubaker
7 Replies

7. Shell Programming and Scripting

Delete Some Old files from Particular Directory

Hi Team, I am new to scripting. I want to create a script, which needs to keep only 5 days directories and want to remove the old directory from a particular directory. Can Somebody help me with starting this script. All my directories will be created in the name <YYYYMMDD>. Thanks... (2 Replies)
Discussion started by: siva80_cit
2 Replies

8. Shell Programming and Scripting

script that detects duplicate files in directory

I need help with a script which accepts one argument and goes through all the files under a directory and prints a list of possible duplicate files As its output, it prints zero or more lines, each one containing a space-separated list of filenames. All the files listed on one line have the same... (1 Reply)
Discussion started by: trueman82
1 Replies

9. Shell Programming and Scripting

remove duplicate files in a directory

Hi ppl. I have to check for duplicate files in a directory . the directory has following files /the/folder /containing/the/file a1.yyyymmddhhmmss a1.yyyyMMddhhmmss b1.yyyymmddhhmmss b2.yyyymmddhhmmss c.yyyymmddhhmmss d.yyyymmddhhmmss d.yyyymmddhhmmss where the date time stamp can be... (1 Reply)
Discussion started by: asinha63
1 Replies

10. Shell Programming and Scripting

help:comparing two directory tree structures only

Hi I what, a script snippet for "comparing two directory tree structures only " not the contents of directories(like files..etc). Thanking you a lot. Regards Rajesh (7 Replies)
Discussion started by: raj_thota
7 Replies
Login or Register to Ask a Question
CVSUTILS(1)						      General Commands Manual						       CVSUTILS(1)

NAME
cvsutils - CVS utilities for use in working directories SYNOPSIS
cvsu [options] cvsco [ options ] cvsdiscard [ options ] cvspurge [ options ] cvstrim [ options ] cvschroot [ options ] cvsdo [ options ] DESCRIPTION
The idea of cvsutils is to facilitate working with the files in the working directory of a developer using CVS (Concurrent Versions Sys- tem). From the point of view of CVS, working directories have low value, since they can easily be recreated using the cvs checkout command. Also the cvs update command will show the status of the files, i.e. whether they have been modified, added or removed. CVS in it's current state is a client-server system that does most of its work on the server side. CVS provides only few (if any) means for managing the working directory without communicating with the server. There are, however, several reasons why such means are necessary: * There is enough information on the client side to create fast tools for sorting and purging the working directory without contacting the CVS server. * Checking out a big module over a slow line can take too much time. * There should be support for disconnected operations. * CVS poses certain unnecessary restrictions on read-only users, e.g. cvs add command doesn't work for them. CVSU
cvsu is "cvs update offline". It lists the files found in the current directory (or in the directories which you specify). Following is taken into account: * Attributes of the file. * Information about the file in CVS/Entries. * Timestamp of the file compared to the timestamp stored in CVS/Entries. Run cvsu --help to see supported command line options. The options can be abbreviated. This functionality is provided by Perl, and can vary from one machine to another. CVSCO
cvsco is a "cruel checkout". In other words, it removes results of compilation and discards local changes. It deletes all the files except listed unmodified ones and checks out everything which seems to be missing. Please note, that cvsco doesn't update files which haven't been modified locally. It only reloads missing files and files which it erases. CVSDISCARD
cvsdiscard is "discard my changes". In other words, it discards local changes but keeps results of compilation. It works like cvsco, but it only deletes files which are likely to cause merge conflicts. CVSPURGE
cvspurge leaves all files known to CVS, but removes the rest. Unlike cvsco, it doesn't remove local changes. It is useful to test local changes in the otherwise clean source tree. CVSTRIM
cvstrim removes files and directories unknown to CVS. Files listed in .cvsignore are not removed. The idea is to remove the files that are not resulted from the normal build process - backups, coredumps etc. cvstrim relies on .cvsignore files being correct. Note that the back- ups for modified files are removed. CVSCHROOT
cvschroot makes it possible to change CVS/Root in all subdirectories to the given value. Currently the only argument accepted is the new CVSROOT value. Old-style CVS/Repository files that contain the full path to the repository are updated to reflect the change. New-style CVS/Repository don't need to be changed. If the environment variable CVSROOT is defined, it overrides the contents of CVS/Root. In other words, it is treated as the old CVS root. CVSDO
cvsdo simulates some of the CVS commands (currently add, remove and diff) without any access to the CVS server. Using cvsdo add and cvsdo remove allows you to create diffs with cvs diff -N, and all removed and added files will appear in the diff correctly, as if you had used cvs add and cvs remove respectively. cvsdo diff tries to locate the backup copies of the modified files. If they can be found, they are compared with the current version using diff. Only those backup copies are used that have the modification date equal the date listed in CVS/Entries for the modified file. cvsdo diff patches the diff output to make it more robust to apply. An exception is made for files named "ChangeLog" - in this case diff will be instructed to omit all context lines, so that the patch can be applied even if other changes have been written to the ChangeLog. Also the added files are handled properly. The header of the diff output is patched in such way that at least GNU patch will create a new file when the resulting patch is applied and remove that file when the patch is reverted. LICENCE
cvsutils is covered by the GNU General Public License (GPL). SEE ALSO
cvs(1), cvs2cl(1). AUTHOR
This manual page was written by Uwe Hermann <uwe@debian.org>, for the Debian GNU/Linux system (but may be used by others). February 27, 2002 CVSUTILS(1)