Visit Our UNIX and Linux User Community


Delete duplicate files from one of two directory structures


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Delete duplicate files from one of two directory structures
# 1  
Old 10-15-2009
Delete duplicate files from one of two directory structures

Hello everyone,

I have been struggling to clean up a back-up mess I created when manually duplicating a directory structure and then working in both of them..
The structures now are significantly different and contain in the order of 15 k files of which most are duplicates.
Now I am trying to merge those dirs and had a look at FSlint, Meld, diff, and fdupes.
While all of those are good tools, I have not found them able to do what I need so I am looking for a way to reduce manual work to a minimum by deleting duplicates from the second dir structure. I will have to sort/merge the remaining files by hand.
The closest to doing that is with fslint's findup (Linux.com :: Tidy up your filesystem with FSlint) which returns a list of duplicate files separated by empty lines.
Since the duplicates listed may also be within a single one of the directory structures, I cannot be sure to delete files that are also present in path 1.
I can't make myself really clear, I'm afraid, so here's an example:

dir1/path/dup1
dir2/somepath/dup1 <-- delete
dir2/path/to/dup1 <-- delete

One attempt may be to first delete duplicates from dir2 and afterwards compare with dir1.

Any help appreciated!
 

Previous Thread | Next Thread
Test Your Knowledge in Computers #669
Difficulty: Medium
802.11 technology has its origins in a 1985 ruling by the U.S. FCC that released the ISM band for unlicensed use.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script needed to delete to the list of files in a directory based on last created & delete them

Hi My directory structure is as below. dir1, dir2, dir3 I have the list of files to be deleted in the below path as below. /staging/retain_for_2years/Cleanup/log $ ls -lrt total 0 drwxr-xr-x 2 nobody nobody 256 Mar 01 16:15 01-MAR-2015_SPDBS2 drwxr-xr-x 2 root ... (2 Replies)
Discussion started by: prasadn
2 Replies

2. Shell Programming and Scripting

Delete all files if another files in the same directory has a matching occurrence of a specific word

he following are the files available in my directory RSK_123_20141113_031500.txt RSK_123_20141113_081500.txt RSK_126_20141113_041500.txt RSK_126_20141113_081800.txt RSK_128_20141113_091600.txt Here, "RSK" is file prefix and 123 is a code name and rest is just timestamp of the file when its... (7 Replies)
Discussion started by: kridhick
7 Replies

3. Ubuntu

delete duplicate rows with awk files

Hi every body I have some text file with a lots of duplicate rows like this: 165.179.568.197 154.893.836.174 242.473.396.153 165.179.568.197 165.179.568.197 165.179.568.197 154.893.836.174 how can I delete the repeated rows? Thanks Saeideh (2 Replies)
Discussion started by: sashtari
2 Replies

4. Shell Programming and Scripting

Remove duplicate files in same directory

Hi all. Am doing continuous backup of mailboxes using rsync. So whenever a new mail arrives it is automatically copied on backup server. When a new mail arrives it is named as xyz:2, when it is read by the email client an S is appended xyz:2,S Eventually , 2 copies of the same file exist on... (7 Replies)
Discussion started by: coolatt
7 Replies

5. Shell Programming and Scripting

Delete all files if another files in the same directory has a matching occurence of a specific word

Hello, I have several files in a specific directory. A specific string in one file can occur in another files. If this string is in other files. Then all the files in which this string occured should be deleted and only 1 file should remain with the string. Example. file1 ShortName "Blue... (2 Replies)
Discussion started by: premier_de
2 Replies

6. UNIX for Dummies Questions & Answers

Production Directory Structures

We (our company) has just purchased a new IBM unix machine. We have been doing some research and have found that it is NOT a good idea to put your own in-house-written applications under the existing file folders such as /usr or /bin ect. Instead you should place these applications in directories... (7 Replies)
Discussion started by: jbrubaker
7 Replies

7. Shell Programming and Scripting

Delete Some Old files from Particular Directory

Hi Team, I am new to scripting. I want to create a script, which needs to keep only 5 days directories and want to remove the old directory from a particular directory. Can Somebody help me with starting this script. All my directories will be created in the name <YYYYMMDD>. Thanks... (2 Replies)
Discussion started by: siva80_cit
2 Replies

8. Shell Programming and Scripting

script that detects duplicate files in directory

I need help with a script which accepts one argument and goes through all the files under a directory and prints a list of possible duplicate files As its output, it prints zero or more lines, each one containing a space-separated list of filenames. All the files listed on one line have the same... (1 Reply)
Discussion started by: trueman82
1 Replies

9. Shell Programming and Scripting

remove duplicate files in a directory

Hi ppl. I have to check for duplicate files in a directory . the directory has following files /the/folder /containing/the/file a1.yyyymmddhhmmss a1.yyyyMMddhhmmss b1.yyyymmddhhmmss b2.yyyymmddhhmmss c.yyyymmddhhmmss d.yyyymmddhhmmss d.yyyymmddhhmmss where the date time stamp can be... (1 Reply)
Discussion started by: asinha63
1 Replies

10. Shell Programming and Scripting

help:comparing two directory tree structures only

Hi I what, a script snippet for "comparing two directory tree structures only " not the contents of directories(like files..etc). Thanking you a lot. Regards Rajesh (7 Replies)
Discussion started by: raj_thota
7 Replies
GIT-CLEAN(1)							    Git Manual							      GIT-CLEAN(1)

NAME
git-clean - Remove untracked files from the working tree SYNOPSIS
git clean [-d] [-f] [-n] [-q] [-e <pattern>] [-x | -X] [--] <path>... DESCRIPTION
Cleans the working tree by recursively removing files that are not under version control, starting from the current directory. Normally, only files unknown to git are removed, but if the -x option is specified, ignored files are also removed. This can, for example, be useful to remove all build products. If any optional <path>... arguments are given, only those paths are affected. OPTIONS
-d Remove untracked directories in addition to untracked files. If an untracked directory is managed by a different git repository, it is not removed by default. Use -f option twice if you really want to remove such a directory. -f, --force If the git configuration variable clean.requireForce is not set to false, git clean will refuse to run unless given -f or -n. -n, --dry-run Don't actually remove anything, just show what would be done. -q, --quiet Be quiet, only report errors, but not the files that are successfully removed. -e <pattern>, --exclude=<pattern> In addition to those found in .gitignore (per directory) and $GIT_DIR/info/exclude, also consider these patterns to be in the set of the ignore rules in effect. -x Don't use the standard ignore rules read from .gitignore (per directory) and $GIT_DIR/info/exclude, but do still use the ignore rules given with -e options. This allows removing all untracked files, including build products. This can be used (possibly in conjunction with git reset) to create a pristine working directory to test a clean build. -X Remove only files ignored by git. This may be useful to rebuild everything from scratch, but keep manually created files. GIT
Part of the git(1) suite Git 1.7.10.4 11/24/2012 GIT-CLEAN(1)

Featured Tech Videos