Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Delete duplicate files from one of two directory structures Post 302362236 by procreator on Thursday 15th of October 2009 10:33:06 AM
Old 10-15-2009
Delete duplicate files from one of two directory structures

Hello everyone,

I have been struggling to clean up a back-up mess I created when manually duplicating a directory structure and then working in both of them..
The structures now are significantly different and contain in the order of 15 k files of which most are duplicates.
Now I am trying to merge those dirs and had a look at FSlint, Meld, diff, and fdupes.
While all of those are good tools, I have not found them able to do what I need so I am looking for a way to reduce manual work to a minimum by deleting duplicates from the second dir structure. I will have to sort/merge the remaining files by hand.
The closest to doing that is with fslint's findup (Linux.com :: Tidy up your filesystem with FSlint) which returns a list of duplicate files separated by empty lines.
Since the duplicates listed may also be within a single one of the directory structures, I cannot be sure to delete files that are also present in path 1.
I can't make myself really clear, I'm afraid, so here's an example:

dir1/path/dup1
dir2/somepath/dup1 <-- delete
dir2/path/to/dup1 <-- delete

One attempt may be to first delete duplicates from dir2 and afterwards compare with dir1.

Any help appreciated!
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

help:comparing two directory tree structures only

Hi I what, a script snippet for "comparing two directory tree structures only " not the contents of directories(like files..etc). Thanking you a lot. Regards Rajesh (7 Replies)
Discussion started by: raj_thota
7 Replies

2. Shell Programming and Scripting

remove duplicate files in a directory

Hi ppl. I have to check for duplicate files in a directory . the directory has following files /the/folder /containing/the/file a1.yyyymmddhhmmss a1.yyyyMMddhhmmss b1.yyyymmddhhmmss b2.yyyymmddhhmmss c.yyyymmddhhmmss d.yyyymmddhhmmss d.yyyymmddhhmmss where the date time stamp can be... (1 Reply)
Discussion started by: asinha63
1 Replies

3. Shell Programming and Scripting

script that detects duplicate files in directory

I need help with a script which accepts one argument and goes through all the files under a directory and prints a list of possible duplicate files As its output, it prints zero or more lines, each one containing a space-separated list of filenames. All the files listed on one line have the same... (1 Reply)
Discussion started by: trueman82
1 Replies

4. Shell Programming and Scripting

Delete Some Old files from Particular Directory

Hi Team, I am new to scripting. I want to create a script, which needs to keep only 5 days directories and want to remove the old directory from a particular directory. Can Somebody help me with starting this script. All my directories will be created in the name <YYYYMMDD>. Thanks... (2 Replies)
Discussion started by: siva80_cit
2 Replies

5. UNIX for Dummies Questions & Answers

Production Directory Structures

We (our company) has just purchased a new IBM unix machine. We have been doing some research and have found that it is NOT a good idea to put your own in-house-written applications under the existing file folders such as /usr or /bin ect. Instead you should place these applications in directories... (7 Replies)
Discussion started by: jbrubaker
7 Replies

6. Shell Programming and Scripting

Delete all files if another files in the same directory has a matching occurence of a specific word

Hello, I have several files in a specific directory. A specific string in one file can occur in another files. If this string is in other files. Then all the files in which this string occured should be deleted and only 1 file should remain with the string. Example. file1 ShortName "Blue... (2 Replies)
Discussion started by: premier_de
2 Replies

7. Shell Programming and Scripting

Remove duplicate files in same directory

Hi all. Am doing continuous backup of mailboxes using rsync. So whenever a new mail arrives it is automatically copied on backup server. When a new mail arrives it is named as xyz:2, when it is read by the email client an S is appended xyz:2,S Eventually , 2 copies of the same file exist on... (7 Replies)
Discussion started by: coolatt
7 Replies

8. Ubuntu

delete duplicate rows with awk files

Hi every body I have some text file with a lots of duplicate rows like this: 165.179.568.197 154.893.836.174 242.473.396.153 165.179.568.197 165.179.568.197 165.179.568.197 154.893.836.174 how can I delete the repeated rows? Thanks Saeideh (2 Replies)
Discussion started by: sashtari
2 Replies

9. Shell Programming and Scripting

Delete all files if another files in the same directory has a matching occurrence of a specific word

he following are the files available in my directory RSK_123_20141113_031500.txt RSK_123_20141113_081500.txt RSK_126_20141113_041500.txt RSK_126_20141113_081800.txt RSK_128_20141113_091600.txt Here, "RSK" is file prefix and 123 is a code name and rest is just timestamp of the file when its... (7 Replies)
Discussion started by: kridhick
7 Replies

10. Shell Programming and Scripting

Script needed to delete to the list of files in a directory based on last created & delete them

Hi My directory structure is as below. dir1, dir2, dir3 I have the list of files to be deleted in the below path as below. /staging/retain_for_2years/Cleanup/log $ ls -lrt total 0 drwxr-xr-x 2 nobody nobody 256 Mar 01 16:15 01-MAR-2015_SPDBS2 drwxr-xr-x 2 root ... (2 Replies)
Discussion started by: prasadn
2 Replies
scandir(3)						     Library Functions Manual							scandir(3)

NAME
scandir, alphasort - Scans or sorts directory contents LIBRARY
Standard C Library (libc.a) SYNOPSIS
#include <sys/types.h> #include <sys/dir.h> int scandir ( char *dir_name, struct dirent * (*name_list[ ]), int (*select) ( struct dirent *dir ), int (*compare) ( struct dirent **dir1, struct dirent **dir2 ) ); int alphasort ( struct dirent **dir1, struct dirent **dir2 ); PARAMETERS
Points to the directory name. Points to the array of pointers to directory entries. Points to a user-supplied function that is called by the scandir() function to select which entries to include in the array. Points to a user-supplied function that sorts the completed array. Points to a dirent structure. Points to a dirent structure. DESCRIPTION
The scandir() function reads the directory pointed to by the dir_name parameter. It then uses the malloc() function to create an array of pointers to directory entries. The scandir() function returns the number of entries in the array and, through the name_list parameter, a pointer to the array. The select parameter points to a user-supplied function that the scandir() function calls to select which entries to include in the array. The selection routine is passed a pointer to a directory entry and returns a nonzero value for a directory entry that is included in the array. If the select parameter is a null value, all directory entries are included. The compare parameter points to a user-supplied function that is passed to the qsort() function to sort the completed array. If the compare parameter is a null value, the array is not sorted. The memory allocated to the array can be deallocated by freeing each pointer in the array, and the array itself, with the free() function. The alphasort() function alphabetically compares the two dirent structures pointed to by the dir1 and dir2 parameters. This function can be passed as the compare parameter to either the scandir() function or the qsort() function. A user-supplied subroutine may also be used. RETURN VALUES
The scandir() function returns -1 if the directory cannot be opened for reading or if the malloc() function cannot allocate enough memory to hold all the data structures. If successful, the scandir() function returns the number of entries found. The alphasort() function returns the following values: Less than 0 (zero): The dirent structure pointed to by the dir1 parameter is lexi- cally less than the dirent structure pointed to by the dir2 parameter. 0 (zero): The dirent structures pointed to by the dir1 parameter and the dir2 parameter are equal. Greater than 0 (zero): The dirent structure pointed to by the dir1 parameter is lexically greater than the dirent structure pointed to by the dir2 parameter. RELATED INFORMATION
Functions: malloc(3), opendir(3), qsort(3) delim off scandir(3)
All times are GMT -4. The time now is 07:21 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy