10-15-2009
Delete duplicate files from one of two directory structures
Hello everyone,
I have been struggling to clean up a back-up mess I created when manually duplicating a directory structure and then working in both of them..
The structures now are significantly different and contain in the order of 15 k files of which most are duplicates.
Now I am trying to merge those dirs and had a look at FSlint, Meld, diff, and fdupes.
While all of those are good tools, I have not found them able to do what I need so I am looking for a way to reduce manual work to a minimum by deleting duplicates from the second dir structure. I will have to sort/merge the remaining files by hand.
The closest to doing that is with fslint's findup (
Linux.com :: Tidy up your filesystem with FSlint) which returns a list of duplicate files separated by empty lines.
Since the duplicates listed may also be within a single one of the directory structures, I cannot be sure to delete files that are also present in path 1.
I can't make myself really clear, I'm afraid, so here's an example:
dir1/path/dup1
dir2/somepath/dup1 <-- delete
dir2/path/to/dup1 <-- delete
One attempt may be to first delete duplicates from dir2 and afterwards compare with dir1.
Any help appreciated!
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi
I what, a script snippet for "comparing two directory tree structures only " not the contents of directories(like files..etc).
Thanking you a lot.
Regards
Rajesh (7 Replies)
Discussion started by: raj_thota
7 Replies
2. Shell Programming and Scripting
Hi ppl.
I have to check for duplicate files in a directory .
the directory has following files
/the/folder /containing/the/file
a1.yyyymmddhhmmss
a1.yyyyMMddhhmmss
b1.yyyymmddhhmmss
b2.yyyymmddhhmmss
c.yyyymmddhhmmss
d.yyyymmddhhmmss
d.yyyymmddhhmmss
where the date time stamp can be... (1 Reply)
Discussion started by: asinha63
1 Replies
3. Shell Programming and Scripting
I need help with a script which accepts one argument and goes through all the files under a directory and prints a list of possible duplicate files As its output, it prints zero or more lines, each one containing a space-separated list of filenames. All the files listed on one line have the same... (1 Reply)
Discussion started by: trueman82
1 Replies
4. Shell Programming and Scripting
Hi Team,
I am new to scripting. I want to create a script, which needs to keep only 5 days directories and want to remove the old directory from a particular directory. Can Somebody help me with starting this script.
All my directories will be created in the name <YYYYMMDD>.
Thanks... (2 Replies)
Discussion started by: siva80_cit
2 Replies
5. UNIX for Dummies Questions & Answers
We (our company) has just purchased a new IBM unix machine. We have been doing some research and have found that it is NOT a good idea to put your own in-house-written applications under the existing file folders such as /usr or /bin ect. Instead you should place these applications in directories... (7 Replies)
Discussion started by: jbrubaker
7 Replies
6. Shell Programming and Scripting
Hello,
I have several files in a specific directory.
A specific string in one file can occur in another files.
If this string is in other files. Then all the files in which this string occured should be deleted and only 1 file should remain with the string.
Example.
file1
ShortName "Blue... (2 Replies)
Discussion started by: premier_de
2 Replies
7. Shell Programming and Scripting
Hi all.
Am doing continuous backup of mailboxes using rsync.
So whenever a new mail arrives it is automatically copied on backup server.
When a new mail arrives it is named as xyz:2, when it is read by the email client an S is appended xyz:2,S
Eventually , 2 copies of the same file exist on... (7 Replies)
Discussion started by: coolatt
7 Replies
8. Ubuntu
Hi every body
I have some text file with a lots of duplicate rows like this:
165.179.568.197
154.893.836.174
242.473.396.153
165.179.568.197
165.179.568.197
165.179.568.197
154.893.836.174
how can I delete the repeated rows?
Thanks
Saeideh (2 Replies)
Discussion started by: sashtari
2 Replies
9. Shell Programming and Scripting
he following are the files available in my directory
RSK_123_20141113_031500.txt
RSK_123_20141113_081500.txt
RSK_126_20141113_041500.txt
RSK_126_20141113_081800.txt
RSK_128_20141113_091600.txt
Here, "RSK" is file prefix and 123 is a code name and rest is just timestamp of the file when its... (7 Replies)
Discussion started by: kridhick
7 Replies
10. Shell Programming and Scripting
Hi
My directory structure is as below.
dir1, dir2, dir3
I have the list of files to be deleted in the below path as below.
/staging/retain_for_2years/Cleanup/log $ ls -lrt
total 0
drwxr-xr-x 2 nobody nobody 256 Mar 01 16:15 01-MAR-2015_SPDBS2
drwxr-xr-x 2 root ... (2 Replies)
Discussion started by: prasadn
2 Replies
LEARN ABOUT REDHAT
msguniq
MSGUNIQ(1) GNU MSGUNIQ(1)
NAME
msguniq - unify duplicate translations in message catalog
SYNOPSIS
msguniq [OPTION] [INPUTFILE]
DESCRIPTION
Unifies duplicate translations in a translation catalog. Finds duplicate translations of the same message ID. Such duplicates are invalid
input for other programs like msgfmt, msgmerge or msgcat. By default, duplicates are merged together. When using the --repeated option,
only duplicates are output, and all other messages are discarded. Comments and extracted comments will be cumulated, except that if
--use-first is specified, they will be taken from the first translation. File positions will be cumulated. When using the --unique
option, duplicates are discarded.
Mandatory arguments to long options are mandatory for short options too.
Input file location:
INPUTFILE
input PO file
-D, --directory=DIRECTORY
add DIRECTORY to list for input files search
If no input file is given or if it is -, standard input is read.
Output file location:
-o, --output-file=FILE
write output to specified file
The results are written to standard output if no output file is specified or if it is -.
Message selection:
-d, --repeated
print only duplicates
-u, --unique
print only unique messages, discard duplicates
Output details:
-t, --to-code=NAME
encoding for output
--use-first
use first available translation for each message, don't merge several translations
-e, --no-escape
do not use C escapes in output (default)
-E, --escape
use C escapes in output, no extended chars
--force-po
write PO file even if empty
-i, --indent
write the .po file using indented style
--no-location
do not write '#: filename:line' lines
-n, --add-location
generate '#: filename:line' lines (default)
--strict
write out strict Uniforum conforming .po file
-w, --width=NUMBER
set output page width
--no-wrap
do not break long message lines, longer than the output page width, into several lines
-s, --sort-output
generate sorted output
-F, --sort-by-file
sort output by file location
Informative output:
-h, --help
display this help and exit
-V, --version
output version information and exit
AUTHOR
Written by Bruno Haible.
REPORTING BUGS
Report bugs to <bug-gnu-gettext@gnu.org>.
COPYRIGHT
Copyright (C) 2001-2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICU-
LAR PURPOSE.
SEE ALSO
The full documentation for msguniq is maintained as a Texinfo manual. If the info and msguniq programs are properly installed at your
site, the command
info msguniq
should give you access to the complete manual.
GNU gettext 0.11.4 July 2002 MSGUNIQ(1)