03-13-2006
remove duplicate files in a directory
Hi ppl.
I have to check for duplicate files in a directory .
the directory has following files
/the/folder /containing/the/file
a1.yyyymmddhhmmss
a1.yyyyMMddhhmmss
b1.yyyymmddhhmmss
b2.yyyymmddhhmmss
c.yyyymmddhhmmss
d.yyyymmddhhmmss
d.yyyymmddhhmmss
where the date time stamp can be different for the same file hence the risk of duplicate files.
How do i make the validate so that there are no duplicate files to be processed .
Anubha
Last edited by asinha63; 03-14-2006 at 10:25 AM..
Reason: to make the issue clearer
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I need help with a script which accepts one argument and goes through all the files under a directory and prints a list of possible duplicate files As its output, it prints zero or more lines, each one containing a space-separated list of filenames. All the files listed on one line have the same... (1 Reply)
Discussion started by: trueman82
1 Replies
2. Shell Programming and Scripting
Hi,
is it possible to remove all duplicate lines from all txt files in a specific folder?
This is too hard for me maybe someone could help.
lets say we have an amount of textfiles 1 or 2 or 3 or... maximum 50
each textfile has lines with text.
I want all lines of all textfiles... (8 Replies)
Discussion started by: lowmaster
8 Replies
3. Shell Programming and Scripting
Hi
I have been struggling with a script for removing duplicate messages from a shared mailbox.
I would like to search for duplicate messages based on the “Message-ID” string within the messages files.
I have managed to find the duplicate “Message-ID” strings and (if I would like) delete... (1 Reply)
Discussion started by: spangberg
1 Replies
4. Shell Programming and Scripting
Hi all.
Am doing continuous backup of mailboxes using rsync.
So whenever a new mail arrives it is automatically copied on backup server.
When a new mail arrives it is named as xyz:2, when it is read by the email client an S is appended xyz:2,S
Eventually , 2 copies of the same file exist on... (7 Replies)
Discussion started by: coolatt
7 Replies
5. Shell Programming and Scripting
Hello,
I wrote a basic script that works however I am was wondering if it could be sped up. I am comparing files over ssh to remove the file from the source server directory if a match occurs. Please Advise me on my mistakes.
#!/bin/bash
for file in `ls /export/home/podcast2/"$1" ` ; do
... (5 Replies)
Discussion started by: jaysunn
5 Replies
6. Shell Programming and Scripting
Dear All,
I have multiple files having number of records, consist of more than 10 columns some column values are duplicate and i want to remove these duplicate values from these files.
Duplicate values may come in different files.... all files laying in single directory..
Need help to... (3 Replies)
Discussion started by: arvindng
3 Replies
7. Shell Programming and Scripting
Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example:
input:
<string-array name="threeItems">
<item>item1</item>
<item>item2</item>
<item>item3</item>
</string-array>
<string-array name="twoItems">
<item>item1</item>
<item>item2</item>... (19 Replies)
Discussion started by: raidzero
19 Replies
8. Shell Programming and Scripting
Hi,
In a directory, e.g. ~/corpus is a lot of files and subdirectories. Some of the files are named:
12345___PP___0902___AA.txt
12346___PP___0902___AA. txt
12347___PP___0902___AA. txt
The amount of files varies. I need to keep the highest (12347___PP___0902___AA. txt) and remove... (5 Replies)
Discussion started by: corfuitl
5 Replies
9. Windows & DOS: Issues & Discussions
So, I have text files,
one "fail.txt"
And one
"color.txt"
I now want to use a command line (DOS) to remove ANY line that is PRESENT IN BOTH from each text file.
Afterwards there shall be no duplicate lines. (1 Reply)
Discussion started by: pasc
1 Replies
10. Shell Programming and Scripting
TARGET_DIR='/media/andy/MAXTOR_SDB1/Ubuntu_Mate_18.04/'
REGEX='{4}-{2}-{2}_{2}:{2}' # regular expression that match to: date '+%Y-%m-%d_%H:%M'
LATEST_FILE="$(ls "$TARGET_DIR" | egrep "^${REGEX}$" | tail -1)"
find "$TARGET_DIR" ! -name "$LATEST_FILE" -type f -regextype egrep -regex... (7 Replies)
Discussion started by: drew77
7 Replies
LEARN ABOUT CENTOS
hardlink
hardlink(1) General Commands Manual hardlink(1)
NAME
hardlink - Consolidate duplicate files via hardlinks
SYNOPSIS
hardlink [-c] [-n] [-v] [-vv] [-h] directory1 [ directory2 ... ]
DESCRIPTION
This manual page documents hardlink, a program which consolidates duplicate files in one or more directories using hardlinks.
hardlink traverses one or more directories searching for duplicate files. When it finds duplicate files, it uses one of them as the mas-
ter. It then removes all other duplicates and places a hardlink for each one pointing to the master file. This allows for conservation of
disk space where multiple directories on a single filesystem contain many duplicate files.
Since hard links can only span a single filesystem, hardlink is only useful when all directories specified are on the same filesystem.
OPTIONS
-c Compare only the contents of the files being considered for consolidation. Disregards permission, ownership and other differ-
ences.
-f Force hardlinking across file systems.
-n Do not perform the consolidation; only print what would be changed.
-v Print summary after hardlinking.
-vv Print every hardlinked file and bytes saved. Also print summary after hardlinking.
-h Show help.
AUTHOR
hardlink was written by Jakub Jelinek <jakub@redhat.com>.
Man page written by Brian Long.
Man page updated by Jindrich Novy <jnovy@redhat.com>
BUGS
hardlink assumes that its target directory trees do not change from under it. If a directory tree does change, this may result in hardlink
accessing files and/or directories outside of the intended directory tree. Thus, you must avoid running hardlink on potentially changing
directory trees, and especially on directory trees under control of another user.
hardlink(1)