Remove duplicate files in same directory


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove duplicate files in same directory
# 1  
Old 02-03-2010
Remove duplicate files in same directory

Hi all.

Am doing continuous backup of mailboxes using rsync.
So whenever a new mail arrives it is automatically copied on backup server.
When a new mail arrives it is named as xyz:2, when it is read by the email client an S is appended xyz:2,S
Eventually , 2 copies of the same file exist on the backup server with different names while on the Mailserver only xyz:2,S exists.

e.g
on mail server:
xyz:2,RS

on backup server:
xyz:2,
xyz:2,S
xyz:2,RS

So in 1 directory i can have 3 copies of file xyz & 2 copies of file abc.
Can anyone help how i can remove the oldest files(xyz:2,) and keep only the most recent one on the backup server ?

ThanksSmilie
# 2  
Old 02-03-2010
I am very new in unix but will it help if you use..
ls -rt| tail -1 ==> this will get you the most current file
# 3  
Old 02-03-2010
rsync with -e --delete?

Code:
     --delete                delete extraneous files from dest dirs

# 4  
Old 02-03-2010
Quote:
Originally Posted by rdcwayx
rsync with -e --delete?

Code:
     --delete                delete extraneous files from dest dirs

If a mail file is deleted from the mail server , using --delete will delete on backup server as well.
Am running rsync regularly & files on backup server are deleted by a script only if they have a certain age e.g 10days

Smilie

---------- Post updated at 06:39 PM ---------- Previous update was at 05:58 PM ----------

Quote:
Originally Posted by Anu_1
I am very new in unix but will it help if you use..
ls -rt| tail -1 ==> this will get you the most current file
I tried on the following files:

Quote:
1265199975.P6583Q0M174865.ecs,S=623:2,S
1265199975.P6583Q0M174865.ecs,S=623:2,F
1265199975.P6583Q0M174865.ecs,S=623:2,
1265198625.P6233Q0M875762.ecs,S=639:2,F
1265198625.P6233Q0M875762.ecs,S=639:2,FS
1265198625.P6233Q0M875762.ecs,S=639:2,S
Output is:
Quote:
1265199975.P6583Q0M174865.ecs,S=623:2,
Still ... Smilie

Anyone plz help me.thx.
# 5  
Old 02-03-2010
Lightbulb

coolatt, if I'm reading this right, you want to delete from the backup server, for each xyz,2* group, all but the most recent file pertaining to each group.

I recreated your directory with the following files. Please note the order in which they were created (ls -rt):

1265199975.P6583Q0M174865.ecs,S=623:2,
1265199975.P6583Q0M174865.ecs,S=623:2,F
1265198625.P6233Q0M875762.ecs,S=639:2,S
1265199975.P6583Q0M174865.ecs,S=623:2,S
1265198625.P6233Q0M875762.ecs,S=639:2,FS
1265198625.P6233Q0M875762.ecs,S=639:2,F

I created a script looking like this ($FILEDIR being the directory where you have the files that are to be checked, on the backup server):

Code:
#!/bin/bash

ls  -rt $FILEDIR > filelist.txt

awk  -F ':2,' '{print $1}' filelist.txt | sort -u > searchbase.txt

cat searchbase.txt | while read line
do
        grep "$line" filelist.txt | head --lines=-1
done

Please let me know if this outputs the correct files (it does for me).

If it does, I recon simply adding a |xargs rm after the head command should delete the older files.

BTW, the script also works if you have just one file for a group (say the original xyz:2, file), in the sense that it will not delete the backup.
# 6  
Old 02-04-2010
Thanks cmf1985, for the script.

I run your script on a directory containing the following files:

Quote:
1265198625.P6233Q0M875762.ecs,S=639:2,F
1265198625.P6233Q0M875762.ecs,S=639:2,FS
1265198625.P6233Q0M875762.ecs,S=639:2,S
1265199975.P6583Q0M174865.ecs,S=623:2,
1265199975.P6583Q0M174865.ecs,S=623:2,F
1265199975.P6583Q0M174865.ecs,S=623:2,S
1265201980.P7044Q0M234565.ecs,S=623:2,S
1265201997.P7058Q0M121781.ecs,S=639:2,S
1265202446.P7209Q0M203877.ecs,S=623:2,S
1265202446.P7209Q1M203877.ecs,S=623:2,S
1265202446.P7209Q2M203877.ecs,S=623:2,S
1265202446.P7209Q3M203877.ecs,S=623:2,S
1265202957.P7339Q0M799626.ecs,S=639:2,F
1265202957.P7339Q1M799626.ecs,S=639:2,S
1265202957.P7339Q2M799626.ecs,S=639:2,
1265202957.P7339Q3M799626.ecs,S=639:2,S
I got the following output:

Quote:
1265198625.P6233Q0M875762.ecs,S=639:2,S
1265198625.P6233Q0M875762.ecs,S=639:2,FS
1265199975.P6583Q0M174865.ecs,S=623:2,S
1265199975.P6583Q0M174865.ecs,S=623:2,F
However,As you can see for the folowing group it didn't work:

1265202446.P7209Q1M203877.ecs,S=623:2,S
1265202446.P7209Q2M203877.ecs,S=623:2,S
1265202446.P7209Q3M203877.ecs,S=623:2,S

---------- Post updated at 02:22 PM ---------- Previous update was at 12:23 PM ----------

It is not working as expected Smilie
I found another problem when i run the script (red+green):

On mail server i have the following email files (output of #ls -ltc):

Quote:
-rw------- 1 vmail mail 1019 2010-02-04 11:54 1265269890.P4829Q0M586604.ecs,S=1019:2,S

-rw------- 1 vmail mail 619 2010-02-04 11:53 1265269831.P4792Q0M213777.ecs,S=619:2,S

-rw------- 1 vmail mail 645 2010-02-04 11:52 1265269852.P4803Q0M751985.ecs,S=645:2,S
On backup server i have the following email files (output of #ls -ltc):

Quote:
-rw------- 1 500 mail 619 2010-02-04 11:55 1265269831.P4792Q0M213777.ecs,S=619:2,S

-rw------- 1 500 mail 1019 2010-02-04 11:55 1265269890.P4829Q0M586604.ecs,S=1019:2,FS

-rw------- 1 500 mail 645 2010-02-04 11:54 1265269852.P4803Q0M751985.ecs,S=645:2,S

-rw------- 1 500 mail 619 2010-02-04 11:53 1265269831.P4792Q0M213777.ecs,S=619:2,F

-rw------- 1 500 mail 1019 2010-02-04 11:53 1265269890.P4829Q0M586604.ecs,S=1019:2,S

-rw------- 1 500 mail 619 2010-02-04 11:52 1265269831.P4792Q0M213777.ecs,S=619:2,

-rw------- 1 500 mail 645 2010-02-04 11:52 1265269852.P4803Q0M751985.ecs,S=645:2,
After running the script on the backup server:

Quote:
1265269831.P4792Q0M213777.ecs,S=619:2,S

1265269831.P4792Q0M213777.ecs,S=619:2,F

1265269852.P4803Q0M751985.ecs,S=645:2,S

1265269890.P4829Q0M586604.ecs,S=1019:2,S
The script must keep the 3 files (which are both on the mail server & the backup server)
and delete the rest from the backup server.


But I think the prob is associated with timestamps of the files.

Please advise.Thanks.

Last edited by coolatt; 02-04-2010 at 02:28 AM..
# 7  
Old 02-04-2010
I'm beginning to get a bit confused about what you want... Do you want to have the same files on the mail server and the backup server (as in, get a list of all the files currently on the mail server and delete from the backup server anything that's not in that list)?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove all but newest two files (Not a duplicate post)

TARGET_DIR='/media/andy/MAXTOR_SDB1/Ubuntu_Mate_18.04/' REGEX='{4}-{2}-{2}_{2}:{2}' # regular expression that match to: date '+%Y-%m-%d_%H:%M' LATEST_FILE="$(ls "$TARGET_DIR" | egrep "^${REGEX}$" | tail -1)" find "$TARGET_DIR" ! -name "$LATEST_FILE" -type f -regextype egrep -regex... (7 Replies)
Discussion started by: drew77
7 Replies

2. Windows & DOS: Issues & Discussions

Remove duplicate lines from text files.

So, I have text files, one "fail.txt" And one "color.txt" I now want to use a command line (DOS) to remove ANY line that is PRESENT IN BOTH from each text file. Afterwards there shall be no duplicate lines. (1 Reply)
Discussion started by: pasc
1 Replies

3. Shell Programming and Scripting

Remove duplicate files

Hi, In a directory, e.g. ~/corpus is a lot of files and subdirectories. Some of the files are named: 12345___PP___0902___AA.txt 12346___PP___0902___AA. txt 12347___PP___0902___AA. txt The amount of files varies. I need to keep the highest (12347___PP___0902___AA. txt) and remove... (5 Replies)
Discussion started by: corfuitl
5 Replies

4. Shell Programming and Scripting

[uniq + awk?] How to remove duplicate blocks of lines in files?

Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example: input: <string-array name="threeItems"> <item>item1</item> <item>item2</item> <item>item3</item> </string-array> <string-array name="twoItems"> <item>item1</item> <item>item2</item>... (19 Replies)
Discussion started by: raidzero
19 Replies

5. Shell Programming and Scripting

perl/shell need help to remove duplicate lines from files

Dear All, I have multiple files having number of records, consist of more than 10 columns some column values are duplicate and i want to remove these duplicate values from these files. Duplicate values may come in different files.... all files laying in single directory.. Need help to... (3 Replies)
Discussion started by: arvindng
3 Replies

6. Shell Programming and Scripting

Remove Duplicate Files On Remote Servers

Hello, I wrote a basic script that works however I am was wondering if it could be sped up. I am comparing files over ssh to remove the file from the source server directory if a match occurs. Please Advise me on my mistakes. #!/bin/bash for file in `ls /export/home/podcast2/"$1" ` ; do ... (5 Replies)
Discussion started by: jaysunn
5 Replies

7. Shell Programming and Scripting

Remove duplicate files based on text string?

Hi I have been struggling with a script for removing duplicate messages from a shared mailbox. I would like to search for duplicate messages based on the “Message-ID” string within the messages files. I have managed to find the duplicate “Message-ID” strings and (if I would like) delete... (1 Reply)
Discussion started by: spangberg
1 Replies

8. Shell Programming and Scripting

remove all duplicate lines from all files in one folder

Hi, is it possible to remove all duplicate lines from all txt files in a specific folder? This is too hard for me maybe someone could help. lets say we have an amount of textfiles 1 or 2 or 3 or... maximum 50 each textfile has lines with text. I want all lines of all textfiles... (8 Replies)
Discussion started by: lowmaster
8 Replies

9. Shell Programming and Scripting

script that detects duplicate files in directory

I need help with a script which accepts one argument and goes through all the files under a directory and prints a list of possible duplicate files As its output, it prints zero or more lines, each one containing a space-separated list of filenames. All the files listed on one line have the same... (1 Reply)
Discussion started by: trueman82
1 Replies

10. Shell Programming and Scripting

remove duplicate files in a directory

Hi ppl. I have to check for duplicate files in a directory . the directory has following files /the/folder /containing/the/file a1.yyyymmddhhmmss a1.yyyyMMddhhmmss b1.yyyymmddhhmmss b2.yyyymmddhhmmss c.yyyymmddhhmmss d.yyyymmddhhmmss d.yyyymmddhhmmss where the date time stamp can be... (1 Reply)
Discussion started by: asinha63
1 Replies
Login or Register to Ask a Question