Remove Duplicate Files On Remote Servers


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove Duplicate Files On Remote Servers
# 1  
Old 03-09-2010
Remove Duplicate Files On Remote Servers

Hello,
I wrote a basic script that works however I am was wondering if it could be sped up. I am comparing files over ssh to remove the file from the source server directory if a match occurs. Please Advise me on my mistakes.

Code:
#!/bin/bash

for file in `ls /export/home/podcast2/"$1" ` ; do

    if [ "`ssh server1.stm ls /export/home/podcast/data2/$1/$file`" = "/export/home/podcast/data2/$1/$file" ]
        then

         rm -f /export/home/podcast2/$1/$file


    fi
done

I would execute the script as.
Code:
Prompt>./shellscript.sh arg1



Thanks,

Jaysunn

Last edited by jaysunn; 03-09-2010 at 09:48 AM.. Reason: added how I execute the code.
# 2  
Old 03-09-2010
Quote:
Originally Posted by jaysunn
I am comparing files over ssh to remove the file from the source server directory if a match occurs.
You are connecting to the server for each file, this is slooooow; you should connect only one time. There are lots of ways of doing it, that's one:

Code:
ssh server ls remotedir | ( cd localdir && xargs -d"\n" rm )

Anyway, this whole idea is a bit weird. If this involves syncing files, rsync is the right tool.
# 3  
Old 03-09-2010
Whoa,
This is exactly what I am looking for. I will have test it, but thanks for the reply.

Jaysunn

---------- Post updated at 09:38 AM ---------- Previous update was at 09:31 AM ----------

I have modified the script to work with my variables. However I am getting a xargs error.

I am on RHEL4 with bash. I checked the man page and did not see the -d option.


Code:
#!/bin/bash
server=podcast01.stm


ssh $server ls /export/home/podcast/data2/"$1" | ( cd /export/home/podcast2/"$1" && xargs -d"\n" echo rm -f)


Code:
[root@podcast2 bin]# ./remove_dups.sh kmox
xargs: invalid option -- d
Usage: xargs [-0prtx] [-E eof-str] [-e[eof-str]] [-I replace-str]
       [-i[replace-str]] [-L max-lines] [-l[max-lines]] [-n max-args]
       [-s max-chars] [-P max-procs] [--null] [--eof[=eof-str]]
       [--replace[=replace-str]] [--max-lines[=max-lines]] [--interactive]
       [--max-chars=max-chars] [--verbose] [--exit] [--max-procs=max-procs]
       [--max-args=max-args] [--no-run-if-empty] [--version] [--help]
       [command [initial-arguments]]


Thanks
# 4  
Old 03-09-2010
Quote:
Originally Posted by jaysunn
I have modified the script to work with my variables. However I am getting a xargs error.

I am on RHEL4 with bash. I checked the man page and did not see the -d option.
The -d option is used to set the delimiter between fields. If your xargs does not have this option, you can omit it; it will work fine except for filenames with spaces
# 5  
Old 03-09-2010
Quote:
Originally Posted by tokland
The -d option is used to set the delimiter between fields. If your xargs does not have this option, you can omit it; it will work fine except for filenames with spaces
That's understating the matter. It will not work properly for filenames with spaces, tabs, newlines, single quotes, and double quotes.

You can improve the robustness of the pipeline by passing the output of ssh through
Code:
tr '\n' '\0'

and using xargs' -0 option. This will render it impervious to any characters except embedded newlines in filenames (which I assume is very unlikely to occur unless someone has been drinking and admining). If you retool to use `find -print0`, then there'd be no need for the tr filtering and even embedded newlines would be handled properly.

Also, the rm command in the original post needs some quoting to prevent field splitting damage.

Regards,
Alister
# 6  
Old 03-10-2010
Quote:
Originally Posted by alister
That's understating the matter. It will not work properly for filenames with spaces, tabs, newlines, single quotes, and double quotes.

Code:
tr '\n' '\0'

and using xargs' -0 option.
Absolutely.

Actually, I have this tr command aliased (lineto0) but I didn't remember it.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove all but newest two files (Not a duplicate post)

TARGET_DIR='/media/andy/MAXTOR_SDB1/Ubuntu_Mate_18.04/' REGEX='{4}-{2}-{2}_{2}:{2}' # regular expression that match to: date '+%Y-%m-%d_%H:%M' LATEST_FILE="$(ls "$TARGET_DIR" | egrep "^${REGEX}$" | tail -1)" find "$TARGET_DIR" ! -name "$LATEST_FILE" -type f -regextype egrep -regex... (7 Replies)
Discussion started by: drew77
7 Replies

2. Windows & DOS: Issues & Discussions

Remove duplicate lines from text files.

So, I have text files, one "fail.txt" And one "color.txt" I now want to use a command line (DOS) to remove ANY line that is PRESENT IN BOTH from each text file. Afterwards there shall be no duplicate lines. (1 Reply)
Discussion started by: pasc
1 Replies

3. UNIX for Dummies Questions & Answers

Help with Copying files between two remote servers

Hi All, Please help me for a shell. I am a New to unix I am trying to DB dump file from one server and copying it to another server. From My Local ServerA connecting to remote ServerB using ssh and taking dump of a instance. That Dump file i need to copy to ServerC. I am able to connect... (6 Replies)
Discussion started by: maddyd2k
6 Replies

4. Shell Programming and Scripting

Remove duplicate files

Hi, In a directory, e.g. ~/corpus is a lot of files and subdirectories. Some of the files are named: 12345___PP___0902___AA.txt 12346___PP___0902___AA. txt 12347___PP___0902___AA. txt The amount of files varies. I need to keep the highest (12347___PP___0902___AA. txt) and remove... (5 Replies)
Discussion started by: corfuitl
5 Replies

5. Shell Programming and Scripting

how to delete files on two remote servers simultaneously?

dear all, i'm preparing a script which can do these actions : 1. stop remove server's certain service 2. clean the files on remote servers simultaneously (because lots of files need to be deleted) 3. after files/logs are removed, restart the service again i'm stuck on how to clean remote... (4 Replies)
Discussion started by: tiger2000
4 Replies

6. Shell Programming and Scripting

perl/shell need help to remove duplicate lines from files

Dear All, I have multiple files having number of records, consist of more than 10 columns some column values are duplicate and i want to remove these duplicate values from these files. Duplicate values may come in different files.... all files laying in single directory.. Need help to... (3 Replies)
Discussion started by: arvindng
3 Replies

7. Shell Programming and Scripting

Remove duplicate files in same directory

Hi all. Am doing continuous backup of mailboxes using rsync. So whenever a new mail arrives it is automatically copied on backup server. When a new mail arrives it is named as xyz:2, when it is read by the email client an S is appended xyz:2,S Eventually , 2 copies of the same file exist on... (7 Replies)
Discussion started by: coolatt
7 Replies

8. Shell Programming and Scripting

Remove duplicate files based on text string?

Hi I have been struggling with a script for removing duplicate messages from a shared mailbox. I would like to search for duplicate messages based on the “Message-ID” string within the messages files. I have managed to find the duplicate “Message-ID” strings and (if I would like) delete... (1 Reply)
Discussion started by: spangberg
1 Replies

9. Shell Programming and Scripting

remove all duplicate lines from all files in one folder

Hi, is it possible to remove all duplicate lines from all txt files in a specific folder? This is too hard for me maybe someone could help. lets say we have an amount of textfiles 1 or 2 or 3 or... maximum 50 each textfile has lines with text. I want all lines of all textfiles... (8 Replies)
Discussion started by: lowmaster
8 Replies

10. Shell Programming and Scripting

remove duplicate files in a directory

Hi ppl. I have to check for duplicate files in a directory . the directory has following files /the/folder /containing/the/file a1.yyyymmddhhmmss a1.yyyyMMddhhmmss b1.yyyymmddhhmmss b2.yyyymmddhhmmss c.yyyymmddhhmmss d.yyyymmddhhmmss d.yyyymmddhhmmss where the date time stamp can be... (1 Reply)
Discussion started by: asinha63
1 Replies
Login or Register to Ask a Question