Remove duplicate files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove duplicate files
# 1  
Old 03-27-2012
Remove duplicate files

Hi,

In a directory, e.g. ~/corpus is a lot of files and subdirectories. Some of the files are named:

Code:
 
12345___PP___0902___AA.txt
12346___PP___0902___AA. txt
12347___PP___0902___AA. txt

The amount of files varies. I need to keep the highest (12347___PP___0902___AA. txt) and remove others in a new dir. Could you help me? Is there any script?
Thank you for your helping in advance.
# 2  
Old 03-27-2012
Off the top of my head, test before using in the real world!!
Code:
latest=0;
for i in $(find ~/corpus -name \*___PP___0902___AA. txt); do
    current=$(echo $(basename $1) \ cut -d_ -f1) 
    if [ $current -lt $latest ] ; then
        #rm $i
        echo we would delete $i" # I HAVE NOT TESTED THIS CODE
    else
        latest=$current
    fi
done

# 3  
Old 03-27-2012
thanks for the reply! Is that a bash? Sorry I am new in linux.
Can I replace *___PP___0902___AA. txt with *.txt because there are file names named
Code:
12345___PP___0902___AA.txt
12346___PP___0902___AA. txt
12347___PP___0902___AA. txt
12345___PP___0903___AA.txt
12346___PP___0903___AA. txt
12347___PP___0903___AA. txt
...

and another question. If I need the file with the smallest ID?

Thanks a lot!
# 4  
Old 03-27-2012
Yes, it's bash.
You could certainly use \*.txt in the find command
If you need to build a list of not the highest or lowest you could use an array or a temp file and set the first to be both highest and lowest
The rm command should be replaced writing the names of the files to be moved to your array/temp file
Then post process with a mv command for everything in the array / temp file , possibly recreating any directory structure underneath with a mkdir -p $new_dir/$(basenaname $i)
# 5  
Old 03-28-2012
I am sorry but it does not work. I can not run it.

Code:
dupl.sh: line 8: unexpected EOF while looking for matching `"'
dupl.sh: line 13: syntax error: unexpected end of file

# 6  
Old 03-28-2012
Code:
latest=0;
back_up="$HOME/backup" # or whatever makes sense for your use case
 for i in $(find ~/corpus -name \*.txt); do     current=$(echo $(basename $1) \ cut -d_ -f1)      if [ $current -lt $latest ] ; then
        if [ ! -d $back_up/$(dirname $i) ] ; then
            #mkdir -p $back_up/$(dirname $i)
            echo "We would have created the $back_up/$(dirname $i) directory"         fi
        # mv $i $back_up/$(dirname $i)         echo "We would have moved $i to $back_up/$(dirname $i)" # I HAVE NOT TESTED THIS CODE     else         latest=$current     fi done

Strangely enough, untested stream of conciousness code doesn't work, fortunately the parser explained why it baulked, there was a terminating quote on a string with no opening quote.
Try the above and then modify as outlined
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove all but newest two files (Not a duplicate post)

TARGET_DIR='/media/andy/MAXTOR_SDB1/Ubuntu_Mate_18.04/' REGEX='{4}-{2}-{2}_{2}:{2}' # regular expression that match to: date '+%Y-%m-%d_%H:%M' LATEST_FILE="$(ls "$TARGET_DIR" | egrep "^${REGEX}$" | tail -1)" find "$TARGET_DIR" ! -name "$LATEST_FILE" -type f -regextype egrep -regex... (7 Replies)
Discussion started by: drew77
7 Replies

2. Windows & DOS: Issues & Discussions

Remove duplicate lines from text files.

So, I have text files, one "fail.txt" And one "color.txt" I now want to use a command line (DOS) to remove ANY line that is PRESENT IN BOTH from each text file. Afterwards there shall be no duplicate lines. (1 Reply)
Discussion started by: pasc
1 Replies

3. Shell Programming and Scripting

[uniq + awk?] How to remove duplicate blocks of lines in files?

Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example: input: <string-array name="threeItems"> <item>item1</item> <item>item2</item> <item>item3</item> </string-array> <string-array name="twoItems"> <item>item1</item> <item>item2</item>... (19 Replies)
Discussion started by: raidzero
19 Replies

4. Shell Programming and Scripting

perl/shell need help to remove duplicate lines from files

Dear All, I have multiple files having number of records, consist of more than 10 columns some column values are duplicate and i want to remove these duplicate values from these files. Duplicate values may come in different files.... all files laying in single directory.. Need help to... (3 Replies)
Discussion started by: arvindng
3 Replies

5. Shell Programming and Scripting

Remove Duplicate Files On Remote Servers

Hello, I wrote a basic script that works however I am was wondering if it could be sped up. I am comparing files over ssh to remove the file from the source server directory if a match occurs. Please Advise me on my mistakes. #!/bin/bash for file in `ls /export/home/podcast2/"$1" ` ; do ... (5 Replies)
Discussion started by: jaysunn
5 Replies

6. Shell Programming and Scripting

Remove duplicate files in same directory

Hi all. Am doing continuous backup of mailboxes using rsync. So whenever a new mail arrives it is automatically copied on backup server. When a new mail arrives it is named as xyz:2, when it is read by the email client an S is appended xyz:2,S Eventually , 2 copies of the same file exist on... (7 Replies)
Discussion started by: coolatt
7 Replies

7. Shell Programming and Scripting

remove duplicate

Hi, I am tryung to use shell or perl to remove duplicate characters for example , if I have " I love google" it will become I love ggle" or even "I loveggle" if removing duplicate white space Thanks CC (6 Replies)
Discussion started by: ccp
6 Replies

8. Shell Programming and Scripting

Remove duplicate files based on text string?

Hi I have been struggling with a script for removing duplicate messages from a shared mailbox. I would like to search for duplicate messages based on the “Message-ID” string within the messages files. I have managed to find the duplicate “Message-ID” strings and (if I would like) delete... (1 Reply)
Discussion started by: spangberg
1 Replies

9. Shell Programming and Scripting

remove all duplicate lines from all files in one folder

Hi, is it possible to remove all duplicate lines from all txt files in a specific folder? This is too hard for me maybe someone could help. lets say we have an amount of textfiles 1 or 2 or 3 or... maximum 50 each textfile has lines with text. I want all lines of all textfiles... (8 Replies)
Discussion started by: lowmaster
8 Replies

10. Shell Programming and Scripting

remove duplicate files in a directory

Hi ppl. I have to check for duplicate files in a directory . the directory has following files /the/folder /containing/the/file a1.yyyymmddhhmmss a1.yyyyMMddhhmmss b1.yyyymmddhhmmss b2.yyyymmddhhmmss c.yyyymmddhhmmss d.yyyymmddhhmmss d.yyyymmddhhmmss where the date time stamp can be... (1 Reply)
Discussion started by: asinha63
1 Replies
Login or Register to Ask a Question