To Find Duplicate files using latest in Linux


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting To Find Duplicate files using latest in Linux
# 1  
To Find Duplicate files using latest in Linux

I have tried the following code and with that i couldnt achieve what i want.

Code:
    #!/usr/bin/bash
    find ./ -type f \( -iname "*.xml" \) | sort -n > fileList
    sed -i '/\.\/fileList/d' fileList
    NAMEOFTHISFILE=$(echo $0|sed -e 's/[]\/()$*.^|[]/\\&/g')
    sed -i "/$NAMEOFTHISFILE/d" fileList
    cp fileList auxFileList
    while read FILENAME
    do
        sed -i '1d' auxFileList
        #echo "Comparing $FILENAME with :"
        #Read the aux file and compare current file with every other element in the file
        while read COMPFILENAME
        do
            RETURN=$(diff $FILENAME $COMPFILENAME)
            if [ "$RETURN" == "" ]
            then
            cat $FILENAME | awk ' BEGIN { FS="_" } { printf( "%03d\n",$2) }' | sort | awk ' { printf( "data_%d_box\n", $1)  }'
             #echo "$FILENAME AND $COMPFILENAME are identical"
             #rm -r $FILENAME
            fi
            #echo "  $COMPFILENAME"
        done<auxFileList
    done<fileList
    rm fileList auxFileList &>/dev/null
    printf '\n\n'

this code selecting all the files initially. I have to amend my code in such a way that only recent modified filename patterns for example
Code:
    File 1: AAA_555_0000 
    File 2: AAAA_123_123 
    File 3: AAAA_452_452 [latest]
    
    File 4: BBB_555_0000 
    File 5: BBB_555_555 
    File 6: BBB_999_999 [latest]
    
    File 7: CCC_555_0000 
    File 8: CCC_000_000 
    File 9: CCC_000_111 [latest]

Script has to pick latest file in all the filename patterns in the folder and it should compare and delete the duplicates.

Appreciate if you can help me with this logic.

Thanks much!
# 2  
Script for removing the Duplicate files except latest in filename series

I have a folder with series of filename patterns like the below.
Code:
./ARCZ00300_010117_1504690829222.xml
./ARCZ00300_010117_1507101655366.xml [latest]
./ARCZ00301_010117_1504691829478.xml
./ARCZ00301_010117_1507101655591.xml  [latest]
./ARCZ00302_010117_1504691451495.xml
./ARCZ00302_010117_1507101656182.xml  [latest]
./ARCZ00303_010117_1504691526615.xml
./ARCZ00303_010117_1507101657147.xml  [latest]
./ARCZ00304_010117_1504691981689.xml
./ARCZ00304_010117_1507101657249.xml  [latest]
./ARCZ00305_010117_1507101657610.xml
./ARCZ00306_010117_1507101658585.xml
./ARCZ00307_010117_1504691981668.xml
./ARCZ00307_010117_1507101658940.xml  [latest]
./ARCZ00577_010117_1504692004529.xml
./ARCZ00580_010117_1504691562602.xml
./ARCZ00580_010117_1507101892930.xml  [latest]


Script has to pick latest file in all the filename patterns in the folder and it should compare and delete the duplicates.

Appreciate if you can help me with this logic.

Thanks much!
# 3  
Maybe this would come closer to what you want:
Code:
#!/bin/bash
ls -r *.xml | while read -r file
do      if [ "$last" = "${file%%_*}" ]
        then    echo rm "$file"
        else    last=${file%%_*}
        fi
done

If that gives you the list of files you want to remove, remove the echo shown in red and run the script again.
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #771
Difficulty: Medium
Arcade system boards have been using specialized graphics chips since the 1970s.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Modified or latest files copy from windows to Linux

To copy the file from windows to linux i use pscp command(pscp source user@destination). Know i want to copy the latest modified or created files from windows to linux. could any one please help me out with it. Thanks and Regards, Sourabh (2 Replies)
Discussion started by: SourabhChavan
2 Replies

2. Shell Programming and Scripting

How to find duplicate line in Linux?

Hi, Gurus, I need find the duplicate record in unix file. what command I should use for this. Thanks in advance (4 Replies)
Discussion started by: ken6503
4 Replies

3. Shell Programming and Scripting

Find duplicate files but with different extensions

Hi ! I wonder if anyone can help on this : I have a directory: /xyz that has the following files: chsLog.107.20130603.gz chsLog.115.20130603 chsLog.111.20130603.gz chsLog.107.20130603 chsLog.115.20130603.gz As you ca see there are two files that are the same but only with a minor... (10 Replies)
Discussion started by: fretagi
10 Replies

4. Shell Programming and Scripting

How to find the latest file on Unix or Linux (recursive)

Hi all, I need to get the latest file. I have found this command "ls -lrt" that is great but not recursive. Can anyone help? Thanx by advance. (7 Replies)
Discussion started by: 1or2is3
7 Replies

5. Shell Programming and Scripting

Find the latest directory and loop through the files and pick the error messages

Hi, I am new to unix and shell scripting,can anybody help me in sctipting a requirement. my requirement is to get the latest directory the name of the directory will be like CSB.monthdate_time stamp like CSB.Sep29_11:16 and CSB.Oct01_16:21. i need to pick the latest directory. in the... (15 Replies)
Discussion started by: sudhir_83k
15 Replies

6. Shell Programming and Scripting

Find duplicate files

What utility do you recommend for simply finding all duplicate files among all files? (4 Replies)
Discussion started by: kiasas
4 Replies

7. Shell Programming and Scripting

find the latest files in multiple directory

I want to get the latest files from multiple directories, d1, d2,d3 and d4 under the parent dierectoy d. can anyone help out with this? thx (3 Replies)
Discussion started by: shyork2001
3 Replies

8. Shell Programming and Scripting

Find Duplicate files, not by name

I have a directory with images: -rw-r--r-- 1 root root 26216 Mar 19 21:00 020109.210001.jpg -rw-r--r-- 1 root root 21760 Mar 19 21:15 020109.211502.jpg -rw-r--r-- 1 root root 23144 Mar 19 21:30 020109.213002.jpg -rw-r--r-- 1 root root 31350 Mar 20 00:45 020109.004501.jpg -rw-r--r-- 1 root... (2 Replies)
Discussion started by: Ikon
2 Replies

9. UNIX for Dummies Questions & Answers

How to find the latest file on Unix or Linux

Please help me out how to identify the latest file in one directory by looking at file's timestamp or datestamp. You can say using system command. Thanks (10 Replies)
Discussion started by: duke0001
10 Replies

10. Shell Programming and Scripting

shell script to find latest date and time of the files

Hi everyone, Please help:) I have a list of 1000 different files which comes daily to the directory.Some of the files are not coming to the directory now. I need to write a shell script to find the latest date and time of the files they came to the directory. The files should be unique.... (1 Reply)
Discussion started by: karthicss
1 Replies

Featured Tech Videos