Sponsored Content
Top Forums Shell Programming and Scripting To Find Duplicate files using latest in Linux Post 303007616 by gold2k8 on Saturday 18th of November 2017 11:40:56 PM
Old 11-19-2017
To Find Duplicate files using latest in Linux

I have tried the following code and with that i couldnt achieve what i want.

Code:
    #!/usr/bin/bash
    find ./ -type f \( -iname "*.xml" \) | sort -n > fileList
    sed -i '/\.\/fileList/d' fileList
    NAMEOFTHISFILE=$(echo $0|sed -e 's/[]\/()$*.^|[]/\\&/g')
    sed -i "/$NAMEOFTHISFILE/d" fileList
    cp fileList auxFileList
    while read FILENAME
    do
        sed -i '1d' auxFileList
        #echo "Comparing $FILENAME with :"
        #Read the aux file and compare current file with every other element in the file
        while read COMPFILENAME
        do
            RETURN=$(diff $FILENAME $COMPFILENAME)
            if [ "$RETURN" == "" ]
            then
            cat $FILENAME | awk ' BEGIN { FS="_" } { printf( "%03d\n",$2) }' | sort | awk ' { printf( "data_%d_box\n", $1)  }'
             #echo "$FILENAME AND $COMPFILENAME are identical"
             #rm -r $FILENAME
            fi
            #echo "  $COMPFILENAME"
        done<auxFileList
    done<fileList
    rm fileList auxFileList &>/dev/null
    printf '\n\n'

this code selecting all the files initially. I have to amend my code in such a way that only recent modified filename patterns for example
Code:
    File 1: AAA_555_0000 
    File 2: AAAA_123_123 
    File 3: AAAA_452_452 [latest]
    
    File 4: BBB_555_0000 
    File 5: BBB_555_555 
    File 6: BBB_999_999 [latest]
    
    File 7: CCC_555_0000 
    File 8: CCC_000_000 
    File 9: CCC_000_111 [latest]

Script has to pick latest file in all the filename patterns in the folder and it should compare and delete the duplicates.

Appreciate if you can help me with this logic.

Thanks much!
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

shell script to find latest date and time of the files

Hi everyone, Please help:) I have a list of 1000 different files which comes daily to the directory.Some of the files are not coming to the directory now. I need to write a shell script to find the latest date and time of the files they came to the directory. The files should be unique.... (1 Reply)
Discussion started by: karthicss
1 Replies

2. UNIX for Dummies Questions & Answers

How to find the latest file on Unix or Linux

Please help me out how to identify the latest file in one directory by looking at file's timestamp or datestamp. You can say using system command. Thanks (10 Replies)
Discussion started by: duke0001
10 Replies

3. Shell Programming and Scripting

Find Duplicate files, not by name

I have a directory with images: -rw-r--r-- 1 root root 26216 Mar 19 21:00 020109.210001.jpg -rw-r--r-- 1 root root 21760 Mar 19 21:15 020109.211502.jpg -rw-r--r-- 1 root root 23144 Mar 19 21:30 020109.213002.jpg -rw-r--r-- 1 root root 31350 Mar 20 00:45 020109.004501.jpg -rw-r--r-- 1 root... (2 Replies)
Discussion started by: Ikon
2 Replies

4. Shell Programming and Scripting

find the latest files in multiple directory

I want to get the latest files from multiple directories, d1, d2,d3 and d4 under the parent dierectoy d. can anyone help out with this? thx (3 Replies)
Discussion started by: shyork2001
3 Replies

5. Shell Programming and Scripting

Find duplicate files

What utility do you recommend for simply finding all duplicate files among all files? (4 Replies)
Discussion started by: kiasas
4 Replies

6. Shell Programming and Scripting

Find the latest directory and loop through the files and pick the error messages

Hi, I am new to unix and shell scripting,can anybody help me in sctipting a requirement. my requirement is to get the latest directory the name of the directory will be like CSB.monthdate_time stamp like CSB.Sep29_11:16 and CSB.Oct01_16:21. i need to pick the latest directory. in the... (15 Replies)
Discussion started by: sudhir_83k
15 Replies

7. Shell Programming and Scripting

How to find the latest file on Unix or Linux (recursive)

Hi all, I need to get the latest file. I have found this command "ls -lrt" that is great but not recursive. Can anyone help? Thanx by advance. (7 Replies)
Discussion started by: 1or2is3
7 Replies

8. Shell Programming and Scripting

Find duplicate files but with different extensions

Hi ! I wonder if anyone can help on this : I have a directory: /xyz that has the following files: chsLog.107.20130603.gz chsLog.115.20130603 chsLog.111.20130603.gz chsLog.107.20130603 chsLog.115.20130603.gz As you ca see there are two files that are the same but only with a minor... (10 Replies)
Discussion started by: fretagi
10 Replies

9. Shell Programming and Scripting

How to find duplicate line in Linux?

Hi, Gurus, I need find the duplicate record in unix file. what command I should use for this. Thanks in advance (4 Replies)
Discussion started by: ken6503
4 Replies

10. UNIX for Beginners Questions & Answers

Modified or latest files copy from windows to Linux

To copy the file from windows to linux i use pscp command(pscp source user@destination). Know i want to copy the latest modified or created files from windows to linux. could any one please help me out with it. Thanks and Regards, Sourabh (2 Replies)
Discussion started by: SourabhChavan
2 Replies
FileList(3pm)						User Contributed Perl Documentation					     FileList(3pm)

NAME
File::RsyncP::FileList - Perl interface to rsync file list encoding and decoding. SYNOPSIS
use File::RsyncP::FileList; $fileList = File::RsyncP::FileList->new({ preserve_uid => 1, preserve_gid => 1, preserve_links => 1, preserve_devices => 1, preserve_hard_links => 0, always_checksum => 0, remote_version => 26, }); # decoding an incoming file list while ( !$fileList->decodeDone && !$fileList->fatalError ) { $data .= readMoreDataFromRemoteRsync(); $bytesDone = $fileList->decode($data); $data = substr($data, $bytesDone) if ( $bytesDone > 0 ); } $fileList->clean; # create (encode) a file list $fileList->encode({ name => $filePath, dev => $stat[0], inode => $stat[1], mode => $stat[2], uid => $stat[4], gid => $stat[5], rdev => $stat[6], mtime => $stat[9], }); $data = $fileList->encodeData; # get file information, for file number 5: $fileInfo = $fileList->get(5); # utility functions $numberOfFiles = $fileList->count; $gotFatalError = $fileList->fatalError; DESCRIPTION
The File::RsyncP::FileList module is used to encode and decode file lists in using the same format at Rsync. The sender side of Rsync creates a list of all the files the are going to be sent. This list is sent in a compact format to the receiver side. Each side then sorts the list and removes duplicate entries. From this point on, all files are referred to by their integer index into the sorted file list. A new file list object is created by calling File::RsyncP::FileList->new. An object can be used to decode or encode a file list. There is no mechanism to reset the state of a file list: you should create a new object each time you need to do a new decode or encode. The new() function takes a hashref of options, which correspond to various rsync command-line switches. These must exactly match the arguments to the remote rsync, otherwise the file list format will not be compatible and decoding will fail. $fileList = File::RsyncP::FileList->new({ preserve_uid => 1, # --owner preserve_gid => 1, # --group preserve_links => 1, # --links preserve_devices => 1, # --devices preserve_hard_links => 0, # --hard-links always_checksum => 0, # --checksum remote_version => 26, # remote protocol version }); Decoding The decoding functions take a stream of bytes from the remote rsync and convert them into an internal data structure. Rather than store the file list as a native perl list of hashes (which occupies too much memory for large file lists), the same internal data structure as rsync is used. Individual file list entries can be returned with the get() function. File list data read from the remote rsync should be passed to the decode() function. The data may be read and processed in arbitrary sized chunks. The decode() function returns how many bytes were actually processed. It is the caller's responsbility to remove that number of bytes from the input argument, preserving the remaining bytes for the next call to decode(). The decodeDone() function returns true when the file list is complete. The fatalError() function returns true if there was a non-recoverable error while decoding. The clean() function needs to be called after the file list decode is complete. The clean() function sorts the file list and removes repeated entries. Skipping this step will produce unexpected results: since files are referred to using integers, each side will refer to different files is the file lists are not sorted and purged in exactly the same manner. A typical decode loop looks like: while ( !$fileList->decodeDone && !$fileList->fatalError ) { $data .= readMoreDataFromRemoteRsync(); $bytesDone = $fileList->decode($data); $data = substr($data, $bytesDone) if ( $bytesDone > 0 ); } $fileList->clean; After clean() is called, the number of files in the file list can be found by calling count(). Files can be fetched by calling the get() function, with an index from 0 to count()-1: $fileInfo = $fileList->get(5); The get() function returns a hashref with various entries: name path name of the file (relative to rsync dir): equal to dirname/basename basename file name, without directory dirname directory where file resides sum file MD4 checksum (only present if --checksum specified) uid file user id gid file group id mode file mode mtime file modification time size file length dev device number on which file resides inode file inode link link contents if the file is a sym link rdev major/minor device number if file is char/block special Various fields will only have valid values if the corresponding options are set (eg: uid if preserve_uid is set, dev and inode if preserve_hard_links is set etc). For example, to dump out each of hash you could do this: use Data::Dumper; my $count = $fileList->count; for ( my $i = 0 ; $i < $count ; $i++ ) { print("File $i is: "); print Dumper($fileList->get($i)); } Encoding The encode() function is used to build a file list in preparation for encoding and sending a file list to a remote rsync. The encode() function takes a hashref argument with the parameters for one file. It should be called once for each file. The parameter names are the same as those returned by get(). In this example the matching stat() values are shown: $fileList->encode({ name => $filePath, dev => $stat[0], inode => $stat[1], mode => $stat[2], uid => $stat[4], gid => $stat[5], rdev => $stat[6], size => $stat[7], mtime => $stat[9], }); It is not necessary to specify basename and dirname; these are extracted from name. You only need to specify the parameters that match the options given to new(). You can also specify sum and link as necessary. To compute the encoded file list data the encodeData() function should be called. It can be called every time encode() is called, or once at the end of all the encode() calls. It returns the encoded data that should be sent to the remote rsync: $data = $fileList->encodeData; It is recommended that encodeData() be called frequently to avoid the need to allocate large internal buffers to hold the entire encoded file list. Since encodeData() does not know when the last file has been encoded, it is the caller's responsbility to add the final null byte (eg: pack("C", 0)) to the data to indicate the end of the file list data. After all the file list entries are processed you should call clean(): $fileList->clean; This ensures that each side (sender/receiver) has identical sorted file lists. Utility functions The count() function returns the total number of files in the internal file list (either decoded or encoded). The fatalError() function returns true if a fatal error has occured during file decoding. It should be called in the decode loop to make sure no error has occured. AUTHOR
File::RsyncP::FileList was written by Craig Barratt <cbarratt@users.sourceforge.net> based on rsync 2.5.5. Rsync was written by Andrew Tridgell <tridge@samba.org> and Paul Mackerras. It is available under a GPL license. See http://rsync.samba.org LICENSE
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License in the LICENSE file along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA. SEE ALSO
See <http://perlrsync.sourceforge.net> for File::RsyncP's SourceForge home page. See File::RsyncP and File::RsyncP::FileIO for more precise examples of using File::RsyncP::FileList. Also see BackupPC's lib/BackupPC/Xfer/RsyncFileIO.pm for other examples. perl v5.14.2 2010-07-25 FileList(3pm)
All times are GMT -4. The time now is 06:50 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy