Sponsored Content
Top Forums Shell Programming and Scripting Script to compare files in 2 folders and delete the large file Post 302999860 by jim mcnamara on Thursday 29th of June 2017 12:58:03 AM
Old 06-29-2017
IF I understand:

You have one filename with several different extensions (or in windows, file types):
example filename.aa filename.qb filename.abcd and maybe more.

If this is correct you need to aggregate all of the complete filenames by just the part before the dot in the filename.

What you need for input is
Code:
 the filename with no directory name and without a type
 size of the file in bytes
 the full filename  (directory/filename.filetype)

Output has to be the full filename and maybe the size, but only for the largest file in bytes.

You then LOOK at the output to make sure you did not screw up somehow, right?

Then finally you feed the full filenames in the output file to the rm command.

So:
Code:
# get all the filenames in one place -> /tmp/list
find /path/to/directory1 /path/to/directory2 -type f > /tmp/list
#  you now have all the file names
#
# rewrite /tmp/list to have the correct values
while read fname   # fname is the complete file name
do
      shortfile=$(basename $fname)
      shortfile=${shortfile%%.*}
      size=(stat -c '%s' $fname)
      
      print " $shortfile $size $fname"
done < /tmp/list > /tmp/next

# /tmp/next has the data, so let's sort and aggregate it -  assuming no spaces in the shortfile name
# sort by shortfile

sort -k1 -k2n -o /tmp/next /tmp/next

# aggregate
# awk fields are $1 - shortfile, $2 - size,  $3 - fullname
awk '{ 
         arr($1)=$3 " " $2  # note that the last values to be stored for shortfile
                                   # come from  the last time shortfile is in the file
                                   
        }
         END { for (i in arr) {print arr(i)} }
        ' /tmp/next > /tmp/final
        
# delete ONLY after you check /tmp/final
while read fname
do 
     rm $fname
done < /tmp/final

This code is meant more to learn from than production. Others will show you how to make it more efficient. You need to understand this one first.
This User Gave Thanks to jim mcnamara For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare data in 2 files and delete if file exist

Hi there, I have written a script called "compare" (see below) to make comparison between 2 files namely test_put.log and Output_A0.log #!/bin/ksh while read file do found="no" while read line do echo $line | grep $file > /dev/null if then echo $file found found="yes" break fi... (3 Replies)
Discussion started by: lweegp
3 Replies

2. Shell Programming and Scripting

Looking for Large Files and Folders

Hello, On one of my UNIX boxes, there are many folders. I'm looking for a way / command that will search for and display folder names / location / size so I can do some cleanups. How can I do this? (8 Replies)
Discussion started by: bbbngowc
8 Replies

3. Shell Programming and Scripting

Script to Compare a large number of files.

I have a large Filesystem on an AIX server and another one on a Red Hat box. I have syncd the two filesystems using rsysnc. What Im looking for is a script that would compare to the two filesystems to make sure the bits match up and the number of files match up. its around 2.8 million... (5 Replies)
Discussion started by: zippdawg2001
5 Replies

4. Shell Programming and Scripting

Compare files in two folders and delete missing ones

I do not know much about shell scripting so I am at a loss here. If someone can help me, that would be great! I have two directories /dir1 /dir2 I need to delete all files from /dir1 and that does not have a correspondent file in /dir2. It should NOT check file suffixes in /dir2 . Why?... (20 Replies)
Discussion started by: kaah
20 Replies

5. Shell Programming and Scripting

Shell Script to delete files within a particular time frame under multiple sub folders

Greetings! I'm looking for starting information for a shell script. Here's my scenario: I have multiple folders(100) for example: /www/test/applications/app1/logs /www/test/applications/app2/logs Within these folders there are log files files that need to be deleted after a month. ... (3 Replies)
Discussion started by: whysolucky
3 Replies

6. Shell Programming and Scripting

Need to delete large set of files (i.e) close to 100K from a directory based on the input file

Hi all, I need a script to delete a large set of files from a directory under / based on an input file and want to redirect errors into separate file. I have already prepared a list of files in the input file. Kndly help me. Thanks, Prash (36 Replies)
Discussion started by: prash358
36 Replies

7. Shell Programming and Scripting

Linux Script to compare two folders and copy missing files

Hi, I need help in shell scripting. If someone can help me, that would be great! Problem. I want Linux Script to compare two folders and copy missing files. Description. I have two directories /dir1 /dir2 I need to copy all distinct/new/unique/missing files from /dir1 and that... (1 Reply)
Discussion started by: S.Praveen Kumar
1 Replies

8. Shell Programming and Scripting

Script to delete folders and files from a prompt

Hi Everyone, I work for GE Money IVR as a DB analyst and the environment on which I work is Solaris 5.0 server and Oracle 11g. I got a project in which I have to clean up the folders and files which are not used in DB. I copied an existing script and edited it, dont know this is the... (5 Replies)
Discussion started by: habeeb506
5 Replies

9. UNIX for Advanced & Expert Users

Help with creating script to delete log files/folders

Hi I am new to Linux / scripting language. I need to improve our Linux servers at work and looking to claim some space my deleting log files/ folders on a 5 day basis. Can someone help me with creating a script to do so. Any sample script will be helpful.:b: Regards (2 Replies)
Discussion started by: sachinksl
2 Replies

10. Shell Programming and Scripting

Script to compare partial filenames in two folders and delete duplicates

Background: I use a TV tuner card to capture OTA video files (.mpeg) and then my Plex Media Server automatically optimizes the files (transcodes for better playback) and places them in a new directory. I have another Plex Library pointing to the new location for the optimized .mp4 files. This... (2 Replies)
Discussion started by: shaky
2 Replies
apache_mod_perl-108~358::mod_perl-2.0.7::docs::api::APR:UserfContributed Perl Docapache_mod_perl-108~358::mod_perl-2.0.7::docs::api::APR::Finfo(3)

NAME
APR::Finfo - Perl API for APR fileinfo structure Synopsis use APR::Finfo (); use APR::Const -compile => qw(FINFO_NORM); my $finfo = APR::Finfo::stat("/tmp/test", APR::Const::FINFO_NORM, $pool); $device = $finfo->device; # (stat $file)[0] $inode = $finfo->inode; # (stat $file)[1] # stat returns an octal number while protection is hex $prot = $finfo->protection; # (stat $file)[2] $nlink = $finfo->nlink; # (stat $file)[3] $gid = $finfo->group; # (stat $file)[4] $uid = $finfo->user; # (stat $file)[5] $size = $finfo->size; # (stat $file)[7] $atime = $finfo->atime; # (stat $file)[8] $mtime = $finfo->mtime; # (stat $file)[9] $ctime = $finfo->ctime; # (stat $file)[10] $csize = $finfo->csize; # consumed size: not portable! $filetype = $finfo->filetype; # file/dir/socket/etc $fname = $finfo->fname; $name = $finfo->name; # in filesystem case: # valid fields that can be queried $valid = $finfo->valid; Description APR fileinfo structure provides somewhat similar information to Perl's "stat()" call, but you will want to use this module's API to query an already "stat()'ed" filehandle to avoid an extra system call or to query attributes specific to APR file handles. During the HTTP request handlers coming after "PerlMapToStorageHandler", "$r->finfo" already contains the cached values from the apr's "stat()" call. So you don't want to perform it again, but instead get the "ARP::Finfo" object via: my $finfo = $r->finfo; API
"APR::Finfo" provides the following functions and/or methods: "atime" Get the time the file was last accessed: $atime = $finfo->atime; obj: $finfo ( "APR::Finfo object" ) return: $atime ( integer ) Last access time in seconds since the epoch since: 2.0.00 This method returns the same value as Perl's: (stat $filename)[8] Note that this method may not be reliable on all platforms, most notably Win32 -- FAT32 filesystems appear to work properly, but NTFS filesystems do not. "csize" Get the storage size consumed by the file $csize = $finfo->csize; obj: $finfo ( "APR::Finfo object" ) return: $csize ( integer ) since: 2.0.00 Chances are that you don't want to use this method, since its functionality is not supported on most platforms (in which case it always returns 0). "ctime" Get the time the file was last changed $ctime = $finfo->ctime; obj: $finfo ( "APR::Finfo object" ) return: $ctime ( integer ) Inode change time in seconds since the epoch since: 2.0.00 This method returns the same value as Perl's: (stat $filename)[10] The ctime field is non-portable. In particular, you cannot expect it to be a "creation time", see "Files and Filesystems" in the perlport manpage for details. "device" Get the id of the device the file is on. $device = $finfo->device; obj: $finfo ( "APR::Finfo object" ) return: $device ( integer ) since: 2.0.00 This method returns the same value as Perl's: (stat $filename)[0] Note that this method is non-portable. It doesn't work on all platforms, most notably Win32. "filetype" Get the type of file. $filetype = $finfo->filetype; obj: $finfo ( "APR::Finfo object" ) return: $filetype ( ":filetype constant" ) since: 2.0.00 For example: use APR::Pool; use APR::Finfo; use APR::Const -compile => qw(FILETYPE_DIR FILETYPE_REG FINFO_NORM); my $pool = APR::Pool->new(); my $finfo = APR::Finfo::stat("/tmp", APR::Const::FINFO_NORM, $pool); my $finfo = $finfo->filetype; if ($finfo == APR::Const::FILETYPE_REG) { print "regular file"; } elsif ($finfo == APR::Const::FILETYPE_REG) { print "directory"; } else { print "other file"; } Since /tmp is a directory, this will print: directory "fname" Get the pathname of the file (possibly unrooted) $fname = $finfo->fname; obj: $finfo ( "APR::Finfo object" ) return: $filetype ( string ) since: 2.0.00 "group" Get the group id that owns the file: $gid = $finfo->group; obj: $finfo ( "APR::Finfo object" ) return: $gid ( number ) since: 2.0.00 This method returns the same value as Perl's: (stat $filename)[5] Note that this method may not be meaningful on all platforms, most notably Win32. Incorrect results have also been reported on some versions of OSX. "inode" Get the inode of the file. $inode = $finfo->inode; obj: $finfo ( "APR::Finfo object" ) return: $inode ( integer ) since: 2.0.00 This method returns the same value as Perl's: (stat $filename)[1] Note that this method may not be meaningful on all platforms, most notably Win32. "mtime" The time the file was last modified $mtime = $finfo->mtime; obj: $finfo ( "APR::Finfo object" ) return: $mtime ( integer ) Last modify time in seconds since the epoch since: 2.0.00 This method returns the same value as Perl's: (stat $filename)[9] "name" Get the file's name (no path) in filesystem case: $name = $finfo->name; obj: $finfo ( "APR::Finfo object" ) return: $device ( string ) since: 2.0.00 "nlink" Get the number of hard links to the file. $nlink = $finfo->nlink; obj: $finfo ( "APR::Finfo object" ) return: $nlink ( integer ) since: 2.0.00 This method returns the same value as Perl's: (stat $filename)[3] "protection" Get the access permissions of the file. Mimics Unix access rights. $prot = $finfo->protection; obj: $finfo ( "APR::Finfo object" ) return: $prot ( ":fprot constant" ) since: 2.0.00 This method returns the same value as Perl's: (stat $filename)[2] Note: Perl's stat returns an octal number while mod_perl's "protection" returns a hex number. See perldoc -f stat and APR's file_io for more information on each. "size" Get the size of the file $size = $finfo->size; obj: $finfo ( "APR::Finfo object" ) return: $size ( integer ) Total size of file, in bytes since: 2.0.00 This method returns the same value as Perl's: (stat $filename)[7] "stat" Get the specified file's stats. $finfo = APR::Finfo::stat($fname, $wanted_fields, $p); arg1: $fname ( string ) The path to the file to "stat()". arg2: $wanted_fields ( ":finfo constant" ) The desired fields, as a bitmask flag of "APR::FINFO_*" constants. Notice that you can also use the constants that already combine several elements in one. For example "APR::Const::FINFO_PROT" asks for all protection bits, "APR::Const::FINFO_MIN" asks for the following fields: type, mtime, ctime, atime, size and "APR::Const::FINFO_NORM" asks for all atomic unix "apr_stat()" fields (similar to perl's "stat()"). arg3: $p ( "APR::Pool object" ) the pool to use to allocate the file stat structure. ret: $finfo ( "APR::Finfo object" ) since: 2.0.00 For example, here is how to get most of the "stat" fields: use APR::Pool (); use APR::Finfo (); use APR::Const -compile => qw(FINFO_NORM); my $pool = APR::Pool->new(); my $finfo = APR::Finfo::stat("/tmp/test", APR::Const::FINFO_NORM, $pool); "user" Get the user id that owns the file: $uid = $finfo->user; obj: $finfo ( "APR::Finfo object" ) return: $uid ( number ) since: 2.0.00 This method returns the same value as Perl's: (stat $filename)[4] Note that this method may not be meaningful on all platforms, most notably Win32. "valid" The bitmask describing valid fields of this apr_finfo_t structure including all available 'wanted' fields and potentially more $valid = $finfo->valid; obj: $finfo ( "APR::Finfo object" ) arg1: $valid ( bitmask ) This bitmask flag should be bit-OR'ed against ":finfo constant" constants. since: 2.0.00 See Also mod_perl 2.0 documentation. Copyright mod_perl 2.0 and its core modules are copyrighted under The Apache Software License, Version 2.0. Authors The mod_perl development team and numerous contributors. perl v5.16.2 2011-02-07 apache_mod_perl-108~358::mod_perl-2.0.7::docs::api::APR::Finfo(3)
All times are GMT -4. The time now is 02:44 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy