Script to compare files in 2 folders and delete the large file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Script to compare files in 2 folders and delete the large file
# 1  
Old 06-28-2017
Script to compare files in 2 folders and delete the large file

Hello, my first thread here.

I've been searching and fiddling around for about a week and I cannot find a solution.Smilie

I have been converting all of my home videos to HEVC and sometimes the files end up smaller and sometimes they don't. I am currently comparing all the video files manually and it takes up quite a bit of time.

I was wondering if there is a script that can check the 2 folders and delete the larger of the 2 files and keep the smaller one.

I have the original videos in one directory and the converted in another directory. The filenames are always the same but sometimes the extensions are different.

eg. the destination output file will always have the .mkv extension but the original may have .avi, .mpg, mp4 etc. But the filenames themselves will always be the same.
# 2  
Old 06-28-2017
Welcome to the forum!

What you request is certainly possible and may have been posted, al least in part, in these fora. Did you try a search with your keywords? When "fiddling around", what were your attempts, with what tools? Where did they fail, or did you get stuck?

Please become accustomed to provide decent context info of your problem.
It is always helpful to support a request with system info like OS and shell, related environment (variables, options), preferred tools, adequate (representative) sample input and desired output data and the logics connecting the two, and, if existent, system (error) messages verbatim, to avoid ambiguities and keep people from guessing.
# 3  
Old 06-28-2017
I searched many forums before deciding to join here. I usually find a solution but this one has proven to be hard to find.

If the answer has already been posted I apologize for creating another post about it.

I am running Ubuntu 17.04 Server on my encoding machine which sits in my basement and I access it through SSH. I am thinking of utilizing diff along with a bash script to determine whether the original or re-encoded file is smaller and then have it delete the larger of the 2 files.

I got as far as playing around with diff a little bit but I am not a script writer so I have no idea how to implement what I want to do into an efficient script.

I will try searching the forum again.
# 4  
Old 06-29-2017
IF I understand:

You have one filename with several different extensions (or in windows, file types):
example filename.aa filename.qb filename.abcd and maybe more.

If this is correct you need to aggregate all of the complete filenames by just the part before the dot in the filename.

What you need for input is
Code:
 the filename with no directory name and without a type
 size of the file in bytes
 the full filename  (directory/filename.filetype)

Output has to be the full filename and maybe the size, but only for the largest file in bytes.

You then LOOK at the output to make sure you did not screw up somehow, right?

Then finally you feed the full filenames in the output file to the rm command.

So:
Code:
# get all the filenames in one place -> /tmp/list
find /path/to/directory1 /path/to/directory2 -type f > /tmp/list
#  you now have all the file names
#
# rewrite /tmp/list to have the correct values
while read fname   # fname is the complete file name
do
      shortfile=$(basename $fname)
      shortfile=${shortfile%%.*}
      size=(stat -c '%s' $fname)
      
      print " $shortfile $size $fname"
done < /tmp/list > /tmp/next

# /tmp/next has the data, so let's sort and aggregate it -  assuming no spaces in the shortfile name
# sort by shortfile

sort -k1 -k2n -o /tmp/next /tmp/next

# aggregate
# awk fields are $1 - shortfile, $2 - size,  $3 - fullname
awk '{ 
         arr($1)=$3 " " $2  # note that the last values to be stored for shortfile
                                   # come from  the last time shortfile is in the file
                                   
        }
         END { for (i in arr) {print arr(i)} }
        ' /tmp/next > /tmp/final
        
# delete ONLY after you check /tmp/final
while read fname
do 
     rm $fname
done < /tmp/final

This code is meant more to learn from than production. Others will show you how to make it more efficient. You need to understand this one first.
This User Gave Thanks to jim mcnamara For This Post:
# 5  
Old 06-29-2017
Hello, Thank you for your help. I have ran into a snag though. Here is what I get when I attempt to create /tmp/next.

When executed without sudo:
Code:
josh52180@MediaBox:~$ ./next.sh
Error: no such file "home stat /home/josh52180/originals/home.video.01.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.02.mp4"
Error: no such file "home stat /home/josh52180/originals/home.video.03.mkv"
Error: no such file "home stat /home/josh52180/originals/home.video.04.mp4"
Error: no such file "home stat /home/josh52180/originals/home.video.05.mov"
Error: no such file "home stat /home/josh52180/originals/home.video.06.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.07.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.08.mpg"
Error: no such file "home stat /home/josh52180/originals/home.video.09.mkv"
Error: no such file "home stat /home/josh52180/originals/home.video.10.mpg"
Error: no such file "home stat /home/josh52180/originals/home.video.11.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.12.mpg"
Error: no such file "home stat /home/josh52180/originals/home.video.13.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.14.mp4"
Error: no such file "home stat /home/josh52180/originals/home.video.15.mkv"
Error: no such file "home stat /home/josh52180/originals/home.video.16.mov"
Error: no such file "home stat /home/josh52180/originals/home.video.17.mpg"
Error: no such file "home stat /home/josh52180/originals/home.video.18.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.19.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.01.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.02.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.03.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.04.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.05.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.06.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.07.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.08.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.09.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.10.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.11.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.12.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.13.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.14.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.15.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.16.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.17.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.18.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.19.mkv"

When executed with sudo:
Code:
josh52180@MediaBox:~$ sudo ./next.sh
./next.sh: 5: ./next.sh: Syntax error: "(" unexpected (expecting "done")


Last edited by Josh52180; 06-29-2017 at 09:56 AM.. Reason: additional information
# 6  
Old 06-29-2017
Three typos made this a mess. Check for zero length files added. You can remove it.
Apologies. Thanks Rudi for spotting the problem.

Code:
# get all the filenames in one place -> /tmp/list
find /path/to/directory1 /path/to/directory2 -type f > /tmp/list
#  you now have all the file names
#
# rewrite /tmp/list to have the correct values
while read fname   # fname is the complete file name
do
      shortfile=$(basename $fname)
      shortfile=${shortfile%%.*}
      size=$(stat -c '%s' $fname)
      [ $size -eq 0 ] && continue # skip zero-length files 
      echo "$shortfile $size $fname"
done < /tmp/list > /tmp/next

# /tmp/next has the data, so let's sort and aggregate it -  assuming no spaces in the shortfile name
# sort by shortfile

sort -k1 -k2n -o /tmp/next /tmp/next

# aggregate
# awk fields are $1 - shortfile, $2 - size,  $3 - fullname
awk '{ 
         arr[$1]=$3 " " $2  # note that the last values to be stored for shortfile
                                   # come from  the last time shortfile is in the file
                                   
        }
         END { for (i in arr) {print arr[i]} }
        ' /tmp/next > /tmp/final
        
# removed the rm stuff for now

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script to compare partial filenames in two folders and delete duplicates

Background: I use a TV tuner card to capture OTA video files (.mpeg) and then my Plex Media Server automatically optimizes the files (transcodes for better playback) and places them in a new directory. I have another Plex Library pointing to the new location for the optimized .mp4 files. This... (2 Replies)
Discussion started by: shaky
2 Replies

2. UNIX for Advanced & Expert Users

Help with creating script to delete log files/folders

Hi I am new to Linux / scripting language. I need to improve our Linux servers at work and looking to claim some space my deleting log files/ folders on a 5 day basis. Can someone help me with creating a script to do so. Any sample script will be helpful.:b: Regards (2 Replies)
Discussion started by: sachinksl
2 Replies

3. Shell Programming and Scripting

Script to delete folders and files from a prompt

Hi Everyone, I work for GE Money IVR as a DB analyst and the environment on which I work is Solaris 5.0 server and Oracle 11g. I got a project in which I have to clean up the folders and files which are not used in DB. I copied an existing script and edited it, dont know this is the... (5 Replies)
Discussion started by: habeeb506
5 Replies

4. Shell Programming and Scripting

Linux Script to compare two folders and copy missing files

Hi, I need help in shell scripting. If someone can help me, that would be great! Problem. I want Linux Script to compare two folders and copy missing files. Description. I have two directories /dir1 /dir2 I need to copy all distinct/new/unique/missing files from /dir1 and that... (1 Reply)
Discussion started by: S.Praveen Kumar
1 Replies

5. Shell Programming and Scripting

Need to delete large set of files (i.e) close to 100K from a directory based on the input file

Hi all, I need a script to delete a large set of files from a directory under / based on an input file and want to redirect errors into separate file. I have already prepared a list of files in the input file. Kndly help me. Thanks, Prash (36 Replies)
Discussion started by: prash358
36 Replies

6. Shell Programming and Scripting

Shell Script to delete files within a particular time frame under multiple sub folders

Greetings! I'm looking for starting information for a shell script. Here's my scenario: I have multiple folders(100) for example: /www/test/applications/app1/logs /www/test/applications/app2/logs Within these folders there are log files files that need to be deleted after a month. ... (3 Replies)
Discussion started by: whysolucky
3 Replies

7. Shell Programming and Scripting

Compare files in two folders and delete missing ones

I do not know much about shell scripting so I am at a loss here. If someone can help me, that would be great! I have two directories /dir1 /dir2 I need to delete all files from /dir1 and that does not have a correspondent file in /dir2. It should NOT check file suffixes in /dir2 . Why?... (20 Replies)
Discussion started by: kaah
20 Replies

8. Shell Programming and Scripting

Script to Compare a large number of files.

I have a large Filesystem on an AIX server and another one on a Red Hat box. I have syncd the two filesystems using rsysnc. What Im looking for is a script that would compare to the two filesystems to make sure the bits match up and the number of files match up. its around 2.8 million... (5 Replies)
Discussion started by: zippdawg2001
5 Replies

9. Shell Programming and Scripting

Looking for Large Files and Folders

Hello, On one of my UNIX boxes, there are many folders. I'm looking for a way / command that will search for and display folder names / location / size so I can do some cleanups. How can I do this? (8 Replies)
Discussion started by: bbbngowc
8 Replies

10. Shell Programming and Scripting

Compare data in 2 files and delete if file exist

Hi there, I have written a script called "compare" (see below) to make comparison between 2 files namely test_put.log and Output_A0.log #!/bin/ksh while read file do found="no" while read line do echo $line | grep $file > /dev/null if then echo $file found found="yes" break fi... (3 Replies)
Discussion started by: lweegp
3 Replies
Login or Register to Ask a Question