Script to compare files in 2 folders and delete the large file
Hello, my first thread here.
I've been searching and fiddling around for about a week and I cannot find a solution.
I have been converting all of my home videos to HEVC and sometimes the files end up smaller and sometimes they don't. I am currently comparing all the video files manually and it takes up quite a bit of time.
I was wondering if there is a script that can check the 2 folders and delete the larger of the 2 files and keep the smaller one.
I have the original videos in one directory and the converted in another directory. The filenames are always the same but sometimes the extensions are different.
eg. the destination output file will always have the .mkv extension but the original may have .avi, .mpg, mp4 etc. But the filenames themselves will always be the same.
What you request is certainly possible and may have been posted, al least in part, in these fora. Did you try a search with your keywords? When "fiddling around", what were your attempts, with what tools? Where did they fail, or did you get stuck?
Please become accustomed to provide decent context info of your problem.
It is always helpful to support a request with system info like OS and shell, related environment (variables, options), preferred tools, adequate (representative) sample input and desired output data and the logics connecting the two, and, if existent, system (error) messages verbatim, to avoid ambiguities and keep people from guessing.
I searched many forums before deciding to join here. I usually find a solution but this one has proven to be hard to find.
If the answer has already been posted I apologize for creating another post about it.
I am running Ubuntu 17.04 Server on my encoding machine which sits in my basement and I access it through SSH. I am thinking of utilizing diff along with a bash script to determine whether the original or re-encoded file is smaller and then have it delete the larger of the 2 files.
I got as far as playing around with diff a little bit but I am not a script writer so I have no idea how to implement what I want to do into an efficient script.
You have one filename with several different extensions (or in windows, file types):
example filename.aa filename.qb filename.abcd and maybe more.
If this is correct you need to aggregate all of the complete filenames by just the part before the dot in the filename.
What you need for input is
Code:
the filename with no directory name and without a type
size of the file in bytes
the full filename (directory/filename.filetype)
Output has to be the full filename and maybe the size, but only for the largest file in bytes.
You then LOOK at the output to make sure you did not screw up somehow, right?
Then finally you feed the full filenames in the output file to the rm command.
So:
Code:
# get all the filenames in one place -> /tmp/list
find /path/to/directory1 /path/to/directory2 -type f > /tmp/list
# you now have all the file names
#
# rewrite /tmp/list to have the correct values
while read fname # fname is the complete file name
do
shortfile=$(basename $fname)
shortfile=${shortfile%%.*}
size=(stat -c '%s' $fname)
print " $shortfile $size $fname"
done < /tmp/list > /tmp/next
# /tmp/next has the data, so let's sort and aggregate it - assuming no spaces in the shortfile name
# sort by shortfile
sort -k1 -k2n -o /tmp/next /tmp/next
# aggregate
# awk fields are $1 - shortfile, $2 - size, $3 - fullname
awk '{
arr($1)=$3 " " $2 # note that the last values to be stored for shortfile
# come from the last time shortfile is in the file
}
END { for (i in arr) {print arr(i)} }
' /tmp/next > /tmp/final
# delete ONLY after you check /tmp/final
while read fname
do
rm $fname
done < /tmp/final
This code is meant more to learn from than production. Others will show you how to make it more efficient. You need to understand this one first.
This User Gave Thanks to jim mcnamara For This Post:
Hello, Thank you for your help. I have ran into a snag though. Here is what I get when I attempt to create /tmp/next.
When executed without sudo:
Code:
josh52180@MediaBox:~$ ./next.sh
Error: no such file "home stat /home/josh52180/originals/home.video.01.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.02.mp4"
Error: no such file "home stat /home/josh52180/originals/home.video.03.mkv"
Error: no such file "home stat /home/josh52180/originals/home.video.04.mp4"
Error: no such file "home stat /home/josh52180/originals/home.video.05.mov"
Error: no such file "home stat /home/josh52180/originals/home.video.06.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.07.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.08.mpg"
Error: no such file "home stat /home/josh52180/originals/home.video.09.mkv"
Error: no such file "home stat /home/josh52180/originals/home.video.10.mpg"
Error: no such file "home stat /home/josh52180/originals/home.video.11.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.12.mpg"
Error: no such file "home stat /home/josh52180/originals/home.video.13.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.14.mp4"
Error: no such file "home stat /home/josh52180/originals/home.video.15.mkv"
Error: no such file "home stat /home/josh52180/originals/home.video.16.mov"
Error: no such file "home stat /home/josh52180/originals/home.video.17.mpg"
Error: no such file "home stat /home/josh52180/originals/home.video.18.avi"
Error: no such file "home stat /home/josh52180/originals/home.video.19.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.01.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.02.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.03.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.04.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.05.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.06.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.07.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.08.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.09.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.10.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.11.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.12.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.13.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.14.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.15.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.16.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.17.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.18.mkv"
Error: no such file "home stat /home/josh52180/reencoded/home.video.19.mkv"
Three typos made this a mess. Check for zero length files added. You can remove it.
Apologies. Thanks Rudi for spotting the problem.
Code:
# get all the filenames in one place -> /tmp/list
find /path/to/directory1 /path/to/directory2 -type f > /tmp/list
# you now have all the file names
#
# rewrite /tmp/list to have the correct values
while read fname # fname is the complete file name
do
shortfile=$(basename $fname)
shortfile=${shortfile%%.*}
size=$(stat -c '%s' $fname)
[ $size -eq 0 ] && continue # skip zero-length files
echo "$shortfile $size $fname"
done < /tmp/list > /tmp/next
# /tmp/next has the data, so let's sort and aggregate it - assuming no spaces in the shortfile name
# sort by shortfile
sort -k1 -k2n -o /tmp/next /tmp/next
# aggregate
# awk fields are $1 - shortfile, $2 - size, $3 - fullname
awk '{
arr[$1]=$3 " " $2 # note that the last values to be stored for shortfile
# come from the last time shortfile is in the file
}
END { for (i in arr) {print arr[i]} }
' /tmp/next > /tmp/final
# removed the rm stuff for now
Background: I use a TV tuner card to capture OTA video files (.mpeg) and then my Plex Media Server automatically optimizes the files (transcodes for better playback) and places them in a new directory. I have another Plex Library pointing to the new location for the optimized .mp4 files. This... (2 Replies)
Hi
I am new to Linux / scripting language. I need to improve our Linux servers at work and looking to claim some space my deleting log files/ folders on a 5 day basis.
Can someone help me with creating a script to do so.
Any sample script will be helpful.:b:
Regards (2 Replies)
Hi Everyone,
I work for GE Money IVR as a DB analyst and the environment on which I work is Solaris 5.0 server and Oracle 11g.
I got a project in which I have to clean up the folders and files which are not used in DB.
I copied an existing script and edited it, dont know this is the... (5 Replies)
Hi,
I need help in shell scripting. If someone can help me, that would be great!
Problem. I want Linux Script to compare two folders and copy missing files.
Description.
I have two directories
/dir1
/dir2
I need to copy all distinct/new/unique/missing files from /dir1 and that... (1 Reply)
Hi all,
I need a script to delete a large set of files from a directory under / based on an input file and want to redirect errors into separate file.
I have already prepared a list of files in the input file.
Kndly help me.
Thanks,
Prash (36 Replies)
Greetings!
I'm looking for starting information for a shell script. Here's my scenario:
I have multiple folders(100) for example:
/www/test/applications/app1/logs
/www/test/applications/app2/logs
Within these folders there are log files files that need to be deleted after a month.
... (3 Replies)
I do not know much about shell scripting so I am at a loss here. If someone can help me, that would be great!
I have two directories
/dir1
/dir2
I need to delete all files from /dir1 and that does not have a correspondent file in /dir2. It should NOT check file suffixes in /dir2 . Why?... (20 Replies)
I have a large Filesystem on an AIX server and another one on a Red Hat box. I have syncd the two filesystems using rsysnc.
What Im looking for is a script that would compare to the two filesystems to make sure the bits match up and the number of files match up.
its around 2.8 million... (5 Replies)
Hello,
On one of my UNIX boxes, there are many folders. I'm looking for a way / command that will search for and display folder names / location / size so I can do some cleanups.
How can I do this? (8 Replies)
Hi there,
I have written a script called "compare" (see below) to make comparison between 2 files namely test_put.log and Output_A0.log
#!/bin/ksh
while read file
do
found="no"
while read line
do
echo $line | grep $file > /dev/null
if
then
echo $file found
found="yes"
break
fi... (3 Replies)