Need to delete large set of files (i.e) close to 100K from a directory based on the input file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need to delete large set of files (i.e) close to 100K from a directory based on the input file
# 8  
Old 08-20-2012
Quote:
Originally Posted by Chubler_XL
Well any file in the original filelist that isn't in the error list was removed.

Can you output the first few lines of the error file ( head /tmp/errors.list ) file so we can see the format, a simple awk script should be able to produce the removed files list.
Below is the output from the errors.list file

12345678.jpg: No such file or directory
12348765.jpg: No such file or directory
87654321.jpg: No such file or directory
87651234.jpg: No such file or directory
-
-
-

Thanks
# 9  
Old 08-20-2012
try:
Code:
awk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' errors.list filelist.txt > removed.list

# 10  
Old 08-20-2012
Quote:
Originally Posted by Chubler_XL
try:
Code:
awk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' errors.list filelist.txt > removed.list

Getting below error and btw please include something for file not found with this script.
Code:
awk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' /tmp/error.txt /tmp/file.txt
awk: syntax error near line 1
awk: bailing out near line 1

Thanks

---------- Post updated at 05:02 PM ---------- Previous update was at 05:00 PM ----------

Quote:
Originally Posted by Chubler_XL
try:
Code:
awk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' errors.list filelist.txt > removed.list

Below is the full command with piping the output file
Code:
awk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' /tmp/error.txt /tmp/file.txt > /tmp/removed.list
awk: syntax error near line 1
awk: bailing out near line 1

Thanks

Last edited by Franklin52; 08-21-2012 at 05:27 AM.. Reason: Please use code tags for data and code samples
# 11  
Old 08-20-2012
If your on Solaris you will need to use nawk instead of awk
# 12  
Old 08-20-2012
Quote:
Originally Posted by Chubler_XL
If your on Solaris you will need to use nawk instead of awk
Thank you for the update. I have used nawk as suggested but i dont see anything in the below files as they are empty

0 Aug 20 22:05 error.txt
0 Aug 20 22:06 removed.list
Code:
nawk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' /tmp/error.txt /tmp/file.txt > /tmp/removed.list

So in the below command first file will be the errors file and the second one will be the list of files to be deleted and the third one will be the deleted files list.

can you confirm?

Thanks

---------- Post updated at 05:18 PM ---------- Previous update was at 05:11 PM ----------

Quote:
Originally Posted by Chubler_XL
If your on Solaris you will need to use nawk instead of awk
Btw if i run the below command it wont delete the files as required.
Code:
nawk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' /tmp/error.txt /tmp/file.txt > /tmp/removed.list

Thanks

Last edited by Franklin52; 08-21-2012 at 05:28 AM.. Reason: Please use code tags for data and code samples
# 13  
Old 08-20-2012
Quote:
Originally Posted by prash358
So in the below command first file will be the errors file and the second one will be the list of files to be deleted and the third one will be the deleted files list.

can you confirm?
Yes that is correct, the awk command is just to produce a list of what was removed by the xargs command run earlier.

Unfortunately your rm doesn’t support -v (verbose), so the best way to get a deleted files list is to process the errorlog after the fact. We can deduce that any file listed in the remove list that is not in the errorlog was removed


The complete script would be:
Code:
xargs rm < /tmp/file.list 2> /tmp/error.txt
nawk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' /tmp/error.txt /tmp/file.list > /tmp/removed.list

This User Gave Thanks to Chubler_XL For This Post:
# 14  
Old 08-20-2012
Quote:
Originally Posted by Chubler_XL
Yes that is correct, the awk command is just to produce a list of what was removed by the xargs command run earlier.

Unfortunately your rm doesn’t support -v (verbose), so the best way to get a deleted files list is to process the errorlog after the fact. We can deduce that any file listed in the remove list that is not in the errorlog was removed


The complete script would be:
Code:
xargs rm < /tmp/file.list 2> /tmp/error.txt
nawk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' /tmp/error.txt /tmp/file.list > /tmp/removed.list

Thanks for the quick response.

---------- Post updated at 05:36 PM ---------- Previous update was at 05:33 PM ----------

Quote:
Originally Posted by prash358
Thanks for the quick response.
Btw even the nawk command worked for me in Solaris and what can we use for linux?

Is there a way that we can combine these two commands in a shell script?
Code:
xargs rm < /tmp/file.list 2> /tmp/error.txt
nawk -F: 'NR==FNR{d[$1]++;next} !($0 in d)' /tmp/error.txt /tmp/file.list > /tmp/removed.list

Thanks

Last edited by Franklin52; 08-21-2012 at 05:28 AM.. Reason: Please use code tags for data and code samples
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script to compare files in 2 folders and delete the large file

Hello, my first thread here. I've been searching and fiddling around for about a week and I cannot find a solution.:confused: I have been converting all of my home videos to HEVC and sometimes the files end up smaller and sometimes they don't. I am currently comparing all the video files... (5 Replies)
Discussion started by: Josh52180
5 Replies

2. Shell Programming and Scripting

If no input then set directory to current

Hi GUys I have a script where i am counting the number of files that the user has read permissions to in a folder . They are asked for this folder at the start if they dont enter anything then i want to be able to do the search on the current directory, can you help me with that part please... (25 Replies)
Discussion started by: johnnybananas
25 Replies

3. Shell Programming and Scripting

Script needed to delete to the list of files in a directory based on last created & delete them

Hi My directory structure is as below. dir1, dir2, dir3 I have the list of files to be deleted in the below path as below. /staging/retain_for_2years/Cleanup/log $ ls -lrt total 0 drwxr-xr-x 2 nobody nobody 256 Mar 01 16:15 01-MAR-2015_SPDBS2 drwxr-xr-x 2 root ... (2 Replies)
Discussion started by: prasadn
2 Replies

4. Shell Programming and Scripting

Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file - I have a single text file in the form of : <NAME>house........ SOMETEXT SOMETEXT SOMETEXT . . . . </script> MORETEXT MORETEXT . . . (6 Replies)
Discussion started by: sumguy
6 Replies

5. Shell Programming and Scripting

Help needed - Split large file into smaller files based on pattern match

Help needed urgently please. I have a large file - a few hundred thousand lines. Sample CP START ACCOUNT 1234556 name 1 CP END ACCOUNT CP START ACCOUNT 2224444 name 1 CP END ACCOUNT CP START ACCOUNT 333344444 name 1 CP END ACCOUNT I need to split this file each time "CP START... (7 Replies)
Discussion started by: frustrated1
7 Replies

6. Shell Programming and Scripting

Script to delete files with an input for directories and an input for path/file

Hello, I'm trying to figure out how best to approach this script, and I have very little experience, so I could use all the help I can get. :wall: I regularly need to delete files from many directories. A file with the same name may exist any number of times in different subdirectories.... (3 Replies)
Discussion started by: *ShadowCat*
3 Replies

7. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345... (19 Replies)
Discussion started by: jimmy12
19 Replies

8. Shell Programming and Scripting

Split a file into multiple files based on the input pattern

I have a file with lines something like. ...... 123_start ...... ....... 123_end .... ..... 456_start ...... ..... 456_end .... ..... 789_start .... .... 789_end (6 Replies)
Discussion started by: abinash
6 Replies

9. Shell Programming and Scripting

Sheel script to Delete a set of files from a given directory

I have a file <filestodelete> containing names of files to to be deleted from a directory <filesstore>. I want a script file which accptes the <filestodelete> and also the location of the directory(<filestore>) and deletes all files matching. Thanks in Advance.. (3 Replies)
Discussion started by: VardhiniVenkat
3 Replies

10. Shell Programming and Scripting

i want to delete a file based on existing file in a directory

hi i am having four files in a directory.like 1)sampleRej 2)exampleRej 3)samplemain 4)examplemain my requirement is i have to search for the rejected files (sampleRej,exampleRej) in a directory.if these files in that directory then i have to delete the main files... (3 Replies)
Discussion started by: srivsn
3 Replies
Login or Register to Ask a Question