To Delete the duplicates using Part of File Name


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting To Delete the duplicates using Part of File Name
# 1  
Old 01-22-2018
To Delete the duplicates using Part of File Name

I am using the below script to delete duplicate files but it is not working for directories with more than 10k files "Argument is too long" is getting for ls -t. Tried to replace ls -t with

Code:
find . -type f \( -iname "*.xml" \) -printf '%T@ %p\n' | sort -rg | sed -r 's/[^ ]* //' | awk 'BEGIN{FS="_"}{if (++dup[$1] >= 2) print}'`;

but not getting same outcome as ls -t and my logic also not working.
Code:
#!/bin/bash
for i in `ls -t *xml|awk 'BEGIN{FS="_"}{if (++dup[$1] >= 2) print}'`;
do
rm $i 
done

File names like

Code:
AECZ00205_010917_1506689024063.xml
AECZ00205_010917_1506689024064.xml
AECZ00205_010917_1506689024066.xml [Latest]
AECZ00207_010917_1506690865368.xml
AECZ00207_010917_1506690865369.xml
AECZ00207_010917_1506690865364.xml [Latest]
AECZ00209_010917_1506707811518.xml
AECZ00209_010917_1506707811519.xml
AECZ00209_010917_1506707811529.xml [Latest]

This User Gave Thanks to gold2k8 For This Post:
# 2  
Old 01-22-2018
Untested, but should be close to what you need. If the list of rm commands produced by the following looks correct, remove the echo and run it again to actually remove the files:
Code:
#!/bin/bash
ls -t | awk -F_ '/xml$/ && ++dup[$1] >= 2' | while IFS= read -r i
do
	echo rm $i 
done

One could also try:
Code:
#!/bin/bash
ls -t | awk -F_ '/xml$/{if($1 in dup) print; else dup[$1]}' | while IFS= read -r i
do
	echo rm $i 
done

which, with lots of files, consumes a little bit less memory.

Last edited by Don Cragun; 01-22-2018 at 09:44 PM.. Reason: Add second alternative.
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 01-22-2018
Code:
find . -type f \( -iname "*.xml" \) -printf '%T@ %p\n' | sort -rg | sed -r 's/[^ ]* //' | awk '{w=$0; sub(".*/", "", w); sub("_[0-9_][0-9_]*.*", "", w);} !a[w]++'

This User Gave Thanks to rdrtx1 For This Post:
# 4  
Old 01-22-2018
Quote:
Originally Posted by rdrtx1
Code:
find . -type f \( -iname "*.xml" \) -printf '%T@ %p\n' | sort -rg | sed -r 's/[^ ]* //' | awk '{w=$0; sub(".*/", "", w); sub("_[0-9_][0-9_]*.*", "", w);} !a[w]++'

How to delete the duplicates using this?
# 5  
Old 01-22-2018
Lke Don Cragun stated, be careful before you run rm:
Code:
find . -type f \( -iname "*.xml" \) -printf '%T@ %p\n' |
   sort -rg |
   sed -r 's/[^ ]* //' |
   awk '{w=$0; sub(".*/", "", w); sub("_[0-9_][0-9_]*.*", "", w);} a[w]++' | while read f
   do
      echo "rm -f $f" 
   done > rm_file

Verify files that actually need to be deleted then run:
Code:
sh rm_file

The above post was to identify the latest (the files you want to keep?) as shown in case you just wanted to do a move or copy of files to a new directory and keep all data.
This User Gave Thanks to rdrtx1 For This Post:
# 6  
Old 01-23-2018
It working perfectly fine. Can I know what that AWK does please explain i have to brief my coworker
# 7  
Old 01-23-2018
Quote:
Originally Posted by gold2k8
It working perfectly fine. Can I know what that AWK does please explain i have to brief my coworker
Which one? One of the two in post #2 that only processes xml files in the current directory using ls -t? Or the one in post #5 that uses find, sort, and sed to process all .xml files in the entire file hierarchy rooted in the current directory?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete only if duplicates found in each record

Hi, i have another problem. I have been trying to solve it by myself but failed. inputfile ;; ID T08578 NAME T08578 SBASE 30696 EBASE 32083 TYPE P func just test func chronology func cholesterol func null INT 30765-37333 INT 37154-37318 Link 5546 Link 8142 (4 Replies)
Discussion started by: redse171
4 Replies

2. Shell Programming and Scripting

Delete duplicates in CA bundle

I do have a big CA bundle certificate file and each time if i get request to add new certificate to the existing bundle i need to make sure it is not present already. How i can validate the duplicates. The alignment of the certificate within the bundle seems to be different. Example: Cert 1... (7 Replies)
Discussion started by: diva_thilak
7 Replies

3. Shell Programming and Scripting

delete from line and remove duplicates

My Input.....file1 ABCDE4435 Connected to 107.71.136.122 (SubNetwork=ONRM_RootMo_R SubNetwork=XYVLTN29CRBR99 MeContext=ABCDE4435 ManagedElement=1) ABCDE4478 Connected to 166.208.30.57 (SubNetwork=ONRM_RootMo_R SubNetwork=KLFMTN29CR0R04 MeContext=ABCDE4478 ManagedElement=1) ABCDE4478... (5 Replies)
Discussion started by: pareshkp
5 Replies

4. Shell Programming and Scripting

delete part of file from pattern1 to pattern2

Hi all! How can I delete all the text starting from <string1> to <string2> in all the .txt files of the folder "FOLDER" ? Thanks a lot! mjomba ... </s> <s> <w></w> </s> <s> ... to get: (1 Reply)
Discussion started by: mjomba
1 Replies

5. Shell Programming and Scripting

Delete duplicates via script?

Hello, i have the following problem: there are two folders with a lot of files. Example: FolderA contains AAA, BBB, CCC FolderB contains DDD, EEE, AAA How can i via script identify AAA as duplicate in Folder B and delete it there? So that only DDD and EEE remain, in Folder B? Thank you... (16 Replies)
Discussion started by: Y-T
16 Replies

6. Shell Programming and Scripting

how can I delete duplicates in the log?

I have a log file and I am trying to run a script against it to search for key issues such as invalid users, errors etc. In one part, I grep for session closed and get a lot of the same thing,, ie. root username etc. I want to remove the multiple root and just have it do a count, like wc -l ... (5 Replies)
Discussion started by: taekwondo
5 Replies

7. Shell Programming and Scripting

How can i delete the duplicates based on one column of a line

I have my data something like this (08/03/2009 22:57:42.414)(:) king aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbb (08/03/2009 22:57:42.416)(:) John cccccccccccc cccccvssssssssss baaaaa (08/03/2009 22:57:42.417)(:) Michael ddddddd tststststtststts (08/03/2009 22:57:42.425)(:) Ravi... (11 Replies)
Discussion started by: rdhanek
11 Replies

8. Shell Programming and Scripting

How to delete lines in a file that have duplicates or derive the lines that aper once

Input: a b b c d d I need: a c I know how to get this (the lines that have duplicates) : b d sort file | uniq -d But i need opossite of this. I have searched the forum and other places as well, but have found solution for everything except this variant of the problem. (3 Replies)
Discussion started by: necroman08
3 Replies

9. Shell Programming and Scripting

An interactive way to delete duplicates

1)I am trying to write a script that works interactively lists duplicated records on certain field/column and asks user to delete one or more. And finally it deletes all the records the used has asked for. I have an idea to store those line numbers in an array, not sure how to do this in... (3 Replies)
Discussion started by: chvs2000
3 Replies

10. Shell Programming and Scripting

How to delete part of a file?

Suppose i have a file which contains 10 lines.... i have to delete a line which starts with the word "BEGIN" and delete the consecutive lines till i find a start of line with the word "END" how to do this? (6 Replies)
Discussion started by: brkavi_in
6 Replies
Login or Register to Ask a Question