Sponsored Content
Top Forums Shell Programming and Scripting Remove duplicate files based on text string? Post 302331720 by spangberg on Tuesday 7th of July 2009 12:02:02 PM
Old 07-07-2009
Remove duplicate files based on text string?

Hi

I have been struggling with a script for removing duplicate messages from a shared mailbox.
I would like to search for duplicate messages based on the “Message-ID” string within the messages files.

I have managed to find the duplicate “Message-ID” strings and (if I would like) delete the files in which they where found.
My problem is who to preserve one of each file.

My script so far:

--------------------
#!/bin/tcsh
set dir=/my/maildir

foreach file (`grep -h "Message-ID: <" $dir/* | uniq -d |xargs -i \grep -l "{}" $dir/*`)

rm -f "$file"

end

--------------------

Any ideas?

Thanks // Tomas

---------- Post updated at 06:02 PM ---------- Previous update was at 10:18 AM ----------

Fyi, solved
-------------------
#!/bin/tcsh
set maildir=/my/maildir
foreach dupstring ("`grep -m 1 -h -R "^Message-ID:" $maildir/ | sort | uniq -d`")
grep -l -R "$dupstring" $maildir/ |sed 1d |xargs -i \rm -f "{}"
end
-------------------

// Tomas
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to get remove duplicate of a file based on many conditions

Hii Friends.. I have a huge set of data stored in a file.Which is as shown below a.dat: RAO 1869 12 19 0 0 0.00 17.9000 82.3000 10.0 0 0.00 0 3.70 0.00 0.00 0 0.00 3.70 4 NULL LEE 1870 4 11 1 0 0.00 30.0000 99.0000 0.0 0 0.00 0 0.00 0.00 0.00 0 ... (3 Replies)
Discussion started by: reva
3 Replies

2. Shell Programming and Scripting

Remove duplicate based on Group

Hi, How can I remove duplicates from a file based on group on other column? for example: Test1|Test2|Test3|Test4|Test5 Test1|Test6|Test7|Test8|Test5 Test1|Test9|Test10|Test11|Test12 Test1|Test13|Test14|Test15|Test16 Test17|Test18|Test19|Test20|Test21 Test17|Test22|Test23|Test24|Test5 ... (2 Replies)
Discussion started by: yale_work
2 Replies

3. Shell Programming and Scripting

Remove duplicate value based on two field $4 and $5

Hi All, i have input file like below... CA009156;20091003;M;AWBKCA72;123;;CANADIAN WESTERN BANK;EDMONTON;;2300, 10303, JASPER AVENUE;;T5J 3X6;; CA009156;20091003;M;AWBKCA72;321;;CANADIAN WESTERN BANK;EDMONTON;;2300, 10303, JASPER AVENUE;;T5J 3X6;; CA009156;20091003;M;AWBKCA72;231;;CANADIAN... (2 Replies)
Discussion started by: mohan sharma
2 Replies

4. Shell Programming and Scripting

How To Remove Duplicate Based on the Value?

Hi , Some time i got duplicated value in my files , bundle_identifier= B Sometext=ABC bundle_identifier= A bundle_unit=500 Sometext123=ABCD bundle_unit=400 i need to check if there is a duplicated values or not if yes , i need to check if the value is A or B when Bundle_Identified ,... (2 Replies)
Discussion started by: OTNA
2 Replies

5. Shell Programming and Scripting

Remove duplicate entries based on the range

I have file like this: chr start end chr15 99874874 99875874 chr15 99875173 99876173 aa1 chr15 99874923 99875923 chr15 99875173 99876173 aa1 chr15 99874962 99875962 chr15 99875173 99876173 aa1 chr1 ... (7 Replies)
Discussion started by: raj_k
7 Replies

6. Shell Programming and Scripting

Remove not only the duplicate string but also the keyword of the string in Perl

Hi Perl users, I have another problem with text processing in Perl. I have a file below: Linux Unix Linux Windows SUN MACOS SUN SUN HP-AUX I want the result below: Unix Windows SUN MACOS HP-AUX so the duplicate string will be removed and also the keyword of the string on... (2 Replies)
Discussion started by: askari
2 Replies

7. Shell Programming and Scripting

Remove duplicate rows based on one column

Dear members, I need to filter a file based on the 8th column (that is id), and does not mather the other columns, because I want just one id (1 line of each id) and remove the duplicates lines based on this id (8th column), and does not matter wich duplicate will be removed. example of my file... (3 Replies)
Discussion started by: clarissab
3 Replies

8. Windows & DOS: Issues & Discussions

Remove duplicate lines from text files.

So, I have text files, one "fail.txt" And one "color.txt" I now want to use a command line (DOS) to remove ANY line that is PRESENT IN BOTH from each text file. Afterwards there shall be no duplicate lines. (1 Reply)
Discussion started by: pasc
1 Replies

9. Shell Programming and Scripting

Remove duplicate lines from file based on fields

Dear community, I have to remove duplicate lines from a file contains a very big ammount of rows (milions?) based on 1st and 3rd columns The data are like this: Region 23/11/2014 09:11:36 41752 Medio 23/11/2014 03:11:38 4132 Info 23/11/2014 05:11:09 4323... (2 Replies)
Discussion started by: Lord Spectre
2 Replies

10. Shell Programming and Scripting

Remove sections based on duplicate first line

Hi, I have a file with many sections in it. Each section is separated by a blank line. The first line of each section would determine if the section is duplicate or not. if the section is duplicate then remove the entire section from the file. below is the example of input and output.... (5 Replies)
Discussion started by: ahmedwaseem2000
5 Replies
CYR_EXPIRE(8)						      System Manager's Manual						     CYR_EXPIRE(8)

 *

NAME
cyr_expire - expire messages and duplicate delivery database entries SYNOPSIS
cyr_expire [ -C config-file ] [ -D delete-days ] -E expire-duration [ -X expunge-days ] [ -p mailbox-prefix ] [ -v ] DESCRIPTION
Cyr_expire is used to expire messages and duplicate delivery database entries. Cyr_expire also cleanses mailboxes of partially expunged messages (when using the "delayed" expunge mode). The expiration of messages is controlled by the /vendor/cmu/cyrus-imapd/expire mailbox annotation which specifies the age (in days) of messages in the given mailbox that should be deleted. Any duplicate delivery database entries which correspond to the mailbox are also deleted at the same frequency. The value of the /vendor/cmu/cyrus-imapd/expire annotation is inherited by all children of the given mailbox, so an entire mailbox tree can be expired by seting a single annotation on the root of that tree. If a mailbox does not have a /vendor/cmu/cyrus-imapd/expire annotation set on it (or does not inherit one), then no messages are expired from the mailbox. Cyr_expire reads its configuration options out of the imapd.conf(5) file unless specified otherwise by -C. OPTIONS
-C config-file Read configuration options from config-file. -D delete-duration Remove previously deleted mailboxes older than delete-duration (when using the "delayed" delete mode). The value can be a floating point number, and may have a suffix to specify the unit of time. If no suffix, the value is number of days. Valid suffixes are d (days), h (hours), m (minutes) and s (seconds). -E expire-duration Prune the duplicate database of entries older than expire-duration. This value is only used for entries which do not have a corre- sponding /vendor/cmu/cyrus-imapd/expire mailbox annotation. Format is the same as delete-duration. -X expunge-duration Expunge previously deleted messages older than expunge-duration (when using the "delayed" expunge mode). Format is the same as delete-duration. -x Do not expunge messages even if using delayed expunge mode (reduces the IO hit considerably, allowing you to run cyr_expire fre- quently to clean up the duplicate database without overloading your server) -p mailbox-prefix Only find mailboxes starting with this prefix. e.g. "user.justgotspammedlots" -v Enable verbose output. -a Skip the annotation lookup, so all /vendor/cmu/cyrus-imapd/expire annotations are ignored entirely. It behaves as if they were not set, so only expire-days is considered for all mailboxes. FILES
/etc/imapd.conf SEE ALSO
imapd.conf(5), cyrmaster(8) CMU
Project Cyrus CYR_EXPIRE(8)
All times are GMT -4. The time now is 11:08 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy