07-07-2009
Remove duplicate files based on text string?
Hi
I have been struggling with a script for removing duplicate messages from a shared mailbox.
I would like to search for duplicate messages based on the “Message-ID” string within the messages files.
I have managed to find the duplicate “Message-ID” strings and (if I would like) delete the files in which they where found.
My problem is who to preserve one of each file.
My script so far:
--------------------
#!/bin/tcsh
set dir=/my/maildir
foreach file (`grep -h "Message-ID: <" $dir/* | uniq -d |xargs -i \grep -l "{}" $dir/*`)
rm -f "$file"
end
--------------------
Any ideas?
Thanks // Tomas
---------- Post updated at 06:02 PM ---------- Previous update was at 10:18 AM ----------
Fyi, solved
-------------------
#!/bin/tcsh
set maildir=/my/maildir
foreach dupstring ("`grep -m 1 -h -R "^Message-ID:" $maildir/ | sort | uniq -d`")
grep -l -R "$dupstring" $maildir/ |sed 1d |xargs -i \rm -f "{}"
end
-------------------
// Tomas
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hii Friends.. I have a huge set of data stored in a file.Which is as shown below
a.dat:
RAO 1869 12 19 0 0 0.00 17.9000 82.3000 10.0 0 0.00 0 3.70 0.00 0.00 0 0.00 3.70 4 NULL
LEE 1870 4 11 1 0 0.00 30.0000 99.0000 0.0 0 0.00 0 0.00 0.00 0.00 0 ... (3 Replies)
Discussion started by: reva
3 Replies
2. Shell Programming and Scripting
Hi,
How can I remove duplicates from a file based on group on other column? for example:
Test1|Test2|Test3|Test4|Test5
Test1|Test6|Test7|Test8|Test5
Test1|Test9|Test10|Test11|Test12
Test1|Test13|Test14|Test15|Test16
Test17|Test18|Test19|Test20|Test21
Test17|Test22|Test23|Test24|Test5
... (2 Replies)
Discussion started by: yale_work
2 Replies
3. Shell Programming and Scripting
Hi All,
i have input file like below...
CA009156;20091003;M;AWBKCA72;123;;CANADIAN WESTERN BANK;EDMONTON;;2300, 10303, JASPER AVENUE;;T5J 3X6;;
CA009156;20091003;M;AWBKCA72;321;;CANADIAN WESTERN BANK;EDMONTON;;2300, 10303, JASPER AVENUE;;T5J 3X6;;
CA009156;20091003;M;AWBKCA72;231;;CANADIAN... (2 Replies)
Discussion started by: mohan sharma
2 Replies
4. Shell Programming and Scripting
Hi ,
Some time i got duplicated value in my files ,
bundle_identifier= B
Sometext=ABC
bundle_identifier= A
bundle_unit=500
Sometext123=ABCD
bundle_unit=400
i need to check if there is a duplicated values or not if yes , i need to check if the value is A or B when Bundle_Identified ,... (2 Replies)
Discussion started by: OTNA
2 Replies
5. Shell Programming and Scripting
I have file like this:
chr start end
chr15 99874874 99875874 chr15 99875173 99876173 aa1
chr15 99874923 99875923 chr15 99875173 99876173 aa1
chr15 99874962 99875962 chr15 99875173 99876173 aa1
chr1 ... (7 Replies)
Discussion started by: raj_k
7 Replies
6. Shell Programming and Scripting
Hi Perl users,
I have another problem with text processing in Perl. I have a file below:
Linux Unix Linux Windows SUN
MACOS SUN SUN HP-AUX
I want the result below:
Unix Windows SUN
MACOS HP-AUX
so the duplicate string will be removed and also the keyword of the string on... (2 Replies)
Discussion started by: askari
2 Replies
7. Shell Programming and Scripting
Dear members, I need to filter a file based on the 8th column (that is id), and does not mather the other columns, because I want just one id (1 line of each id) and remove the duplicates lines based on this id (8th column), and does not matter wich duplicate will be removed.
example of my file... (3 Replies)
Discussion started by: clarissab
3 Replies
8. Windows & DOS: Issues & Discussions
So, I have text files,
one "fail.txt"
And one
"color.txt"
I now want to use a command line (DOS) to remove ANY line that is PRESENT IN BOTH from each text file.
Afterwards there shall be no duplicate lines. (1 Reply)
Discussion started by: pasc
1 Replies
9. Shell Programming and Scripting
Dear community,
I have to remove duplicate lines from a file contains a very big ammount of rows (milions?) based on 1st and 3rd columns
The data are like this:
Region 23/11/2014 09:11:36 41752
Medio 23/11/2014 03:11:38 4132
Info 23/11/2014 05:11:09 4323... (2 Replies)
Discussion started by: Lord Spectre
2 Replies
10. Shell Programming and Scripting
Hi,
I have a file with many sections in it. Each section is separated by a blank line.
The first line of each section would determine if the section is duplicate or not.
if the section is duplicate then remove the entire section from the file.
below is the example of input and output.... (5 Replies)
Discussion started by: ahmedwaseem2000
5 Replies
LEARN ABOUT DEBIAN
cyr_expire
CYR_EXPIRE(8) System Manager's Manual CYR_EXPIRE(8)
*
NAME
cyr_expire - expire messages and duplicate delivery database entries
SYNOPSIS
cyr_expire [ -C config-file ] [ -D delete-days ] -E expire-duration [ -X expunge-days ] [ -p mailbox-prefix ] [ -v ]
DESCRIPTION
Cyr_expire is used to expire messages and duplicate delivery database entries. Cyr_expire also cleanses mailboxes of partially expunged
messages (when using the "delayed" expunge mode). The expiration of messages is controlled by the /vendor/cmu/cyrus-imapd/expire mailbox
annotation which specifies the age (in days) of messages in the given mailbox that should be deleted. Any duplicate delivery database
entries which correspond to the mailbox are also deleted at the same frequency.
The value of the /vendor/cmu/cyrus-imapd/expire annotation is inherited by all children of the given mailbox, so an entire mailbox tree can
be expired by seting a single annotation on the root of that tree. If a mailbox does not have a /vendor/cmu/cyrus-imapd/expire annotation
set on it (or does not inherit one), then no messages are expired from the mailbox.
Cyr_expire reads its configuration options out of the imapd.conf(5) file unless specified otherwise by -C.
OPTIONS
-C config-file
Read configuration options from config-file.
-D delete-duration
Remove previously deleted mailboxes older than delete-duration (when using the "delayed" delete mode). The value can be a floating
point number, and may have a suffix to specify the unit of time. If no suffix, the value is number of days. Valid suffixes are d
(days), h (hours), m (minutes) and s (seconds).
-E expire-duration
Prune the duplicate database of entries older than expire-duration. This value is only used for entries which do not have a corre-
sponding /vendor/cmu/cyrus-imapd/expire mailbox annotation. Format is the same as delete-duration.
-X expunge-duration
Expunge previously deleted messages older than expunge-duration (when using the "delayed" expunge mode). Format is the same as
delete-duration.
-x Do not expunge messages even if using delayed expunge mode (reduces the IO hit considerably, allowing you to run cyr_expire fre-
quently to clean up the duplicate database without overloading your server)
-p mailbox-prefix
Only find mailboxes starting with this prefix. e.g. "user.justgotspammedlots"
-v Enable verbose output.
-a Skip the annotation lookup, so all /vendor/cmu/cyrus-imapd/expire annotations are ignored entirely. It behaves as if they were not
set, so only expire-days is considered for all mailboxes.
FILES
/etc/imapd.conf
SEE ALSO
imapd.conf(5), cyrmaster(8)
CMU
Project Cyrus CYR_EXPIRE(8)