Help with parsing mailbox folder list (identify similar folders)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with parsing mailbox folder list (identify similar folders)
# 1  
Old 01-21-2011
Power Help with parsing mailbox folder list (identify similar folders)

List sample:
Code:
user/xxx/Archives/2010 
user/xxx/BLARG 
user/xxx/BlArG 
user/xxx/Burton 
user/xxx/DAY 
user/yyy/Trainees/Nutrition interns 
user/yyy/Trainees/Primary Care 
user/yyy/Trainees/Psychiatric NP interns 
user/yyy/Trainees/Psychiatric residents 
user/yyy/Trainees/Psychology externs 
user/yyy/Trainees/psychology eXterns
user/zzz/Goose/moose
user/zzz/Goose/mouse
user/zzz/Goose/Moose
user/zzz/Goose/Moose/goose
user/zzz/Goose/Moose/Goose
user/aaa/Boo
user/aaa/Bah/boo
user/aaa/boo/boo
user/aaa/boo/boO
user/aaa/bOo/boo
user/bbb/Zoo
user/bbb/Zoo/boo
user/bbb/ooo/boo
user/bbb/ooo/bOo

I'm helping to migrate a mail server from a case sensitive folder name space to a case insensitive one.
The case sensitive space was able to accommodate folders like "test", "TEST" and "Test" as different folders,
the new system will only allow one of these (on the same lever per user).

The current system will _not_ allow the same level of folder to have an identical name - that the output below appears to show the opposite e.g. Trainees,
means Trainees is a parent folder:
Code:
user/yyy/Trainees/Nutrition interns 
user/yyy/Trainees/Primary Care 
user/yyy/Trainees/Psychiatric NP interns 
user/yyy/Trainees/Psychiatric residents 
user/yyy/Trainees/Psychology externs 
user/yyy/Trainees/psychology eXterns

given this behavior, I don't care about Trainees being the same - that it is (identical, and on the same level) indicates it's a parent folder.

but I do care about Psychology externs and psychology eXterns given their shared parent.

In this example:
Code:
user/zzz/Goose/moose
user/zzz/Goose/mouse
user/zzz/Goose/Moose
user/zzz/Goose/Moose/goose
user/zzz/Goose/Moose/Goose

I don't care about the first "/Goose/" - it's a parent with children,
but "moose" I do care about because it's in the same container ("/Goose/") as "/Moose/" - so the new system will not allow this.

Similarly, I care about "Goose/Moose/goose" and "Goose/Moose/Goose" because "goose" and "Goose" are in the same "Goose/Moose/" container,
and again this is unacceptable to the new system.

For each user, I'd like to identify the folder path levels that are identical except in case - e.g.
Code:
user/xxx/BLARG* 
user/xxx/BlArG*

user/yyy/Trainees/Psychology externs* 
user/yyy/Trainees/psychology eXterns*

user/zzz/Goose/moose*
user/zzz/Goose/Moose*
user/zzz/Goose/Moose/goose*
user/zzz/Goose/Moose/Goose*

user/aaa/Boo*
user/aaa/boo/boo*
user/aaa/boo/boO*
user/aaa/bOo*/boo

user/bbb/ooo/boo*
user/bbb/ooo/bOo*

Again, I don't care about folder paths that are completely identical (case included) as this will indicate it's a parent folder.

Any ideas or working pseudo code?

Thanks for any info. And I hope this was clear and I didn't miss any edge cases.

Bill Smilie

Last edited by Scott; 01-23-2011 at 11:53 AM.. Reason: Code tags
# 2  
Old 01-23-2011
It's not clear what you are asking for.
Are you asking for a mapping strategy?

I would encourage users to rename everything that differs only by case themselves, and adopt a straightforward rule for those that ignore you. Maybe something like this:

* Keep everything the same until there is a conflict
* Resolve the first conflicting name by adding a trailing underscore
* Add a trailing digit after the underscore if there are multiple conflicts
# 3  
Old 01-23-2011
One way of finding all the problem directories:

We first create a list of all relevant directories.
Then extract all case-significant duplicates and re-search the original list for case-insignificant matches.
Reasonably efficient approach for large numbers of directories and a moderate numbers of case-significant duplicates.

Code:
find /parent_directory/ -follow -type d -print | sort >/tmp/myworkfile1
cat /tmp/myworkfile1 | tr '[:upper:]' '[:lower:]' | sort | uniq -d >/tmp/myworkfile2
cat /tmp/myworkfile2 | while read dir
do
        grep -ix "${dir}" /tmp/myworkfile1
done
rm -f /tmp/myworkfile1
rm -f /tmp/myworkfile2


Footnote:
It always helps to know what Operating System and version you have and what Shell you prefer.
The code posted should work with most versions of unix or Linux with Bourne-like Shell (sh, bash, ksh etc.).

For the benefit of the "UUOC" police, I prefer left-to-right processing and have yet to find anything faster than "cat" for placing text records on a pipeline.

Last edited by methyl; 01-23-2011 at 05:13 PM.. Reason: Layout
This User Gave Thanks to methyl For This Post:
# 4  
Old 01-25-2011
Thanks!

I use Bash on Solaris 10, OS X 10.6, or Red Hat Enterprise Linux 5 - if necessary.

tr '[:upper:]' '[:lower:]' | sort | uniq

was what I needed - from there it was pretty clear which were the duplicates.

Your UOC is fine by me - definitely not an egregious case Smilie

Thanks again!

Bill

I think the bit about identical parent folder paths was unnecessary and confusing - apologies - the paths still need to be unique and that is determined by their entire length, whatever their duplicate column paths may be.

Last edited by spacegoose; 01-25-2011 at 08:22 PM..
# 5  
Old 01-25-2011
Glad the code works.
I had a comparable problem some years ago when consolidating multiple smaller servers into one large server where many users had accounts on more than one of the original computers ... and were not consistent in the upper/lower case naming of their directories.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Answers to Frequently Asked Questions

Why Parsing Can't be Done With sed ( or similar tools)

Regularly we have questions like: i have an XML (C, C++, ...) file with this or that property and i want to extract the content of this or that tag (function, ...). How do i do it in sed? Yes, in some (very limited) cases this is possible, but in general this can't be done. That is: you can do... (0 Replies)
Discussion started by: bakunin
0 Replies

2. Red Hat

Identify the folder is part of which mount point

Dear, I am using Redhat 6.6 . How to identify a given directory is part of which mount point. (2 Replies)
Discussion started by: aneesha
2 Replies

3. Shell Programming and Scripting

Copying files from various folders to similar folder structure in another location

Hi, I need to write a script the has to copy the files from folders and subfolders to the same folder structure located in another location. Ex: mainfolder1 file1,file2,file3 subfolder1(file1,etc) subfolder2(file1,etc) to another folder location of same folder structure. rsync is not... (7 Replies)
Discussion started by: Raji Perumal
7 Replies

4. Shell Programming and Scripting

Mv series out of mixed folder & identify substring

Dear unix-Community, great to be here! Actually i try to build a script to sort out my serials into an series-folder. Reason is: plex cant handle mixed folder filled with other stuff than series only. First shot was ls in combination with grep and regex. Got no positiv result. Then i... (3 Replies)
Discussion started by: Zack
3 Replies

5. Windows & DOS: Issues & Discussions

Issue: Unzipping file containing files/folders with a similar name

Hi, I have a zip file created on a Linxux server that I need to extract on a Windows machine... The zip file containing folders with the same name but they each have a different case, one if camel case and the other is just capitalised. When I extract using 7zip, I get prompted if I want to... (3 Replies)
Discussion started by: muay_tb
3 Replies

6. Shell Programming and Scripting

Script to Identify if folder has a file in it

Hi everyone I am new to the forums. I haven't done much linux myself but I have been asked if I can do the following. Write a linux script that needs to scan a certain folder every x amount of minutes and if there is a file in the folder then it needs to call a different script. Is this... (2 Replies)
Discussion started by: Bosbaba
2 Replies

7. Shell Programming and Scripting

Script to move files with similar names to folder

I have in directory /media/AUDIO/WAVE many .mp3 files with names like: my filename_01of02.mp3 my filename_02of02.mp3 Your File_01of06.mp3 Your File_02of06.mp3 etc.... In the same directory, /media/AUDIO/WAVE, I have many folders with names like 9780743579490 9780743579491 etc.. Inside... (7 Replies)
Discussion started by: glev2005
7 Replies

8. Shell Programming and Scripting

IMAPSYNC - trouble to create mailbox folder structure

Hi, I have installed ImapSync on Linux Debian. I tried run command to copy from Server A to Server B. It's run but imapsync doesn't create mailbox folder structure. I don't know if there is a command to force creation of mailbox's folders and subfolder. My command is below imapsync --host1... (0 Replies)
Discussion started by: symonx80
0 Replies

9. Shell Programming and Scripting

parsing file names and then grouping similar files

Hello Friends, I have .tar files which exists under different directories after the below code is run: find . -name "*" -type f -print | grep .tar > tmp.txt cat tmp.txt ./dir1/subdir1/subdir2/database-db1_28112009.tar ./dir2/subdir3/database-db2_28112009.tar... (2 Replies)
Discussion started by: EAGL€
2 Replies
Login or Register to Ask a Question