convert email headers' encoding?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting convert email headers' encoding?
# 1  
Old 12-08-2008
convert email headers' encoding?

hi all -

first, huge thanks to anyone who might be able to help me out with this. it's fairly esoteric, but it seems like there has to be an answer for me...

* the environment:

mac os x 10.5.x server
communigate pro (mail server)
bash script (read on)

* the brief:

my script is meant to parse a spam folder; it puts together a nicely-formatted summary email of all messages that have arrived in the past 24 hours, showing only the From: and Subject: lines. mechanically speaking, it works great.

* the problem:

encodings. some character sets (russian/cyrillic; japanese; presumably chinese) break my script pretty badly - a mailer will display them properly in the From or Subject line, but in the body of my email, it just shows them as garbage, i assume because my emails are using another character set. for example:

Subject: =?koi8-r?B?UmU6IMvVxMEg0M/FxMnNIM/UxNnIwdTYPw==?=

the script is smart enough to find the encoding and run the whole message through iconv - but that doesn't seem to help with the header lines, only the email body. which is ignored by the script, so...yeah.

* the question:

does anyone know of a way to properly convert these header lines, ideally into something like utf-8? alternatively, would it help if i specified some text encoding in the summary email itself instead?

for what it's worth, when the lines are displayed in the summaries, i've stripped out the Subject: and From: part, leaving only the actual subject and from text in place. in case that matters...

thanks for reading,
-john.
# 2  
Old 12-09-2008
When I suffered from char issues, HPUX using roman8, I used a .mailrc file with this inside:
set crt=21
set encoding=8bit
set charset=iso-8859-1
#

it would be worth investigating ?
# 3  
Old 12-09-2008
hi vbe -

i'll check that out. in the meantime, i tried changing the charset in the emails themselves from us-ascii to utf-8 (which i think would accomplish pretty much the same thing), with no effect.

i also realized that i could've provided a little more info - sorry, folks. the accounts all have .mdir mailboxes (as opposed to .mbox) - so each message is its own rfc 822-compliant textfile. that means the script is plowing through sometimes hundreds of files per account, and pulling only what it needs (in this case, from, subject, and a couple of other things that are irrelevant to this problem).

for each message, it takes that info and writes it all to one line in a temp file, then moves on to the next. when it's processed all the messages for that account, it reads back the file it just finished writing (which consists of the from & subject lines plus that other info, like a from name and its spam score), one line at a time, and clunks those bits of info into the body of the summary email.

i guess it's a little more complicated than i remembered - but again, the mechanics are working fine; it's just this charset thing that's broken.

thanks again to anyone with a tip,
-john.
# 4  
Old 12-09-2008
It's no wonder switching to UTF-8 "doesn't work", because email messages must be composed of entirely ASCII and anything else must be encoded. UTF-8 is of no exception to this rule (but still, I think using UTF-8 is better than other legacy encodings - it just doesn't relate to your issue).

The subject header you quoted has been encoded as required by MIME. You can refer to additional information in the RFC 2047 itself:

http://www.rfc-editor.org/rfc/rfc2047.txt

I don't think you can easily find a shell script that does MIME decoding for you. Even with Perl, a set of custom modules would be needed to be installed to parse all that properly. If you are willing to use PHP for this parsing, it is likely the easiest route because support is builtin, and you save a lot of module installation. As an example, parsing the sample you quoted:

Code:
<?php

// Actually in PHP 5, iconv_mime_decode() is the easiest way.

// Assume base64 encoding
$array = array();
$mstring = '=?koi8-r?B?UmU6IMvVxMEg0M/FxMnNIM/UxNnIwdTYPw==?=';
preg_match('/^=\?(.+)\?B\?(.+?)\?=$/', $mstring, $array);
list(, $charset, $encoded) = $array;
$str = base64_decode($encoded);
echo iconv($charset, "UTF-8", $str);

?>

So on my terminal, I got

Code:
Re: куда поедим отдыхать?

Not sure what it is, but it looks properly decoded.
# 5  
Old 12-10-2008
well, then...time to learn some php!

i'll see if i can't roll your code into something that works in my environment.

thanks for your help!

-john.
# 6  
Old 12-10-2008
My code was meant to show you the general process of MIME decoding (and mostly concept). It was not quite good for production use. Parsing a real-world email message is likely slightly more complex due to existence of variations.

To be frank, if you can get hold of PHP 5, as indicated in the inline comment, the simplest approach would be to use the iconv_mime_decode() function which is a one-stop shop of what you want. There was a (intentional) flaw in my posted code because it didn't handle the case where the encoding is quoted-printable, that is also supported by MIME. For simplicity, I only posted the part which decodes Base64, because that was used in your sample posted.

If you get hold of the concepts needed, you may then check other languages or tools to see if they may better suit your environment compared with PHP. As PHP install is typically pretty big, it may not be necessarily suitable in all deployment environments (say on very limited storage space).
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

View file encoding then change encoding.

Hi all!! I´m using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . I´m expecting / looking an output as UTF8 or ANSI / ASCII Is there command to display the files encoding? Thank you! (2 Replies)
Discussion started by: mrreds
2 Replies

2. UNIX for Dummies Questions & Answers

Convert Txt file to HTML table and email

Hi all I need help converting a text file into a html table in bash and I need to email this table. The text file looks like the below. Two columns with multiple rows. Top row being header. Application Name Application Status Application 1 Open Application 2 ... (2 Replies)
Discussion started by: hitmanjd
2 Replies

3. UNIX for Dummies Questions & Answers

Email Headers

I'm trying to pick up some Unix SysAdmin skills on my own outside of work through the use of the "Unix and Linux System Administrators Handbook." I've found the exercises to be very beneficial, until I came to this.... "What path did the email take? To Whom was it addressed, and to whom was it... (0 Replies)
Discussion started by: ksmarine1980
0 Replies

4. UNIX for Dummies Questions & Answers

Script output in Email is not showing Colored headers

Hi All I am working on AIX 7.1 and I am trying to show an output that I get from "cat" a log file to email. However in email I get the below output: In the script I have defined the colors as: #!/bin/sh echo "\033 Below is the script I have created to send this output: ... (9 Replies)
Discussion started by: Bubs
9 Replies

5. Shell Programming and Scripting

SQL query output convert to HTML & send as email body

Hi , I have a sql query in the unix script ,whose output is shown below.I want to convert this output to HTML table format & send email from unix with this table as email body. p_id src_system amount 1 A 100 2 B 200 3 C ... (3 Replies)
Discussion started by: jagadeeshn04
3 Replies

6. Shell Programming and Scripting

How to find the file encoding and updating the file encoding?

Hi, I am beginner to Unix. My requirement is to validate the encoding used in the incoming file(csv,txt).If it is encoded with UTF-8 format,then the file should remain as such otherwise i need to chnage the encoding to UTF-8. Please advice me how to proceed on this. (7 Replies)
Discussion started by: cnraja
7 Replies

7. Shell Programming and Scripting

UTF8 encoding

Hi experts, I have a gz file from other system(solaris), which is ftped to our system(solaris). After gunzip, the file is a xml file and we are using ORACLE built in xml transformiing tool ORAXSL to transform XML to TXT. Now the issue is we come accross issue regarding UTF8 as below:... (1 Reply)
Discussion started by: summer_cherry
1 Replies

8. Shell Programming and Scripting

Merging of files with different headers to make combined headers file

Hi , I have a typical situation. I have 4 files and with different headers (number of headers is varible ). I need to make such a merged file which will have headers combined from all files (comman coluns should appear once only). For example - File 1 H1|H2|H3|H4 11|12|13|14 21|22|23|23... (1 Reply)
Discussion started by: marut_ashu
1 Replies

9. Shell Programming and Scripting

Remove text between headers while leaving headers intact

Hi, I'm trying to strip all lines between two headers in a file: ### BEGIN ### Text to remove, contains all kinds of characters ... Antispyware-Downloadserver.com (Germany)=http://www.antispyware-downloadserver.c om/updates/ Antispyware-Downloadserver.com #2... (3 Replies)
Discussion started by: Trones
3 Replies

10. UNIX for Dummies Questions & Answers

encoding

Hi, I'm using putty and when I try to write ü it writes | (or when I try to write é , it writes i) I tried to change settings/translation of putty but with no success I have KSH # locale LANG= LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C"... (3 Replies)
Discussion started by: palmer18
3 Replies
Login or Register to Ask a Question