International character set support

One of the best (or worst) things about developing an E-Mail application is dealing with internationalization issues. The early Internet protocols tend to assume that 7-bit  ASCII US English text is the default, and everything else is the exceptional case. To this end, there have been many  character sets and  encoding formats created over the years. Today, we've at least managed to consolidate all the character sets into a standard known as  Unicode.

Now the BlackBerry API only claims to support the following character sets:

In addition, it has been discovered that East Asia localized BlackBerry devices also support  GB2312.

To make matters more interesting,  Vietnamese characters will only display correctly if they are represented as decomposed Unicode characters. Meanwhile, most Vietnamese text on the Internet is represented as canonically composed Unicode characters.

Users of LogicMail have occasionally run into issues where their non-English E-Mails do not render correctly. The most recent example involved a message that used the  CP1251 character set for Cyrillic. However, I suspect that other similar issues have been noticed and have gone unreported.

Assuming you haven't fallen asleep yet, you are probably just wondering whether I can just wave my hand over the code and make your E-Mail render legibly. Well, I can, but there are some drawbacks. Basically, I need to include two things in the LogicMail code:

  1. Unicode translation tables for all unsupported character sets
  2. Unicode compatibility decomposition normalization (NFKD) tables

Believe it or not,  Java SE 6 already includes libraries for this. But, the BlackBerry API does not. Including this in its complete form would dramatically increase the size of the LogicMail application, most likely by an unreasonable amount.

So what do I want to know from you? What unsupported character sets do you all normally receive E-Mails written in? If I can scope the effort to just a few translation tables, the size penalty may be acceptable. (If you don't know where to look, check the raw message headers.)

  • Posted: 2009-02-13 17:40 (Updated: 2009-02-13 17:40)
  • Categories: (none)

Comments

No comments.