Thursday, May 20, 2010
One of the biggest problems facing report developers is how to efficiently support multiple languages using multiple output formats (HTML, PDF, Excel...) BIRT provides support for multiple languages but the support is both OS and output format dependent. Having gone through this process a couple of times, I have been asked to share my experiences, hopefully it will simplify the task if you run into similar issues.
What I knew going in was that fonts are handled differently depending on the type of output. HTML fonts are rendered in the users browser, PDF fonts are rendered on the server by BIRT, and chart fonts are rendered on the server by Java AWT. On top of all that, fonts in the development environment are rendered by SWT. There are different ways to configure fonts for each of these situations.
For HTML reports, the user must have the appropriate fonts installed on their machine.
For PDF reports, BIRT renders the fonts on the server using information in the various fontsConfigXXX.xml files located in plugins/org.eclipse.birt.report.engine.fonts_XXX. The base file is fontsConfig.xml and there are additional files to handle specific operating systems. There are some useful comments in fontsConfig.xml that explain how this works. I know these files are used for PDF generation and I think they are also used for rendering fonts in the development environment.
From a previous font adventure, I knew that BIRT uses AWT to render charts, so font configuration is done in the lib folder of the JRE by manipulating various properties files. Documentation on this seems to be a bit sketchy, but this site has some information.
My customer's problem, had to do with PDF output. Thai characters were displaying as question marks in their 188.8.131.52 BIRT reports. This was happening on all the OS's they ran, including Windows, Solaris and Linux. The characters looked fine in HTML output, but the PDF rendered characters incorrectly. Their system administrator stated that they had full language packs installed on all of the machines.
I started my investigations on their Windows box, running Windows Server 2003. The first thing I looked at was the fontsConfig.xml file in plugins/org.eclipse.birt.report.engine.fonts_2.2.1.v20070823. I noticed there was a block element for Thai in the commented out section:
<block name="Thai" start="e00" end="e7f" index="27" font-family="Font-Family"/>
I realized this has the rather nice effect of dynamically switching fonts in mid-string whenever characters within this range are encountered. There's no need to use a special font in the report design and that's particularly important in multi-lingual environments.
The only problem was the font family. The sample element didn't tell me what it should be and I couldn't tell by looking looking at the font files in C:\windows\fonts. I tried downloading several font viewers but none of them told me what I needed to know.
Then I had the good luck to check on my laptop, which runs Windows 7. In Windows 7, the font listing in windows explorer shows lots more information about the fonts, including a column called "Designed For". I found there were several fonts that were "designed for" Thai and Angsana New was one that also existed on their server. I used Angsana New for the font family in the block element and restarted their web application and it fixed it!
Next I had to preform this feat on their Solaris box.
Of course I tried Angsana New right off the bat, but it wasn't going to be that easy.
I started exploring the installed packages on my on Ubuntu box by searching for "font" in the synaptic package manager. I found all kinds of programs and spent a good amount of time looking at the man pages for them and trying them out, but the only one I found that told me which fonts support which character ranges was gucharmap, the Gnome version of the character map program.
The left side has a list of "Scripts" and one of them was Thai. Clicking on that showed the Thai character set (which is relatively small). I found that holding the right mouse button down over a character would display its font name. On my Ubuntu box the font was Waree, which worked for my machine but did not solve the client's issue.
Using gucharmap on the clients machine showed that their server was using AngsanaUPC. A quick modification to the fontsConfigXXX.xml to use AngsanaUPC and the problem was resolved. The main lesson I learned from this was how to find font families that support specific blocks of unicode characters.
For windows, Windows 7 explorer works nicely. I am unsure how to handle this if you are using XP. For Linux gucharmap worked, but it was dependent on the particular distro of Linux. The only downside to this approach was the need to have XWindows access. I will continue looking for a good command line utility that can be used to research installed fonts on servers. If anyone has found a utility like this, I would love to hear about it.