IET Gen Development Blog: Understanding and editing UTF-8 files

Tuesday, 15 October 2013

Understanding and editing UTF-8 files

One of the design goals of Rapide was to enable internationalisation of the applications so that the user interface could be displayed in multiple languages, with the user's locale automatically detected and the appropriate locale specific translated text strings displayed instead of the default ones.

We chose UTF-8 as the encoding standard for the files that Rapide generates and uses, for example the GUI XML definition files and the string properties files.

Here are some links to various articles if you want to know more about UTF-8:

http://www.utf-8.com/

http://www.icu-project.org/docs/papers/forms_of_unicode/

http://en.wikipedia.org/wiki/UTF-8

http://www.joelonsoftware.com/articles/Unicode.html

http://www.utf8everywhere.org/

When editing the files to create locale specific translations, you must use an editor that understands and preserves the UTF-8 encoding. For basic editing we use Notepad++ which can edit files in UTF-8 and also contains conversion utilities. On opening an existing file, always check that the encoding has been set to UTF-8. Many editors will attempt to auto-detect the encoding, but if the file only contains basic ascii characters below X'80', there is no difference between an ascii and UTF-8 file and so the editor will probably not set the encoding format to UTF-8.

One way around this is to insert a BOM (byte order mark) at the start of the file to indicate the encoding, but this is not recommended for UTF-8 and we have decided not to include a BOM in the files generated by Rapide.

Tuesday, 15 October 2013

Understanding and editing UTF-8 files

No comments: