UTF8 support (still work in progress)

https://github.com/utunnels/Cataclysm-DDA/tree/utf8

I updated SDL version to support utf8 strings that can make localization easier. There are some heavy modifications in those psuedo curses functions, so maybe I need to do more test to make sure there’s no serious bugs. If you want to try, make sure the font actually supports your characters and save your code (.cpp, .json, .h, etc) using utf8.

The code calculates x position based on character “cell” width, not to be confused with byte offset (this is what I’m not too sure, because somewhere in the project might be hard coded). It uses Markus Kuhn’s free wcwidth() implementation so if you want more details check wcwidth.c.

Link to original file: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

[hr]

Some screenshots:

Chinese characters:
Tested on windows 7 (C:\windows\fonts\MingLiU.ttc)

Cyrillic characters (google translation, forget it)
Tested on windows 7 (C:\windows\fonts\lucon.ttf)

Uhm… I may have been doing something wrong. http://i.imgur.com/wwdq60o.png Those are supposed to be üöäß

Did you save the file in utf-8?
What font are you using?

BTW, fixed a line length limit bug.

Yep, I saved it in UTF8… Wait, I just realized the problem. I’m not using SDL at all, for some reason it starts in the console even though I compiled with Tiles=1

LOL I see.

But wait, I seemed to have committed a bad modification some minutes ago, so do not checkout that version. :slight_smile:

Also I realized there are more things to do… for example, the word wrapping function in output.cpp needs to be reworked for the reason it uses byte length to calculate width.

Looking through wcwidth.c, unicode chars can be “wide” in both extent on the screen as well as number of bytes to display? Are we going to try and handle this on the map, or disallow those characters? It should simply be a matter of refraining from setting any map-drawable characters to those values, and possibly having a check for it if there are user-settable characters.

In other parts of the display that use normal flowed text it’s not an issue, but for the map and other parts of the screen that require fixed-width characters, I’m not sure how we could make that work.

I like this. there are alot of non-english speakers who might like the game. Would help expand the coder base too.

[quote=“Kevin Granade, post:6, topic:1975”]Looking through wcwidth.c, unicode chars can be “wide” in both extent on the screen as well as number of bytes to display? Are we going to try and handle this on the map, or disallow those characters? It should simply be a matter of refraining from setting any map-drawable characters to those values, and possibly having a check for it if there are user-settable characters.

In other parts of the display that use normal flowed text it’s not an issue, but for the map and other parts of the screen that require fixed-width characters, I’m not sure how we could make that work.[/quote]

Yeah, maps don’t use them so it is not an issue.
As for other parts, the code already checks the characters behind current cursor position. For example, if it wants to display a character with a width of 2, it will erase two cells behind current position. In some special cases, for example, there is a width 1 character and a width 2 character at that position, it will leave a gap for 3 cells, insert the new character and fill the rest with a space, make sure the characters behind current position keep their positions unchanged.

[hr]

Although I still encountered an item description word wrapping issue yesterday that I haven’t solved yet. Maybe I need to investigate other areas of the project…

I have already moved unicode related functions to catacharset.cpp

Well, I posted some screenshots.

Oh this looks nice :o

Does this work with the gettext stuff that was added to git master this weekend? Afaik it’s using utf-8 for the text, so might (hopefully) be fairly straightforward to merge.

Merged. But I haven’t try that feature yet.

This is awesome, good work. I suppose all we really need now, with this and the curses implementation being pretty close to done, is an actual architecture in place to handle the translation work.

At the very least we should consider if we want to ship the game with available languages, and allow the user to toggle between them, or if we want to compile to a specific language (to keep the compiled executable smaller and have the correct language by default when a user downloads it) and just keep all the languages in the source code but not the final product.

And, obviously, we will need to figure out a good way to actually store and manage the translations…

@GlyphGryph: system is there already, all it needs now are the translations.

Initial translators create “.po” files, which get stored in the git repo in “lang/po/<language_code>.po”, for example lang/po/en_NZ.po.

For use, these are compiled to .mo files (instructions in TRANSLATING.md).

After that, when the game is started it automatically checks the user’s current language, and searches for a translation in that language. If it finds one, all the translatable strings are replaced with the ones provided by the translator :).

Currently the lang/po directory doesn’t exist because there are no translations yet, and git refuses to track an empty directory.

Good thing about the .po files: they are text files, so git can track them like any other code file.

Here’s how it looks in cygwin.

Should I create a pull request?

Besides sdl curses, there are still many changes that handle word sizing and wrapping. Also there are languages that don’t use space as word separator (so maybe there should be such options in the future).

It will be good to have the word wrapping functions for the terminal interface as well.

I checked out your utf8 branch on github to test, there were a couple minor compile errors:

  • catacharset.cpp needs "#include "
  • utf8_width() has “unsigned ch” defined twice (just makes a warning, no real effect)
  • output.h needs “std::string word_rewrap (const std::string &in, int width);”

With these changes it seems to work fine :). I even tested the linux SDL version with the gettext translation file i’m using, and it works as expected. Missing glyphs all over the place due to no font having all the test characters i used, but if it were a real language it would work.

How are you doing the Chinese translation? Does it have all the strings replaced in the source code?

Thank you, I’ll try to correct them.

I’m not the translation. And I heard they have done most of the strings, leaving npc and special game modes untranslated. T

But I assure you, sometimes it is not just replacing A with B trick. Grammar sometimes becomes a problem wen you find the sentence looks funny with the original order.

Yes, many places in the code that assume english word order will need to be fixed. Most cases are using string substitution though, so it’s not too bad as the translator can just move the “%s” or “%d” into the correct place. For example the season and day are printed using “%s, day %d”, where %s is the season name (translatable) and %d the day (number). Then the translator can change it to “%s, %d天”, giving what you have in the earlier screenshot.

Another example in the current code: “Fungus stalks burst through %s’s hands!”, so the translator just moves the %s to wherever makes sense in the translated sentence. Might be a problem with gender, but sometimes it can be worked around by using a gender-neutral translation.

Some with two substitutions will need to be changed, for example “The %s fires its %s!”, the order might change. For these it can be changed in code to “The %1$s fires its %2$s!”, and it will use the correct argument even if the order is changed.

I didn’t know you can use like that. O_o

Me neither until i started worrying about translations :slight_smile:

Hi yobbo.
I compiled po file into mo, but the text just didn’t show correctly. (lang\mo\zh_CN\LC_MESSAGES\cataclysm-dda.mo)
I guess perhaps there are still some encoding issue or I have to change settings?

I edited some json file directly and the text showed correctly.

You can check the attachment. I only edited “New Game”.


Edit*

Oh, this will fixed the problem:

bind_textdomain_codeset(“cataclysm-dda”, “UTF-8”);