diff options
author | Denys Vlasenko | 2010-02-01 15:58:08 +0100 |
---|---|---|
committer | Denys Vlasenko | 2010-02-01 15:58:08 +0100 |
commit | 698dca5805117f470ef19488428c8a5f795b9e0c (patch) | |
tree | 1ca510d7308f0a019ab8b0ba38be9183d438dfd6 /docs | |
parent | c8e18ca12c66bc95a30a7d41a7aff245c352d2c2 (diff) | |
download | busybox-698dca5805117f470ef19488428c8a5f795b9e0c.zip busybox-698dca5805117f470ef19488428c8a5f795b9e0c.tar.gz |
add unicode.txt
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Diffstat (limited to 'docs')
-rw-r--r-- | docs/unicode.txt | 56 |
1 files changed, 56 insertions, 0 deletions
diff --git a/docs/unicode.txt b/docs/unicode.txt new file mode 100644 index 0000000..019d12f --- /dev/null +++ b/docs/unicode.txt @@ -0,0 +1,56 @@ + Unicode support in busybox + +There are several scenarios where we need to handle unicode +correctly. + + Shell input + +We want to correctly handle input of unicode characters. +There are several problems with it. Just handling input +as sequence of bytes would break any editing. This was fixed +and now lineedit operates on the array of wchar_t's. +But we also need to handle the following problematic moments: + +* It is unreasonable to expect that output device supports + _any_ unicode chars. Perhaps we need to avoid printing + those chars which are not supported by output device. + Examples: chars which are not present in the font, + chars which are not assigned in unicode, + combining chars (especially trying to combine bad pairs: + a_chinese_symbol + "combining grave accent" = ??!) + +* We need to account for the fact that unicode chars have + different widths: 0 for combining chars, 1 for usual, + 2 for ideograms (are there 3+ wide chars?). + +* Bidirectional handling. If user wants to echo a phrase + in Hebrew, he types: echo "srettel werbeH" + + Editors + +This case is a bit similar to "shell input", but unlike shell, +editors may encounder many more unexpected unicode sequences +(try to load a random binry file...), and they need to preserve +them, unlike shell which can afford to drop bogus input. + + + more, less + +. + + ls (multi-column display) + +. + + top, ps + +. + + Filename display (in error messages and elsewhere) + +. + + + +TODO: write an email to Asmus Freytag (asmus@unicode.org), +author of http://unicode.org/reports/tr11/ |