Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Gentoo & UTF-8: Should I care?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
Master One
l33t
l33t


Joined: 25 Aug 2003
Posts: 754
Location: Austria

PostPosted: Thu Feb 24, 2005 9:22 pm    Post subject: Gentoo & UTF-8: Should I care? Reply with quote

While searching for more info about setting up Gentoo with UTF-8 support, I just saw here, that the Gentoo forums will be switched to UTF-8 in the (near?) future.

I already have compiled glibc with userlocales "de_AT@euro/ISO-8859-15" + "de_AT.UTF-8/UTF-8", but I still have set LANG="de_AT@euro" in /etc/env.d/99local

Do I have to do anything, to have full support of UTF-8 in KDE? (I assume not, because I compiled my whole system with the USE flags "cjk" &"unicode", and the system locale doesn't have any influence on X and the used DE, right?)

I don't want to use any special characters (except the German umlaut) in console, nor do I want to convert any filenames or filecontents to UTF-8. Is this right then, that I do not have to care for changing my system settings LANG="de_AT@euro" in /etc/env.d/99local, KEYMAP="de-latin1-nodeadkeys" in /etc/conf.d/keymaps and the settings in /etc/rc.conf (UNICODE="no", CONSOLEFONT="lat9w-16"), as well as kernel NLS settings (iso-8859-15) Default NLS Option and <M> NLS UTF8?

"There is a bug in Linux kernels (all versions up to 2.6.10 at the time of writing), which is affecting UTF-8 locales using dead keys (like Czech, Polish, Slovak, Spanish...). Linux kernel does not support unicode for dead keys." (quote from here)

So why should anyone try to use a full UTF-8 system, including UTF-8 in console?

As I have compiled glibc with the userlocales mentioned above, I already have this output:
Code:
# locale -a
C
de_AT@euro
de_AT.iso885915@euro
de_AT.utf8
POSIX

But if I would not have compiled glibc with "de_AT.UTF-8/UTF-8", why should "localedef -i de_AT -f UTF-8 de_AT.utf8" (taken from here) have the same effect ??? (Wouldn't this make the need to recompile glibc completely obsolet?)

In the "Using UTF-8 with Gentoo" guide, the author seems to not have set a global locale at all, but individual settings in ~/.profile for each user, because he does not recommend setting UTF-8 for the root user. So I let LANG="de_AT@euro" system-wide, and don't set any ~/.profile, because my only normal user is access my box by using X with KDE?

That guide keeps on confusing me with that ncurses recompile and "revdep-rebuild --soname libncurses.so.5", because I don't think I have to do any of these, since I have "unicode" in my USE flags right from the beginning.

So I am a little lost, or maybe I was just reading too much info about this topic today, also I am really not into it.

Only the goal is clear: I simply want everything to be shown correctly using KDE, even after the Gentoo forums are switched to UTF-8.

Any comments?
_________________
Las torturas mentales de la CIA
Back to top
View user's profile Send private message
cazort
Guru
Guru


Joined: 19 Sep 2004
Posts: 343
Location: Lancaster, PA

PostPosted: Thu Feb 24, 2005 10:17 pm    Post subject: Reply with quote

UTF-8 is basically "backwards compatible" with the basic western encoding, so even if you don't have support, you should still be able to read all the characters with the exception of ones that are part of UTF-8 but not part of whatever encoding you are using. So I wouldn't worry too much. But UTF-8 is so cool and versatile that I've been building it into all my systems for the past several years, and I would recommend everyone to do the same!
Back to top
View user's profile Send private message
Master One
l33t
l33t


Joined: 25 Aug 2003
Posts: 754
Location: Austria

PostPosted: Thu Feb 24, 2005 11:08 pm    Post subject: Reply with quote

cazort wrote:
UTF-8 is basically "backwards compatible" with the basic western encoding, so even if you don't have support, you should still be able to read all the characters with the exception of ones that are part of UTF-8 but not part of whatever encoding you are using. So I wouldn't worry too much. But UTF-8 is so cool and versatile that I've been building it into all my systems for the past several years, and I would recommend everyone to do the same!

Can you be more precise, please?

So what exactly did you build into your systems?

I guess it's not about console support, so did you do anything other, than I have done?

Or do you mean, you built UTF-8 only systems?

I really would like to benefit from UTF-8, but I am not sure if I am on the right way to do so.
_________________
Las torturas mentales de la CIA
Back to top
View user's profile Send private message
orzetto
Apprentice
Apprentice


Joined: 05 Mar 2003
Posts: 165
Location: Magdeburg, Germany

PostPosted: Sun Feb 27, 2005 12:30 pm    Post subject: Reply with quote

Master One wrote:
Can you be more precise, please?

So what exactly did you build into your systems?


When cazort says that UTF-8 is backwards compatible, I assume he means "compatible with ASCII". To sum it up, ASCII (which contains by far the most common characters in use, as all the latin alphabet without accents, basic control keys, punctuation and so on) uses 7 bits out of 8 available, for a grand total of 128 (2^7) characters.

Setting the 8th bit, you get 128 characters more, and you obtain the various ISO-8859-* codes (among these, the various Latin-[1-9], Hebrew, Arabic, Cyrillic), other ones as KOI-8R and so on.

What UTF-8 does, is to use the 8th bit as a flag to tell the system "check next byte too". This way, ASCII characters are encoded just the same in UTF-8 and in ASCII, and a gigantic number of characters can be coded. However, if you have characters from, say, Latin-3, as e.g. ĉ, the UTF-8 interpreter will show some gibberish. Same happens the other way around. However, since ASCII characters are very common, if your language is based on the latin alphabet you will probably be able to read in spite of some accented characters being corrupted.

UTF-8 is probably the best implementation of Unicode: it implies a minimal expansion of ASCII files (with UCS-2, used in Windows, ASCII text files double in size since all characters are represented by 2 bytes), compatibility with ASCII files (try opening a ASCII file with UCS-2 encoding and you should see a lot of CJK characters), and minimal "gibberization" of latin-based text.
_________________
Why is everybody always generalising?
Back to top
View user's profile Send private message
h.u.n.t.e.r
n00b
n00b


Joined: 04 Jun 2003
Posts: 63
Location: Belgium

PostPosted: Sun Jul 03, 2005 8:43 pm    Post subject: Reply with quote

I have the same problem. I started with nl_BE@euro and ISO-8859-15. This weekend I switched to UTF-8. I did everything as explained in the guide.
Problem: dead key's in kde applications do not work but they do work in gnome apps! f ex eclipse, gaim, etc....

---------------------------

Strange:? When I JUST leave the /etc/env.d/02locale file empty and everything defaults to "POSIX"; the dead key's work.
So what's in that file? --> LANG="nl_BE@euro.UTF-8" AND LC_ALL="nl_BE@euro.UTF-8"

------------------------------

Furthermore -->

localhost workspace # locale -a
C
POSIX
de_DE
de_DE@euro
en_HK
en_PH
en_US
en_US.utf8
fr_FR
fr_FR@euro
nl_BE
nl_BE.utf8
nl_BE.utf8@euro.UTF-8
nl_BE.utf8@euro.utf8
nl_BE@euro
nl_BE@euro.UTF-8

----------------------------



Does the following has an influence: ???????????????? Why??? --> BECAUSE using en_US.UTF-8 instead of nl_BE@euro.UTF-8 works normally!
But I do want MY language settings :(


localhost workspace # ls /usr/X11R6/lib/X11/locale/ | grep UTF
el_GR.UTF-8
en_US.UTF-8
ja_JP.UTF-8
ko_KR.UTF-8
pt_BR.UTF-8
th_TH.UTF-8
zh_CN.UTF-8
zh_TW.UTF-8


-----------------------------------

And last but not least -->
http://gentoo-wiki.com/HOWTO_Make_your_system_use_unicode/utf-8#Kernel_Bugs
Could this be my problem? I mean; why is it stilll working just fine unther gnome thenn??



TNX for the advice!
Back to top
View user's profile Send private message
Deathwing00
Bodhisattva
Bodhisattva


Joined: 13 Jun 2003
Posts: 4087
Location: Dresden, Germany

PostPosted: Sun Jul 03, 2005 10:33 pm    Post subject: Reply with quote

Moved from Installing Gentoo to Other Things Gentoo.
Back to top
View user's profile Send private message
rofro
Apprentice
Apprentice


Joined: 21 Jun 2004
Posts: 234
Location: Piaseczno, Poland

PostPosted: Wed Sep 28, 2005 5:41 pm    Post subject: Reply with quote

read this

Quote:
It could be that es_ES defaults to UTF-8 when unicode support is enabled

_________________
Linux #358594
gentoo bug comment 175808#c26
You either must have patience or contribute to open source. There is only one guaranteed way to have open source do what you want it to do, and that's write it yourself.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum