Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
HOWTO: Using UTF-8 on Gentoo (edited)
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3  Next  
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
gna
n00b
n00b


Joined: 19 Mar 2003
Posts: 38
Location: Beijing

PostPosted: Sat Jul 24, 2004 9:43 am    Post subject: Reply with quote

I think all the packages mentioned in the howto have ebuilds. Can you be a bit more precise about what kind of ebuild?
Back to top
View user's profile Send private message
skyfolly
Apprentice
Apprentice


Joined: 16 Jul 2003
Posts: 245
Location: Dongguan & Hong Kong, PRC

PostPosted: Tue Jul 27, 2004 8:52 am    Post subject: Reply with quote

gna wrote:
I think all the packages mentioned in the howto have ebuilds. Can you be a bit more precise about what kind of ebuild?


like chinese UTF-8 ones. the chinese UTF-8 locale can not be found on
Code:
locale
either. Sorry, I am a bloody old newbie, dunno much about it, there is one article on transfering to Chinese UTF-8, but that guy seemed to fail it too. I am fed up with GB-2312 and Big5.

People have to transfer their fonts and locale to use UTF-8, fonts never display correctly.
_________________
I am the only being whose doom
No tongue would ask no eye would mourn
I never caused a thought of gloom
A smile of joy since I was born.
emily bronte
Back to top
View user's profile Send private message
Gatak
Apprentice
Apprentice


Joined: 04 Jan 2004
Posts: 174

PostPosted: Sun Aug 22, 2004 10:19 am    Post subject: Reply with quote

I have one problem with UTF-8. It is that I cannot mount a WindowsXP share with UTF-8. All extended characters come out very wrong, or simply missing.

But if I mount a Samba share from WindowsXP, UTF-8 works.

I tried with mount -o iocharset=utf8 with no luck.

EDIT: It works now with:
Code:
mount -t smbfs  -o iocharset=utf8,codepage=cp850
Back to top
View user's profile Send private message
gna
n00b
n00b


Joined: 19 Mar 2003
Posts: 38
Location: Beijing

PostPosted: Sun Aug 22, 2004 12:28 pm    Post subject: Reply with quote

Actually you are still using samba to mount your Windows XP partition. That is what the
Code:
-t smbfs
means. Without samba it should be
Code:
-t ntfs
or
Code:
-t vfat
depending on whether you are using an ntfs or a fat32 partition for windows.

Also I think you need to have the appropriate code page modules compiled as modules or built into your kernel for mount to be able to use iocharset correctly. See File Systems -> Native Language Support
Back to top
View user's profile Send private message
Gatak
Apprentice
Apprentice


Joined: 04 Jan 2004
Posts: 174

PostPosted: Sun Aug 22, 2004 12:40 pm    Post subject: Reply with quote

I think you are mistaking me what I wanted to do. I am not mounting a partition, but a Windows share over the network.
Code:
mount -t smbfs //windowsmachine/share /mnt/win -o username=blah,password=blah,iocharset=utf8,codepage=cp850


What is odd is that the codepage statement is needed. The purpose of Unicode is to provice a single universal characterset so no codepage translations will ever be nessesary between applications and systems.
Back to top
View user's profile Send private message
gna
n00b
n00b


Joined: 19 Mar 2003
Posts: 38
Location: Beijing

PostPosted: Mon Aug 23, 2004 10:51 pm    Post subject: Reply with quote

I have tried this on a Win2k share and am also having similar problems.
Why did you chose cp850?
Is cp850 the default codepage on your windows XP?
What is the default nls in your kernel?

thanks
Back to top
View user's profile Send private message
Gatak
Apprentice
Apprentice


Joined: 04 Jan 2004
Posts: 174

PostPosted: Mon Aug 23, 2004 10:58 pm    Post subject: Reply with quote

The codepage should be irrelevant when using UTF-8 (Unicode). This is the whole point with Unicode.

My default NLS in the kernel is UTF-8.

Windows XP and Windows 2000 are using Unicode for SMB shares, not single-byte codepages. This is why it is so strange when Samba required me to choose one.

cp850 is a "western latin-1" codepage so this is why I tested it. Windows 2000/XP uses codepages for non-Unicode applications only.

Normally, a character is described as 8 bits. This makes it possible to have 256 different ones. Naturally. 256 characters aren't enough to describe all languages and all systems. Therefore codepages were developed so applications could know what the specific byte would be.

If two users were to talk to eachother over the net their systems would need to use the same codepage or characters would end up wrong.

Unicode was developed to remedy this. Unicode is large enough to be able to describe most (all?) languages in the world. Therefore the need for other codepage is removed. The biggest remaining problem is to have full unicode fonts. The fullest one I know is Arial Unicode MS. It has about 55000 characters defined.
Back to top
View user's profile Send private message
gna
n00b
n00b


Joined: 19 Mar 2003
Posts: 38
Location: Beijing

PostPosted: Tue Aug 24, 2004 3:06 am    Post subject: Reply with quote

I agree that it should not be necessary to specify a codepage and, preferably, also no iocharset. It seems that that is the way it is intended to work. Why that is not working is either a bug or a configuration error.

Two more suggestions:

In the kernel configuration check
File Systems -> Network File Systems -> SMB File System support -> Use a default NLS -> utf8
It seems you can specify two default NLS's in the kernel, one for smbfs and one for other stuff.

Also try using the cifs filesystem. Just replace smbfs with cifs in your mount command (assuming it is configured in the kernel). cifs doesn't have a codepage option and is supposed to have better international support than smbfs. Cifs is now recommended over smbfs for all except old smb systems. Documentation is in /usr/src/linx/fs/cifs/README

If you can't get it to work then it might be good to ask a question on the linux cifs mailing list and/or file a bug report.
Back to top
View user's profile Send private message
Leo Lausren
Apprentice
Apprentice


Joined: 24 Feb 2004
Posts: 198
Location: Denmark

PostPosted: Tue Aug 24, 2004 6:25 am    Post subject: Reply with quote

ecatmur wrote:
Hmm, I have to use unicode_start to get the UTF-8 characters to work...

I made a script that echoes the \E%G to the terminals at boot, called /etc/init.d/unicode. It probably needs some work to be of general use.
Code:

#!/sbin/runscript
conf=/etc/env.d/02locale

# Using devfs?
if [ -e /dev/.devfsd ] || [ -e /dev/.udev -a -d /dev/vc ]; then
  device=/dev/vc/
else
  device=/dev/tty
fi

depend() {
        need localmount
        after keymaps
        before consolefont
}

checkconfig() {

  if [ -r ${conf} ]; then
          . ${conf}
          encoding=
          [ -n "${LC_ALL}" ]      && encoding=${LC_ALL#*.}   && return 0
          [ -n "${LC_MESSAGES}" ] && encoding=${LC_MESSAGES#*. } && return 0
          [ -n "${LANG}" ]        && encoding=${LANG#*.}   && return 0
  fi
  eend 1 "Locale is not configured, Please fix ${conf}"
  return 1
}

start() {
        ebegin "setting consoles to UTF-8"
        checkconfig
        if [ "${encoding}" = "UTF-8" -o "${encoding}" = "utf-8 " ]; then
                dumpkeys | loadkeys --unicode
                for ((i=1; i <= "${RC_TTY_NUMBER}"; i++)); do
                        echo -ne "\033%G" > ${device}${i}
                done
                eend 0
        else
                eend 1 "UTF-8 is not required"
        fi
}

_________________
Blog: common sense – nonsense
Back to top
View user's profile Send private message
max4ever
Tux's lil' helper
Tux's lil' helper


Joined: 29 Jul 2004
Posts: 87
Location: almost in hell

PostPosted: Thu Sep 02, 2004 8:54 pm    Post subject: Reply with quote

umm so if i did this
Code:
linuxoid max # cat /etc/env.d/99locale
LANG=it_IT.utf8
LC_CTYPE=it_IT.utf8
does this means now that anywhere in linux now i can see any character from any language if the terminal supports or the software UTF-8 ? i'm having problems getting my linux to show romanian specific letters in kde and mplayer...
_________________
Stop posting your PC's hardware as your signature.
Back to top
View user's profile Send private message
Gatak
Apprentice
Apprentice


Joined: 04 Jan 2004
Posts: 174

PostPosted: Thu Sep 02, 2004 9:19 pm    Post subject: Reply with quote

Only if the application you use has a font which includes these characters. And only if the application support UTF-8.
Back to top
View user's profile Send private message
max4ever
Tux's lil' helper
Tux's lil' helper


Joined: 29 Jul 2004
Posts: 87
Location: almost in hell

PostPosted: Fri Sep 03, 2004 11:27 am    Post subject: Reply with quote

hmm, and how can i find out if a font has "support" for those characters ? for example i'm having problems with mplayer showing correctly subtitles..., can u suggest some font with utf8 support and antialias ?
_________________
Stop posting your PC's hardware as your signature.
Back to top
View user's profile Send private message
Gatak
Apprentice
Apprentice


Joined: 04 Jan 2004
Posts: 174

PostPosted: Fri Sep 03, 2004 11:41 am    Post subject: Reply with quote

You can try to load the font in a character map program. I think there is one in Gnome. It allows you to see which characters exist in the font. Then you have to use that font in mplayer.

But remember, the subtitles that you load in mplayer may not be encoded with UTF-8, but some other local encoding. Mplayer would need to support that one.
Back to top
View user's profile Send private message
andrewski
Guru
Guru


Joined: 30 Apr 2004
Posts: 366
Location: Royersford, PA, USA

PostPosted: Sat Oct 02, 2004 2:50 am    Post subject: Reply with quote

It'd be great if you could post a bit on the various fonts that are necessary to complete the effort to actually "see" UTF, i.e. console font, *term font. In all my searching, I haven't been able to figure that one out!

Also, where does CONSOLETRANSLATION from /etc/rc.conf come in? Perhaps that's necessary to seal the deal, as it were?

Thanks for a nice howto.
Back to top
View user's profile Send private message
obmun
n00b
n00b


Joined: 17 Jan 2004
Posts: 66
Location: Europe (Spain)

PostPosted: Sat Oct 02, 2004 11:22 am    Post subject: Reply with quote

@andreskwi:

Forget about UTF-8 in console. It won't work completely (compose chars won't work). For more info take a look at this post. There I have some info about console font. Essentialy you have to use a console font with unicode map. Also it's good to have a font that makes use of the full 512 available gliphs (and not just one with 256).

CONSOLETRANSLATION tells setfont the translation map it will use to translate program output from 8 bit to the UTF-8 the kernel expects (kernel is always in UTF-8. It always execpts to recive unicode chars) when you're not using UTF-8. If apps are already sending UTF-8 chars it's not necessary to use the translation map and therefore CONSOLETRANSLATION should be commented out if you're using UTF-8 as your default coding.
Back to top
View user's profile Send private message
talon
n00b
n00b


Joined: 11 Jun 2003
Posts: 13

PostPosted: Thu Oct 07, 2004 10:36 pm    Post subject: gtk utf-8 Reply with quote

My major problem in porting my machine to utf-8 was that all gtk-1 apps didn´t display chars correctly. After a long time of experimenting I figured out how to do it right. You have to add the following line to your ~/.gtkrc.mine:
Code:

style "gtk-default" {
fontset = "-*-luxi sans-medium-r-normal--10-*-*-*-p-*-iso10646-1,\
-*-luxi sans-medium-r-normal--10-*-*-*-p-*-iso10646-1,\
-*-r-*-iso10646-1,*"
}
class "GtkWidget" style "gtk-default"

replace the "luxi sans" with your favorite font and the "10" with your preferable size. Even when you work with themes they won´t overwrite this file :D .
Back to top
View user's profile Send private message
Haqqax
n00b
n00b


Joined: 11 Jul 2004
Posts: 35

PostPosted: Fri Oct 08, 2004 10:13 pm    Post subject: Reply with quote

Can anyone shed some light on how to force (or whether it can be done at all) KDE apps to work with Unicode Plane1 characters?
I have been testing a little the last two days. I managed to create a font with just a few characters encoded in Plane1 (they start with 0x12000 - I am trying to make my linux support Akkadian cuneiform), I installed it and created with Perl a text file and HTML file for tests. HTML has both plain text chars and character entity references.

The only applications that processes and displays these files correctly are Firefox (it does display cuneiform texts :-) ) and Thunderbird (I did send a cuneiform e-mail to myself, and when it arrived it got displayed correctly :-) ) All the other applications, including but not limited to: OpenOffice, Konqueror and standard KDE apps do not parse UTF from Plane 1 correctly (they split one code into 2 chars) and of course do not display the text correctly. I am particularely disappointed by OpenOffice in this matter.

Can my KDE be cured? Does my success with Firefox and Thunderbird mean, that other GTK editors may work equally well?
Back to top
View user's profile Send private message
gna
n00b
n00b


Joined: 19 Mar 2003
Posts: 38
Location: Beijing

PostPosted: Sat Oct 09, 2004 5:50 am    Post subject: Reply with quote

Actually this topic is of interest to me too. I know that a lot of applications ignore the supplementary planes. There is a UTF-8 project at freedesktop.org that is trying to make a list of non unicode compliant software. In particular they have a list of unicode software that doesn't work for the supplementary planes. Unfortunately this list is very short. But if you do find out something please report here and let us all know.

What software did you use to make your font? It would be helpful to know so that more people know how to do testing.

thanks
Back to top
View user's profile Send private message
Haqqax
n00b
n00b


Joined: 11 Jul 2004
Posts: 35

PostPosted: Sat Oct 09, 2004 2:09 pm    Post subject: Reply with quote

Quote:
What software did you use to make your font?


I used FontForge.

I was really surprised (in a positive sense) by this program. I like it very much.

I was not able to successfylly set up encoding for my font from within the user interface - I just opened SFD file with Vim and updated the encoding manually:

Code:

Encoding: unicode4
UnicodeInterp: none


It is a new program to me, maybe some other settings are also important. I noticed (try and error :-) ) that if you make a mistake in "Encoding", SourceForge will change it to "Custom"

I am still reading about the file format.

Quote:
list of unicode software that doesn't work for the supplementary planes


They only list Vim and Emacs? I would say Vim does better job than KDE editors and OpenOffice. I wonder whether it would not work if I had proper console font. I can only see that Vim does know how many characters I have - it displays question marks instead of them, and it has no other choice because I only have truetype font for my encoding. OpenOffice 1.1.2 did not get that far. I am upgrading today to 1.1.3.

I do not have Emacs to test. I think one might try to use Thunderbird's editor to edit these texts (sooner or later other editors will support Plane1 too). I will investigate this if I have some time :-) The other solution may be to build console font and check whether medit can be used for editing. Building IME for medit is extremely easy. I think this approach would be successful - but it does not meet my goal.

I would like to use cuneiform just like I use Chinese - not to have to do a magick dance with special macros, hacking too much with fonts and having to use specialized editors. I want to open all the files in editors I use for everyday work and input them with IMEs I normally use.
Back to top
View user's profile Send private message
numerodix
l33t
l33t


Joined: 18 Jul 2002
Posts: 743
Location: nl.eu

PostPosted: Sat Oct 09, 2004 7:40 pm    Post subject: Reply with quote

Ok, so I finally succeeded in getting this to work, my /etc/env.d/02locale now looks like this:

Code:
LC_CTYPE="no_NO.utf8"
LANG="en_US.utf8"


After restarting X (you may want to mention that without restarting it just won't work) I was relieved to find out that apparently both qt and gtk now recognize the character set, filenames displayed correctly in konqueror etc. It looks like the apps that I use in X are working fine in this respect.

What is still missing is unicode support in the console, that is outside of X. I'm not exactly sure what it takes to get filenames to display correctly, sometimes I have to run unicode_start, sometimes it seems to work without it. But input is still not working, that is the keys æøå. My /etc/rc.conf looks like this:

Code:
KEYMAP="no-latin1"
CONSOLEFONT="lat0-16"
CONSOLETRANSLATION="8859-1_to_uni"


While I use X 98% of the time, it's a little problematic to have this bug if anything thas to be done from the shell. Any ideas?

[edit]The euro symbol is not working either, whatever I've done I've never been able to activate it.
_________________
undvd - ripping dvds should be as simple as unzip
Back to top
View user's profile Send private message
Haqqax
n00b
n00b


Joined: 11 Jul 2004
Posts: 35

PostPosted: Sun Oct 10, 2004 12:57 am    Post subject: Reply with quote

gna wrote:
But if you do find out something please report here and let us all know.


Well, I did some additional tests and the results are very good.

I made a test IME for my Akkadian font in SCIM and IT WORKS. I can write Akkadian just like Chinese!

This can be usable in academic projects. If I send you a TTF font and you install it, I will gain the possibility to send you emails in Akkadian. Thunderbird will display them for you, you can save text files correctly,etc. And with SCIM, you can also write Akkadian back to me. If there only was an word-processing application, it would be so easy to write books, prepare tests for students etc.

As I said Firefox works well with Plane1 (only deleting is a little broken - you have to backspace each character twice, as it happend sometimes woth chinese on English systems in the old days - ie. not all the bytes of the character are deleted at once). So, if PostgreSQL is Plane1 ready (I did not check yet) we might start to collaborate on some Akkadian data (dictionary, book, text repository - and not only Akkadian) already encoded in the future standard (UNICODE did not accept the sumero-akkadian cuneiform encoding yet) just like we can with English. We have everything in place. Even if the encoding will finally change, it would be a matter of minutes to write the script to fix the existing texts. I think I could build such collaboration platform to be usable in a week - if someone would donate glyphs for the cuneiform font (I think the beginning might be the fonts created for TeX by Mr Piska. Or one might buy fonts from Michel Everson :-) ).

Well, the only problem now seems to be the retarded language support in Qt and KDE. I am extremely frustrated by this. Can someone write how non-BMP encodings are supported in GNOME applications?

PS: OpenOffice 1.1.3 is no better than 1.1.2 with support for Plane1 characters.
Back to top
View user's profile Send private message
Gatak
Apprentice
Apprentice


Joined: 04 Jan 2004
Posts: 174

PostPosted: Sun Oct 10, 2004 1:05 am    Post subject: Reply with quote

It think most Gnome applications support Unicode. At least if compiled in with accessibility support. In GEdit, for example, I can view all sorts of Unicode characters. I suppose I still need truetype or opentype fonts in system that support Unicode.
Back to top
View user's profile Send private message
Haqqax
n00b
n00b


Joined: 11 Jul 2004
Posts: 35

PostPosted: Sun Oct 10, 2004 1:23 am    Post subject: Reply with quote

Gatak wrote:
It think most Gnome applications support Unicode. At least if compiled in with accessibility support. In GEdit, for example, I can view all sorts of Unicode characters. I suppose I still need truetype or opentype fonts in system that support Unicode.


To be clear - there is no problem with BMP in KDE (chinese, IPA, arabic without vowels) - so Unicode is supported. I am interested in support for codes beyond 0xFFFF
Back to top
View user's profile Send private message
Haqqax
n00b
n00b


Joined: 11 Jul 2004
Posts: 35

PostPosted: Sun Oct 10, 2004 4:59 pm    Post subject: Reply with quote

I've got one more question: are Hebrew niqud and Arabic vowels displayed correctly on your Gentoo boxes? On my box they are displayed, but are not positioned correctly on their characters.

And, of course, arabic ligatures are broken by the vowels.

Is it working for anyone?
Back to top
View user's profile Send private message
obmun
n00b
n00b


Joined: 17 Jan 2004
Posts: 66
Location: Europe (Spain)

PostPosted: Mon Oct 11, 2004 3:07 pm    Post subject: Reply with quote

@numerodix:

Console and UTF-8? Bad mixture. Take a look at this post. There I analize the problem. Conclusion? It's a kernel problem.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum