View previous topic :: View next topic |
Author |
Message |
gna n00b
Joined: 19 Mar 2003 Posts: 38 Location: Beijing
|
Posted: Sat Jul 24, 2004 9:43 am Post subject: |
|
|
I think all the packages mentioned in the howto have ebuilds. Can you be a bit more precise about what kind of ebuild? |
|
Back to top |
|
|
skyfolly Apprentice
Joined: 16 Jul 2003 Posts: 245 Location: Dongguan & Hong Kong, PRC
|
Posted: Tue Jul 27, 2004 8:52 am Post subject: |
|
|
gna wrote: | I think all the packages mentioned in the howto have ebuilds. Can you be a bit more precise about what kind of ebuild? |
like chinese UTF-8 ones. the chinese UTF-8 locale can not be found on either. Sorry, I am a bloody old newbie, dunno much about it, there is one article on transfering to Chinese UTF-8, but that guy seemed to fail it too. I am fed up with GB-2312 and Big5.
People have to transfer their fonts and locale to use UTF-8, fonts never display correctly. _________________ I am the only being whose doom
No tongue would ask no eye would mourn
I never caused a thought of gloom
A smile of joy since I was born.
emily bronte |
|
Back to top |
|
|
Gatak Apprentice
Joined: 04 Jan 2004 Posts: 174
|
Posted: Sun Aug 22, 2004 10:19 am Post subject: |
|
|
I have one problem with UTF-8. It is that I cannot mount a WindowsXP share with UTF-8. All extended characters come out very wrong, or simply missing.
But if I mount a Samba share from WindowsXP, UTF-8 works.
I tried with mount -o iocharset=utf8 with no luck.
EDIT: It works now with: Code: | mount -t smbfs -o iocharset=utf8,codepage=cp850 |
|
|
Back to top |
|
|
gna n00b
Joined: 19 Mar 2003 Posts: 38 Location: Beijing
|
Posted: Sun Aug 22, 2004 12:28 pm Post subject: |
|
|
Actually you are still using samba to mount your Windows XP partition. That is what the means. Without samba it should be or depending on whether you are using an ntfs or a fat32 partition for windows.
Also I think you need to have the appropriate code page modules compiled as modules or built into your kernel for mount to be able to use iocharset correctly. See File Systems -> Native Language Support |
|
Back to top |
|
|
Gatak Apprentice
Joined: 04 Jan 2004 Posts: 174
|
Posted: Sun Aug 22, 2004 12:40 pm Post subject: |
|
|
I think you are mistaking me what I wanted to do. I am not mounting a partition, but a Windows share over the network. Code: | mount -t smbfs //windowsmachine/share /mnt/win -o username=blah,password=blah,iocharset=utf8,codepage=cp850 |
What is odd is that the codepage statement is needed. The purpose of Unicode is to provice a single universal characterset so no codepage translations will ever be nessesary between applications and systems. |
|
Back to top |
|
|
gna n00b
Joined: 19 Mar 2003 Posts: 38 Location: Beijing
|
Posted: Mon Aug 23, 2004 10:51 pm Post subject: |
|
|
I have tried this on a Win2k share and am also having similar problems.
Why did you chose cp850?
Is cp850 the default codepage on your windows XP?
What is the default nls in your kernel?
thanks |
|
Back to top |
|
|
Gatak Apprentice
Joined: 04 Jan 2004 Posts: 174
|
Posted: Mon Aug 23, 2004 10:58 pm Post subject: |
|
|
The codepage should be irrelevant when using UTF-8 (Unicode). This is the whole point with Unicode.
My default NLS in the kernel is UTF-8.
Windows XP and Windows 2000 are using Unicode for SMB shares, not single-byte codepages. This is why it is so strange when Samba required me to choose one.
cp850 is a "western latin-1" codepage so this is why I tested it. Windows 2000/XP uses codepages for non-Unicode applications only.
Normally, a character is described as 8 bits. This makes it possible to have 256 different ones. Naturally. 256 characters aren't enough to describe all languages and all systems. Therefore codepages were developed so applications could know what the specific byte would be.
If two users were to talk to eachother over the net their systems would need to use the same codepage or characters would end up wrong.
Unicode was developed to remedy this. Unicode is large enough to be able to describe most (all?) languages in the world. Therefore the need for other codepage is removed. The biggest remaining problem is to have full unicode fonts. The fullest one I know is Arial Unicode MS. It has about 55000 characters defined. |
|
Back to top |
|
|
gna n00b
Joined: 19 Mar 2003 Posts: 38 Location: Beijing
|
Posted: Tue Aug 24, 2004 3:06 am Post subject: |
|
|
I agree that it should not be necessary to specify a codepage and, preferably, also no iocharset. It seems that that is the way it is intended to work. Why that is not working is either a bug or a configuration error.
Two more suggestions:
In the kernel configuration check
File Systems -> Network File Systems -> SMB File System support -> Use a default NLS -> utf8
It seems you can specify two default NLS's in the kernel, one for smbfs and one for other stuff.
Also try using the cifs filesystem. Just replace smbfs with cifs in your mount command (assuming it is configured in the kernel). cifs doesn't have a codepage option and is supposed to have better international support than smbfs. Cifs is now recommended over smbfs for all except old smb systems. Documentation is in /usr/src/linx/fs/cifs/README
If you can't get it to work then it might be good to ask a question on the linux cifs mailing list and/or file a bug report. |
|
Back to top |
|
|
Leo Lausren Apprentice
Joined: 24 Feb 2004 Posts: 198 Location: Denmark
|
Posted: Tue Aug 24, 2004 6:25 am Post subject: |
|
|
ecatmur wrote: | Hmm, I have to use unicode_start to get the UTF-8 characters to work... |
I made a script that echoes the \E%G to the terminals at boot, called /etc/init.d/unicode. It probably needs some work to be of general use. Code: |
#!/sbin/runscript
conf=/etc/env.d/02locale
# Using devfs?
if [ -e /dev/.devfsd ] || [ -e /dev/.udev -a -d /dev/vc ]; then
device=/dev/vc/
else
device=/dev/tty
fi
depend() {
need localmount
after keymaps
before consolefont
}
checkconfig() {
if [ -r ${conf} ]; then
. ${conf}
encoding=
[ -n "${LC_ALL}" ] && encoding=${LC_ALL#*.} && return 0
[ -n "${LC_MESSAGES}" ] && encoding=${LC_MESSAGES#*. } && return 0
[ -n "${LANG}" ] && encoding=${LANG#*.} && return 0
fi
eend 1 "Locale is not configured, Please fix ${conf}"
return 1
}
start() {
ebegin "setting consoles to UTF-8"
checkconfig
if [ "${encoding}" = "UTF-8" -o "${encoding}" = "utf-8 " ]; then
dumpkeys | loadkeys --unicode
for ((i=1; i <= "${RC_TTY_NUMBER}"; i++)); do
echo -ne "\033%G" > ${device}${i}
done
eend 0
else
eend 1 "UTF-8 is not required"
fi
}
|
_________________ Blog: common sense – nonsense |
|
Back to top |
|
|
max4ever Tux's lil' helper
Joined: 29 Jul 2004 Posts: 87 Location: almost in hell
|
Posted: Thu Sep 02, 2004 8:54 pm Post subject: |
|
|
umm so if i did this Code: | linuxoid max # cat /etc/env.d/99locale
LANG=it_IT.utf8
LC_CTYPE=it_IT.utf8 | does this means now that anywhere in linux now i can see any character from any language if the terminal supports or the software UTF-8 ? i'm having problems getting my linux to show romanian specific letters in kde and mplayer... _________________ Stop posting your PC's hardware as your signature. |
|
Back to top |
|
|
Gatak Apprentice
Joined: 04 Jan 2004 Posts: 174
|
Posted: Thu Sep 02, 2004 9:19 pm Post subject: |
|
|
Only if the application you use has a font which includes these characters. And only if the application support UTF-8. |
|
Back to top |
|
|
max4ever Tux's lil' helper
Joined: 29 Jul 2004 Posts: 87 Location: almost in hell
|
Posted: Fri Sep 03, 2004 11:27 am Post subject: |
|
|
hmm, and how can i find out if a font has "support" for those characters ? for example i'm having problems with mplayer showing correctly subtitles..., can u suggest some font with utf8 support and antialias ? _________________ Stop posting your PC's hardware as your signature. |
|
Back to top |
|
|
Gatak Apprentice
Joined: 04 Jan 2004 Posts: 174
|
Posted: Fri Sep 03, 2004 11:41 am Post subject: |
|
|
You can try to load the font in a character map program. I think there is one in Gnome. It allows you to see which characters exist in the font. Then you have to use that font in mplayer.
But remember, the subtitles that you load in mplayer may not be encoded with UTF-8, but some other local encoding. Mplayer would need to support that one. |
|
Back to top |
|
|
andrewski Guru
Joined: 30 Apr 2004 Posts: 366 Location: Royersford, PA, USA
|
Posted: Sat Oct 02, 2004 2:50 am Post subject: |
|
|
It'd be great if you could post a bit on the various fonts that are necessary to complete the effort to actually "see" UTF, i.e. console font, *term font. In all my searching, I haven't been able to figure that one out!
Also, where does CONSOLETRANSLATION from /etc/rc.conf come in? Perhaps that's necessary to seal the deal, as it were?
Thanks for a nice howto. |
|
Back to top |
|
|
obmun n00b
Joined: 17 Jan 2004 Posts: 66 Location: Europe (Spain)
|
Posted: Sat Oct 02, 2004 11:22 am Post subject: |
|
|
@andreskwi:
Forget about UTF-8 in console. It won't work completely (compose chars won't work). For more info take a look at this post. There I have some info about console font. Essentialy you have to use a console font with unicode map. Also it's good to have a font that makes use of the full 512 available gliphs (and not just one with 256).
CONSOLETRANSLATION tells setfont the translation map it will use to translate program output from 8 bit to the UTF-8 the kernel expects (kernel is always in UTF-8. It always execpts to recive unicode chars) when you're not using UTF-8. If apps are already sending UTF-8 chars it's not necessary to use the translation map and therefore CONSOLETRANSLATION should be commented out if you're using UTF-8 as your default coding. |
|
Back to top |
|
|
talon n00b
Joined: 11 Jun 2003 Posts: 13
|
Posted: Thu Oct 07, 2004 10:36 pm Post subject: gtk utf-8 |
|
|
My major problem in porting my machine to utf-8 was that all gtk-1 apps didn´t display chars correctly. After a long time of experimenting I figured out how to do it right. You have to add the following line to your ~/.gtkrc.mine:
Code: |
style "gtk-default" {
fontset = "-*-luxi sans-medium-r-normal--10-*-*-*-p-*-iso10646-1,\
-*-luxi sans-medium-r-normal--10-*-*-*-p-*-iso10646-1,\
-*-r-*-iso10646-1,*"
}
class "GtkWidget" style "gtk-default"
|
replace the "luxi sans" with your favorite font and the "10" with your preferable size. Even when you work with themes they won´t overwrite this file . |
|
Back to top |
|
|
Haqqax n00b
Joined: 11 Jul 2004 Posts: 35
|
Posted: Fri Oct 08, 2004 10:13 pm Post subject: |
|
|
Can anyone shed some light on how to force (or whether it can be done at all) KDE apps to work with Unicode Plane1 characters?
I have been testing a little the last two days. I managed to create a font with just a few characters encoded in Plane1 (they start with 0x12000 - I am trying to make my linux support Akkadian cuneiform), I installed it and created with Perl a text file and HTML file for tests. HTML has both plain text chars and character entity references.
The only applications that processes and displays these files correctly are Firefox (it does display cuneiform texts ) and Thunderbird (I did send a cuneiform e-mail to myself, and when it arrived it got displayed correctly ) All the other applications, including but not limited to: OpenOffice, Konqueror and standard KDE apps do not parse UTF from Plane 1 correctly (they split one code into 2 chars) and of course do not display the text correctly. I am particularely disappointed by OpenOffice in this matter.
Can my KDE be cured? Does my success with Firefox and Thunderbird mean, that other GTK editors may work equally well? |
|
Back to top |
|
|
gna n00b
Joined: 19 Mar 2003 Posts: 38 Location: Beijing
|
Posted: Sat Oct 09, 2004 5:50 am Post subject: |
|
|
Actually this topic is of interest to me too. I know that a lot of applications ignore the supplementary planes. There is a UTF-8 project at freedesktop.org that is trying to make a list of non unicode compliant software. In particular they have a list of unicode software that doesn't work for the supplementary planes. Unfortunately this list is very short. But if you do find out something please report here and let us all know.
What software did you use to make your font? It would be helpful to know so that more people know how to do testing.
thanks |
|
Back to top |
|
|
Haqqax n00b
Joined: 11 Jul 2004 Posts: 35
|
Posted: Sat Oct 09, 2004 2:09 pm Post subject: |
|
|
Quote: | What software did you use to make your font? |
I used FontForge.
I was really surprised (in a positive sense) by this program. I like it very much.
I was not able to successfylly set up encoding for my font from within the user interface - I just opened SFD file with Vim and updated the encoding manually:
Code: |
Encoding: unicode4
UnicodeInterp: none
|
It is a new program to me, maybe some other settings are also important. I noticed (try and error ) that if you make a mistake in "Encoding", SourceForge will change it to "Custom"
I am still reading about the file format.
Quote: | list of unicode software that doesn't work for the supplementary planes |
They only list Vim and Emacs? I would say Vim does better job than KDE editors and OpenOffice. I wonder whether it would not work if I had proper console font. I can only see that Vim does know how many characters I have - it displays question marks instead of them, and it has no other choice because I only have truetype font for my encoding. OpenOffice 1.1.2 did not get that far. I am upgrading today to 1.1.3.
I do not have Emacs to test. I think one might try to use Thunderbird's editor to edit these texts (sooner or later other editors will support Plane1 too). I will investigate this if I have some time The other solution may be to build console font and check whether medit can be used for editing. Building IME for medit is extremely easy. I think this approach would be successful - but it does not meet my goal.
I would like to use cuneiform just like I use Chinese - not to have to do a magick dance with special macros, hacking too much with fonts and having to use specialized editors. I want to open all the files in editors I use for everyday work and input them with IMEs I normally use. |
|
Back to top |
|
|
numerodix l33t
Joined: 18 Jul 2002 Posts: 743 Location: nl.eu
|
Posted: Sat Oct 09, 2004 7:40 pm Post subject: |
|
|
Ok, so I finally succeeded in getting this to work, my /etc/env.d/02locale now looks like this:
Code: | LC_CTYPE="no_NO.utf8"
LANG="en_US.utf8" |
After restarting X (you may want to mention that without restarting it just won't work) I was relieved to find out that apparently both qt and gtk now recognize the character set, filenames displayed correctly in konqueror etc. It looks like the apps that I use in X are working fine in this respect.
What is still missing is unicode support in the console, that is outside of X. I'm not exactly sure what it takes to get filenames to display correctly, sometimes I have to run unicode_start, sometimes it seems to work without it. But input is still not working, that is the keys æøå. My /etc/rc.conf looks like this:
Code: | KEYMAP="no-latin1"
CONSOLEFONT="lat0-16"
CONSOLETRANSLATION="8859-1_to_uni" |
While I use X 98% of the time, it's a little problematic to have this bug if anything thas to be done from the shell. Any ideas?
[edit]The euro symbol is not working either, whatever I've done I've never been able to activate it. _________________ undvd - ripping dvds should be as simple as unzip |
|
Back to top |
|
|
Haqqax n00b
Joined: 11 Jul 2004 Posts: 35
|
Posted: Sun Oct 10, 2004 12:57 am Post subject: |
|
|
gna wrote: | But if you do find out something please report here and let us all know. |
Well, I did some additional tests and the results are very good.
I made a test IME for my Akkadian font in SCIM and IT WORKS. I can write Akkadian just like Chinese!
This can be usable in academic projects. If I send you a TTF font and you install it, I will gain the possibility to send you emails in Akkadian. Thunderbird will display them for you, you can save text files correctly,etc. And with SCIM, you can also write Akkadian back to me. If there only was an word-processing application, it would be so easy to write books, prepare tests for students etc.
As I said Firefox works well with Plane1 (only deleting is a little broken - you have to backspace each character twice, as it happend sometimes woth chinese on English systems in the old days - ie. not all the bytes of the character are deleted at once). So, if PostgreSQL is Plane1 ready (I did not check yet) we might start to collaborate on some Akkadian data (dictionary, book, text repository - and not only Akkadian) already encoded in the future standard (UNICODE did not accept the sumero-akkadian cuneiform encoding yet) just like we can with English. We have everything in place. Even if the encoding will finally change, it would be a matter of minutes to write the script to fix the existing texts. I think I could build such collaboration platform to be usable in a week - if someone would donate glyphs for the cuneiform font (I think the beginning might be the fonts created for TeX by Mr Piska. Or one might buy fonts from Michel Everson ).
Well, the only problem now seems to be the retarded language support in Qt and KDE. I am extremely frustrated by this. Can someone write how non-BMP encodings are supported in GNOME applications?
PS: OpenOffice 1.1.3 is no better than 1.1.2 with support for Plane1 characters. |
|
Back to top |
|
|
Gatak Apprentice
Joined: 04 Jan 2004 Posts: 174
|
Posted: Sun Oct 10, 2004 1:05 am Post subject: |
|
|
It think most Gnome applications support Unicode. At least if compiled in with accessibility support. In GEdit, for example, I can view all sorts of Unicode characters. I suppose I still need truetype or opentype fonts in system that support Unicode. |
|
Back to top |
|
|
Haqqax n00b
Joined: 11 Jul 2004 Posts: 35
|
Posted: Sun Oct 10, 2004 1:23 am Post subject: |
|
|
Gatak wrote: | It think most Gnome applications support Unicode. At least if compiled in with accessibility support. In GEdit, for example, I can view all sorts of Unicode characters. I suppose I still need truetype or opentype fonts in system that support Unicode. |
To be clear - there is no problem with BMP in KDE (chinese, IPA, arabic without vowels) - so Unicode is supported. I am interested in support for codes beyond 0xFFFF |
|
Back to top |
|
|
Haqqax n00b
Joined: 11 Jul 2004 Posts: 35
|
Posted: Sun Oct 10, 2004 4:59 pm Post subject: |
|
|
I've got one more question: are Hebrew niqud and Arabic vowels displayed correctly on your Gentoo boxes? On my box they are displayed, but are not positioned correctly on their characters.
And, of course, arabic ligatures are broken by the vowels.
Is it working for anyone? |
|
Back to top |
|
|
obmun n00b
Joined: 17 Jan 2004 Posts: 66 Location: Europe (Spain)
|
Posted: Mon Oct 11, 2004 3:07 pm Post subject: |
|
|
@numerodix:
Console and UTF-8? Bad mixture. Take a look at this post. There I analize the problem. Conclusion? It's a kernel problem. |
|
Back to top |
|
|
|