View previous topic :: View next topic |
Author |
Message |
candamil Tux's lil' helper
Joined: 19 Mar 2012 Posts: 96
|
Posted: Mon Apr 02, 2012 9:10 am Post subject: Wrong charset in a tomcat servlet |
|
|
Hi, guys, I hope you can help me with this problem. First of all, the servlet works fine, I have been using it for several months in several systems, but it fails now, in a new gentoo installation, so it's problem of the installation.
Ok, let me explain. The servlet lists some items within a cathegory, inserted from a web browser, and stores it in a file, whose name is the hash of the cathegory, created with the java function hashCode(). The servlet works with UTF-8. The files are created also in UTF-8. I had some files created in the other systems, but when I tried to get into one of them, whose name has an accent (Películas), the servlet wasn't able to find the file. I tried to create a new one, and I discovered that it's not working properly with accents (so the charset is wrong).
I opened the new file with kwrite. It should have this: Películas. If I set the charcode to UTF-8, I get this: PelÃculas, but if I set the charcode to ISO, I get this: PelÃÂculas. I have tried with several encodings, but it's wrong with all of them. It seems it writes the word with an enconding, it stores it with another... so the word is wrong. And because of the wrong word, the hash and the name of the file are wrong (556892427 instead of 1014027990).
The system is configured in UTF-8:
Code: |
candamil@nomada ~ $ locale
LANG=es_ES.UTF-8
LC_CTYPE="es_ES.UTF-8"
LC_NUMERIC="es_ES.UTF-8"
LC_TIME="es_ES.UTF-8"
LC_COLLATE="es_ES.UTF-8"
LC_MONETARY="es_ES.UTF-8"
LC_MESSAGES="es_ES.UTF-8"
LC_PAPER="es_ES.UTF-8"
LC_NAME="es_ES.UTF-8"
LC_ADDRESS="es_ES.UTF-8"
LC_TELEPHONE="es_ES.UTF-8"
LC_MEASUREMENT="es_ES.UTF-8"
LC_IDENTIFICATION="es_ES.UTF-8"
LC_ALL=es_ES.UTF-8
|
and the browser (chromium) has set the encoding option to "Unicode (UTF-". This is the language support section of my kernel, 3.2.12-gentoo:
Code: |
│ │ --- Native language support │ │
│ │ (utf8) Default NLS Option │ │
│ │ < > Codepage 437 (United States, Canada) │ │
│ │ < > Codepage 737 (Greek) │ │
│ │ < > Codepage 775 (Baltic Rim) │ │
│ │ <*> Codepage 850 (Europe) │ │
│ │ < > Codepage 852 (Central/Eastern Europe) │ │
│ │ < > Codepage 855 (Cyrillic) │ │
│ │ < > Codepage 857 (Turkish) │ │
│ │ < > Codepage 860 (Portuguese) │ │
│ │ < > Codepage 861 (Icelandic) │ │
│ │ < > Codepage 862 (Hebrew) │ │
│ │ < > Codepage 863 (Canadian French) │ │
│ │ < > Codepage 864 (Arabic) │ │
│ │ < > Codepage 865 (Norwegian, Danish) │ │
│ │ < > Codepage 866 (Cyrillic/Russian) │ │
│ │ < > Codepage 869 (Greek) │ │
│ │ < > Simplified Chinese charset (CP936, GB2312) │ │
│ │ < > Traditional Chinese charset (Big5) │ │
│ │ < > Japanese charsets (Shift-JIS, EUC-JP) │ │
│ │ < > Korean charset (CP949, EUC-KR) │ │
│ │ < > Thai charset (CP874, TIS-620) │ │
│ │ < > Hebrew charsets (ISO-8859-8, CP1255) │ │
│ │ < > Windows CP1250 (Slavic/Central European Languages) │ │
│ │ < > Windows CP1251 (Bulgarian, Belarusian) │ │
│ │ <*> ASCII (United States) │ │
│ │ < > NLS ISO 8859-1 (Latin 1; Western European Languages) │ │
│ │ < > NLS ISO 8859-2 (Latin 2; Slavic/Central European Languages) │ │
│ │ < > NLS ISO 8859-3 (Latin 3; Esperanto, Galician, Maltese, Turkish) │ │
│ │ < > NLS ISO 8859-4 (Latin 4; old Baltic charset) │ │
│ │ < > NLS ISO 8859-5 (Cyrillic) │ │
│ │ < > NLS ISO 8859-6 (Arabic) │ │
│ │ < > NLS ISO 8859-7 (Modern Greek) │ │
│ │ < > NLS ISO 8859-9 (Latin 5; Turkish) │ │
│ │ < > NLS ISO 8859-13 (Latin 7; Baltic) │ │
│ │ < > NLS ISO 8859-14 (Latin 8; Celtic) │ │
│ │ <*> NLS ISO 8859-15 (Latin 9; Western European Languages with Euro) │ │
│ │ < > NLS KOI8-R (Russian) │ │
│ │ < > NLS KOI8-U/RU (Ukrainian, Belarusian) │ │
│ │ <*> NLS UTF-8 │ │
|
and this is my locale.gen file:
Code: |
nomada linux # more /etc/locale.gen
# /etc/locale.gen: list all of the locales you want to have on your system
#
# The format of each line:
# <locale> <charmap>
#
# Where <locale> is a locale located in /usr/share/i18n/locales/ and
# where <charmap> is a charmap located in /usr/share/i18n/charmaps/.
#
# All blank lines and lines starting with # are ignored.
#
# For the default list of supported combinations, see the file:
# /usr/share/i18n/SUPPORTED
#
# Whenever glibc is emerged, the locales listed here will be automatically
# rebuilt for you. After updating this file, you can simply run `locale-gen`
# yourself instead of re-emerging glibc.
#en_US ISO-8859-1
#en_US.UTF-8 UTF-8
#ja_JP.EUC-JP EUC-JP
#ja_JP.UTF-8 UTF-8
#ja_JP EUC-JP
#en_HK ISO-8859-1
#en_PH ISO-8859-1
#de_DE ISO-8859-1
#de_DE@euro ISO-8859-15
#es_MX ISO-8859-1
#fa_IR UTF-8
#fr_FR ISO-8859-1
#fr_FR@euro ISO-8859-15
#it_IT ISO-8859-1
es_ES.UTF-8 UTF-8
es_ES@euro ISO-8859-15
|
I think everything is configured exactly as in my old system, so I don't know where the problem is. You can check the files (the correct old one and the incorrect new one) in these links:
http://www.wupload.com/file/2683169542/correct.xml
http://www.wupload.com/file/2683169547/incorrect.xml
You just need to check the 4th line, <NombreCategoria>
Any help would be appreciated. Thank you. |
|
Back to top |
|
|
86me n00b
Joined: 20 Jul 2009 Posts: 20
|
Posted: Sun Apr 08, 2012 4:27 am Post subject: |
|
|
The xml links you listed are not publicly viewable.
Quote: | NOTE: Wupload does not allow files to be shared. We are a STORAGE ONLY product so you can only download your own files. If you have uploaded this file yourself, login first in order to download it. |
|
|
Back to top |
|
|
candamil Tux's lil' helper
Joined: 19 Mar 2012 Posts: 96
|
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|