View previous topic :: View next topic |
Author |
Message |
Singularity n00b
Joined: 28 May 2017 Posts: 9
|
Posted: Mon Oct 23, 2017 9:36 am Post subject: Multilingual filenames |
|
|
Hello,
I usually have to work with files that are saved with filenames from different languages.
Usually the are:
English (Latin)
Slovak\Czech
Russian\Ukrainian (Cyrillic)
My locale is en_US.UTF-8
The problem is that all filenames different from Latin has unreadable characters (squares, questions marks).
F.e. Motivan�� list.pdf
How do I fix this?
Thanks in advance |
|
Back to top |
|
|
charles17 Advocate
Joined: 02 Mar 2008 Posts: 3684
|
Posted: Mon Oct 23, 2017 10:08 am Post subject: |
|
|
Where does the problem appear? Console or X? What if you list containing directories in Firefox or other internet browser? |
|
Back to top |
|
|
Singularity n00b
Joined: 28 May 2017 Posts: 9
|
Posted: Mon Oct 23, 2017 11:32 am Post subject: |
|
|
charles17 wrote: | Where does the problem appear? Console or X? What if you list containing directories in Firefox or other internet browser? |
This problem appears in the terminal and DE. Now I am using KDE as my main DE.
Last edited by Singularity on Mon Oct 23, 2017 12:19 pm; edited 1 time in total |
|
Back to top |
|
|
P.Kosunen Guru
Joined: 21 Nov 2005 Posts: 309 Location: Finland
|
Posted: Mon Oct 23, 2017 12:02 pm Post subject: |
|
|
Font could be problem if it does not contain those characters. |
|
Back to top |
|
|
Singularity n00b
Joined: 28 May 2017 Posts: 9
|
Posted: Mon Oct 23, 2017 12:22 pm Post subject: |
|
|
P.Kosunen wrote: | Font could be problem if it does not contain those characters. |
I've just edit my comment. I tried to check if these file names are handle correctly by terminal. They are don't, the same situation.
I played with fonts, doesn't help also.
Maybe Kernel Native Language Support?
Any ideas? |
|
Back to top |
|
|
krinn Watchman
Joined: 02 May 2003 Posts: 7470
|
Posted: Mon Oct 23, 2017 10:12 pm Post subject: |
|
|
it's because of the chosen language no?
Code: | touch téléphone
LANG=C ls
't'$'\303\251''l'$'\303\251''phone'
LANG=fr_FR ls
téléphone |
|
|
Back to top |
|
|
desultory Bodhisattva
Joined: 04 Nov 2005 Posts: 9410
|
Posted: Tue Oct 24, 2017 3:51 am Post subject: |
|
|
Singularity wrote: | I tried to check if these file names are handle correctly by terminal. They are don't, the same situation.
I played with fonts, doesn't help also. | Played with in what way?
Singularity wrote: | Maybe Kernel Native Language Support?
Any ideas? | If your kernel lacks it, try enabling it.
krinn wrote: | it's because of the chosen language no?
Code: | touch téléphone
LANG=C ls
't'$'\303\251''l'$'\303\251''phone'
LANG=fr_FR ls
téléphone |
| No. Code: | $ touch téléphone
$ ls téléphone
téléphone
$ locale
LANG=C
LC_CTYPE=en_US.utf8
LC_NUMERIC=C
LC_TIME=C
LC_COLLATE=C
LC_MONETARY=C
LC_MESSAGES=C
LC_PAPER=C
LC_NAME=C
LC_ADDRESS=C
LC_TELEPHONE=C
LC_MEASUREMENT=C
LC_IDENTIFICATION=C
LC_ALL=
|
|
|
Back to top |
|
|
krinn Watchman
Joined: 02 May 2003 Posts: 7470
|
Posted: Tue Oct 24, 2017 9:43 am Post subject: |
|
|
desultory, you are showing that you create the file while you have support for the C language in your system.
It's different.
Look again:
Code: | touch téléphone
ls
téléphone
locale
LANG=fr_FR
LC_CTYPE="fr_FR"
LC_NUMERIC="fr_FR"
LC_TIME="fr_FR"
LC_COLLATE=C
LC_MONETARY="fr_FR"
LC_MESSAGES="fr_FR"
LC_PAPER="fr_FR"
LC_NAME="fr_FR"
LC_ADDRESS="fr_FR"
LC_TELEPHONE="fr_FR"
LC_MEASUREMENT="fr_FR"
LC_IDENTIFICATION="fr_FR"
LC_ALL=
|
but look when unsupported set is use
Code: | LANG=stupidtest ls
't'$'\303\251''l'$'\303\251''phone'
|
It display unicode values, with my consolefont i could display cyrillic set myself (default one in gentoo) and i have only utf-8 (french) support.
Look (taken from here)
Code: | touch "$(echo -e "\u0448\040A\u0470")"
ls -la
total 2404
drwxr-xr-x 2 root root 4096 24 oct. 11:34 .
drwxrwxrwt 9 root root 2453504 24 oct. 11:09 ..
-rw-r--r-- 1 root root 0 24 oct. 11:34 téléphone
-rw-r--r-- 1 root root 0 24 oct. 11:33 'ш AѰ'
|
|
|
Back to top |
|
|
desultory Bodhisattva
Joined: 04 Nov 2005 Posts: 9410
|
Posted: Wed Oct 25, 2017 4:02 am Post subject: |
|
|
krinn wrote: | desultory, you are showing that you create the file while you have support for the C language in your system.
It's different. | While you are correct in noting that there was a mistake in that post, you are incorrect as to what the mistake was. The C locale is not UTF aware, but my having left LC_CTYPE=en_US.utf8 in effect provided that awareness to the resultant locale set. |
|
Back to top |
|
|
Singularity n00b
Joined: 28 May 2017 Posts: 9
|
Posted: Wed Nov 01, 2017 11:40 am Post subject: |
|
|
I have enable language support in Kernel, but nothing change.
Please see below my locale settings.
Code: | eselect locale list
Available targets for the LANG variable:
[1] C
[2] POSIX
[3] de_DE.iso885915@euro
[4] de_DE@euro
[5] en_GB
[6] en_GB.iso88591
[7] en_GB.utf8 *
[8] en_US
[9] en_US.iso88591
[10] en_US.utf8
[11] ru_RU
[12] ru_RU.cp1251
[13] ru_RU.iso88595
[14] ru_RU.koi8r
[15] ru_RU.utf8
[16] russian
[17] sk_SK
[18] sk_SK.iso88592
[19] sk_SK.utf8
[20] slovak
[21] uk_UA
[22] uk_UA.koi8u
[23] uk_UA.utf8
[ ] (free form) |
Code: | locale
LANG=C
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE=C
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL= |
and
Code: | 02locale
LC_COLLATE="C"
LANG="en_GB.utf8" |
I need to leave the whole system including massages, date format etc. in English but be able to work with non-English file names.
What should i change in configuration?
Thanks for support |
|
Back to top |
|
|
desultory Bodhisattva
Joined: 04 Nov 2005 Posts: 9410
|
Posted: Thu Nov 02, 2017 4:00 am Post subject: |
|
|
Singularity wrote: | What should i change in configuration? |
desultory wrote: | The C locale is not UTF aware, but my having left LC_CTYPE=en_US.utf8 in effect provided that awareness to the resultant locale set. | Using en_GB.utf8 should have essentially identical effect, however your problem is not, at least directly, simply that.
For whatever reason your running shell is using the C locale for everything except LC_ALL, even LANG which is explicitly set to en_GB.utf8. Given the blank LC_ALL, this appears to be either deliberate, and possibly induced by attempting to solve your initial problem, or a result of having changed the system locale settings without updating your running shell(s). In short, given bash as your running shell, source ~/.bashrc and try the "téléphone" test again. |
|
Back to top |
|
|
Mr. T. Guru
Joined: 26 Dec 2016 Posts: 477
|
Posted: Thu Nov 02, 2017 11:37 am Post subject: |
|
|
These filenames may be encoded with a different encoding than your system encoding. Some features are different depending on the environment.
Singularity wrote: | I usually have to work with files that are saved with filenames from different languages. |
You may be interested by app-text/conmv to convert a filename encoding. I do not know how the things are designed internally.
Linux NLS parameter wrote: | The default NLS used when mounting file system. Note, that this is the NLS used by your console, not the NLS used by a specific file system (if different) to store data (filenames) on a disk. |
I have read the encoding is defined by the current locale so I suppose the keyboard encoding is associated with the character set defined by the current locale.
N.B: Oddly, I modified the keyboard mapping from azerty to dvorak-r, edited some files and reinitialized the keymap but the control sequences are still interpreted using the dvorak keymap. |
|
Back to top |
|
|
deltoro05 n00b
Joined: 30 May 2024 Posts: 1
|
Posted: Thu May 30, 2024 2:06 pm Post subject: |
|
|
This URL https://unicode-table.com/ has changed to https://symbl.cc/
krinn wrote: | desultory, you are showing that you create the file while you have support for the C language in your system.
It's different.
Look again:
Code: | touch téléphone
ls
téléphone
locale
LANG=fr_FR
LC_CTYPE="fr_FR"
LC_NUMERIC="fr_FR"
LC_TIME="fr_FR"
LC_COLLATE=C
LC_MONETARY="fr_FR"
LC_MESSAGES="fr_FR"
LC_PAPER="fr_FR"
LC_NAME="fr_FR"
LC_ADDRESS="fr_FR"
LC_TELEPHONE="fr_FR"
LC_MEASUREMENT="fr_FR"
LC_IDENTIFICATION="fr_FR"
LC_ALL=
|
but look when unsupported set is use
Code: | LANG=stupidtest ls
't'$'\303\251''l'$'\303\251''phone'
|
It display unicode values, with my consolefont i could display cyrillic set myself (default one in gentoo) and i have only utf-8 (french) support.
Look (taken from here)
Code: | touch "$(echo -e "\u0448\040A\u0470")"
ls -la
total 2404
drwxr-xr-x 2 root root 4096 24 oct. 11:34 .
drwxrwxrwt 9 root root 2453504 24 oct. 11:09 ..
-rw-r--r-- 1 root root 0 24 oct. 11:34 téléphone
-rw-r--r-- 1 root root 0 24 oct. 11:33 'ш AѰ'
|
|
|
|
Back to top |
|
|
|