Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Multilingual filenames
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Installing Gentoo
View previous topic :: View next topic  
Author Message
Singularity
n00b
n00b


Joined: 28 May 2017
Posts: 9

PostPosted: Mon Oct 23, 2017 9:36 am    Post subject: Multilingual filenames Reply with quote

Hello,

I usually have to work with files that are saved with filenames from different languages.

Usually the are:
English (Latin)
Slovak\Czech
Russian\Ukrainian (Cyrillic)

My locale is en_US.UTF-8

The problem is that all filenames different from Latin has unreadable characters (squares, questions marks).

F.e. Motivan�� list.pdf

How do I fix this?

Thanks in advance
Back to top
View user's profile Send private message
charles17
Advocate
Advocate


Joined: 02 Mar 2008
Posts: 3684

PostPosted: Mon Oct 23, 2017 10:08 am    Post subject: Reply with quote

Where does the problem appear? Console or X? What if you list containing directories in Firefox or other internet browser?
Back to top
View user's profile Send private message
Singularity
n00b
n00b


Joined: 28 May 2017
Posts: 9

PostPosted: Mon Oct 23, 2017 11:32 am    Post subject: Reply with quote

charles17 wrote:
Where does the problem appear? Console or X? What if you list containing directories in Firefox or other internet browser?


This problem appears in the terminal and DE. Now I am using KDE as my main DE.


Last edited by Singularity on Mon Oct 23, 2017 12:19 pm; edited 1 time in total
Back to top
View user's profile Send private message
P.Kosunen
Guru
Guru


Joined: 21 Nov 2005
Posts: 309
Location: Finland

PostPosted: Mon Oct 23, 2017 12:02 pm    Post subject: Reply with quote

Font could be problem if it does not contain those characters.
Back to top
View user's profile Send private message
Singularity
n00b
n00b


Joined: 28 May 2017
Posts: 9

PostPosted: Mon Oct 23, 2017 12:22 pm    Post subject: Reply with quote

P.Kosunen wrote:
Font could be problem if it does not contain those characters.


I've just edit my comment. I tried
Code:
ls
to check if these file names are handle correctly by terminal. They are don't, the same situation.

I played with fonts, doesn't help also.

Maybe Kernel Native Language Support?

Any ideas?
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Mon Oct 23, 2017 10:12 pm    Post subject: Reply with quote

it's because of the chosen language no?
Code:
touch téléphone
LANG=C ls
't'$'\303\251''l'$'\303\251''phone'
LANG=fr_FR ls
téléphone
Back to top
View user's profile Send private message
desultory
Bodhisattva
Bodhisattva


Joined: 04 Nov 2005
Posts: 9410

PostPosted: Tue Oct 24, 2017 3:51 am    Post subject: Reply with quote

Singularity wrote:
I tried
Code:
ls
to check if these file names are handle correctly by terminal. They are don't, the same situation.

I played with fonts, doesn't help also.
Played with in what way?
Singularity wrote:
Maybe Kernel Native Language Support?

Any ideas?
If your kernel lacks it, try enabling it.

krinn wrote:
it's because of the chosen language no?
Code:
touch téléphone
LANG=C ls
't'$'\303\251''l'$'\303\251''phone'
LANG=fr_FR ls
téléphone
No.
Code:
$ touch téléphone
$ ls téléphone
téléphone
$ locale
LANG=C
LC_CTYPE=en_US.utf8
LC_NUMERIC=C
LC_TIME=C
LC_COLLATE=C
LC_MONETARY=C
LC_MESSAGES=C
LC_PAPER=C
LC_NAME=C
LC_ADDRESS=C
LC_TELEPHONE=C
LC_MEASUREMENT=C
LC_IDENTIFICATION=C
LC_ALL=
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Tue Oct 24, 2017 9:43 am    Post subject: Reply with quote

desultory, you are showing that you create the file while you have support for the C language in your system.
It's different.

Look again:
Code:
touch téléphone
ls
téléphone
locale
LANG=fr_FR
LC_CTYPE="fr_FR"
LC_NUMERIC="fr_FR"
LC_TIME="fr_FR"
LC_COLLATE=C
LC_MONETARY="fr_FR"
LC_MESSAGES="fr_FR"
LC_PAPER="fr_FR"
LC_NAME="fr_FR"
LC_ADDRESS="fr_FR"
LC_TELEPHONE="fr_FR"
LC_MEASUREMENT="fr_FR"
LC_IDENTIFICATION="fr_FR"
LC_ALL=

but look when unsupported set is use
Code:
LANG=stupidtest ls
't'$'\303\251''l'$'\303\251''phone'

It display unicode values, with my consolefont i could display cyrillic set myself (default one in gentoo) and i have only utf-8 (french) support.
Look (taken from here)
Code:
touch "$(echo -e "\u0448\040A\u0470")"
ls -la
total 2404
drwxr-xr-x 2 root root    4096 24 oct.  11:34  .
drwxrwxrwt 9 root root 2453504 24 oct.  11:09  ..
-rw-r--r-- 1 root root       0 24 oct.  11:34  téléphone
-rw-r--r-- 1 root root       0 24 oct.  11:33 'ш AѰ'
Back to top
View user's profile Send private message
desultory
Bodhisattva
Bodhisattva


Joined: 04 Nov 2005
Posts: 9410

PostPosted: Wed Oct 25, 2017 4:02 am    Post subject: Reply with quote

krinn wrote:
desultory, you are showing that you create the file while you have support for the C language in your system.
It's different.
While you are correct in noting that there was a mistake in that post, you are incorrect as to what the mistake was. The C locale is not UTF aware, but my having left LC_CTYPE=en_US.utf8 in effect provided that awareness to the resultant locale set.
Back to top
View user's profile Send private message
Singularity
n00b
n00b


Joined: 28 May 2017
Posts: 9

PostPosted: Wed Nov 01, 2017 11:40 am    Post subject: Reply with quote

I have enable language support in Kernel, but nothing change.

Please see below my locale settings.

Code:
eselect locale list
Available targets for the LANG variable:
  [1]   C
  [2]   POSIX
  [3]   de_DE.iso885915@euro
  [4]   de_DE@euro
  [5]   en_GB
  [6]   en_GB.iso88591
  [7]   en_GB.utf8 *
  [8]   en_US
  [9]   en_US.iso88591
  [10]  en_US.utf8
  [11]  ru_RU
  [12]  ru_RU.cp1251
  [13]  ru_RU.iso88595
  [14]  ru_RU.koi8r
  [15]  ru_RU.utf8
  [16]  russian
  [17]  sk_SK
  [18]  sk_SK.iso88592
  [19]  sk_SK.utf8
  [20]  slovak
  [21]  uk_UA
  [22]  uk_UA.koi8u
  [23]  uk_UA.utf8
  [ ]   (free form)


Code:
locale
LANG=C
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE=C
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=

and
Code:
02locale
LC_COLLATE="C"
LANG="en_GB.utf8"


I need to leave the whole system including massages, date format etc. in English but be able to work with non-English file names.

What should i change in configuration?

Thanks for support
Back to top
View user's profile Send private message
desultory
Bodhisattva
Bodhisattva


Joined: 04 Nov 2005
Posts: 9410

PostPosted: Thu Nov 02, 2017 4:00 am    Post subject: Reply with quote

Singularity wrote:
What should i change in configuration?
desultory wrote:
The C locale is not UTF aware, but my having left LC_CTYPE=en_US.utf8 in effect provided that awareness to the resultant locale set.
Using en_GB.utf8 should have essentially identical effect, however your problem is not, at least directly, simply that.

For whatever reason your running shell is using the C locale for everything except LC_ALL, even LANG which is explicitly set to en_GB.utf8. Given the blank LC_ALL, this appears to be either deliberate, and possibly induced by attempting to solve your initial problem, or a result of having changed the system locale settings without updating your running shell(s). In short, given bash as your running shell, source ~/.bashrc and try the "téléphone" test again.
Back to top
View user's profile Send private message
Mr. T.
Guru
Guru


Joined: 26 Dec 2016
Posts: 477

PostPosted: Thu Nov 02, 2017 11:37 am    Post subject: Reply with quote

These filenames may be encoded with a different encoding than your system encoding. Some features are different depending on the environment.

Singularity wrote:
I usually have to work with files that are saved with filenames from different languages.


You may be interested by app-text/conmv to convert a filename encoding. I do not know how the things are designed internally.

Linux NLS parameter wrote:
The default NLS used when mounting file system. Note, that this is the NLS used by your console, not the NLS used by a specific file system (if different) to store data (filenames) on a disk.


I have read the encoding is defined by the current locale so I suppose the keyboard encoding is associated with the character set defined by the current locale.

N.B: Oddly, I modified the keyboard mapping from azerty to dvorak-r, edited some files and reinitialized the keymap but the control sequences are still interpreted using the dvorak keymap.
Back to top
View user's profile Send private message
deltoro05
n00b
n00b


Joined: 30 May 2024
Posts: 1

PostPosted: Thu May 30, 2024 2:06 pm    Post subject: Reply with quote

This URL https://unicode-table.com/ has changed to https://symbl.cc/


krinn wrote:
desultory, you are showing that you create the file while you have support for the C language in your system.
It's different.

Look again:
Code:
touch téléphone
ls
téléphone
locale
LANG=fr_FR
LC_CTYPE="fr_FR"
LC_NUMERIC="fr_FR"
LC_TIME="fr_FR"
LC_COLLATE=C
LC_MONETARY="fr_FR"
LC_MESSAGES="fr_FR"
LC_PAPER="fr_FR"
LC_NAME="fr_FR"
LC_ADDRESS="fr_FR"
LC_TELEPHONE="fr_FR"
LC_MEASUREMENT="fr_FR"
LC_IDENTIFICATION="fr_FR"
LC_ALL=

but look when unsupported set is use
Code:
LANG=stupidtest ls
't'$'\303\251''l'$'\303\251''phone'

It display unicode values, with my consolefont i could display cyrillic set myself (default one in gentoo) and i have only utf-8 (french) support.
Look (taken from here)
Code:
touch "$(echo -e "\u0448\040A\u0470")"
ls -la
total 2404
drwxr-xr-x 2 root root    4096 24 oct.  11:34  .
drwxrwxrwt 9 root root 2453504 24 oct.  11:09  ..
-rw-r--r-- 1 root root       0 24 oct.  11:34  téléphone
-rw-r--r-- 1 root root       0 24 oct.  11:33 'ш AѰ'
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Installing Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum