Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Accented characters in filenames
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
kramer2718
Tux's lil' helper
Tux's lil' helper


Joined: 24 Mar 2006
Posts: 78

PostPosted: Tue Aug 29, 2006 8:48 pm    Post subject: Accented characters in filenames Reply with quote

Hi. I run Gentoo 2.6.16 on an AMD 64.

I have several files originally from various Windows systems that have accented characters (from multiple languages). They don't display correctly and I can't type them. I have no idea what encoding they are in. Does anyone have any idea how to determine what the encoding is and how to display and type them properly? I don't think that I want to change my keyboard layout or locale as most of my work is conducted in American English.

Any thoughts?
_________________
God save us from His followers.
Back to top
View user's profile Send private message
SirYes
Apprentice
Apprentice


Joined: 15 Jan 2006
Posts: 282
Location: Lodz, Poland

PostPosted: Tue Aug 29, 2006 9:56 pm    Post subject: Reply with quote

Depending on the file system where the data resides (NTFS / FAT32), you can mount the partition in question passing the iocharset, codepage or maybe even nls and/or utf8 options.

Example:
(NTFS)
# mount -t ntfs -o ro,nls=utf8 /dev/your_partition /mnt/your_mountpoint
# mount -t ntfs -o ro,iocharset=iso-8859-2 /dev/your_partition /mnt/your_mountpoint

(FAT32)
# mount -t vfat -o rw,iocharset=utf8,codepage=852 /dev/your_partition /mnt/your_mountpoint
# mount -t vfat -o rw,iocharset=iso-8859-1,codepage=850 /dev/your_partition /mnt/your_mountpoint
# mount -t vfat -o rw,iocharset=iso-8859-2,codepage=852,utf8=true /dev/your_partition /mnt/your_mountpoint


See also:
* /usr/src/linux/Documentation/filesystems/ntfs.txt
* /usr/src/linux/Documentation/filesystems/vfat.txt
_________________
My blog: In search for ultimate programming language
Back to top
View user's profile Send private message
kramer2718
Tux's lil' helper
Tux's lil' helper


Joined: 24 Mar 2006
Posts: 78

PostPosted: Tue Aug 29, 2006 11:08 pm    Post subject: Reply with quote

The data resides on a reiserfs partition. The files were uploaded from my buddy's Windows laptop via scp.
_________________
God save us from His followers.
Back to top
View user's profile Send private message
SirYes
Apprentice
Apprentice


Joined: 15 Jan 2006
Posts: 282
Location: Lodz, Poland

PostPosted: Thu Aug 31, 2006 5:22 pm    Post subject: Reply with quote

kramer2718 wrote:
The data resides on a reiserfs partition. The files were uploaded from my buddy's Windows laptop via scp.

So, you want to type the names of the files in the terminal, right? Two quick solutions that come to my mind:
  • Type several first letters of the file's name and press the "Tab" key -- bash should auto-complete the name for you.
  • Open your terminal and do:
    Code:
    $ cd directory_where_files_reside
    $ ls > filenames.txt
    $ gedit filenames.txt &

    (assuming you have gedit installed, if not - replace it with editor of your choice).
    Then you'll have all the names in a file, you can copy them and paste in the terminal.

Heck, you can even rename the files in a batch:
Code:
NUM=1; for file in *; do echo mv ${file} file-${NUM}; NUM=$((NUM + 1)); done

This is actually the test run. If everything looks okay, remove the "echo" before "mv" and execute the command again:
Code:
NUM=1; for file in *; do mv ${file} file-${NUM}; NUM=$((NUM + 1)); done

If you want some other naming scheme, change "file-${NUM}" to some other pattern, like "track-${NUM}.mp3" or similar. You get the idea.

This should help you out.
_________________
My blog: In search for ultimate programming language
Back to top
View user's profile Send private message
kramer2718
Tux's lil' helper
Tux's lil' helper


Joined: 24 Mar 2006
Posts: 78

PostPosted: Thu Aug 31, 2006 8:45 pm    Post subject: Reply with quote

SirYes wrote:
kramer2718 wrote:
The data resides on a reiserfs partition. The files were uploaded from my buddy's Windows laptop via scp.

So, you want to type the names of the files in the terminal, right? Two quick solutions that come to my mind:
  • Type several first letters of the file's name and press the "Tab" key -- bash should auto-complete the name for you.
  • Open your terminal and do:
    Code:
    $ cd directory_where_files_reside
    $ ls > filenames.txt
    $ gedit filenames.txt &

    (assuming you have gedit installed, if not - replace it with editor of your choice).
    Then you'll have all the names in a file, you can copy them and paste in the terminal.

Heck, you can even rename the files in a batch:
Code:
NUM=1; for file in *; do echo mv ${file} file-${NUM}; NUM=$((NUM + 1)); done

This is actually the test run. If everything looks okay, remove the "echo" before "mv" and execute the command again:
Code:
NUM=1; for file in *; do mv ${file} file-${NUM}; NUM=$((NUM + 1)); done

If you want some other naming scheme, change "file-${NUM}" to some other pattern, like "track-${NUM}.mp3" or similar. You get the idea.

This should help you out.


Creative and helpful, but not really a complete solution.

1) How can I discover what encoding those files are?
2) Can I setup my system to handle multiple encodings including these?
3) I presume that when I have installed these encodings, I'll be able to type them using Ctrl, Alt, + character.

Anyone know about how linux handles character encodings?
_________________
God save us from His followers.
Back to top
View user's profile Send private message
SirYes
Apprentice
Apprentice


Joined: 15 Jan 2006
Posts: 282
Location: Lodz, Poland

PostPosted: Fri Sep 01, 2006 8:46 am    Post subject: Reply with quote

kramer2718 wrote:
Anyone know about how linux handles character encodings?

For a good start see Gentoo Linux Localization Guide.

As a small experiment please type
Code:
locale
in your terminal to know what your current locale settings are.

Quote:
1) How can I discover what encoding those files are?

Actually, while copying files through scp it was sending strings denoting file names in original encoding, and it was different from the one your system used. The sftp that was receiving this information just routed each string to the file system (and kernel's NLS /National Language Support/ component) in order to create a file with the same name. Exact meaning of each letter was not taken into consideration at this time, it was just a bunch of bytes passed from user space down into lower lever layer. The meaning of each character now depends on your current locale.

I'm not sure if you can discover what encoding the files are now. For 8-bit code pages the same byte code (234, for example) can describe a different character in different code pages. And there are literally dozens of old code pages. Probably you'd have to extract byte codes representing "unknown" characters and compare them against existing code pages. Or maybe there is some other way, but I don't know it?

Computers are not people. They don't assign a meaning to symbols such as letters. For computers a letter is just a combination of numbers (a byte or several bytes) which are transferred from one place to another. It's us that interpret letters and give them meaning. But you probably know that already. ;)

Quote:
2) Can I setup my system to handle multiple encodings including these?

Yes, there is something just for this purpose. It's called Unicode and includes about 65000 characters, contrary to the 8-bit encodings which can describe a maximum of 256 characters. A variant of Unicode is called UTF-8 and both Linux kernel (including file systems it supports) and Gentoo have a first class support for it. However, it's not automatic and you need to configure the system a bit in order to use it. Again, see the localization guide for more explanation.

Quote:
3) I presume that when I have installed these encodings, I'll be able to type them using Ctrl, Alt, + character.

It depends on your keyboard settings. It's different for console mode (settable in /etc/conf.d/consolefont) and for graphical terminals (keyboard layout settable in different ways in KDE/GNOME/etc).

Personally I think that all electronics, computers, operating systems and programs should move away from ASCII to Unicode. Globally. Alas, this is only an optimistic wish... :?
_________________
My blog: In search for ultimate programming language
Back to top
View user's profile Send private message
kramer2718
Tux's lil' helper
Tux's lil' helper


Joined: 24 Mar 2006
Posts: 78

PostPosted: Fri Sep 01, 2006 10:17 pm    Post subject: Reply with quote

Is there a way that I can configure sftp or scp to handle filenames in the proper way? I know if I transfer those files from one Windows box to another, Windows displays the characters properly.
_________________
God save us from His followers.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum