View previous topic :: View next topic |
Author |
Message |
i92guboj Bodhisattva
Joined: 30 Nov 2004 Posts: 10315 Location: Córdoba (Spain)
|
Posted: Tue Jan 26, 2010 10:17 am Post subject: ntfs-3g file names oddity |
|
|
Hi,
I've been far from Windows for quite a while. Now I need to share some data with a Windows user and I am a bit lost. FAT is quite limited, cdroms are an unnecessary waste (or so I hope), so I guessed that the remaining choices would be to use either NTFS or Ext2 with some of the Windows available drivers for that purpose.
I tried the ext2 solution first, because I am more comfortable with that. My first idea was to create a small 16mb FAT16 partition in a 8gb pendrive to host the ext2 driver. Then assign the remaining space as ext2. The problem is that at a later stage I discovered the hard way that Windows (sigh) can only handle one drive letter for a pendrive (Windows as its best, as always...). So, the only solution in that direction would be to ship two pens instead of one. Beautiful
The only option left is ntfs3g. I formatted the drive again as a single NTFS partition, copied the files, everything went well. But now Windows complains that it can't read some files. I discovered a few files with ':' in the middle, so I guess the problem is that there are "illegal" characters in the file names. Now, the natural question is why the hell does ntfs3g allow to create illegal file names in first place. The second natural question is: am I doing something wrong? Is there maybe some configuration settings that I am not aware of that will prevent this behavior so my NTFS's are actually useful for anyone else besides me?
I know I can resort to shell scripting to fix this, as long as I know all the illegal characters. It's a pretty trivial thing to do. But that's not the question.
Any ideas? Thanks in advance.
Edited: http://ubuntuforums.org/archive/index.php/t-1253837.html , A bit exasperating. Well, I guess I'll have to resort to bash to do the work. If someone has an idea I am still hearing.
Last edited by i92guboj on Tue Jan 26, 2010 10:29 am; edited 1 time in total |
|
Back to top |
|
|
xaviermiller Bodhisattva
Joined: 23 Jul 2004 Posts: 8723 Location: ~Brussels - Belgique
|
Posted: Tue Jan 26, 2010 10:28 am Post subject: |
|
|
Same other problem : two files with same name, but different capitalization (ex : "hello.txt" and "HELLO.TXT", which are identical for Windows, but different for *NIX). _________________ Kind regards,
Xavier Miller |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 23066
|
Posted: Wed Jan 27, 2010 4:32 am Post subject: Re: ntfs-3g file names oddity |
|
|
i92guboj wrote: | Now, the natural question is why the hell does ntfs3g allow to create illegal file names in first place. |
Based on the NTFS-3G FAQ, it appears that the developers consider it a feature that NTFS-3G obeys the looser POSIX rules.
i92guboj wrote: | I know I can resort to shell scripting to fix this, as long as I know all the illegal characters. |
When in doubt, be paranoid. The mention of pendrives suggests that there is a significant hassle associated with each failed run, so the safest thing to do would be to restrict the files you send to dots, dashes, lowercase letters, and numbers. The lowercase restriction will protect against confusing Windows by using two files that differ only by case. The other restrictions will keep you away from special characters, of which there are several. |
|
Back to top |
|
|
i92guboj Bodhisattva
Joined: 30 Nov 2004 Posts: 10315 Location: Córdoba (Spain)
|
Posted: Wed Jan 27, 2010 7:49 am Post subject: |
|
|
Thanks for all the input. I've read their FAQ, I can't say I agree with them but oh, well, it's their product, they can do whatever they want with it. This flaw renders their product useless for anyone that hasn't scripting capabilities though. But I don't want yet another war about that here.
I'll upload the script to my server once it's tested. I don't know when will that be, but I'll post a link here in case anyone else wants to actually use the ntfs-3g driver, because without a proper file name mangling mechanism it's useless for any practical purpose, unless all your file names use the DOS conventions. |
|
Back to top |
|
|
Cyker Veteran
Joined: 15 Jun 2006 Posts: 1746
|
Posted: Wed Jan 27, 2010 8:25 pm Post subject: |
|
|
Make all punctuation that isn't supported (Or just all punctuation aside from '.' if you can't be arsed) a '-' or a '_' (_ is the 'standard' P2P munging char), that should do it as a quick hack.
I'm hoping case shouldn't be an issue (I've yet to find a valid scenario where anyone would need a file with the exact same name in differing case!)
I still have a tendency to make my filenames 8+3, and if I can't then alphanumerics with no spaces, which has saved me many times from the filenaming wars |
|
Back to top |
|
|
i92guboj Bodhisattva
Joined: 30 Nov 2004 Posts: 10315 Location: Córdoba (Spain)
|
Posted: Wed Jan 27, 2010 9:01 pm Post subject: |
|
|
I try to keep it simple as well. However that doesn't help when you have to share lots of files from multiple sources. Most people simply lack the background, knowledge or simply do not care about it. |
|
Back to top |
|
|
Cyker Veteran
Joined: 15 Jun 2006 Posts: 1746
|
Posted: Wed Jan 27, 2010 9:56 pm Post subject: |
|
|
i92guboj wrote: | I try to keep it simple as well. However that doesn't help when you have to share lots of files from multiple sources. Most people simply lack the background, knowledge or simply do not care about it. |
Ugh, point.
Well, thank smeg that filenames aren't automatically generated like they are in MSWord or it'd be even worse considering Tabs and CRLF are considered a valid file characters in POSIX! |
|
Back to top |
|
|
i92guboj Bodhisattva
Joined: 30 Nov 2004 Posts: 10315 Location: Córdoba (Spain)
|
Posted: Wed Jan 27, 2010 10:06 pm Post subject: |
|
|
Yup.
I guess there are opinions in both directions. The ntfs-3g people seem to think that enforcing file names that can be handled in Windows is not a must, as long as the file names can be stored in the NTFS volume. Users on the contrary want to just use the driver, and are not interested in technicalities. Both are valid points I think, though I am inclined more in the later direction than the former. I don't care if my engine can handle bigger wheels if there's no way I can fit them into my car chassis. But, as said, it's their doing, so they have the right to do whatever they want. |
|
Back to top |
|
|
Cyker Veteran
Joined: 15 Jun 2006 Posts: 1746
|
Posted: Wed Jan 27, 2010 10:25 pm Post subject: |
|
|
Aye, alas this is all opinions and politics. I just want my shit to work
Soo... how's that script coming along? |
|
Back to top |
|
|
i92guboj Bodhisattva
Joined: 30 Nov 2004 Posts: 10315 Location: Córdoba (Spain)
|
Posted: Wed Jan 27, 2010 11:03 pm Post subject: |
|
|
It doesn't yet have any kind of error handling and it's mostly untested. I paste it below. You obviously need to change the mv lines so it does the actual work. I intentionally paste it this way because people usually like to blindly try untested stuff and then come back complaining. Be sure you don't check this against any important files, though because of its simplicity, I am fairly sure it should be harmless.
Code: | #!/bin/bash
# Uncomment this for extra output
#DEBUG="foo"
if [[ ! -d "$1" ]]; then
echo "Please, enter valid path."
exit 1
fi
find "$1" | while read O_FILENAME; do
FILENAME="$O_FILENAME"
[ -n "$DEBUG" ] && echo "* Processing $O_FILENAME"
for CHAR in \< \> : \" \\ \| \\? \\*; do
[ -n "$DEBUG" ] && echo " - Using $CHAR"
N_FILENAME="${FILENAME//$CHAR/_}"
if [[ ! "$FILENAME" == "$N_FILENAME" ]]; then
[ -n "$DEBUG" ] && echo " · FILENAME=$FILENAME"
[ -n "$DEBUG" ] && echo " · Filtered FILENAME=$N_FILENAME"
FILENAME="$N_FILENAME"
fi
done
if [[ ! "$O_FILENAME" == "$FILENAME" ]]; then
echo mv "$O_FILENAME" "$FILENAME"
fi
done |
|
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 23066
|
Posted: Thu Jan 28, 2010 3:50 am Post subject: |
|
|
i92guboj wrote: | I can't say I agree with them but oh, well, it's their product, they can do whatever they want with it. This flaw renders their product useless for anyone that hasn't scripting capabilities though. |
Yes. It would be more useful if there was a mount option to enforce proper names, either via mapping or by returning an error when attempting to create names which will cause trouble later. If it was an optional feature, they might accept an enhancement that provided this behavior.
If you want to make your renaming script a bit safer, you could add in a -i on the mv so that the user will be prompted for any moves that would destroy files. |
|
Back to top |
|
|
Cyker Veteran
Joined: 15 Jun 2006 Posts: 1746
|
Posted: Thu Jan 28, 2010 8:13 pm Post subject: |
|
|
To be fair, it is tricky just because of all the edge cases (Aside from upper and lower case being different, POSIX also allows non-printable characters; I've seen filename 'jokes' that reprogram your terminal when ls'd! ), plus what to do with filenames that would be duplicates post-mangle.
Samba is pretty much the only thing that tries to interop between the two namespaces, and it does very well in sane situations, but as soon as invalid chars come into the equasion, it will just reduce filenames to the mangled 8+3, or possibly hide them in the case of duplicates (e.g. README.TXT and readme.txt). |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 23066
|
Posted: Fri Jan 29, 2010 4:22 am Post subject: |
|
|
Cyker wrote: | To be fair, it is tricky just because of all the edge cases | Yes, it is tricky if you want to allow as much POSIX as possible without breaking Windows. However, it is theoretically straightforward to design a filter which ensures Windows compatibility if you are willing to have it err on the side of portability by denying edge cases that would work on Windows, but are difficult to recognize. The key would be to put the restrictions in the file creation path only, so that any existing files can be accessed, but no potentially disruptive files could be created.
A mode which takes the paranoid approach I described in my first post in this thread would be easy to implement by filtering on a per-character level. It would break some legitimate files, such as names with non-English characters, but it would ensure that every file written was accessible by Windows and that every failure was recognized by the sending program before transferring the filesystem to a Windows user. I think such a filter would be overzealous, and a logical evolution of it would be to relax the rules to facilitate, among other things, lowercase characters used in non-English languages. |
|
Back to top |
|
|
|