View previous topic :: View next topic |
Author |
Message |
slycordinator Advocate
Joined: 31 Jan 2004 Posts: 3065 Location: Korea
|
Posted: Wed Oct 16, 2019 5:49 am Post subject: [SOLVED] Python thinks encoding is ANSI_X3.4-1968; not utf8 |
|
|
python3 is borking on utf-8 files, giving that the ascii codec can't decode them and when I check the default encoding that python thinks I have, it gives ANSI_X3.4-1968.
Code: | # python3 -c "import locale; print(locale.getpreferredencoding(False))"
ANSI_X3.4-1968 |
My locale is set as utf-8
Code: | # eselect locale list
Available targets for the LANG variable:
[1] C
[2] C.utf8
[3] en_US
[4] en_US.ansix341968
[5] en_US.utf8 *
[6] ko_KR.euckr
[7] ko_KR.utf8
[8] POSIX
[ ] (free form) |
Code: | # locale
LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=en_US.utf8 |
Code: | # cat /etc/env.d/02locale
# Configuration file for eselect
# This file has been automatically generated.
LANG="en_US.utf8"
LC_ALL="en_US.utf8" |
The LC_ALL setting was added by hand and after I ran "env-update && source /etc/profile". I read that the setting could affect this.
But the output/errors are the same before and after. python still thinks my default encoding is "ANSI" instead of uft8 _________________ My political stance/bias
slycordinator != slycoordinator
Last edited by slycordinator on Thu Oct 17, 2019 8:56 am; edited 1 time in total |
|
Back to top |
|
|
slycordinator Advocate
Joined: 31 Jan 2004 Posts: 3065 Location: Korea
|
Posted: Wed Oct 16, 2019 10:19 am Post subject: |
|
|
So, I created a 1-line file that's utf-8 (just random letters plus some random Korean characters). python thinks it's encoded as ANSI.
Code: | # file blah
blah: UTF-8 Unicode text
# python3
Python 3.6.9 (default, Oct 10 2019, 00:27:28)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f=open('blah','r')
>>> f.encoding
'ANSI_X3.4-1968' |
Code: | # python3 -c 'import locale; print(locale.getdefaultlocale())'
('en_US', 'UTF-8')
# python3 -c 'import locale; print(locale.getpreferredencoding())'
ANSI_X3.4-1968 |
_________________ My political stance/bias
slycordinator != slycoordinator |
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Wed Oct 16, 2019 12:27 pm Post subject: |
|
|
A wild guess: did you install python without USE flag 'wide-unicode'? |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3151
|
Posted: Wed Oct 16, 2019 4:43 pm Post subject: |
|
|
Python is windows-retarded when it comes to guessing encoding.
I've hit this problem a long long time ago....
My program would work just fine when attached to the terminal, and crash on the first accented letter in any other case (file, pipe, etc).
The solution from SO or something like that was to unload some module and then load it again... And then it would output utf-8 to that file, so I had finally had accents without crashes. |
|
Back to top |
|
|
slycordinator Advocate
Joined: 31 Jan 2004 Posts: 3065 Location: Korea
|
Posted: Wed Oct 16, 2019 11:37 pm Post subject: |
|
|
mike155 wrote: | A wild guess: did you install python without USE flag 'wide-unicode'? | There is no 'wide-unicode' USE flag for python3 _________________ My political stance/bias
slycordinator != slycoordinator |
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Thu Oct 17, 2019 12:49 am Post subject: |
|
|
slycordinator wrote: | There is no 'wide-unicode' USE flag for python3 |
You're right. Only Python 2 has this USE flag. I'm sorry! |
|
Back to top |
|
|
Ant P. Watchman
Joined: 18 Apr 2009 Posts: 6920
|
Posted: Thu Oct 17, 2019 1:31 am Post subject: |
|
|
I get ANSI_X3.4-1968 if I set LC_ALL to an invalid value. Have you run locale-gen recently? |
|
Back to top |
|
|
slycordinator Advocate
Joined: 31 Jan 2004 Posts: 3065 Location: Korea
|
Posted: Thu Oct 17, 2019 6:03 am Post subject: |
|
|
I see I'm getting "failed to set locale" and "not found: no such file or directory" output upon running locale-gen.
And for the secondary locales, it gives weird "unknown character" errors.
I'll see if rebuilding glibc makes it work.
Definitely strange that locale-gen gives "success" while outputting error upon error. _________________ My political stance/bias
slycordinator != slycoordinator |
|
Back to top |
|
|
slycordinator Advocate
Joined: 31 Jan 2004 Posts: 3065 Location: Korea
|
Posted: Thu Oct 17, 2019 6:11 am Post subject: |
|
|
I see what happened.
In copying over files that I edited on a windows box, the locale.gen file got windows line endings and somehow locale-gen went looking for the files including the unprintable characters.
It's all good now. Although, it's still problematic that locale-gen succeeded with errors galore. _________________ My political stance/bias
slycordinator != slycoordinator |
|
Back to top |
|
|
|