On 15 December 2016 at 21:17, Toshio Kuratomi a.badger@gmail.com wrote:
On Mon, Dec 12, 2016 at 1:39 AM, Nick Coghlan ncoghlan@gmail.com wrote:
I don't anticipate any major concerns with downstream redistributors adding this behaviour, as the main thing that makes us nervous about globally changing the default upstream is the sheer variety of Linux distros out there, and the fact that folks are inclined to take their Linux integration bugs straight to bugs.python.org rather than first trying the issue tracker for their particular distro.
My one concern is precisely this variety. For instance, if I get a report that my application is raising a UnicodeError on RHEL7 when run under cron (which uses the C locale) I might then try to replicate the error on Fedora using the same LC_ALL=C locale. With this change I would fail to reproduce the error.
But with the current patch you *would* get a visible warning on stderr saying:
Python detected LC_CTYPE=C. Setting LC_ALL & LANG to C.UTF-8.
This is a variation on arguments about why individual sites should not change the default encoding via sitecustomize.py. The changes tend to make python applications non-portable. I don't think it is as severe because we're still able to broadly classify things as "Fedora Python" vs "Upstream Python" (instead of "Python running at My Business" vs "Python running on the rest of the world" but it still is problematic.
Agreed, and my original idea upstream included an environment variable override to account for that case: http://bugs.python.org/issue28180#msg282964
I just forgot about that bit while writing the initial patch :(
As documented at https://docs.python.org/3/using/cmdline.html#environment-variables the normal convention for Python environment variable toggles is "A non-empty string setting enables it", so the name I'd suggest here is PYTHONALLOWCLOCALE.
The error message would then change to:
Python detected LC_CTYPE=C, forcing LC_ALL & LANG to C.UTF-8 (set PYTHONALLOWCLOCALE to disable this behaviour)
and if the environment variable is already set:
Python detected LC_CTYPE=C, but PYTHONALLOWCLOCALE is set. Some applications may not work correctly.
Does that approach seem more reasonable than unilateral locale coercion with no off switch?
OTOH, if this is a stepping stone and proving ground for getting it into upstream Python then we just get this change a little early... that's IMHO, a good thing.
Yeah, my goal is to standardise this upstream for 3.7, but I expect folks to be more willing to make it the default behaviour on *nix systems if at least some distros are willing to try it out in their releases of 3.6 first.
Perhaps what's needed is a locale on Fedora that allows people to select an ascii encoding for python which does not coincide with the C locale. This should satisfy the case you mention that *most* of the time the C locale is not a conscious desire to select the ascii encoding but also, as I'm pointing out, the need to select an ascii-only encoding for debugging cross-platform scripts and applications.
As in an explicit "LANG=C.ASCII"? While I agree that would work, it's probably more complexity than is needed vs a dedicated off switch for the locale coercion.
On the other hand, if *glibc* were to some day start natively interpreting "no locale set" or an unqualified "C" locale as "C.UTF-8", then I agree a "C.ASCII" locale to explicitly opt in to the old behaviour would make sense.
[..]
As far as where we might add that check, I'd suggest the entry point for the `python3` binary itself, rather than in the shared library: https://hg.python.org/cpython/file/3.6/Programs/python.c#l46
I think the library is the appropriate place. Otherwise you end up with a python application failing when run under mod_wsgi[*]_ which you can't debug using the command line interpreter.
There's one pragmatic problem with that, and one that's a question of appropriate division of responsibilities in terms of understanding the runtime's context of use.
The pragmatic problem is that the main CPython binary calls https://docs.python.org/3/c-api/sys.html#c.Py_DecodeLocale to convert the command line arguments from char* to wchar_t* before it calls Py_Main, which means we have to override the locale *before* we hand over control to the dynamically linked library. Otherwise we end up in exactly the same situation that click complains about: by the time we find out there's a problem with the locale, some work has already been done using the wrong setting.
The architectural problem is that when you embed CPython, it really is one of the embedding application's responsibilities to configure the locale such that the interpreter plays nice with the rest of the application. It's one thing to second guess the shell from directly inside a C-level main() function when we know POSIX makes some really old ASCII-centric assumptions and that developers are prone to writing "LANG=C" rather than "LANG=C.UTF-8" to turn off their locale settings, but something else entirely to second guess a GUI application like Blender (where arbitrary amounts of code may have already run before the CPython runtime gets initialised) or an application platform with its own environment management system like Apache httpd.
Cheers, Nick.