At 09:51 PM 1/4/2011 +1100, Graham Dumpleton wrote:
>Add another point. FWIW, these are coming up because of questions
>being asked on python-dev IRC channel about PEP 3333.
>The issue as it came down to was that the PEP may not be clear enough
>in explaining that where str() is unicode and as such something like
>PATH_INFO, although unicode, is actually bytes decoded as ISO-8859-1,
>needed to be re encoded/decoded to get it back to Unicode in the
>charset required before use.
>They were thinking that because it was unicode already they could use
>it as is and not need to do anything. Ie., didn't realise that need to
> path_info = environ.get('PATH_INFO', '')
> path_info = path_info.encode('ISO-8859-1').decode('UTF-8')
>for example to get it interpreted as UTF-8 first. They were simply
>looking at concatenating new URL bits to the ISO-8859-1 variant from
>other unicode strings that weren't bytes represented as ISO-8859-1.
>In Python 2.X it was obvious that since it wasn't unicode that you had
>to decode it, but confusion may arise for Python 3.X if this
>requirement is not explicitly spelled out with a code example like
>We all may see it as obvious and yes perhaps it could be covered in
>separate articles or commentaries be people, but given this person was
>new to it, maybe it is deserving of more explanation in the PEP itself
>if they were confused.
It would be really awesome if somebody would write separate
Application Authors' Guide and Middleware Authors' Guides to
WSGI. They don't need to know absolutely everything in the PEP,
unlike server authors.
>It could also be that the PEP covers it adequately already. I am too
>tired to read through it again right now.
It's pretty prominently stated early on that NO strings in the spec
are really unicode, they're just bytes packed into unicode objects.
Obviously, no matter how prominently this is stated, some people will
still make this mistake, but if desired, we could always put some
additional info near the environ part of the spec for clarification.
(It occurs to me in retrospect that I should probably have updated
wsgiref in the stdlib to check the bytesy-ness of strings used to
create Header objects. Too late for 3.2, though.)