> One comment about that page: it incorrectly states that an AS
> string consists of "Mac-Roman" characters; AS strings actually use
> the user's primary encoding, as determined from their International
> system preferences. For most US, western European and Antipodean
> users this will be MacRoman, but will often be different for folks
> in other parts of the world.
You know, I thought that too, but then I read a little closer and
realized that it's mostly correct. In fact, it defines its own term
"Mac-encoded" to mean "text data in your primary encoding". It does,
however, subtly assume that the primary encoding is MacRoman by
referring to un-encodable characters as "non-Roman".
The only other problem (encoding-wise) is in its definition of the
"string" contents, where it says
"The string class basically stores one byte ([0..255]) per character.
The 128 first values are rendered according to the ASCII standard ...
The 128 larger values are rendered using a macintosh encoding, the
one that goes with the first language listed in your International
In fact, the "string" class stores data encoded using the primary
encoding (which is indeed determined by the first language listed in
your International preference pane; that bit is fine.) Some
encodings are one-byte-per-character, some are mixed-one-and-two
(MacJapanese, for instance), and I don't know if there are any pure-
multi-byte encodings allowed these days.
The trick is that most of them are isomorphic to ASCII: bytes 0
through 127 mean the same thing everywhere. (Well, almost
everywhere. As the page points out, MacJapanese is not strictly
isomorphic -- 0x5C is a yen sign, not a backslash.) Some older Mac
encodings are completely different, such as MacArabic, but those
aren't supported these days except for import and export; the system
uses Unicode instead.
AppleScript and Automator Engineering