• embrace unicode, because it's just going to be more and more.
Programing Languages are all default on unicode by spec (e.g. any html/
systems all default to unicode encoding now (not sure about linux).
Even emacs, starting with emacs 23, uses unicode as default internal
• Unicode is about 2 things: ① a char set with a integer ID for each
char. ② several encoding for the char set, most popular being utf-8
and utf-16 (the latter are default on Mac, Windows). (encoding is a
standard that changes a char from a char set into byte sequence)
• in emacs, just put this in your init:
that should put all encoding to utf-8, and shouldn't cause you any
problem if all your curretn file and elisp file are ascii, because
ascii encoding is compatible/subset of utf-8/unicode.
• in emacs, call describe-car. That'll show the current char's
encoding as well as byte sequence used for that particular encoding.
(this is emacs 24. Emacs 23 may not show the byte sequence... i don't
my unicode tutorial covers all these… feel free to ask me, or here, of
On May 25, 6:40 am, "Buchs, Kevin" <buchs.ke...@mayo...> wrote:
> Thanks, Xah and Eli, for contributing to my further understanding. I
> went to a specific website where I got the content I copied and pasted
> and I can see from the HTML that it has a charset=UTF-8, so I understand
> that is Unicode 8-bit. Using the C-u C-x =, I see that the particular
> character I pasted has a code point of 0x2013 (U+2013). I didn't see,
> however, what the UTF-8 encoding of that code point was. Should I be
> able to read that somewhere on the buffer of information I get with C-u
> C-x = ? I was poking around thewww.unicode.orgwebsite, trying to
> understand how this U+2013 code point is encoded into UTF-8, but I
> haven't determined that yet.
> A fresh buffer in emacs for me on my Win-7 box has an encoding system of
> iso-latin-1-dos. The coding system used to open and save files is the
> So, help me piece together what happens as I paste the UTF-8 text into a
> buffer. First, the paste buffer must define that it is in UTF-8. Emacs
> reads this information and inserts it into the byte string that defines
> the buffer. Now, how does emacs record that it was a UTF-8 encoded
> character? Does it translate it into a different internal encoding
> instead of just recording the 8 bits transferred? Is this encoding used
> as a superset of all possible encoding systems that emacs supports?
> Now, Xah, you suggest I embrace Unicode. What does that mean? Would it
> involve marking all my lisp library files and my org-mode files with the
> file variable -*- coding: utf-8 -*- ? Or is there another way to go
> Unicode automatically?
> I assume that if my lisp library files are encoded utf-8, then I can
> paste that character from the web page into my call to replace-string in
> order to substitute the longer dash of Unicode U+2013 with an ascii
> hyphen or double hyphen. But, how does that really work? If the lisp
> file is encoded utf-8, then how can I put an ascii character in the
> replacement string?
> I would appreciate it if someone could help me open this new door in my
> brain a bit further.
> Kevin Buchs | Senior Engineer | SPPDG | 507-538-5459 |
> buchs.ke...@mayo... > Mayo Clinic | 200 First Street SW | Rochester, MN 55905 |http://www.mayo.edu/sppdg >
> -----Original Message-----
> With cursor on that character, type "C-u C-x =", and Emacs will show
> everything it knows about that character, including its canonical