On May 24, 4:49 pm, "Buchs, Kevin" <buchs.ke...@mayo...> wrote:
> I often paste content from web pages into an emacs org-mode buffer and I
> get the odd quote characters or dashes that are not ASCII. I created a
> lisp function to remove the unicode ones that are just 8 bits. Lately I
> am seeing that there are characters that are not being caught. They show
> up in emacs as the expected character. When I kill/yank them into lisp
> code, they are not being found. When I save the buffer, I am asked for
> coding and chose raw text. When the file is opened again, these
> characters are showing up as some sort of special symbol (dashed circle
> with flag off the top) followed by doubles/triples of \2xx. For example,
> the dash character I just stored was this sequence: circle-flag \200
> \231. Using Gnu/Linux od to dump them I get hex strings such as: 340 245
> 206 340 244 206 210 200 and for the dash mentioned above 342 200 231.
> I am very naive in regard to coding, so please excuse my ignorance. I
> would guess these are 16-bit (Unicode16) characters. Can someone
> enlighten me as to how I can determine what these characters are (after
> pasted into a buffer) and how I can code a function to replace them with
> ASCII equivalents? The only thing I could think of was hexl mode, but
> that didn't turn out well. Thanks.
better to embrace unicode than fight it.
what encoding you have when you paste is rather complex. I guess it
depends on the sources you copy from, as each web page can be in diff
charset and encoding then am not sure your OS do some translation in