FW: Diacritics on the Net

James Ruckle (jruckle@citynet.net)
Thu, 5 Sep 1996 19:36:08 -0400

----------
From: MishaGMCLA@aol.com[SMTP:MishaGMCLA@aol.com]
Sent: Monday, September 02, 1996 1:11 PM
To: jruckle@citynet.net
Cc: manuelf@scf-fs.usc.edu
Subject: Re: Diacritics on the Net

Hi, a friend referred your question to me, knowing I would tell you more =
than you ever wanted to know about the question...

>This is a technical issue, but one for the entire list to address. Is =
there a way to send diacritical (Spanish) characters over >the Internet? =
I am using Microsoft Rich Text format, but that doesn't help if you all =
can't read it. It would be nice to have a >fairly universal way of =
sending accented characters, and the ASCII character set does include =
them, that's how I get them >on my Web pages.

No, unfortunately, the original ASCII character set does *not* include =
accented characters. ASCII values go from 0 to 127 ("7-bit ASCII"), =
while the values from 128 to 255 ("high-bit characters", where the =
eighth bit of the byte is set to 1 instead of 0), which include accents, =
bullets, mathematical symbols, yen signs and a lot of other useful =
stuff, were never standardized. =20

I don't know about UNIX or other systems, but there are two de facto =
microcomputer standards, DOS/Windows and Macintosh. You may have =
noticed that the DOS character set (I don't know whether it was =
originated by IBM or
Microsoft) has lower-case acute accents but not upper-case, except for E =
(that's probably because French uses it, but they were apparently less =
concerned with Spanish at the time). You will also notice, if you're =
still using Alt-plus-numeric-keypad, that there's no rhyme or reason to =
the numbers. More recent software has provided more convenient ways of =
getting the accents where they belong.

The Macintosh developers used the high-bit characters differently - =
since fonts and graphics were part of the Mac operating system from the =
beginning, the Symbol font with a full range of mathematical characters =
was available,
and box-drawing characters were unnecessary, so the Mac character set =
includes things like curly quotes, dashes, etc. The Mac also has a much =
broader range of accented characters, including upper-case. =20

Unfortunately, however, not one has the same value as the corresponding =
character in the DOS set, so if PC users and Mac users "tildan sus =
letras" and email them to each other, the result is not pretty. =20

The text attached to your message, accented correctly on the Mac, =
appears as follows (I don't know how it's going to look when it gets to =
you, but it'll be as strange as the PC-accented text was to me):

Esta es cuestion tecnica, pero una para todos los listeros. ?Hay manera =
de mandar letras diacriticas por el Internet/InterRed? Estoy usando el =
Formato Textual Rico de Microsoft, pero eso no vale si no pueden leerlo. =
Seria bueno
si hubiera metodo mas o menos universal de mandar letras acentadas, y el =
ASCII las incluye, pues por eso las tienen mis paginas de Tela.

What's more, different mailers and transmission systems process the =
high-bit characters differently, which is why you get subject lines like =
"FIESTA DEL =3D?iso-8859-1?Q?A=3DD1O" and email messages with things =
like "much=3DEDsimo" in them.=20

The moral of the story is that there's no standard for email, which is =
(let's face it) a lowest-common-denominator system to begin with. =
People who insist on tilding (like me) use apostrophes next to the =
letter. Not gorgeous, but clear.

On the Web, however, there *is* a standard, ISO-8859-1, from the =
International Standards Organization. It looks awful in raw HTML, but =
browsers are supposed to translate it into the right characters for =
whatever system the browser is running on. The codes all begin with & =
ampersand and end with ; semicolon. (Any good HTML or Website editor -- =
the one I use on the Mac is BBEdit -- can convert text from your word =
processor into the clunkies below. The command is probably "translate =
special characters" or something like that. And if your Web pages don't =
use these codes, they won't look very good to Mac or UNIX users.)

á
é
í
&oacute:
ú

uppercase:

Á
É
etc.

e~ne:

ñ uppercase: Ñ

u-dieresis (as in g"uero; the dieresis is called "umlaut" in German):

ü

Unfortunately, not all browsers understand the inverted question mark =
and exclamation point (they were apparently not in the original HTML 1.0 =
standard, but added later):

¿
¡

You may also find the copyright character useful (the little circled c):
©

So, to make a very long story short, in email there really isn't a good
answer, while on the Web there's a somewhat clumsy but fully functional
answer.

If you need more information, email me back. (I've been around =
computers --
and using accent marks -- for a long time...)

Misha Schutt
Reference Librarian
Burbank (Calif.) Public Library
mishagmcla@aol.com

James' addendum:

Thank you for the advice, although what I meant is that I do know how to =
get diacritics on Web pages but couldn't be certain that my emails were =
getting them across. My directory is <A =
HREF=3D"http://www.citynet.net/personal/ruckle/">, and half the pages =
are in Spanish, so I know that I can do those, with the exception you =
mentioned of HTML 1.0 browsers. The more recent browsers will retrieve =
email through your Web connection so that you can read them in hypertext =
and even click on references like the one given above. This is probably =
what we need to move towards globally, and if Lynx accepts HTML 2.0 =
(without graphics) we'll have it made. The most universal standard =
appears to be neither Micro nor Mac but (predictably) Unix and AOL's =
MIME. Since Unix started out life as a clunky mainframe language used by =
the U.S. Department of Defense in the mid-1970s, all computers "speak" =
it and it's the basis for the Internet Protocol. AOL, with its zillion =
customers on all sorts of platforms, had to develop a graphical standard =
so people could send pictures, hence MIME. Before I get rebuked for =
turning lasnet into a technical forum, I'll just say that the jury =