<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html><head> <title>UTF-8 Sampler</title> <META http-equiv="Content-Type" content="text/html; charset=utf-8"> </head><body bgcolor="#ffffff" text="#000000"> <h3>UTF-8 Sampler</h3> UTF-8 is an ASCII-preserving encoding method for <a href="unicode.html">Unicode</a> (ISO 10646), the Universal Character Set (UCS). The UCS encodes most of the world's writing systems in a single character set, allowing you to mix languages and scripts within a document without needing any tricks for switching character sets. This web page is encoded directly in UTF-8. <p> <a href="k95.html">Kermit 95</a> can display UTF-8 plain text in Windows NT, XP, or 2000 when using a monospace Unicode font like Lucida Console or Courier New. <a href="k95next.html">The forthcoming GUI version of Kermit 95</a> will be able to display it too, even in Windows 95, 98, and ME. <a href="ckermit.html">C-Kermit 7.0</a> and later can handle it too, <a href="http://www.cl.cam.ac.uk/~mgk25/unicode.html">if you have a Unicode display</a>. As many languages as are representable in your font can be seen on the screen at the same time. <p> This, however, is a Web page. Some Web browsers can handle UTF-8, some can't. And those that can might not have a sufficiently populated font to work with. <a href="http://www.hclrss.demon.co.uk/unicode/fonts.html">CLICK HERE</a> for a survey of Unicode fonts. <p> First, the Euro symbol: ¬. <p> From the Anglo-Saxon <a href="http://www.ragweedforge.com/poems.html"><cite>Rune Poem</cite></a> (Rune version): <p><blockquote>  Ç»ëÒæ¦ë ±© ¢±ë Á±ªë·Ö»¹æÚ³¢×<br> ˳֪Úë¦Öª»ëת¾¾ªë·Ö»¹æÚ³ë×Á³Ú¢¾ë»æÏëÞ«Úª¾<br> ·Á ë»Öë¹ÁÚÖë ©±ëÞ±Á»Ï¾ÖëÞ©×ÖËë»ÚÇϪ¾ì<br> </blockquote> <p> From Laamon's<i> <a href="http://mesl.itd.umich.edu/b/brut/">Brut</a></i> (<i>The Chronicles of England</i>, Middle English, West Midlands): <p> <blockquote> An preost wes on leoden, Laamon was ihoten<br> He wes Leovenaðes sone -- liðe him be Drihten.<br> He wonede at Ernlee at æðelen are chirechen,<br> Uppen Sevarne staþe, sel þar him þuhte,<br> Onfest Radestone, þer he bock radde. </blockquote> <p> From the <cite>Tagelied</cite> of <a href="http://www.gunnet.de/Wolframs-Eschenbach/wolfram.htm"> <b>Wolfram von Eschenbach</b></a> (Middle High German): <p><blockquote> Sîne klâwen durh die wolken sint geslagen,<br> er stîget ûf mit grôzer kraft,<br> ich sih in grâwen tägelîch als er wil tagen,<br> den tac, der im geselleschaft<br> erwenden wil, dem werden man,<br> den ich mit sorgen în verliez.<br> ich bringe in hinnen, ob ich kan.<br> sîn vil manegiu tugent michz leisten hiez.<br> </blockquote><p> Some lines of <a href="http://users.hol.gr/~artemis/odysseas_elytis.htm"> <b>Odysseus Elytis</b></a> (Greek): <blockquote> ¤· ³»ÎÃñ ¼¿Å ­´Éñ½ µ»»·½¹º®<br> Ä¿ ÃÀ¯Ä¹ ÆÄÉǹºÌ ÃĹ ±¼¼¿Å´¹­ Ä¿Å Ÿ¼®Á¿Å.<br> œ¿½¬Ç· ­³½¿¹± · ³»ÎÃñ ¼¿Å ÃĹ ±¼¼¿Å´¹­ Ä¿Å Ÿ¼®Á¿Å.<br> <p> ±ÀÌ Ä¿ †¾¹¿½ •Ãį<br> ĿŠŸ´ÅÃí± •»ÍÄ· </blockquote> <p> A stanza of <a href="http://www.ocf.berkeley.edu/%7Eleong/Russkaya%20Literatura/Aleksandr%20Sergeevich%20Pushkin.htm"><b>Pushkin</b></a>'s <cite>Bronze Horseman</cite> (Russian):<br> <p><blockquote> 0 15@53C ?CABK==KE 2>;=<br> !B>O; >=, 4C< 25;8:8E ?>;=,<br>  240;L 3;O45;. @54 =8< H8@>:><br>  5:0 =5A;0AO; 154=K9 GQ;=<br> > =59 AB@5<8;AO >48=>:>.<br> > <H8ABK<, B>?:8< 15@530<<br> '5@=5;8 871K 745AL 8 B0<,<br> @8NB C1>3>3> GCE>=F0;<br>  ;5A, =5254><K9 ;CG0<<br>  BC<0=5 A?@OB0==>3> A>;=F0,<br> @C3>< HC<5;.<br> </blockquote><p> <a href="http://www.compling.hu-berlin.de/~johannes/mxedruli/"><b>`ota Rustaveli</b></a>'s Vepxis T#qaosani, #þ!Th, <cite>The Knight in the Tiger's Skin</cite> (Georgian):<p> <blockquote> ÕÔÞîØá âçÐÝáÐÜØ èÝ×Ð àãá×ÐÕÔÚØ <p> æÛÔà×áØ èÔÛÕÔÓàÔ, Üã×ã ÙÕÚÐ ÓÐÛîáÜÐá áÝäÚØáÐ èàÝÛÐáÐ, êÔêîÚá, ìçÐÚáÐ ÓÐ ÛØìÐáÐ, ðÐÔà×Ð ×ÐÜÐ ÛàÝÛÐáÐ; ÛÝÛêÜÔá äà×ÔÜØ ÓÐ ÐæÕäàØÜÓÔ, ÛØÕðîÕÓÔ ÛÐá éÔÛáÐ ÜÓÝÛÐáÐ, ÓæØáØ× ÓÐ æÐÛØ× ÕðîÔÓÕØÓÔ ÛÖØáÐ ÔÚÕÐ×Ð Ùà×ÝÛÐÐáÐ. </blockquote> <p> And from the sublime to the ridiculous, here is a <a href="http://hcs.harvard.edu/~igp/glass.html"> certain phrase</a> in an assortment of languages: <p><blockquote> <b>Greek</b>: œÀ¿ÁÎ ½± Æ¬É ÃÀ±Ã¼­½± ³Å±»¹¬ ÇÉÁ¯ ½± À¬¸É įÀ¿Ä±.<br> <b>Sanskrit</b>: (NEEDED)<br> <b>Etruscan</b>: (NEEDED)<br> <b>Latin</b>: Vitrum edere possum; mihi non nocet.<br> <b>Esperanto</b>: Mi povas mani vitron, i ne damaas min.<br> <b>French</b>: Je peux manger du verre, cela ne me fait pas mal.<br> <b>Provençal</b>: Pòdi manjar de veire, me nafrariá pas.<br> <b>Québécois</b>: J'peux bouffer d'la vitre, ça m'fa pas mal.<br> <b>Walloon</b>: Dji pou magnî do vêre, çoula m' freut nén må.<br> <b>Champenois</b>: (NEEDED)<br> <b>Lorrain</b>: (NEEDED)<br> <b>Picard</b>: (NEEDED)<br> <b>Corsican</b>: (NEEDED)<br> <b>Occitan</b>: (NEEDED)<br> <b>Catalan</b>: Puc menjar vidre que no em fa mal.<br> <b>Spanish</b>: Puedo comer vidrio, no me hace daño.<br> <b>Basque</b>: Kristala jan dezaket, ez dit minik ematen.<br> <b>Aragones</b>: Puedo minchar beire, no me'n fa mal .<br> <b>Galician</b>: Eu podo xantar cristais e non cortarme.<br> <b>Portuguese</b>: Posso comer vidro, não me faz mal.<br> <b>Brazilian Portuguese</b>: Consigo comer vidro. Não me machuca.<br> <b>Cabo Verde Creole</b>: M' podê cumê vidru, ca ta maguâ-m'.<br> <b>Papiamentu</b>: (NEEDED)<br> <b>Italian</b>: Posso mangiare il vetro e non mi fa male.<br> <b>Roman</b>: Me posso magna' er vetro, e nun me fa male.<br> <b>Sicilian</b>: Puotsu mangiari u vitru, nun mi fa mali.<br> <b>Milanese</b>: Sôn bôn de magnà el véder, el me fa minga mal.<br> <b>Venetian</b>: Mi posso magnare el vetro, no'l me fa mae.<br> <b>Rheto-Romance</b>: (NEEDED)<br> <b>Romanian</b>: Pot s mnânc sticl i ea nu m rnete.<br> <b>Pictish</b>: (NEEDED)<br> <b>Breton</b>: (NEEDED)<br> <b>Cornish</b>: Mý a yl dybry gwéder hag éf ny wra ow ankenya.<br> <b>Welsh</b>: Dw i'n gallu bwyta gwydr, dwy e ddim yn gwneud dolur i mi.<br> <b>Irish</b>: Tá mé in ann gloine a ithe; Ní chuireann sé isteach nó amach orm.<br> <b>Scottish Gaelic</b>: S urrainn dhomh gloinne ithe; cha ghoirtich i mi.<br> <b>Anglo-Saxon</b>: Ic mæg glæs eotan ond hit hearmiað me ne.<br> <b>Middle English</b>: Ich canne glas eten and hit hirtiþ me nout.<br> <b>English</b>: I can eat glass and it doesn't hurt me.<br> <b>Norwegian (Nynorsk):</b> Eg kan eta glas utan å skada meg.<br> <b>Norwegian (Bokmål):</b> Jeg kan spise glass uten å skade meg.<br> <b>Icelandic</b>: Èg get borðað gler, það meiðir mig ekki.<br> <b>Danish</b>: Jeg kan spise glas, det gør ikke ondt på mig.<br> <b>Soenderjysk</b>: Æ ka æe glass uhen at det go mæ naue.<br> <b>Frisian</b>: Ik kin glês ite, it docht me net sear.<br> <b>Dutch</b>: Ik kan glas eten. Het doet me geen pijn.<br> <b>Afrikaans</b>: Ek kan glas eet, maar dit maak my nie seer nie.<br> <b>German</b>: Ich kann Glas essen, ohne mir weh zu tun.<br> <b>Lëtzebuergescht</b>: Ech kan Glas iessen, daat deet mir nët wei.<br> <b>Schwäbisch</b>: I kå Glas frässa, ond des macht mr nix!<br> <b>Bayrisch</b>: I koh Glos esa, und es duard ma ned wei.<br> <b>Allemannisch</b>: I kaun Gloos essen, es tuat ma ned weh.<br> <b>Schwyzerdütsch</b>: Ich chan Glaas ässe, das tuet mir nöd weeh.<br> <b>Swedish</b>: Jag kan äta glas, det skadar mig inte.<br> <b>Finnish</b>: Pystyn syömään lasia. Se ei koske yhtään.<br> <b>Hungarian</b>: Meg tudom enni az üveget, nem lesz tQle bajom.<br> <b>Estonian</b>: Ma vMin klaasi süüa, see ei tee mulle midagi.<br> <b>Latvian</b>: Es varu st stiklu, tas man nekait.<br> <b>Lithuanian</b>: Aa galiu valgyti stikl ir jis mans ne~eid~ia<br> <b>Croatian</b>: Ja mogu jesti staklo i ne boli me.<br> <b>Czech</b>: Mohu jíst sklo, neublí~í mi.<br> <b>Slovak</b>: Mô~em jese sklo. Nezraní ma.<br> <b>Polish</b>: Mog je[ szkBo i mi nie szkodzi.<br> <b>Albanian</b>: Unë mund të ha qelq dhe nuk më gjen gjë.<br> <b>Slovenian:</b> Lahko jem steklo, ne da bi mi akodovalo.<br> <b>Serbian</b>: Mogu jesti staklo bez da mi akodi.<br> <b>Serbian</b>: >3C X5AB8 AB0:;> 157 40 <8 H:>48.<br> <b>Macedonian:</b> >60< 40 X040< AB0:;>, 0 =5 <5 HB5B0.<br> <b>Russian</b>: / <>3C 5ABL AB5:;>, MB> <=5 =5 2@548B.<br> <b>Ukrainian</b>: / <>6C WAB8 H:;>, 9 2>=> <5=V =5 ?>H:>48BL.<br> <b>Bulgarian</b>: 7 <>3J 40 O< ABJ:;>, 0 =5 <5 1>;8.<br> <b>Armenian</b>: ?€vat azaok x‚el ‡ kvnk avpavck} yhve€‰<br> <b>Georgian</b>: ÛØÜÐá ÕíÐÛ ÓÐ ÐàÐ ÛâÙØÕÐ.<br> <b>Turkish</b>: Cam yiyebilirim, bana zarar1 dokunmaz.<br> <b>Marath</b>i: . @  >   > 6  $ K, . 2 > $ G & A  $ ( > 9 @.<br> <b>Hindi</b>: . H   >    > 8  $ > 9 B , . A  G 8 8 G  K  * @ ! > ( 9 @  9 K $ @.<br> <b>Farsi</b>: .EF EÌ *H'FE (/HFP '-3'3 /1/ 4J4G (.H1E<br> <b>Pashto</b><a href="#note1">(1)</a>: 2G 4J4G .H“DÐ 4E G:G E' FG .H–HJ<br> <b>Arabic</b><a href="#note1">(1)</a>: <span dir="RTL" lang=AR>#F' B'/1 9DI #CD 'D2,', H G0' D' J$DEFJ.</span><br> <B>Hebrew</B><a href="#note1">(1)</a>: <SPAN dir=rtl lang=HE>ÐàÙ ÙÛÕÜ ÜÐÛÕÜ ÖÛÕÛÙê ÕÖÔ ÜÐ ÞÖÙç ÜÙ.</SPAN><br> <B>Yiddish</B><a href="#note1">(1)</a>: <SPAN dir=rtl lang=JI>ÐÙÚ çâß âáß ÒÜÐ¸Ö ÐÕß âá ØÕØ ÞÙè àÙéØ ðò.</SPAN><br> <b>Ladino</b>: (NEEDED)<br> <b>Twi</b>: Metumi awe tumpan, \ny\ me hwee.<br> <b>Yoruba</b><a href="#note2">(2)</a>: Mo lè je) dígí, kò ní pa mí lára.<br> <b>Malay</b>: Saya boleh makan kaca dan ia tidak mencederakan saya.<br> <b>Tagalog</b>: Kaya kong kumain nang bubog at hindi ako masaktan.<br> <b>Chamorro</b>: Siña yo' chumocho krestat, ti ha na'lalamen yo'.<br> <b>Javanese</b>: Aku isa mangan beling tanpa lara.<br> <B>Vietnamese</B>: Tôi có thà n thçy tinh mà không h¡i gì.<BR> <b>Chinese</b>: b€ýTN s»tƒ€ N O$Ž«OS0<br> <b>Japanese</b>: yÁ0o0¬0é0¹0’˜ß0y0‰0Œ0~0Y00]0Œ0oyÁ0’P·0d0Q0~0[0“0<br> <b>Korean</b>: °˜²” Ç ¹¬¹| º9ÇD  LjŴƔ. ­ø·˜³Ä ÅDÕÉÀ ÅJÅDÆ”<br> <b>Thai</b>:  14#0DI AH!1D!H3C+I 1@G<br> <b>Lojban</b>: mi kakne le nu citka le blaci .iku'i le se go'i na xrani mi<br> </blockquote> <p> For testing purposes, some of these are repeated in a <b>monospace font</b> . . . <p> <blockquote> <pre> <b>Euro Symbol</b>: ¬. <b>Greek</b>: œÀ¿ÁÎ ½± Æ¬É ÃÀ±Ã¼­½± ½± ³Å±»¹¬ ÇÉÁ¯ ½± À¬¸É įÀ¿Ä±. <b>Icelandic</b>: Èg get borðað gler, það meiðir mig ekki. <b>Polish</b>: Mog je[ szkBo, i mi nie szkodzi. <b>Romanian</b>: Pot s mnânc sticl i ea nu m rnete. <b>Ukrainian</b>: / <>6C WAB8 H:;>, 9 2>=> <5=V =5 ?>H:>48BL. <b>Armenian</b>: ?€vat azaok x‚el ‡ kvnk avpavck} yhve€‰ <b>Georgian</b>: ÛØÜÐá ÕíÐÛ ÓÐ ÐàÐ ÛâÙØÕÐ. <b>Hindi</b>: . H   >    > 8  $ > 9 B , . A  G 8 8 G  K  * @ ! > ( 9 @  9 K $ @. <B>Hebrew</B><a href="#note1">(1)</a>: <SPAN dir=rtl lang=HE>ÐàÙ ÙÛÕÜ ÜÐÛÕÜ ÖÛÕÛÙê ÕÖÔ ÜÐ ÞÖÙç ÜÙ.</SPAN> <b>Arabic</b><a href="#note1">(1)</a>: <span dir="RTL" lang=AR>#F' B'/1 9DI #CD 'D2,', H G0' D' J$DEFJ.</span> <b>Japanese</b>: yÁ0o0¬0é0¹0’˜ß0y0‰0Œ0~0Y00]0Œ0oyÁ0’P·0d0Q0~0[0“0 <b>Thai</b>:  14#0DI AH!1D!H3C+I 1@G</b> </blockquote></pre><b>Notes:</b> <ol> <li><a name="note1">Correct right-to-left display of these languages depends on the capabilities of your browser.</a> The period should appear on the left. <li><a name="note2">The third word is Latin letter small 'j' followed by small 'e' with U+0329, Combining Vertical Line Below. This displays correctly only if your Unicode font includes the U+0329 glyph and your browser supports combining diacritical marks. </ol> <p> <i>(Additions, corrections, completions,</i> <a href="mailto:kermit@columbia.edu"><i>gratefully accepted</i></a><i>.)</i> <p> Other Unicode samplers:<ul><li><a href="http://www.unicode.org/unicode/standard/WhatIsUnicode.html">What Is Unicode?</a> <li><a href="http://www.trigeminal.com/samples/provincial.html">Anyone can be provincial!</a> <li><a href="http://www.macchiato.com/unicode/Unicode_transcriptions.html">Transcriptions of "Unicode"</a> <li><a href="http://www.geocities.com/i18nguy/unicode-example.html">Example Unicode Usage for Business Applications</a> </ul> <p> [ <a href="k95.html">Kermit 95</a> ] [ <a href="ckermit.html">C-Kermit</a> ] [ <a href="index.html">Kermit Home</a> ] [ <a href="http://www.hclrss.demon.co.uk/unicode/fonts.html">Unicode Fonts</a> ] <hr> <ADDRESS> UTF-8 Sampler / <a href="index.html">The Kermit Project</a> / <a href="http://www.columbia.edu">Columbia University</a> / <a href="mailto:kermit@columbia.edu">kermit@columbia.edu</a> / 7 Dec 2001 </ADDRESS> </body> </html>