Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Making Cents of Yens and Euros: Web 2.0 Internationalization Achim Ruopp [email protected] http://www.digitalsilkroad.net/ © Copyright 2007 Achim Ruopp Web 2.0 Expo 2007 Demo A Currency Converter Application – before and after Web 2.0 Internationalization Agenda Introduction to Web Internationalization (i18n) • • • • Selecting and Persisting User Preferences Locales and Locale Identifiers Unicode Localization – Model and Tools Multi-lingual Syndication • RSS • Atom Client-side Scripting • Javascript Internationalization • Ajax International Web Services Design • REST • SOAP Intro to Web Internationalization Language and Location fr en;0.8 en-US da-DK Intro to Web Internationalization User Preferences Language • HTTP Accept-Language header • E.g.: en, fr-CA;0.8, fr;0.6 • Language negotiation with the server Locale • Cultural preferences for formatting, sorting etc. • Infer from Accept-Language header • Map IPv4 address to ccTLD (country code top-level domain) Public information accessible through libraries • E.g. Perl IP::Country CPAN module Commercial services offer more precision Always provide option to change defaults Store preferences in cookies Intro to Web Internationalization Internet Language Tags IETF Language Tags (BCP 47) Language[-Language]*3 [-Script][-Region] [-Variant]*[-Extension]*[-PrivateUse]* Examples • en-CA: English in Canada • Zh-Hant-TW: Chinese written in traditional Chinese script used in Taiwan Obsoletes RFC 3066 & RFC 1766 • Often still used in products/earlier standards Internationalization Changes Intro to Web Internationalization POSIX Locales Cross-platform API • Locale-identifiers can have variations Un*x: en_US Windows: English_United States • Results can be platform-dependent Basis for locale functionality in all scripting languages Provides functionality for • • • • • Number Formatting: 1,000,000.23 Date/Time Formatting: 8 Μάρτιος 2007 12:00:00 μμ Sorting String processing (e.g. upper-/lower-casing) Some translated strings like weekdays, yes/no messages Intro to Web Internationalization International Components for Unicode IBM Open Source project Extensive locale data and APIs • Data vetted as part of Common Locale Data Repository (CLDR) project Java and C++ APIs Wrappers for scripting languages • PyICU (Python) • ICU4R (Ruby) – abandoned? • DIY – difficult because of API complexity and character encoding issues Intro to Web Internationalization Microsoft Internationalization APIs Windows NLS API Microsoft .NET Framework System.Globalization namespace Similar set of data to ICU • Data vetted by Microsoft subsidiaries APIs accessible from all Microsoft programming languages Intro to Web Internationalization Unicode 5.0 99,024 of 1,114,112 code points (U+0000 to U+10FFFF) defined 00000 10000 20000 Basic Multilingual Plane Dead Languages & Math Han Characters 30000 Alphabets 2000 Punctuation 3000 Asian Languages 5000 Language Tags F0000 100000 1000 4000 … E0000 0000 Private Use 6000 7000 8000 Han Characters 9000 A000 B000 C000 D000 E000 F000 Yi Hangul Surrogates Private Use Legacy/Compatibility Intro to Web Internationalization Unicode Encodings Forms Variable length: UTF-8/UTF-16 Fixed length: UTF-32 U+2122: ™: Trade Mark Sign UTF-8 0xE2 0x84 0xA2 UTF-16 0x2122 UTF-32 0x00002122 11100010 10000100 10100010 00100001 00100010 0…00100001 00100010 Intro to Web Internationalization Unicode on the Web XML processors are required to process UTF8/UTF-16 Encoding declaration precedence 1. HTTP Content-Type header charset declaration 2. XML encoding declaration (XHTML) 3. meta charset declaration in (X)HTML 4. link element charset attribute Approx. 4% of pages have encoding errors* No real need for character references • ü: ü or ü • Exceptions: <,>,&," Use styles to control font selection * source: Google presentation at IUC30 Demo A Currency Converter Application – globalized but not localized Intro to Web Internationalization Localization Recommendations Avoid translatable text in graphics Make sure graphics are culturally neutral Avoid absolute sizing Use HTML flow layout Write complete sentences Intro to Web Internationalization Localization Model and Tools Text translation • Localization formats HTML with template library • W3C Internationalization Tag Set (tool support?) GNU gettext/PO XLIFF - XML Localization Interchange File Format • Localization tools OmegaT Open Language Tools (Sun) The WordForge Project: Pootle … Searchability – Links/Sitemap Demo A Currency Converter Application – fully internationalized Web 1.0 application Client-side Scripting Javascript Internationalization ECMAScript edition 3 added a range of internationalization features (1999) • Good support for Unicode processing • Set of locale-sensitive functions Dependent on host locale (i.e. browser) • Set of locale-insensitive functions • No number or date/time parsing Javascript libraries with additional internationalization functionality • dojo Toolkit (i18n contributed by IBM) • Microsoft AJAX Library Client-side Scripting AJAX Recommendations Late globalization • Transmit data in locale-independent form with XMLHttpRequest • Might require some creative parsing/UI Early localization • Text localization server-side • Browsers are missing a message-catalog facility • Dynamically created page content is invisible to search engines Multi-lingual Syndication RSS 2.0 Character encoding • RSS 2.0 is an XML application • XML encoding rules apply Language • Element only on channel (feed), not on item Create one channel per language • Specified to comply to RFC1766 language tags Date/Time • In standard RFC 822 format (including 4-digit years) E.g. “Wed, 02 Oct 2002 08:00:00 EST” Multi-lingual Syndication Atom Syndication More granular language marking • xml:lang can be applied to any human readable text in the format • Aggregators need to deal with this Better date/time format: RFC 3339 • E.g. “2003-12-13T18:30:02-05:00” Acknowledgement: Tim Bray Demo A Currency Converter Application – adding a syndication feed with exchange rate information International Web Services Design Service Patterns Description Locale Neutral Neutral data formats Client Influenced Service reacts to client-locale e.g. HTTP AcceptLanguage Service Determined Service is locale-specific and ignores client preference Data Driven Service adjusts formatting and language to locale the data refers to Request data CAD Return data 1.1785 CAD (AcceptLanguage: de) Kanadischer Dollar 03/08/2007 12:00pm EST NOK norske kroner CHF ? International Web Services Design REST REST naturally ties into i18n features in HTTP/HTML/XML • Locale indicated with HTTP Accept-Language • Encoding and language marking in markup Special caution for HTTP GET parameters • Locale-independent formatting recommended • Text parameters Encode in UTF-8 and escape in URIs IRI (International Resource Identifier) functionality might provide this for you International Web Services Design SOAP Locale can be communicated in • Transport header (e.g. HTTP) • SOAP header • SOAP message body Beware of automatically generated SOAP interfaces • Might be locale-dependent, but not allow to specify locale Use of XML Schema data types promotes locale-independence Also consider localization of error messages Conclusions Unification • One code base Customization • Localization and adaptation for locales Next step: cross-language “leakage” • Provide views in multiple languages to the same (user-generated) data • Translate user-generated content Volunteers Machine Translation Call for Contributions Presentation and Perl CGI demo code • http://www.digitalsilkroad.net/web2expo Add a version in your preferred language • • • • Ruby on Rails PHP Python … Similar ASP.NET application • http://quickstarts.asp.net/QuickStartv20/aspn et/doc/localization/default.aspx References W3C Internationalization Activity • http://www.w3.org/International/ POSIX Locale • http://www.opengroup.org/onlinepubs/009695399/base defs/xbd_chap07.html International Components for Unicode • http://www-306.ibm.com/software/globalization/icu/ Unicode/Common Locale Data Repository • http://www.unicode.org/ Microsoft Internationalization APIs • http://msdn2.microsoft.com/enus/library/ms776254.aspx • http://msdn2.microsoft.com/enus/library/system.globalization.aspx References OmegaT • http://www.omegat.org/omegat/omegat_en/omegat.html Open Language Tools • https://open-language-tools.dev.java.net/ The WordForge Project • http://www.wordforge.org/drupal/ Javascript Internationalization • http://www.icuproject.org/docs/papers/internationalization_support_for_javascript.ht ml RSS 2.0 • http://www.rssboard.org/rss-specification Atom Syndication • http://www.atomenabled.org/developers/syndication RSS 1.0 • http://web.resource.org/rss/1.0/spec W3C Web Services Internationalization Usage Scenarios • http://www.w3.org/TR/ws-i18n-scenarios/ Additional Slides Multi-lingual Syndication RSS 1.0 Character encoding • RSS 1.0 is an XML application • XML encoding rules apply Complies to RDF (Resource Description Framework) specification • Definition of language and date/time formats are left to RDF metadata formats Dublin Core Metadata Element Set Language: RFC1766/ISO639-2 Date/Time: ISO 8601 (superset of RFC 3339) • Also Dublin Core allows to specify time periods!