Différences
Ci-dessous, les différences entre deux révisions de la page.
Les deux révisions précédentesRévision précédente | |||
2_programmation:encodage:notion_d_encodage [2021/08/22 19:54] – Correction d'un problème d'UTF-8. Mise à jour des liens internes. jejust | 2_programmation:encodage:notion_d_encodage [2021/12/20 07:41] (Version actuelle) – Traduction de la page. Ajout de liens. yannick.tanguy | ||
---|---|---|---|
Ligne 1: | Ligne 1: | ||
- | ====== | + | ====== |
+ | Commençons par définir deux notions : | ||
+ | * un caractère est un « atome » d'une langue ou d'un dialecte : il peut donc s'agir d'une lettre dans une langue alphabétique, | ||
+ | * un glyphe est une marque créée sur écran ou papier qui représente un caractère. | ||
+ | |||
+ | Bien sûr, pour que la lecture soit possible, il doit y avoir une certaine relation entre le glyphe et le caractère. Ainsi, alors que la forme précise du glyphe peut être affectée par de nombreux autres facteurs, tels que les capacités du support d' | ||
- | Let's start by defining two concepts, the // | + | Chaque fois qu'un ordinateur doit représenter des caractères, il faut définir la relation entre un ensemble de nombres (que manipule l' |
- | //glyph//. The character is the abstract idea of the " | + | |
- | language or other dialogue: so it might be a letter in an alphabetic | + | |
- | language, a syllable in a syllabic language, or an ideogram in an | + | |
- | ideographic language. | + | |
- | paper which represents a character. | + | |
- | there must be some agreed relationship between the glyph and the character, | + | |
- | so while the precise shape of the glyph can be affected by many other factors, | + | |
- | such as the capabilities of the writing medium and the designer's style, | + | |
- | the essence | + | |
- | Whenever a computer has to represent characters, someone has to define | + | Bien entendu, < |
- | the relationship between a set of numbers and the characters they | + | |
- | represent. | + | |
- | between a set of numbers and a set of things to be represented. | + | |
- | TeX of course deals in encoded characters all the time: the | + | ===== L' |
- | characters presented to it in its input are encoded, and it emits | + | |
- | encoded characters in its DVI or PDF output. | + | |
- | encodings have rather different properties. | + | |
- | The TeX input stream was pretty unruly back in the days when Knuth | + | Le flux d' |
- | first implemented the language. Knuth himself prepared | + | |
- | terminals that produced all sorts of odd characters, and as a result | + | |
- | TeX contains some provision for translating its input (however | + | |
- | encoded) to something regular. | + | |
- | keystrokes into a code appropriate for the user's language: the encoding used | + | |
- | is usually a national or international standard, though some operating systems | + | |
- | use "code pages" (as defined by Microsoft). | + | |
- | often contain characters that may not appear in the TeX system' | + | |
- | Somehow, these characters have to be dealt with --- so an input character like " | + | |
- | needs to be interpreted by TeX in a way that that at least mimics the way | + | |
- | it interprets '' | + | |
- | The TeX output stream is in a somewhat different situation: | + | De nos jours, le système d' |
- | characters in it are to be used to select glyphs from the fonts to be | + | |
- | used. Thus the encoding of the output stream is notionally a font | + | |
- | encoding | + | |
- | [[5_fichiers: | + | |
- | In principle, a fair bit of what appears in the output stream could be | + | |
- | direct transcription of what arrived in the input, but the output stream | + | |
- | also contains the product of commands in the input, and translations | + | |
- | of the input such as ligatures like | + | |
- | Font encodings became a hot topic when the | + | ===== L'encodage en sortie |
- | [[5_fichiers: | + | |
- | appeared, because of the possibility of suppressing | + | |
- | '' | + | |
- | quality of the hyphenation of text in inflected languages, which is | + | |
- | interrupted by the '' | + | |
- | [[3_composition: | + | |
- | To take advantage of the diacriticised characters represented in the | + | |
- | fonts, it is necessary to arrange that whenever the command sequence '' | + | |
- | has been input (explicitly, or implicitly via the sort of mapping of input | + | |
- | mentioned above), the character that codes the position of the " | + | |
- | Thus we could have the odd arrangement that the diacriticised character in | + | Le flux de sortie de < |
- | the TeX input stream is translated into TeX commands that would | + | |
- | generate something looking like the input character; this sequence of | + | |
- | TeX commands is then translated back again into a single | + | |
- | diacriticised glyph as the output is created. This is in fact | + | |
- | precisely what the LaTeX packages | + | |
- | [[ctanpkg> | + | |
- | the ISO Latin-1 input encoding and the T1 font encoding. | + | |
- | At first sight, it seems eccentric to have the first package do a thing, and | + | |
- | the second precisely undo it, but it doesn' | + | |
- | most font encodings can't match the corresponding input encoding | + | |
- | nearly so well, and the two packages provide the sort of symmetry the | + | |
- | LaTeX system needs. | + | |
+ | Les encodages de polices sont devenus un sujet important lorsque l' | ||
+ | |||
+ | ====== ====== | ||
+ | Ainsi, nous nous en présence d'un mécanisme en deux temps : | ||
+ | * un caractère [[wpfr> | ||
+ | * cette séquence de commandes < | ||
+ | |||
+ | Ceci correspond précisément à ce que font les extensions < | ||
----- | ----- | ||
Ligne 77: | Ligne 34: | ||
{{htmlmetatags> | {{htmlmetatags> | ||
- | metatag-og: | + | metatag-og: |
metatag-og: | metatag-og: | ||
}} | }} | ||
- |