Glyph origins

edit

In addition to characters lifted from the Wang word processing set, some of the glyphs may have originated from work Gates did with Microsoft BASIC for Commodore International ... but I can't find any full character maps of all PETSCII glyphs including 0-31 ... http://americanhistory.si.edu/collections/comphist/gates.htm

In terms of Commodore PET, they started with us from the very beginning. Because we helped Chuck Pedal, who was at Commodore at that time, really think about the design of the machine. Adding lots of fun characters to the character set, things like smiley faces, and suit symbols.
Hobart 19:20, 21 September 2006 (UTC)Reply

Isn't this code page also known as PC-8?

Added with reference. —Coroboy (talk) 00:57, 15 November 2011 (UTC)Reply

null

edit

null character should be empty -- the ibm PC did not say "NULL" when the 0 byte was put into the frame buffer. the null, space, and blank characters (0x0, 0x20, and 0xff) were indistinguishable visually.

I agree - so I changed it. -- 212.63.43.180 (talk) 21:12, 24 January 2008 (UTC)Reply
However, characters 0, 32, and 255 were used differently in IBM PC files. The way the table has been, the NULL, SP, and NBSP texts link to relevant articles where people can read what the specific functions of those characters were -- whereas leaving things blank conveys no information. AnonMoos (talk) 22:33, 24 January 2008 (UTC)Reply
I solved the issue by putting two sets of the table for values 0-31. Ricardo Cancho Niemietz (talk) 09:46, 29 January 2008 (UTC)Reply
Most of the changes were good, but now the English does need some clean-up... AnonMoos (talk) 11:24, 29 January 2008 (UTC)Reply
Sorry, I'm not a native english speaker (I'm from and live in Spain). Please, help requested. Ricardo Cancho Niemietz (talk) 12:54, 29 January 2008 (UTC)Reply

Multiple Bases

edit

While the table header rows and Unicode are in hexadecimal, the CP437 is in decimal. This decreases the obviousness where the two encodings point to the same character. I would recommend making it entirely hexadecimal.

I disagree. I think it makes it much quicker to understand using the two different bases. ACED.wiki (talk) 18:40, 5 August 2020 (UTC)Reply
The numbers are only there because you can type them with Alt on Windows to get these characters, and apparently a lot of people rely on this Wikipedia page as a reference for what those numbers are. Since the number you type on Windows is in base-10, it must be shown in base-10 here.Spitzak (talk) 19:12, 5 August 2020 (UTC)Reply

At least provide a link to the previous graphic that contained the decimal values for reference. It was most helpful and used regularly. — Preceding unsigned comment added by 69.59.97.64 (talk) 11:19, 26 January 2022 (UTC)Reply

The Alt code is in the tooltip.Spitzak (talk) 17:21, 26 January 2022 (UTC)Reply

The overloaded character number 237 in CP437

edit

The character for place 237 in the CP437 table should change from U+03D6 GREEK SMALL LETTER PHI to U+03D5 GREEK PHI SYMBOL.

In CP437, this position was used as U+03D5 GREEK PHI SYMBOL in italics, U+2205 EMPTY SET, U+2300 DIAMETER SIGN and even as a surrogate for U+00F8 LATIN SMALL LETTER O WITH STROKE, but rarely as U+03D6 GREEK SMALL LETTER PHI due to its IBM original shape (it seems merely a circle with a slash) does not ressembles closely this greek lowercase letter.

Also, the character 238 effectively should be changed to U+2208 ELEMENT OF. In addition to be used as U+03B5 GREEK SMALL LETTER EPSILON, in some dot matrix ticket printers is used today as the U+20AC EURO SIGN, in the european countries where the euro is the official currency.

In other hand, the character 236 is the U+221E INFINITY, not a greek letter at all, so you should change its background colour to grey.

As you can see, characters 236 to 253 in CP437 was primary intended all for maths symbols, so the positions 237 and 238 are not "real" greek letters. In despite of that, many people has used these characters as greek letters (to name angles and so on), of course.

And another issue: the character 235, U+03B4 GREEK SMALL LETTER DELTA was also used as U+00F0 LATIN SMALL LETTER ETH, an icelandic latin character.

A popular math software for MS-DOS in the late 80's, "Derive", employs the full CP437 character set to display complex formulae, with very good results.

People is able to do incredible things with a very little means...

Yours Ricardo Cancho Niemietz (talk) 15:53, 25 January 2008 (UTC)Reply

I did the changes myself! :-D Ricardo Cancho Niemietz (talk) 14:00, 28 January 2008 (UTC)Reply

Codes for 16 and 17

edit

I just added the image to the top, which is a printout of the code page in order using QEMU. I noticed a discrepancy - positions 16 and 17 are swapped around, relative to the codes given in this article. Note that in the image, there is a right arrow then a left arrow (in the top row). In the table in the article, there is first the left arrow (U+25C4), then the right arrow (U+25BA). Is there an error in the article? I can't find a source for the first 32 characters. EatMyShortz (talk) 12:48, 18 February 2009 (UTC)Reply

There are sources on the Microsoft site, the Unicode.org site, or if you insist on paper, you can look at Appendix C of The New Peter Norton Programmer's Guide to the IBM PC & PS/2 by Peter Norton and Richard Wilton (Microsoft Press, 1987 ISBN 1-55615-131-4). From what I can see if you place 16 and 17 side-by-side, they point at each other, as is also the case for 26 and 27... AnonMoos (talk) 09:33, 19 February 2009 (UTC)Reply

The characters at 0x10 and 0x11

edit

Consider:
U+25BA : BLACK RIGHT-POINTING POINTER
U+25B6 : BLACK RIGHT-POINTING TRIANGLE
U+25C4 : BLACK LEFT-POINTING POINTER
U+25C0 : BLACK LEFT-POINTING TRIANGLE

Compare with the characters at 0x1E and 0x1F:
U+25B2 : BLACK UP-POINTING TRIANGLE
U+25BC : BLACK DOWN-POINTING TRIANGLE

I would recommend to replace 25BA with 25B6 and replace 25C4 with 25C0 in the table.

The Terminus font (which is designed to include all CP437 characters) does not include 25BA and 25C4, but it includes 25B6 and 25C0. (Reference: [1])

I won't make this edit, because I'm uncomfortable with non-ASCII characters in firefox's text input box.
-- 'x' 92.225.64.211 (talk) 07:56, 25 May 2009 (UTC)Reply

Note that the following characters render properly in IE7, which has an incomplete graphic rendering character set:
U+25B2 ▲ Triangle up
U+25BA ► Triangle right
U+25BC ▼ Triangle down
U+25C4 ◄ Triangle left
The characters U+25B2 (▂), U+25B6 (▆), and U+25C0 (◀) are not rendered properly by IE7, being displayed as empty squares.
— Loadmaster (talk) 17:18, 26 May 2009 (UTC)Reply

The decision is not really up to us -- the standard equivalences recognized by Unicode are at http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT and http://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/IBMGRAPH.TXT ... AnonMoos (talk) 18:39, 26 May 2009 (UTC)Reply

Thanks for the links. I'm not sure how "standard" IBMGRAPH.TXT really is.
Comparing the chart http://www.unicode.org/charts/PDF/U25A0.pdf with the rendering by an IBM PC, I believe the author of IBMGRAPH.TXT has made a mistake. I won't write to him though, since I'm perfectly happy with unicode.org hosting a suboptimal document, and wikipedia perpetuating its content.
-- 'x' 85.179.155.203 (talk) 05:59, 2 June 2009 (UTC)Reply
I'd be more comfortable with making clear where the suggestion comes from and clarifying that this information is provided principally from vendors and is not part of the Unicode standard nor referenced for historical accuracy (or indeed for current practice). "The standard equivalences recognized by Unicode" definitely overstates the case. -- Elphion (talk) 16:42, 5 April 2010 (UTC)Reply
The mappings on that site aren't even consistent. IIRC, they have at least 2 versions of the Macintosh encoding that don't agree on ¤ vs. €. DanBishop (talk) 20:12, 26 June 2011 (UTC)Reply

ascii art

edit

Would be nice to have a reference to the fact this code page is used often for ascii art —Preceding unsigned comment added by 85.146.181.17 (talk) 01:15, 1 July 2009 (UTC)Reply

It was actually mainly BBS-type text graphics with ANSI.SYS terminal codes (or other control of video colors), which isn't quite the same thing as plain ASCII art... AnonMoos (talk) 17:43, 1 July 2009 (UTC)Reply
Also, many DOS applications made extensive use of these characters to represent menus and other GUI-like elements and rely on them to display properly. sPAzzMatiC 18:22, 6 August 2009 (UTC) —Preceding unsigned comment added by Spazzmatic (talkcontribs)
See box-drawing character... AnonMoos (talk) 11:25, 29 July 2014 (UTC)Reply

ess-zet/beta

edit

The article mentions: * It has umlauts for German (Ä, ä, Ö, ö, Ü, ü), but sharp S (ß) must be represented with the beta symbol (β). To me, it looks like it should be the other way around. The beta-symbol is not in CP437 and a german sharp S shall be used. Thoughts? —Preceding unsigned comment added by Nlhenk (talkcontribs) 11:51, 9 March 2010 (UTC)Reply

The character is "overloaded" with multiple meanings, but the fact that it's found between alpha and gamma is a pretty good indication that it was originally intended as a beta...   -- AnonMoos (talk) 13:29, 5 April 2010 (UTC)Reply
The order doesn't really mean very much; in the upper set the order was pretty haphazard (e.g., delta and epsilon much farther down the list). The "Greek" section doesn't conform to the usual order anyway, and the characters were chosen primarily for non-Greek uses. The beta/eszett was almost certainly intended for double duty from the beginning, as were several other characters. -- Elphion (talk) 15:38, 5 April 2010 (UTC)Reply
FWIW, RFC 1345 defines character 0xE1 as β. The Windows MultiByteToWideChar function uses ß, while WideCharToMultiByte accepts both. DanBishop (talk) 20:19, 11 July 2010 (UTC)Reply
It is clear (for a scientist or engineer) that it was meant to be a beta. The symbols present are a selection from nice symbols to have when dealing with mathematics and geometry. Alfa, beta, gamma for some angles, pi as in 3.14..., sum, standard deviation, mean, polar coordinates, a delta (as in step, probably), infinity, the empty set, an epsilon and a bunch of operators.
The reason MultiByteToWideChar turns it into an Eszett is probably because almost all text that actually uses the character is German text using it as a stand-in for Eszett. — Preceding unsigned comment added by 82.139.87.39 (talk) 02:07, 28 January 2012 (UTC)Reply
Of course it was meant to be a beta. But it was also meant to be an Eszett; Gates made clear that several characters were meant to do double duty, and this character was intended to give Germany, a market MS couldn't ignore, an acceptable Eszett. It's tolerably close to the Arial version of Eszett, and in those days of fixed characters, people were will willing to put up with the ambiguity. -- Elphion (talk) 04:30, 28 January 2012 (UTC)Reply

Is the image really EGA?

edit

The image purports to show the characters in an EGA display. But the character size in the image is 9 × 16 pixels, the standard VGA size. Isn't EGA limited to 8 × 14? -- Elphion (talk) 00:28, 5 April 2010 (UTC)Reply

Of course, you are right. EGA had not 16 dots high glyphs, and had not 9 dots width mode. File:Codepage-437.png is VGA. Incnis Mrsi (talk) 11:33, 5 April 2010 (UTC)Reply

I like having this screenshot; would it be possible to make one for the other DOS codepages? DanBishop (talk) 01:51, 27 June 2011 (UTC)Reply

 
Like this? I could provide you a simple C program, which make such BMPs (unfortunately, not PNGs) from PSF raster fonts. Incnis Mrsi (talk) 14:17, 27 June 2011 (UTC)Reply
Yes, like that. DanBishop (talk) 23:29, 27 June 2011 (UTC)Reply

Entry on keyboards

edit

The section "Entry on keyboards" says that "programs that support only Windows-1252" attempt to transliterate the "Greek" characters when entered via keyboard. Is there some support for this? (I've never seen this behavior -- what I get is always the corresponding Windows-1252 character.) -- Elphion (talk) 16:30, 5 April 2010 (UTC)Reply

Hearing no response, I've removed the statement from the article. -- Elphion (talk) 01:05, 28 June 2010 (UTC)Reply

The character at 0xE1

edit

The table shows the character at position 0xE1 equating to U+00DF (LATIN SMALL LETTER SHARP S) which contradicts the text: "Table rows 14 and 15 (E and F), codes 224 to 255 (E0 to FF) are devoted to mathematical symbols, where the first twelve are a selection of Greek letters commonly used in physics." — Ksn (talk) 04:07, 20 November 2010 (UTC)Reply

It's actually ambiguous or "overloaded" (see section "Multiple-meaning character glyphs"). AnonMoos (talk) 11:50, 20 November 2010 (UTC)Reply

tan is not pink

edit

i changed the text "and tan cells are international letters." to "and pink cells are international letters.", since the cells in question are pink. —Preceding unsigned comment added by 72.91.177.153 (talk) 00:45, 26 November 2010 (UTC)Reply

windows 1253 vs codepage 437

edit

windows operating system requires me to save under unicode when typing a text using 437codepage. do you have to choose unicode coding to save text using chr codes from the codepage437 table, too? Paul188.25.109.227 (talk) 15:44, 29 April 2011 (UTC)Reply

mean is when to enter a chr-code from the table and saving isn’t codepage437 to not use unicode ? —Preceding unsigned comment added by 188.25.109.227 (talk) 15:47, 29 April 2011 (UTC)Reply
It's not clear what you're asking. When you save a document, typically what is saved are the numerical codes of the characters. But the interpretation of those codes depends on the active code page. If a document created with one code page is displayed while another code page is active, many of the codes with the high bit set ("upper ASCII") will display as different characters. To get around that, you can save the text as Unicode (as WP does), so that each character is saved with its more or less unique code, and is therefore more or less unambiguous. I say "more or less" because some characters, even in Unicode, have multiple uses, and some may differ significantly in appearance from font to font. -- Elphion (talk) 04:57, 30 April 2011 (UTC)Reply
Saying "more or less" is extremely misleading. Unicode characters are (apart from private use characters) unique. A character is not the same thing as a glyph. The letter A will look different in different fonts, but that doesn't mean it's ambiguous. Even the CJK-unification opponents' views stem from this basic misunderstanding (often combined with a dose of rabid paranoid nationalism).
When you care about the exact look of a character, save an image or specify the font. Otherwise, for all intents and purposes, Unicode is unambiguous.
Saying "Unicode characters are unique" or "Unicode is unambiguous" is also misleading. The same character may have many different "meanings" -- the code charts give several readings for some characters. The problem is that in some user communities, the different readings are conventionally represented by different forms of the glyph, so the boundary between glyph and character is not necessarily clear cut. Ideally, Unicode would provide different codepoints for the different readings, and in some cases it has; but in many it has not. The solution of using an image is stop-gap at best; as a long-term solution it's a non-starter. -- Elphion (talk) 01:00, 28 November 2011 (UTC)Reply
Some user communities do X is always true, regardless almost of X. But at the end of the day, Unicode characters are for all intents and purposes unique and unambiguous. People who feel their characters are not in Unicode are mostly delusional, but the few that aren't shouldn't be using it or should use the PUA, that's what it's for. I state again, for the sake of clarity, that if, when you talk to someone who may not know in detail what Unicode is, the first thing you do is to zoom in on extremely marginal issues that are completely unrepresentative of the standard as a whole or its rational users, you are implanting an imagine in someone's mind that doesn't correspond to reality. — Preceding unsigned comment added by 82.139.87.39 (talk) 01:40, 28 January 2012 (UTC)Reply
Well, I guess we'll just have to agree to disagree on this one. There are situations where you need one glyph and not the other. Theta vs var-theta is a good case in point -- even though there are separate codepoints in this case, the standard doesn't make clear how much leeway there is before one symbol turns into the other, and font designers are all over the map with that pair. Result: you can't predict which one the end user is going to see with either codepoint, especially in a web environment where you have little control over the actual font being used. -- Elphion (talk) 04:30, 28 January 2012 (UTC)Reply
When on Windows you hold down Alt and type 3 digits that are the code value in 437, it is immediately translated by Windows into the UTF-16/UCS-2 value. The fact that the number was in CP437 is lost at that point. Most applications (such as MS Word) will place the character into the file in UTF-16 or UTF-8 encoding. You usually have to do something special to make a file that is CP437 encoded nowadays.Spitzak (talk) 16:28, 1 May 2011 (UTC)Reply
I think something slightly different is going on. The keyboard module converts the key sequence Ctrl + Alt + decimal code into a code in the range 0..255 and presents that to the active application, which can interpret it as it sees fit (testing the active codepage if it wants). For example, with the default code page (437), Notepad converts the character code to the equivalent Unicode value (and prompts you to save the doc in a Unicode format to avoid losing character information), while Command Prompt simply displays the character in the active code page and stores the code on redirection as the untraslated code. Thus if I execute "echo £>junk.txt" in Command Prompt (entering £ as Ctrl + Alt + 156) and open junk.txt in Notepad, what I see depends on the mode I use to open the file in Notepad: if ANSI mode, I see œ, which is the ANSI character with code 156. -- Elphion (talk) 19:17, 1 May 2011 (UTC)Reply
When you hold alt and press some digits on the numeric keypad, the following messages are sent to the application:
One or more WM_SYSKEYDOWN(VK_MENU, left-alt, alt pressed)
For every digit one or more WM_SYSKEYDOWN(VK_INSERT...VK_PRIOR* or VK_NUMPAD0...VK_NUMPAD9, numpad-0...numpad-9*, alt pressed)
 * Note: not contiguous.
Followed by one corresponding WM_SYSKEYUP message.
This then results in WM_CHAR(Unicode codepoint, left-alt, alt not pressed)
If the number entered starts with one or more zeroes, the active code page will be used, otherwise the OEM code page will be used. If the number is not in range 0...255 edit: masked with 255 (FFh).
Most applications won't bother with the WM_SYSKEY* messages and only look at the WM_CHAR message. Also, this assumes the window is a Unicode window*, which nowadays is almost always the case even when talking about Notepad.
 * If the window is an ‘ANSI’ window (which isn't really ANSI) the character is translated to the best match in the active code page. For example if you entered a box drawing character that isn't present in the active code you'll often get a +. In a SBCS you'll get one WM_CHAR, in a MBCS you'll get one or more.
I'm pretty sure that you won't get the WM_CHAR unless you pass the WM_SYS* messages to the default window procedure.
I hadn't realized leading 0s made a difference (which explains a lot of puzzlement over the years, thanks!) So yes: a leading 0 gives you some approximation to the character with that code in the active code page, otherwise some approximation to the character with that code from the OEM set; and the form of the approximation (Extended ASCII code or Unicode codepoint) depends on the mode of the window. So the upshot is that a default translation is performed automatically by Windows in processing keyboard messages, depending on the active code page and the presence or absence of leading 0s; and the code presented in WM_CHAR will depend on the mode of the window. The app is of course still free to interpret the WM_CHAR or the other kbd messages if it doesn't want the default translation.
For control codes 1..31, the "OEM translation" of code with or without 0 is the control code itself, as is the "Unicode translation" of code with 0 (which Notepad mostly ignores except for \n, \r, \t, \b). But the "Unicode translation" of code without 0 is the Unicode codepoint of the "equivalent" graphical character (smiley face, eighth note, etc.). (At least, that's what Notepad seems to do.)
On my system, values out of range 0..255 (with or without leading 0s) appear to get masked with 0xFF, so, e.g., Alt+254, Alt+510 give the same result.
-- Elphion (talk) 00:45, 28 November 2011 (UTC)Reply
Exactly. Keep in mind that in practice ‘some approximation’ will turn out to mean ‘exact match’. See this summary:
Ansi to Ansi Window: exact match
Ansi/OEM to Unicode Window: exact match, since when Unicode was drawn up the existing code pages were integrated in it. However:
- Some Chinese glyphs were added to their codepages before they were/are added to Unicode. These will not be entered through Alt-codes however, you will use an IME (I don't even know if any multiple-byte characters can be entered with Alt-codes at all). These glyphs will map to PUA codepoints. Fonts which support both Unicode and that Chinese code page will have the same glyphs at those PUA locations as at the corresponding code page locations.
- Some code pages (notably 437) are actually two-in-one. A graphical code page and a partly semantic code page. Since nowadays semantic information tends to be stored at a different conceptual level in the file format (in HTML for example one would use <table> &c. instead of the tab character) the character data is considered to be just text so nowadays the spotlight will mostly fall on the graphical code page.
- Some code pages contain characters which are invalid/unmapped.
OEM to Ansi window: this can be lossy, for example if the OEM code page is Greek, but the Ansi code page is Hebrew. But since Ansi windows generally belong to legacy applications, this generally won't be an issue.
Notepad is a wrapper around the edit control. Nowadays it's a Unicode application, and it is capable of storing Unicode files, which means it is now possible for a text file to contain both versions of 437 character 13: ♪ and CR. (Back in the day, this was impossible, since there'd be no way to distinguish them. You'd either interpret all 13s like CR, like Edit, or you'd draw all of them as ♪, like the graphics card.)
- When you enter a character in code page 437, the graphical representation will be used, as noted above. On a lot of computers 437 is the OEM code page.
- Other code pages may not have graphical glyphs in the 1...31 range. The edit control interprets some control codes specially and ignores/swallows the rest (to the best of my knowledge).
@masked: my mistake, I fixed it.

Why 437?

edit

Why is it called 437? — Preceding unsigned comment added by 82.139.87.39 (talk) 08:32, 1 October 2011 (UTC)Reply

IBM picked the number. See here: [2]. The CP numbers are dispensed neither consecutively nor monotonically (CP00425 in the year 2000, CP00437 in 1984, CP00500 in 1986). I guess we'll never know the etymology of "437" unless we ask the person who was in charge of CP numbers in 1984. 85.178.182.94 (talk) 23:43, 7 December 2011 (UTC)Reply
Any idea how we could go about finding the ‘culprit’? ;-) — Preceding unsigned comment added by 82.139.87.39 (talk) 23:47, 27 January 2012 (UTC)Reply
It is tantalising to think that since Code page 37, aka CP 037, was one of the most important EBCDIC code pages, perhaps there might have been some kind of natural progression from Code Page 037 to 437. However, there are problems:
  • It is not entirely clear when CP037 was created. It was modified in 1986, but was it created right when EBCDIC was created in 1963/1964? And if not then, when?
  • It seems there is no evidence of "intermediary fossil" code pages 137, 237, and 337, or at least it appears to not be easy to find such evidence.
  • Instead of, or perhaps in addition to, the hypothesis that CP437 is the fourth successor to CP037, it's also conceivable each digit has or had some meaning to IBM, and that CP037 and CP437 shared two characteristics. I have not seen actual evidence of that though.
  • Even if it were proved that Code Page 437 really is a numerical successor to 037, in some way that's only kicking the can down the road, because then the question arises, yes, but what did 037 stand for? The way CP037 doesn't really solve CP437's origin question is somewhat comparable to the way the extraterrestrial creator hypothesis doesn't really answer the origin of (intelligent) life question, it only transfers it (as interesting as that might be). Similar things could be said about panspermia and "Are we really Martians?" theories. These things might all be true and interesting, but elephants all the way down doesn't actually tell us what's at the bottom.
I'm concerned that many of the old IBMers who might have been best placed to answer these questions are in the process of dying out. Also, due to increasing political partisanship, the odds of those in the know still being not only alive but also willing to assist Wikipedia are shrinking. ReadOnlyAccount (talk) 06:44, 15 October 2023 (UTC)Reply

Alt codes

edit

Alt+XXX combination uses the current OEM codepage, not 437. Alt-237 will produce a plethora of characters not present in this codeset: э, Ί, Ý, etc. which are not φ and not from this codepage. 178.49.152.66 (talk) 11:27, 26 November 2014 (UTC)Reply

Yes, the Alt+XXX does convert to other characters in other code pages (if you use modern applications that use Unicode pages), but as in the original IBM PC this was the only code page (hardware code page). The command prompt in windows (by default) still use the characters/gliphs (in this code page) for alt codes using a Raster font [1]. Mot256 (talk) 12:30, 26 November 2014 (UTC)Reply

References

You misunderstand what is written there. The raster font which is representing DOS application windows has the character set of the system default OEM codepage, not the 437. Pressing Alt+XXX in Windows in non-Unicode application still results in OEM characters which can be represented in the ANSI codepage, and those are not from 437 codepage, unless that is USA system which has 437 set manually instead of default 850 or 853 or whatever codepage was set by the manufacturer.

In brief: claims that Alt+XXX in Windows operating systems 'is used to enter characters from codepage 437' are not true. Alt+XXX in DOS enters characters without a definite meaning. Since introduction of EGA adapter with downloadable fonts capability those are interpreted in any DOS codepage selected. There is an article about that combination, and it should not be duplicated in the articles for every particular codepage. 178.49.152.66 (talk) 13:33, 26 November 2014 (UTC)Reply

The "default OEM code page" was 437 on virtually every computer made. Stop trying to change this. The main reason 437 is relevant today is due to Alt codes, so this paragraph MUST remain. Alt+237 certainly does produce φ on my machine. I think you are confusing it with Alt+0237.Spitzak (talk) 20:22, 26 November 2014 (UTC)Reply
Okay, more to the point. The average doofus typing on a computer knows this fact: "I type Alt+129 and I will get a ü". This is despite the fact that Windows is using Unicode or CP1252 which uses a different number for this character. This same number works if the code page is 850 or 853 or a whole lot of others. Why? It is because of code page 437. Sure, after the original IBM PC, there were foreign versions manufactured that defaulted to a different set, but those sets were designed to be compatible with 437!!!!!!. Get it? CP 437 is the reason behind these numbers that computer users today still use. Stop deleting this.Spitzak (talk) 20:30, 26 November 2014 (UTC)Reply
Codepage 437 is in no way unique, as there were international codepages since day 0. Moreover, CP 437 was used by the tiny minority of US computers, as all the other European languages defaulted to 850. To state 'the people were using CP437 codes' when most have never seen that codepage is not correct. You are free to modify that paragraph by removing the claims about Alt+XXX combinations producing "CP437 characters" in any version of Windows or DOS.
The statement about 'most' computers using 437 as the OEM codepage is also absurd, as in Windows it is used so only in two locales: US English and Swahili.
I hope your reverts were in good faith and you realize that your viewpoint is not supported by facts and is coming from your blindness to anything outside the USA. 178.49.152.66 (talk) 12:26, 27 November 2014 (UTC)Reply
I would tend to believe that back then US probably had more computers than the rest of the world combined. But that does not change the fact that 437 was not universal. I specifically get the sign Ý, pressing Alt+237 in Windows 7. In general my results match precisely with Code page 850 (although I'm in Denmark, and CP 865 would have made more sense - but I actually seem to remember that the normal DOS CP was 850 back in the late DOS days, but that in the early DOS days before we started talking about codepages (when did the CHCP command arrive?) it was definitely 437. I made several programs which relied on some of the characters which were removed in 850, specifically all the characters which combine single-line and double-line drawing, i.e. ╢). Note: Changing Windows locale, language and keyboard layout to EN_US does not seem to change my results - maybe a reboot is needed, but language and keyboard are usually changed on the fly, so I'm sceptical. Tøpholm (talk) 13:11, 1 June 2015 (UTC)Reply
Comon guys. It is not a coincidence that all those alternative character sets use the same number for the ü character or for the simpler line-drawing graphics. The reason is that 437 was done first and was on the original PC. Yes of course some other code pages were developed and there were computers sold with them, but, unless you want to believe an astronomical coincidence, those sets had about 60% of the same characters in the top 128 as CP 437 because they copied it! Therefore 437 is the reason being these numbers. They do not match ISO-8859-1 or any other standard, they match 437 (or they match 850 or 865 or whatever, all of which match 437 in the majority of locations). Therefore it is perfectly reasonable to say that the Alt code assignments are a legacy of 437. That does not mean there were not some codes that are different. It means a lot, but not all, of the numbers were decided by the design of 437.Spitzak (talk) 03:23, 2 June 2015 (UTC)Reply

158 Pts?

edit

I've not seen N° 158 as Pts, only Pt 86.148.227.195 (talk) 11:44, 30 April 2015 (UTC)Reply

I have seen both. It depends on the font you're using. --Matthiaspaul (talk) 11:59, 6 June 2016 (UTC)Reply
To my knowledge, all the original MDA/CGA/EGA/VGA DOS fonts used to only render the character as Pt, but many subsequent (Unicode-compatible) fonts use the Pts form, which is why your browser might be showing you the s as part of the actual character here: ₧
ReadOnlyAccount (talk) 01:46, 18 November 2023 (UTC)Reply

Alt codes and leading zero

edit

User:TiredTendencies has claimed that in Windows Alt codes, the leading zero is for typing the codes from CP 437, while no leading zero is for getting newer codes (ie CP1252). My impression is that this is incorrect, the opposite is true (leading zero for the new codes, and no leading zero for CP 437), and was purposely done this way because users had memorized the CP 437 codes, but this is based on the text in Alt codes. Can anybody confirm what the actual situation is?Spitzak (talk) 17:44, 6 September 2016 (UTC)Reply

In Windows 7, Alt(0+code) gives me my current codepage (ISO Latin 1), while Alt(code) gives me CP 437. -- Elphion (talk) 21:24, 6 September 2016 (UTC)Reply

File Formats

edit

in the article as it currently stands it says: " Many file formats developed at the time of the IBM PC are based on code page 437 as well." I'm pretty familiar with old dos file formats and I have no idea what this refers to. Unless somebody can point out one file format based off of the codepage and explain why I'm inclined to take it out. Jethro 82 (talk) 01:34, 9 September 2019 (UTC)Reply

I think "file formats" is probably the wrong way to say what is meant, namely, that software often assumed that the characters stored in the files were from this codepage. Software also typically assumed that this was the active codepage when presenting material on the screen. The control characters and the box-drawing characters were especially heavily used to format the screen, and characters hard-coded in the software files were correspondingly assumed to display as 437 characters. -- Elphion (talk) 06:45, 9 September 2019 (UTC)Reply

I'm headed out at the moment, but I think that sentence needs to be heavily edited then. I certainly agree that what you say is true. Could you clarify your 9x16 edit - do you mean that nearly all graphical modes for VGA had a vertical width of 640 or 320 but the text mode had one of 720? Given the memory map of the 640 and 16 colour modes that's not the weirdest thing I've ever heard about VGA, but it's still pretty strange. Jethro 82 (talk) 17:45, 9 September 2019 (UTC)Reply

I may not be remembering this right, but I think it was the monochrome adapter on the original PC that had a horizontal resolution of 720, with the 8-bit wide glyphs padded with one extra column of bits that was either blank or a duplicate of the rightmost one, as described in the article. I remember clearly the "Hercules" 3rd-party graphics adapter for the original PC using 720 pixels wide, likely so it would display on the Monochrome monitor. I *think* the VGA used 640 pixels in all modes but there is a good chance I am not remembering that correctly.Spitzak (talk) 19:41, 9 September 2019 (UTC)Reply

The standard 16-color text mode for VGA had 9x16 characters and resolution 720x400. You can see this in the image showing the character set in this article, and it is stated explicitly at Video Graphics Array. The 9th column was typically blank, to serve as an inter-character space, but several of the line-drawing characters use the 9th column so that the lines would connect. I remember this well, because the Logitech text editor I used heavily in EGA displayed annoyingly (because of the extra character box column) when I switched to VGA; I wrote an assembly routine that automatically hooked the intercept vector to change the display to EGA mode whenever I launched the editor. (And of course eventually intercepts went away too, and I had to find a new editor.) -- Elphion (talk) 01:20, 13 September 2019 (UTC)Reply

Inaccuracy in article

edit

Under the Internationalization section, I see that it says:

Code page 437 has a series of international characters, mainly values 128 to 175 (80<sub>hex</sub> to AF<sub>hex</sub>). However, as it was primarily intended for the [[English language]], it covers only [[English language|English]], [[German language|German]], [[Swedish language|Swedish]] and the pre-1999 [[Turkmen language|Turkmen]] [[Turkmen alphabet|Latin alphabet]] in full, and so lacks several characters important to many Western languages (mostly capital letters):

I actually changed this from the original already (stating that the reason why it doesn't cover many Western languages is because it was primarily intended for English - which may not even be accurate.)

The thing is, I am not sure how we can briefly yet accurately illustrate the fact that in reality, the scripts of some other Western languages (Basque being an example that I found), as well as some non-Western languages (Malay being an example that I found) are covered in full by Code page 437 (or just happen to be). It is not just the 4 languages mentioned (and honestly, Basque is probably as notable, if not more notable than Turkmen, from my limited point of view, being Asian and living in Hong Kong for my whole life and not speaking or learning any of the mentioned languages in this article except English)

Also, Dutch should be mentioned in the bullet point list as it is a semi-major Western language (#10 most speakers according to [3]).

I have added a note in hidden text (removed in the pre tag text above) as well.

Also, another interesting thing we should consider adding to the article (of course, only if we can find sources or actual evidence) was that (unlike most other cases where the encoding is based off the alphabet of a language or languages) it seems like the pre-1999 Turkmen Latin alphabet actually was built around code page 437, which is why it has unusual letters, including currency signs! I don't have any good source except where I found it (on a Wikipedia page: List of Latin script alphabets - see note number 60 about Turkmen, and that statement is unsourced) Hkbusfan (talk) 11:11, 26 April 2020 (UTC)Reply

I agree it is silly to list obscure languages where the set of letters they use happen to be in 437. This was certainly not anything the designers of 437 were thinking of. Mentioning that things like French are missing letters is probably useful as I suspect the designers at least thought their set would be helpful for some European languages.Spitzak (talk) 18:30, 26 April 2020 (UTC)Reply
Just to let you know, sorry that this is a very long paragraph and I don't know how to split it (if even possible) into smaller paragraphs and add bullet points (which were intended in the part where I listed the changes) but still have indents (when I tried, it added pre format (or what looked like pre format) to everything after the first paragraph. Anyway, what you said is very true. Thanks for the reply. The international characters were added because the designers thought that they were helpful: to quote one of the designers, Dr. David Bradley: "Since we were using 8-bit characters we had 128 new spots to fill. We put serious characters there — three columns of foreign characters, based on our Datamaster experience." (This is in one of the sources of this page.) Unfortunately, the letters were not really that helpful in practice (I'm guessing/assuming). I guess that they could often use lowercase to substitute uppercase (which would work fully in some of the languages listed: Spanish, Catalan, and Italian), but that would probably look sloppy (well, I don't know any of those languages but can imagine how it would be sloppy). However, I don't think that Malay, Basque or Dutch (the first 2 of which are supported fully by code page 437) are exactly "obscure" (were you talking about Turkmen or these 3 languages?) Maybe Malay should not be mentioned as it is an Asian language (and this character set was probably meant for use in Western languages, at least originally) At least Dutch should be mentioned in the section under languages with missing important letters, as it is a "major" Western language (though I read it rarely uses diacritics). And Turkmen may not really be notable enough to get a mention as a fully-supported language. (though maybe that was mentioned in the article due to its possible link to the code page - as in the fact that it uses letters that are not expected for a language, including £ and ¢!) Do you think that we should remove the explicit mention of Turkmen and reword that sentence making it say that the mentioned languages are not all of the supported languages (or maybe just say "among Western languages" or something similar)? Here is what I plan to change some time today or in the next few days: Reword it so it says something like "Only a few Western languages are supported, including...", add Basque and remove Turkmen (not adding Malay, because it is not Western OR European), maybe change "Western" to "Western European" in the sentence (if I do this, my rewording will use the term "Western European" too), and finally add Dutch to the "Western" languages for which code page 437 lacks important letters. After doing that, I may add Turkmen back (maybe not in the same sentence, and maybe not in the next few days) if I can find a detailed reliable source (not including that note I saw on the List of Latin-script alphabets page which doesn't link to an external source) stating that the Turkmen alphabet was intentionally designed with code page 437 in mind. Do you think the aforementioned changes are fine? And sorry, there were 2 typos (that I am aware of) in my text. First, I spelt "scripts" with 2 c's. Second, I forgot the hyphen in List of Latin-script alphabets making it a red link. By the way, I was also the one who added the capital letter mention to that sentence. Hkbusfan (talk) 09:33, 29 April 2020 (UTC)Reply


For French, the characters included were actually very carefully chosen: A wide range of lower-case diacritic letters, and the two upper-case diacritic letters which could be considered most essential (C-cedilla and E-acute). It's not as if they didn't bother with French... AnonMoos (talk) 14:16, 30 April 2020 (UTC)Reply
This is why the paragraph should be fixed. The designers obviously *intended* to support French, but a few symbols used by French are missing. They almost certainly did not intend to support Basque/Malay/Turkman, despite the fact that the resulting set just happens to cover those languages (partially because at least the last one was *designed* to fit in this character set!).Spitzak (talk) 16:05, 30 April 2020 (UTC)Reply
I agree with these points. They did actually try supporting French (and other languages), but not fully. Hkbusfan (talk) 00:38, 3 May 2020 (UTC)Reply
If one looks at https://en.wiki.x.io/wiki/ISO/IEC_646 and compares with https://en.wiki.x.io/wiki/Code_page_437#Internationalization one sees that all small French characters from IEC 646 are supported. That œ is not supported in CP437 could be because it was not included in the 7-bit fonts for French. Mikael4u (talk) 13:32, 27 July 2023 (UTC)Reply
Finnish is also fully supported by CP437. This is perhaps important since Linux is from Finland, and they had no problem with the code page in the ROM on the early i386 PCs (CP437). The standard Finnish keyboard layout is also identical to the Swedish, on what is printed on the keys. Finnish is a Uralic language. Mikael4u (talk) 12:19, 27 July 2023 (UTC)Reply

This page used to be useful

edit

I regularly work on a software project that uses 437 code points. I used to be able to easily use this page for reference, since it had the characters and their byte values visually together, without involving extra work and opportunities for human error. Now, not only have the updates to the table made it obnoxious to work with, I can't even use a useful version back in the page history, since templates it depended on have been deleted. Thanks, I hate it. —chaos5023 (talk) 07:09, 23 January 2022 (UTC)Reply

Make the decimal alt codes more accessible!

edit

@Gonnym, Spitzak, and John Maynard Friedman — Unlike the present version of code page 437, this version by Gonnym displays the decimal Alt code in each box. I hope that I speak for many who regularly use the Alt codes; this is far more convenient than calling up a tooltip, especially for those of us who lack precise control of a computer mouse due to a history of e.g., stroke.

The same point applies to Code page 1252.

Peter Brown (talk) 23:54, 13 February 2022 (UTC)Reply

Please restore the text on alt code that says "often this is the only method a user knows to enter a character" as these comments are proof of that fact.Spitzak (talk) 00:01, 14 February 2022 (UTC)Reply
Please restore Gonnym's table, which you deleted here.
Your proposed text is untrue. Most marginally computer-literate anglophone folks enter lower-case characters in the Latin alphabet by depressing a key labelled, by the manufacturer, with the corresponding upper-case character. They often have no idea what the Alt key can be used for.
Peter Brown (talk) 00:53, 14 February 2022 (UTC)Reply
WTF? I didn't say *all* characters, I said *a* character. For instance the ohm sign, or a box drawing corner. It is obvious that people do not know any other method of entering these characters, because otherwise there would not be the above request to make looking up the number easier.Spitzak (talk) 20:05, 14 February 2022 (UTC)Reply
Okay, replacing "a character" by "some characters" yields a statement that is at least plausible. Is it true? I think that unlikely.
To take your example, suppose someone wants to enter the ohm sign. A simple way is to call up the Wikipedia Ohm article, copy the Ω symbol from the infobox, and paste it into the target document. Suppose, though, that a user somehow (divine revelation?) determines that the decimal code for the ohm sign is 937. Is this information useful? Will entering Alt+0937 produce a Ω?
It depends. If, for example,the user is trying to compose a message using Gmail, this procedure will produce, not Ω, but the copyright symbol ©, which has a decimal code point of 169. Why? Because, in Gmail as in most Windows applications, alt codes over 255 are interpreted modulo 256 and mod(937,256) = 169. Only in a very restricted range of applications, which does include Microsoft Word, will Alt+0937 produce Ω. If there indeed are users with a need to produce the ohm sign in a more typical application but whose only method of producing it is by means of an alt code, they are in an impossible situation.
I expect that it is a rare user for whom alt codes are the only method known to enter some characters that she actually has an interest in entering. In the case of the ohm sign, the Wikipedia technique is fast and reliable and will work in a broad range of applications.
Peter Brown (talk) 05:00, 15 February 2022 (UTC)Reply
I hate to be bitchy here but I'm tempted to believe that a more accurate version of the disputed statement would be "often this is the only method people of a certain age know [how] to enter a non-keyboard character". (FWIW, on my Chromebook with UK-extended, Ω is AltGr+⇧ Shift+q but I just got lucky. You want an en-dash? We don't got no en-dash but we have an All-American em-dash on AltGr+,, surely you can use that? So I write {{ndash}} or use Peter's technique and copy/paste it.) Anyway, coming back to the main debate, I'm afraid I have to agree with Peter: practically no-one uses Alt codes outside of very specialised contexts. --John Maynard Friedman (talk) 00:14, 17 February 2022 (UTC)Reply
If "no one uses Alt codes" then no one would be complaining about the lack of the codes on this page, because no one would ever look for them. The fact that this talk section exists is proof that people use Alt codes. In fact it proves they are more likely to know how to use them than cut & paste, since all the characters are provided for cut & paste right on the same Wikipedia page they are complaining is lacking this information!Spitzak (talk) 00:23, 17 February 2022 (UTC)Reply
I think you are mistaking "people who write Wikipedia articles about once-upon-a-time-in-computing" with the more typical visitor who has no interest in such things. And if they found something in an article that says "how to type this character" and followed it to Alt code, the craziness of OEM code pages and Windows code pages alchemy are enough to make any sane person look for a more sensible technique. --John Maynard Friedman (talk) 00:42, 17 February 2022 (UTC)Reply
Calm down, everybody. My son has married a Vietnamese woman named "Hà". She prefers that her name be rendered including the grave accent. "How do I do that in my correspondence?" my wife asked. As she always has Num Lock on, I told her to produce the à by holding down the Alt key while typing "1 3 3" on the keypad and then releasing Alt. Score one for domestic and intergenerational harmony. As it happened, I had anticipated the question and prepared my answer, but it would have been easier if the chart in Code Page 437 included alt codes.  — Peter Brown (talk) 03:06, 17 February 2022 (UTC)Reply

Usage in Dwarf Fortress

edit

Maybe it would be interesting to mention that this code page is the current base of Dwarf Fortress 's tileset. AdalwinAmillion (talk) 09:49, 20 March 2022 (UTC)Reply

I'm pretty certain this is the "base" of virtually every character-based game designed for the IBM PC.16:37, 20 March 2022 (UTC) Spitzak (talk) 16:37, 20 March 2022 (UTC)Reply
The game was not designed for the IBM PC. AdalwinAmillion (talk) 21:15, 20 March 2022 (UTC)Reply
But many were and they used 437's semigraphics to great effect. Dwarf Fortress was designed to mimic that particular aesthetic for which using 437 was a good fit. You could try to pass it as a cultural impact (the infamous ‘in popular culture’ section; weird thing to have in an article about a code page, but for 437 it's imaginable), but I don't think sufficient significance could be established. Are there any secondary and tertiary sources describing how its use of 437 rather than any other character set was an important part of its success? If all they do is just mentioning that this is what it uses, then it's a tangentially related trivia at best. Trivia sections are discouraged. – MwGamera (talk) 11:58, 21 March 2022 (UTC)Reply

Probably lower case theta

edit

Theta (0xE9=233) should probably be lower case theta θ, and not as now upper case theta Θ. The reason is that these symbols are for mathematics and physics and lower case theta is much more used in e.g. polar coordinates. In upper case theta the bar (-) is separated from the O, but it's not in the early computer fonts, since it is a lower case theta. Nowadays it is, but it is because this misunderstanding has spread. Mikael4u (talk) 00:01, 27 July 2023 (UTC)Reply

Sounds right, I added a note to the table copying your text. Spitzak (talk) 04:13, 27 July 2023 (UTC)Reply
You say that IBM claims 0xE9 is upper case theta, but it seems to be Microsoft that does this, see "0xe9 0x0398 #GREEK CAPITAL LETTER THETA" in https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT . IBM designed CP437 and the purpose was probably that 0xE9 should be lower case theta as that is much more useful in science. Mikael4u (talk) 13:45, 27 July 2023 (UTC)Reply
Yea I have no idea where that unicode code point comes from, I just copied the other footnotes, all of which say "IBM says...". Spitzak (talk) 15:55, 27 July 2023 (UTC)Reply
See the IBM references linked in the article. This might be a multiple-use character like many others, but it is IBM that claims its GCGID is GT620000 "Theta Capital" and the image in CP00437.pdf shows an upper case theta with separated stroke inside just like you describe. – MwGamera (talk) 12:44, 3 August 2023 (UTC)Reply

Misleading: 9×16 and "9×14 pixels-per-character font stored in the ROM"

edit

The first paragraph in the Display adapters section has issues and contradicts information elsewhere in the article. The problems are the claims about the 9px width of some of the fonts. While it is correct that in certain modes these fonts' characters are rendered in a 9x14 and 9x16 matrix, respectively, none of these fonts have any 9px wide characters actually stored in the ROM. What's stored is one byte per character line – 8 bits and pixels. That a ninth column is displayed onscreen is a function of logic built into the relevant graphics cards (mostly MDA and VGA). That extra column is mostly empty, except for characters C0–DF as explained here. While it is just arguable the current article phrasing is not strictly incorrect (on grounds of the "9×14 pixels-per-character font stored in the ROM" being a font for the 9×14 character matrix, but not an actual 9x14px font), that defence is really stretching it, and at the very least, the present phrasing is highly misleading. —ReadOnlyAccount (talk) 02:19, 18 November 2023 (UTC)Reply

csPC8CodePage437RTFMBBQROFLMAO

edit

It very slightly bothers me that the article includes the "csPC8CodePage437" moniker (which is in some source, but I have no idea how common that concatenated and camelCased name ever was – sus), but then leaves that pseudorandom-seeming string unexplained. One really has to dig through the cited source to find explanations (in different places) that make it clear that cs is for character set and PC8 for PC 8-bit. (There is an entry further down the page that annotates "PC8-Danish-Norwegian" as being the "PC Danish Norwegian 8-bit PC set".) —ReadOnlyAccount (talk) 19:28, 3 December 2023 (UTC)Reply

" ⁿ " (U+207F, FXC in table) redirects to phonetics article

edit

Given that the placement indicates a primarily mathematical usage, this redirect without a note seems a bit awkward. It seems the character used to redirect to Unicode subscripts and superscripts up until recently, which seems more appropriate here Nethy00 (talk) 23:07, 14 January 2024 (UTC)Reply