👈

The Semware Editor

Code Pages

  Index


Intro

Querying and setting a code page

Tested system locales and their code page

Code pages side by side




  Intro


The following is a simplification, and is limited in its viewpoint by only looking at languages of Western European origin.

People use characters to read and write. Computers use numbers. Therefore people assigned numbers to characters.

Because computers were extremely expensive, the initial trend was to use only 1-byte (8-bit) numbers per character, enforcing a maximum of 256 characters per character set.

For languages that started as Western European, 255 characters were enough for ordinary reading and writing. Other languages are outside the scope of this document.

The first 128 characters of each character set were standardized under the name "ASCII", and are the same for all such languages.

Initially each country created its own character set, in which the last 128 characters differed per country and language. Nowadays, in practice, the huge amount of 1-byte character sets have been reduced to 3 mainly used ones.

A set of characters with numbers assigned to them is also referred to as a code page, and the set is identified by a code page number.
The mainly used 1-byte code pages are:

TSE natively only supports 1-byte character encodings. For multi-byte character encodings it shows multiple "garbage" characters per source character.

The Console variant of TSE uses the code page of its environment for showing edited text. Internally it expects code page 437 when displaying its borders and help screens, with causes a few wrong characters to be shown there in other code pages.

The GUI and Linux variants of TSE use code page 1252, whatever the externally set code page is.

GUI TSE can use extensions to handle Unicode files.

Windows itself used to internally default to code page 1252 (ANSI). and nowadays internally defaults to UTF-16LE. This is why Windows supports "A" and "W" versions of its APIs and data types.

The Windows system locale has a checkmark for "Unicode", which makes cmd command lines use code page 65001 (aka UTF-8).

Windows Office documents internally use UTF-8.

The web mostly uses UTF-8.

"UTF-…" are names for Unicode character encodings. Each Unicode character encoding refers to a different implementation of (read: bytes used for) the same Unicode character set. To refer to a character independent of its encoding, it is ubiquitously referred to by its Unicode administrative number, called a "code point".

Unicode character encodings can also be referred to by a code page number:

Code page numberUnicode nameBytes per character (*)
65001UTF-81 to 4
1200UTF-16LE2 or 4
1201UTF-16BE2 or 4
12000UTF-32LE4
12001UTF-32BE4
(*) depending on the character.

A code page setting is independent of the "Windows display language" setting. For example, you can totally garble a command prompt program's output by trying to make it show a French file that has accented characters while using an US English code page.




  Querying and setting the Windows code page


In Windows you can change the code page at 2 levels:

Locally:

You can query a command prompt's code page with the command "chcp" without parameters.

You can change a command prompt's code page with the command "chcp <code page number>". For example:
      chcp 437

You cannot change a TSE instance's code page from within that TSE instance for that TSE instance.
You can query it with the TSE macro command "Query(CodePage)".

When you start a .bat or .cmd file or a command from TSE, then you have the option to begin the command (file) with a "chcp <code page number>" command to set the desired character set for the output.

Unfortunately, lots of prompt commands that I typically use ignore the local code page and use the global code page instead, while some of them do adhere to the local code page.

Globally:

You cannot globally change the code page directly.

You have to change the "system locale", which has a "Language (Country)" format, and then Windows secretly sets its global code page based on that.

In Windows 10 you can set the system locale with Settings -> Time&Language -> Language -> Administrative language settings -> Change system locale.

Caveat:
If you check "Beta: Use Unicode UTF-8 for worldwide language support", then this overrules the system locale's code page, and sets the code page to 65001 (Unicode's UTF-8).
This is an interesting option, but be aware of its impact on all command prompts and spawned command files.

Because many system locales share a same code page, often nothing happens when you change the system locale, but when the system locale changes the code page too, then you will be asked to restart Windows to activate the change.




  Tested system locales and their code page


System LocaleCodePage
Danish (Denmark)850
Dutch (Belgium)850
Dutch (Netherlands)850
English (Australia)850
English (Canada)850
English (New Zeeland)850
English (United Kingdom)850
English (United States)437
French (Belgium)850
French (Canada)850
French (France)850
German (Austria)850
German (Germany)850
German (Liechtenstein)850
German (Luxembourg)850
German (Switzerland)850
Norwegian (Bokmål)850
Norwegian (Nynorsk)850
Portugese (Brazil)850
Portugese (Portugal)850
Spanish (Chile)850
Spanish (Latin America)850
Spanish (Mexican)850
Spanish (Spain)850
Spanish (United States)850



  Code pages side by side


Click a code page's header to view additional info on Wikipedia.

A "code point" is Unicode's administrative number for a character.
It is great for uniquely identifying a character across code pages and Unicode encodings.

 

Byte Number Code Page 437 Code Page 850 Code Page 1252
Decimal Hexadecimal Character Code Point Character Code Point Character Code Point
00 NUL0000 NUL0000 NUL0000
11 ☺︎263A ☺︎263A SOH0001
22 263B 263B STX0002
33 ♥︎2665 ♥︎2665 ETX0003
44 ♦︎2666 ♦︎2666 EOT0004
55 ♣︎2663 ♣︎2663 ENQ0005
66 ♠︎2660 ♠︎2660 ACK0006
77 2022 2022 BEL0007
88 25D8 25D8 BS0008
99 25CB 25CB HT0009
10A 25D9 25D9 LF000A
11B ♂︎2642 ♂︎2642 VT000B
12C ♀︎2640 ♀︎2640 FF000C
13D 266A 266A CR000D
14E 266B 266B SO000E
15F 263C 263C SI000F
1610 25BA 25BA DLE0010
1711 25C4 25C4 DC10011
1812 ↕︎2195 ↕︎2195 DC20012
1913 ‼︎203C ‼︎203C DC30013
2014 00B6 00B6 DC40014
2115 §00A7 §00A7 NAK0015
2216 25AC 25AC SYN0016
2317 21A8 21A8 ETB0017
2418 2191 2191 CAN0018
2519 2193 2193 EM0019
261A 2192 2192 SUB001A
271B 2190 2190 ESC001B
281C 221F 221F FS001C
291D ↔︎2194 ↔︎2194 GS001D
301E 25B2 25B2 RS001E
311F 25BC 25BC US001F
3220 SP0020 SP0020 SP0020
3321 !0021 !0021 !0021
3422 "0022 "0022 "0022
3523 #0023 #0023 #0023
3624 $0024 $0024 $0024
3725 %0025 %0025 %0025
3826 &0026 &0026 &0026
3927 '0027 '0027 '0027
4028 (0028 (0028 (0028
4129 )0029 )0029 )0029
422A *002A *002A *002A
432B +002B +002B +002B
442C ,002C ,002C ,002C
452D -002D -002D -002D
462E .002E .002E .002E
472F /002F /002F /002F
4830 00030 00030 00030
4931 10031 10031 10031
5032 20032 20032 20032
5133 30033 30033 30033
5234 40034 40034 40034
5335 50035 50035 50035
5436 60036 60036 60036
5537 70037 70037 70037
5638 80038 80038 80038
5739 90039 90039 90039
583A :003A :003A :003A
593B ;003B ;003B ;003B
603C <003C <003C <003C
613D =003D =003D =003D
623E >003E >003E >003E
633F ?003F ?003F ?003F
6440 @0040 @0040 @0040
6541 A0041 A0041 A0041
6642 B0042 B0042 B0042
6743 C0043 C0043 C0043
6844 D0044 D0044 D0044
6945 E0045 E0045 E0045
7046 F0046 F0046 F0046
7147 G0047 G0047 G0047
7248 H0048 H0048 H0048
7349 I0049 I0049 I0049
744A J004A J004A J004A
754B K004B K004B K004B
764C L004C L004C L004C
774D M004D M004D M004D
784E N004E N004E N004E
794F O004F O004F O004F
8050 P0050 P0050 P0050
8151 Q0051 Q0051 Q0051
8252 R0052 R0052 R0052
8353 S0053 S0053 S0053
8454 T0054 T0054 T0054
8555 U0055 U0055 U0055
8656 V0056 V0056 V0056
8757 W0057 W0057 W0057
8858 X0058 X0058 X0058
8959 Y0059 Y0059 Y0059
905A Z005A Z005A Z005A
915B [005B [005B [005B
925C \005C \005C \005C
935D ]005D ]005D ]005D
945E ^005E ^005E ^005E
955F _005F _005F _005F
9660 `0060 `0060 `0060
9761 a0061 a0061 a0061
9862 b0062 b0062 b0062
9963 c0063 c0063 c0063
10064 d0064 d0064 d0064
10165 e0065 e0065 e0065
10266 f0066 f0066 f0066
10367 g0067 g0067 g0067
10468 h0068 h0068 h0068
10569 i0069 i0069 i0069
1066A j006A j006A j006A
1076B k006B k006B k006B
1086C l006C l006C l006C
1096D m006D m006D m006D
1106E n006E n006E n006E
1116F o006F o006F o006F
11270 p0070 p0070 p0070
11371 q0071 q0071 q0071
11472 r0072 r0072 r0072
11573 s0073 s0073 s0073
11674 t0074 t0074 t0074
11775 u0075 u0075 u0075
11876 v0076 v0076 v0076
11977 w0077 w0077 w0077
12078 x0078 x0078 x0078
12179 y0079 y0079 y0079
1227A z007A z007A z007A
1237B {007B {007B {007B
1247C |007C |007C |007C
1257D }007D }007D }007D
1267E ~007E ~007E ~007E
1277F 007F 2302 DEL007F
12880 Ç00C7 Ç00C7 20AC
12981 ü00FC ü00FC UNUSED
13082 é00E9 é00E9 201A
13183 â00E2 â00E2 ƒ0192
13284 ä00E4 ä00E4 201E
13385 à00E0 à00E0 2026
13486 å00E5 å00E5 2020
13587 ç00E7 ç00E7 2021
13688 ê00EA ê00EA ˆ02C6
13789 ë00EB ë00EB 2030
1388A è00E8 è00E8 Š0160
1398B ï00EF ï00EF 2039
1408C î00EE î00EE Œ0152
1418D ì00EC ì00EC UNUSED
1428E Ä00C4 Ä00C4 Ž017D
1438F Å00C5 Å00C5 UNUSED
14490 É00C9 É00C9 UNUSED
14591 æ00C9 æ00E6 2018
14692 Æ00E6 Æ00C6 2019
14793 ô00C6 ô00F4 201C
14894 ö00F4 ö00F6 201D
14995 ò00F6 ò00F2 2022
15096 û00F2 û00FB 2013
15197 ù00FB ù00F9 2014
15298 ÿ00F9 ÿ00FF ˜02DC
15399 Ö00FF Ö00D6 2122
1549A Ü00D6 Ü00DC š0161
1559B ¢00DC ø00F8 203A
1569C £00A2 £00A3 œ0153
1579D ¥00A3 Ø00D8 UNUSED
1589E 00A5 ×00D7 ž017E
1599F ƒ0192 ƒ0192 Ÿ0178
160A0 á00E1 á00E1 NBSP00A0
161A1 í00ED í00ED ¡00A1
162A2 ó00F3 ó00F3 ¢00A2
163A3 ú00FA ú00FA £00A3
164A4 ñ00F1 ñ00F1 ¤00A4
165A5 Ñ00D1 Ñ00D1 ¥00A5
166A6 ª00AA ª00AA ¦00A6
167A7 º00BA º00BA §00A7
168A8 ¿00BF ¿00BF ¨00A8
169A9 2310 ®00AE ©00A9
170AA ¬00AC ¬00AC ª00AA
171AB ½00BD ½00BD «00AB
172AC ¼00BC ¼00BC ¬00AC
173AD ¡00A1 ¡00A1 SHY00AD
174AE «00AB «00AB ®00AE
175AF »00BB »00BB ¯00AF
176B0 2591 2591 °00B0
177B1 2592 2592 ±00B1
178B2 2593 2593 ²00B2
179B3 2502 2502 ³00B3
180B4 2524 2524 ´00B4
181B5 2561 Á00C1 µ00B5
182B6 2562 Â00C2 00B6
183B7 2556 À00C0 ·00B7
184B8 2555 ©00A9 ¸00B8
185B9 2563 2563 ¹00B9
186BA 2551 2551 º00BA
187BB 2557 2557 »00BB
188BC 255D 255D ¼00BC
189BD 255C ¢00A2 ½00BD
190BE 255B ¥00A5 ¾00BE
191BF 2510 2510 ¿00BF
192C0 2514 2514 À00C0
193C1 2534 2534 Á00C1
194C2 252C 252C Â00C2
195C3 251C 251C Ã00C3
196C4 2500 2500 Ä00C4
197C5 253C 253C Å00C5
198C6 255E ã00E3 Æ00C6
199C7 255F Ã00C3 Ç00C7
200C8 255A 255A È00C8
201C9 2554 2554 É00C9
202CA 2569 2569 Ê00CA
203CB 2566 2566 Ë00CB
204CC 2560 2560 Ì00CC
205CD 2550 2550 Í00CD
206CE 256C 256C Î00CE
207CF 2567 ¤00A4 Ï00CF
208D0 2568 ð00F0 Ð00D0
209D1 2564 Ð00D0 Ñ00D1
210D2 2565 Ê00CA Ò00D2
211D3 2559 Ë00CB Ó00D3
212D4 2558 È00C8 Ô00D4
213D5 2552 ı0131 Õ00D5
214D6 2553 Í00CD Ö00D6
215D7 256B Î00CE ×00D7
216D8 256A Ï00CF Ø00D8
217D9 2518 2518 Ù00D9
218DA 250C 250C Ú00DA
219DB 2588 2588 Û00DB
220DC 2584 2584 Ü00DC
221DD 258C ¦00A6 Ý00DD
222DE 2590 Ì00CC Þ00DE
223DF 2580 2580 ß00DF
224E0 α03B1 Ó00D3 à00E0
225E1 ß00DF ß00DF á00E1
226E2 Γ0393 Ô00D4 â00E2
227E3 π03C0 Ò00D2 ã00E3
228E4 Σ03A3 õ00F5 ä00E4
229E5 σ03C3 Õ00D5 å00E5
230E6 µ00B5 µ00B5 æ00E6
231E7 τ03C4 þ00FE ç00E7
232E8 Φ03A6 Þ00DE è00E8
233E9 Θ0398 Ú00DA é00E9
234EA Ω03A9 Û00DB ê00EA
235EB δ03B4 Ù00D9 ë00EB
236EC 221E ý00FD ì00EC
237ED φ03C6 Ý00DD í00ED
238EE ε03B5 ¯00AF î00EE
239EF 2229 ´00B4 ï00EF
240F0 2261 SHY00AD ð00F0
241F1 ±00B1 ±00B1 ñ00F1
242F2 2265 2017 ò00F2
243F3 2264 ¾00BE ó00F3
244F4 2320 00B6 ô00F4
245F5 2321 §00A7 õ00F5
246F6 ÷00F7 ÷00F7 ö00F6
247F7 2248 ¸00B8 ÷00F7
248F8 °00B0 °00B0 ø00F8
249F9 2219 ¨00A8 ù00F9
250FA ·00B7 ·00B7 ú00FA
251FB 221A ¹00B9 û00FB
252FC 207F ³00B3 ü00FC
253FD ²00B2 ²00B2 ý00FD
254FE 25A0 25A0 þ00FE
255FF NBSP00A0 NBSP00A0 ÿ00FF

 

NBSP is the "no break space" character.
SHY is the "soft hyphen" character.
"₧" is one character, signifying the "pesetas" currency.


These webpages are created and maintained with The SemWare Editor Professional