Talk:Base64/Archives/2021

Latest comment: 2 years ago by Johnuniq in topic Base 64 alphabet


Split

The page currently is about two completely different things:

  1. the numeral system in base 64
  2. the base64 encoding
I'm not sure i'd call them completely different after all a binary file is really just a very big number if you think about it. The page describes a numeral system and then goes onto describe its uses I see nothing wrong with this structure. Plugwash 15:47, 30 November 2005 (UTC)
I rather agree, particularly since the numeral system isn't particularly notable except for its use in the encoding. —Ilmari Karonen (talk) 15:52, 30 November 2005 (UTC)

Sorry folks, I made a mistake here: in spite of what I remembered, the numeral system used by the Babylonians was base 60, not 64 (we also divide time in 60th for this reason). Obviously, there is not much to say about the numeral system, except that it is the base of the base64 encoding. I will remove the split tag. Paolo Liberatore (Talk) 17:00, 30 November 2005 (UTC)

MIME Line breaks are <CR><LF>

From the article:

As newlines are inserted in the encoded data every 76 characters, the actual length of the encoded data is approximately 135.1% of the original.

To the best of my knowledge, MIME defines a line break as the character pair <CR><LF> (in that order). Therefore, every 57 bytes from the source is expanded to 76 Base64 characters + <CR> + <LF>, or 78 characters. This gives an expansion of approximately 136.8%. —&bnsp;Thiadmer Riemersma (thiadmer at compuphase dot com)— Preceding unsigned comment added by 14:23, 8 August 2006 (UTC) (talkcontribs) 81.204.135.157

Googling and reading the article newline seem to verify this, so I modified the article accordingly. –Mysid(t) 18:21, 8 August 2006 (UTC)

Base64 in freeware applications

I updated recently this article to include the vast usage possible for Base64, including in freeware applications like Mozilla and Thunderbird.

The simplest of examples:

 C:\Documents and Settings\<UNAME>\Application Data\Thunderbird\Profiles\<PRNUMBER>.default
 type signons.txt
 mailbox://henrique@venus
 \=username=\
 ~
 *\=password=\
 TW9ua2V5
 <UNAME> stands for your username in a Windows XP distribution, for example.

The password can be easily decoded, and is: Monkey.

This does not detract nor diminishes the great software provided by Mozilla (subjective opinion I know... that's why I posted this opinion in this discussion article). The majority of users will not notice these security flaws, nor even bother their personal data is subject of Trojans in their desktops... that can be able to decode these passwords quite easily and deliver them worldwide.

Of course, both Mozilla and Thunderbird offer an option for symmetric cyphers (increasingly more difficult to decode) on all Managed Passwords.

I know this is not the right placeholder for software considerations: but I found outstandingly interesting to find even Mail User Agents (MUA) use the basic concepts of Mail-encoding (as is Base64 mainly used!) for obscuring plain-text passwords.— Preceding unsigned comment added by Henrique Moreira (talkcontribs) 00:25, 19 December 2005‎ (UTC)

Unless the user is asked to enter a key then the ONLY purpose encrypting the key serves is to prevent someone accidentally remembering a password they shouldn't when poking around in a config file. If someone has access to the encrypted password then they almost certainly have access to the key as well! Plugwash 01:26, 19 December 2005 (UTC)
It's my expectation that Mozilla consider the password as stored in clear text, and use Base64 not for 'encryption' but rather in case the password contains special characters (such as the space). Plugwash is right, of course, unless there's a key the password could easily be decrypted by reverse engineering the executable that reads it. 59.167.212.218 23:25, 4 August 2007 (UTC) (aka Calrion)

mIRC trojans

mIRC trojans often use Base 64 as mIRC has functions for this inbuilt: $encode(text,m) and $decode(text,m). The trojans are spread over /amsg (message to all channels) or private messages and rely on the naiv trust of the users. They try to make users run commands encoded in Base 64 by claiming things like it will get them the latest Matrix movie, or operator (administrator) status in a certain channel. Some of them comes in the form '//write somename $decode(Base 64 encoded script,m) || .load -rs somename' and installs a script that keeps spreading this code, and sometimes comes with a backdoor. Other Trojans hid the whole code by making use of brackets: //[ $decode(Base 64 encoded commands,m) ]' and can run any commands. Then there's the ones who make use of $findfile to execute commands and appearing to be a harmless /echo: '//echo $($decode(Base 64 encoded $findfile mostly executing /amsg $cb,m),2)' where $cb is the clipboard content which mostly is the command, and $(...,2) evaluates the $findfile decoded.

Perhaps someone could add a note on this in the article. I have never written in a WP article and feel a bit lost.— Preceding unsigned comment added by 62.181.79.121 (talkcontribs) 23:01, 23 January 2006 (UTC)

UUU becomes VVVV

maybe mention

$ echo -n UUU|base64-encode  # NB: in some OSs the command is now (2019) just "base64".
VVVV

and say why, just for the fun of it.— Preceding unsigned comment added by 210.201.31.246) (talkcontribs) 21:47, 16 February 2006 (UTC

-Black Walnut (talk) 08:37, 3 August 2019 (UTC) removed superfluous trailing ";echo" from above commandline and noted elimination of "-encode".

UTF, really?

"This data encoding scheme is used to encode the UTF-16" Is it really doing this? I doubt. It's encoding Unicode codepoints, just like utf-8, utf-16, ucs-2 do.— Preceding unsigned comment added by 84.92.248.233 (talkcontribs) 18:13, 17 June 2006 (UTC)

The RFC for UTF-7 seems to actually date back to the days before supplementary characters so its no help, using UTF-16 surrogates would be the only sane way to support those planes in UTF-7 without massive changes but i do not know if current implementations do so. Plugwash 18:24, 17 June 2006 (UTC)
UTF-7 is generally deprecated these days. Rootless 13:44, 18 June 2006 (UTC)

Modified Base64 for URL

The section URL Applications contains a little paragraph about "modified Base64 for URL". However acording to the referenced page http://tools.ietf.org/html/rfc3548#page-6, it is wrong.

rfc3548 seems to think that URL and file name encodings use '-' and '_' instead of '+' and '/'. Not '*' and '-'.

And unless I am missing something they should also be with the padding '=' but as far as I know '=' is reserved for URLs... which would indicate that the current wiki text is more correct.— Preceding unsigned comment added by 04:30, 14 August 2006 (UTC) (talkcontribs) Nakerlund

In the Wiki text * is the character to use, but that is not right because * will get percent encoded by url encoders. Therefore I believe the correct base64 is as the RFC states —Preceding unsigned comment added by 207.58.192.150 (talk) 20:19, 17 January 2008 (UTC)
They thought about that too (section 5 paragraph 3)
The pad character "=" is typically percent-encoded when used in an URI [9], but if the data length is known implicitly, this can be avoided by skipping the padding; see section 3.2.
85.149.120.16 (talk) 23:11, 18 May 2008 (UTC)

Example

I felt the example wasn't quite as intuitive as it could be, so I created the table version. (Sometime reader, new to editing.) aes — Preceding unsigned comment added by 02:54, 19 August 2006 (UTC) (talkcontribs) 83.227.141.13

The "Man" --> "TWFu" table is a great addition. Thanks. — Omegatron 22:03, 27 December 2007 (UTC)

Left to right

Am I reading this wrong?

"Groups of 6 bits (6 bits have a maximum of 26 = 64 different binary values) are converted into individual numbers from left to right"

Looks like they're converted into individual numbers from right to left, not left to right. For example:

0 0 0 1 0 1 = 5

and

0 1 0 0 1 1 = 19

Ajkochanowicz (talk) 14:10, 7 January 2013 (UTC)

The groups are processed from left to right: 010011010110000101101110 → 010011 010110 000101 101110 → 1910 2210 510 4610. Anomie 03:51, 8 January 2013 (UTC)

Material added by Ultimater

I have reverted the addition of the following material by user:Ultimater. I think there may be some merit in it, but I also think some more attention should be paid to style and formatting, before it is added to the article proper. E.g., use of whole-word capitalization, rethorical questions, sentences starting by "Notice...", "Remember..." and the like should be avoided or limited.--Niels Ø 11:31, 23 August 2006 (UTC)

Added before heading "An example"

Also notice that the length of each of the outputs are multiples of 4. Not only MUST every base64-encoded string consist of an even number of characters, the number of total characters MUST be evenly divisible by 4. The reason is because base64 is used to represent an exact binary sequence of data in groups 8 bits.--Niels Ø 11:31, 23 August 2006 (UTC)

Added before heading "UTF-7"

Remember ;   The text doesn't need to be exactly 3 characters in length. Notice the usage of the padding character.

Text content M a  
ASCII 77 97  
Bit pattern 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 0            
Index 19 22 4  
Base64-Encoded T W E =

Notice that the equals character (the padding character) is appended to the generated base64-encoded string and ONLY when there is an empty slot in the Text content. The padding character will NEVER appear in the middle or beginning of a base64-encoded string. The padding character can be totally OMITTED from your base64-encoded string and it will not harm the string's contents. The reason is because the number of un-used bits can be recalcuated. However it's always a good idea to include the padding character in your strings.

It's possible to have two padding characters but NEVER three:

Text content M    
ASCII 77    
Bit pattern 0 1 0 0 1 1 0 1 0 0 0 0                        
Index 19 16    
Base64-Encoded T Q = =

Why won't you encounter 3 padding characters? Because the string is read 3 characters at a time and 3 padding characters would translate as 000000 000000 000000 000000 which is "AAAA" and can be totally ignored -- however feel free to add as many extra A's or padding characters to the end of your base64-encoded string as you wish.

Let's have a second look at our previous example base64-encoded string again:

----+---10----+---20----+---30----+---40----+---60----+---70----+---80----86

TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu
dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=

Tell me, if you were to decode that string back into it's original ASCII form, how many characters would it consist of? How long would it take you to turn that into a sequence of 0's and 1's and to count the number of bits and divide it by 4 then calcuate the remainder so you know the number of unused bits? Who needs to count it!? Just count the number of padding characters at the end of the string (in this case one) and you will know the number of un-used bits (one padding character per every two un-used bits). Hence in this case, the length of the original string was 1 character short (the padding character is a blank slot) of being a multiple of three.--Niels Ø 11:31, 23 August 2006 (UTC)

How are the last two rows in the diagram derived?

If you understand how the top three rows are derived it is fairly straightforward. If yo look at the binary digits above the index column, the first 6 are 010011 which is 16+2+1 giving the decimal 19. Likewise the next 6 are 010000, which is decimal 16. These are the values in the index column.

If you take the set of characters from the article:

"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"

Use the 19 to index into here (starting with A as zero, B as 1, etc. and you get T, the character in the last row. Similarly 16 indexes to Q. Does the article need more step by step details? -- Q Chris 07:04, 19 April 2007 (UTC)

I don't understand the need for padding

I don't understand the need for padding. You can always tell the number of bytes of decoded chars from the encoded.

  • 1 byte clear-> 2 bytes encoded
  • 2 bytes clear -> 3 bytes encoded
  • 3 bytes clear -> 4 bytes encoded

Why did they think padding was needed? -- Chris Q 13:25, 13 December 2006 (UTC)

Good question, afaict the specs are silent on the matter too, the only reasons i can think of would possibbly be either poor understanding on the parts of the creators or possiblly an intention to allow encoded data to be concatenated without decoding. Plugwash 13:48, 13 December 2006 (UTC)
It is a decoding optimisation. For decoding the input is always a multiple of 4, when you take padding into account. This means you can "read" the input as an int32_t in C. It also allows you to do a minor consitency check using the length of the input. -- James Antill 18:10, 14 December 2006 (UTC)
After having implemented a decoder I can confirm the optimization. It allows you to remove EOF/length checking from the decoding loop, giving a very small performance improvement. As far a minor consistency check goes I am a little sceptical, as it would only work 2/3 of the time and most MIME usages allow non-mime characters to be appended. -- Chris Q 12:12, 22 January 2007 (UTC)
I don't buy that it improves performance, since you could process the last chunk in a separate piece of code, outside the inner loop. However, I think it does allow the code to be simpler. I found a nice simple example of an implementation in VB at Base64Dec01. I spent a fair amount of time pondering this question myself before I found this discussion there. Perhaps it should be mentioned in the article? --LCarl 23:01, 9 February 2007 (UTC)
what if you want to read/interpret a base64 encode file from backwards. With out padding there is only one way to read the content, that is from the start of the file. (r.saiprasad at gmail dot com)— Preceding unsigned comment added by 64.136.4.254 (talkcontribs) 14:58, 23 February 2007 (UTC)
True but given the intentional use of base64 there is little need to read it randomly or backwards. Plugwash 13:18, 6 March 2007 (UTC)
I think it's so that you can distinguish between a final null byte in the data (which should be retained) and a null byte added to make up a triad to be encoded. A base64 sequence "AA==" should be distinct from "AAAA". --wintermute (talk) 21:29, 14 December 2007 (UTC)
"AA" and "AAA" are already distinct from "AAAA" —Preceding unsigned comment added by 66.235.5.33 (talk) 17:19, 12 January 2008 (UTC)
I believe you are on the wrong track. If memory serves me, the first uses of base64 was encoding/decoding binaries in usenet, all properly licensed of course. The most common way to post such binaries was in parts, meaning that the parts would typically be concatenated back together, manually in the early years, with "---cut here---" instructions to get it back into proper form. While it would be possible to break parts on multiples of three bytes, nobody ever did. The most common breaking points for binary parts were pure powers of 2, such as 1024, 8192, etc. This did not work when decoding the concatenated parts, because there was no reliable way of restarting the encoding every time a new part started. The padding thus just made it work magically. But there are a couple of points that should be made based upon this original and only strong use case:
1. A decoder that only accepts padding at the end of the entire block of data, or even worse, that requires padding at the end of the entire block, is stupid because after a concatenation operation, which is the only use case for padding that I find significant, padding may occur in many places internal to the string.
2. If you are not going to do the full job allowing padding anywhere, do us all a favor and eliminate the padding and eliminate the padding requirement from your decoder. If it is there, do the right thing.
3. Especially if you are encoding something to serve as an identifier, if you allow padding, it should be permitted to occur internally, thus your identifier does not necessarily encode uniquely, because you could legally insert padding after each byte in the identifier, or after every other byte, etc. If you say "that's stupid, who would put padding in the middle, just because that is the original use case", I say "well it is just as stupid to put it on the end if you are never going to concatenate values". It just makes the identifier longer, where one of the points of base64 is to make it as short as reasonably practical. I believe reasonable standards allow for no padding. The go language certainly provides the option.174.52.91.80 (talk)_

Quadrosexagesimal

What's up with this particular expression? It's the equivalent of hexadecimal, but it appears to be all wrong. Could somebody brake down this particular expression to the latin word roots it is made up from and fix the spelling please? --89.212.75.6 16:01, 24 February 2007 (UTC)

Quadrosexagesimal is not the same as hexadecimal, it is base 64 as opposed to base 16. I am no linguist but I guess that quadro refers to the 4 and sexagesimal to the 60, as in hexadecimal the hex refers to six and the decimal to 10. Note the term Quadrosexagesimal applies to the definition of the mathematical base 64, not the Base64 encoding. IMHO Really there should be two articles current one does not separate the two completely different terms very clearly, though separating them would require a lot of work -- Q Chris 15:00, 19 April 2007 (UTC)
It's not Latin. It's a neologism created in English by analogy of words like hexadecimal. My question is whether this is actually an English word (it's not in the OED), or an invention for Wikipedia. kwami 04:42, 27 September 2007 (UTC)
Actually, it should be quattuorsexagesimal, though that's still faux Latin. I'm deleting the word until s.o. can come up with a non-Wikipedia based source. kwami 04:49, 27 September 2007 (UTC)

Justification for URL variant

The justification given for the URL variant of Base64 is unsourced and it doesn't really sound right:

"Using a URL-encoder on standard Base64, however, is inconvenient as it will translate the '+' and '/' characters into special '%XX' hexadecimal sequences ('+' = '%2B' and '/' = '%2F'). When this is later used with database storage or across heterogeneous systems, they will themselves choke on the '%' character generated by URL-encoders (because the '%' character is also used in ANSI SQL as a wildcard)."

It is my understanding that the problem isn't with URL-escaping the data, it's with the interpretation of UNescaped base64 data in URLs. Specifically, the / is used as a path separator, and + is typically converted to a space. So you can't have a URL like http://example.com/base64-encoded-text-here because it will be mangled. Both / and + are perfectly legal in a URL and there's no reason that a developer or toolset should automatically think to escape it. In fact, it would be great if they did because their application would then unescape it and they'd end up with valid Base64 data again.

Further, the justification involving "database storage or across heterogeneous systems" choking on '%' sounds a bit odd to me. Applications should be unescaping their URLs before using the data to begin with (bug 1), and back-ends should never be blindly using user-provided data directly in their queries (bug 2). Any application that fits into this category has more serious problems to worry about, so it makes no sense to me why this should be used as justification for a URL-friendly version of Base64. —Fastolfe00 04:38, 25 October 2007 (UTC)

We don't seem to have a source for either version of the explanation. I'd put a {fact} tag on it, but it seems that the whole article needs sourcing, not just that one point. On the other hand, if you know a source that backs you up, please put your explanation in the article. In my opinion, it makes more sense. -- trlkly 07:19, 15 April 2008 (UTC)

Requested move

The article should be moved to Base 64. Other bases have a space such as Base 24. I know the encoding method is called base64, not base 64, but the former is based on the latter, not vice versa. - TAKASUGI Shinji (talk) 06:51, 21 November 2007 (UTC)

Oppose I would oppose this move on the grounds that I reckon that most people finding this article will be looking for the base64 encoding mechanism rather than the mathematical base 64. Also the article is almost exclusively about base64 encoding, not the mathematical base. Why not create an article about the mathematical base 64 at Base 64 rather than having the redirect. Each article would naturally reference the other. -- Q Chris (talk) 08:38, 21 November 2007 (UTC)
Then please separate Base 32 and Base32 too. - TAKASUGI Shinji (talk) 11:30, 21 November 2007 (UTC)
Yes, I would go along with that -- Q Chris (talk) 14:12, 21 November 2007 (UTC)
I've noticed this page has been renamed. Now, if we divide it into Base 64 and Base64, I think it's better to move this article again to Base64 and create the new article Base 64. What do you think? - TAKASUGI Shinji (talk) 01:01, 22 November 2007 (UTC)
I have moved it back! There is no justification for moving a page that is almost entirely about base64 to base 64. You are quite correct, it makes sense for this article to be called base64 and a new article about the mathematical base 64 should be written. —Preceding unsigned comment added by Q Chris (talkcontribs) 08:06, 22 November 2007 (UTC)
I have separated Base64 and Base 64. - TAKASUGI Shinji (talk) 10:30, 22 November 2007 (UTC)
Good stuff, thanks -- Q Chris (talk) 12:28, 22 November 2007 (UTC)

Is this true?

It is the largest power-of-two base that can be represented using single printable ASCII characters.

I'd have thought that base 128 would hold that honour, and that base 64 is used because it's the largest power-of-2 base that can be represented using characters that are pretty much certain to have the same value on different systems. Thoughts? --wintermute (talk) 21:20, 14 December 2007 (UTC)

There are 95 printable ASCII (not Extended ASCII) characters, and since BASE95 isn't a power of 2, I guess it's true. —EncMstr 21:37, 14 December 2007 (UTC)

JavaScript Code Sample Usage

May I use JavaScript base64 implementation given in the article in GNU GPL software legally? Does anyone have an idea, where this code comes from? Thanks. Dadudadu (talk) 12:58, 20 March 2008 (UTC)

Although I hate to be unhelpful, I can't answer either question. The first would be giving legal advice, which Wikipedia can't do, and the second I don't know the answer. What I can tell you is that there are implementations of base64 that are released under the GPL available, so you might want to consider using one of those. -- trlkly 07:10, 15 April 2008 (UTC)

Citations Missing Tag

This article needs citations. It's not necessarily inaccurate, but it has only one citation, and that is for an relatively insignificant point. So I have added the {{citations missing}} tag, as I feel it is the most accurate representation of what this article needs, i.e. both citations and footnotes. -- trlkly 07:31, 15 April 2008 (UTC)

Apparent buffer overflow bug in C code

I'm looking at the C code, and it looks like it can write one byte beyond the specified end of the output buffer:

      result[resultIndex++] = base64chars[n0];
      if(resultIndex > maxResultLength) return;
      result[resultIndex++] = base64chars[n1];
      if(resultIndex > maxResultLength) return;
      // one more instance omitted for brevity

The problem is that it writes to the output buffer before it checks for overflow. For example, suppose maxResultLength == 0. (Yes, this is a dumb value, but the same argument will apply regardless of the value.) It will write the value to result[0], increment resultIndex to 1, then notice it is too big and return. I am pretty sure it should be:

      if(resultIndex >= maxResultLength) return;
      result[resultIndex++] = base64chars[n0];
      if(resultIndex >= maxResultLength) return;
      result[resultIndex++] = base64chars[n1];
      // and the third instance of writing to result[] should be fixed likewise

If I am missing something and this is not really a bug, will someone please let me know? Otherwise I will fix the code, test it, and edit this article accordingly. CosineKitty (talk) 16:54, 13 May 2008 (UTC)

Proposal for merging

There is no relationship between "Base 64" and "Base64" -- they refer to different concepts and the names, as given, are appropriate to each. Please don't merge them and close this discussion. Thanks! —Preceding unsigned comment added by 69.3.26.236 (talk) 05:23, 24 June 2008 (UTC)

In reality, the only interesting thing one can say about this base is its use in Base64. In my opinion, the article Base 64 should be "merge" to this article in accordance with Wikipedia:Notability (numbers). QQ (talk) 11:20, 23 May 2008 (UTC)

Merge and redirect I don't care which spelling is used, but one should be a redirect to the other. —EncMstr (talk) 07:29, 28 May 2008 (UTC)
Merge and redirect as per EncMstr, also for Base 32 into Base32 (I just edited Base32, and needed Base64 in UTF-1). --217.184.142.41 (talk) 11:30, 3 June 2008 (UTC)
Don't merge - The previous decision to split was voted on and decided. Merging these would be as sensible as merging Georgia (country) and Georgia (U.S. state) on the grounds that someone who knew no geography might look up the wrong one. The concept of the mathematical base 64 is not the same as the encoding mechanism. -- Q Chris (talk) 07:31, 24 June 2008 (UTC)
Don't merge - Base 64 shouldn't be called Base64. If you think Base 64 is too short, you should try to delete it and remove it from the base list. - TAKASUGI Shinji (talk) 07:52, 24 June 2008 (UTC)
Merge and redirect The articles should be merged and Base 64 should be redirected to Base64 on the basis that Base 64 encoding is the only practical use of the Base 64 number system. The number system is only relative with respect to base64 encoding, I mean the only thing worth putting on a page dedicated to the base 64 number system would be its use in base64 encoding, therefore it should not have its own page. --Herecomesgibson (talk) 22:07, 18 July 2019 (UTC)

Merge two base64 encoded strings

no padding - just concatenate

following is my own selection (g and CRLF) one padding = substitute with "g" (32) - magically gives space (32 hex 20) two padding == substitute with "0K" (zero K) (52 10) - magically gives CRLF

Example:

[Man] => [TWFu] TWFuTWFu -> ManMan

[Ma] => [TWE=] TWEgTWFu -> Ma Man

[M] => [TQ==]

TQ0KTWFu -> M(CRLF)Man

KrisK —Preceding unsigned comment added by 69.116.75.194 (talk) 06:46, 19 December 2008 (UTC)

Response to Proposal for merging

This does not seem a good idea. The two subjects, Base 64 and Base64 encoding, are very different. Merging the two would only result in replacing two clear and concise articles with a single unneccessaraly long and confusing article. —Preceding unsigned comment added by 194.221.133.226 (talk) 11:27, 26 June 2008 (UTC) How about changing this article's name to "Base 64 Content Transfer Encoding"? That way there's no doubt. 67.140.218.222 (talk) 20:22, 6 May 2009 (UTC)

This article should reference RFC 1521

Because that is (afaik) the specification for the Base64 encoding we all know and love.

http://www.ietf.org/rfc/rfc1521.txt

Specifically, this article could benefit from inclusion of the table in section 5.2. Base64 Content-Transfer-Encoding, of said RFC that lists all allowed characters. —Preceding unsigned comment added by 84.233.191.62 (talk) 13:16, 26 August 2009 (UTC)

Index Table Formatting

I noticed the Index Table didn't look right, since the last row (containing just the 'pad' character) didn't have the right number of columns and wasn't accounted for in the "colspan" counts. I fixed these issues and it looks better for me. I hope I didn't break it for anyone else -Ilikeimac (talk) 20:22, 18 May 2010 (UTC)

Huh?

Isn't Base64 used in WordPress Malware Attacks? not kidding — ECat200 (talk) 12:01, 31 December 2010 (UTC)

As binary attchments to e-mail are a very common way of spreading viruses, and base64 are used for binary attchments to e-mail, it means that base64 can be associated with computer virus. But that is guilt by association. We could equally well say forbid computers since they can get viruses. base64 is only a method to encode binary data and very useful, so we should not criticize it just for that. --192.138.116.231 (talk) 14:53, 29 April 2011 (UTC)

"encode UTF-16"?

In the UTF-7 section, it says "This data encoding scheme is used to encode UTF-16 as ASCII characters for use in 7-bit transports such as SMTP" — shouldn't it say "to encode UTF-16 "…? I'm not a 100 percent sure, so I didn't change it. --85.177.87.226 (talk) 21:13, 3 October 2011 (UTC)

I am not sure what change you are proposing. UTF-7 is not often used now anyway. -- Q Chris (talk)

Carnal pleasure

The padding example, "Input ends with: any carnal pleasure" is absolutely hilarious, but is it really appropriate for Wikipedia's audience? Should a cleaner example string be selected perhaps?

Buzzert (talk) 07:38, 24 September 2012 (UTC)

You don't like Thomas Hobbes' Leviathan? Anomie 16:08, 24 September 2012 (UTC)

I agree. Some cultures and some people will be uncomfortable with a sexual theme and there's nothing useful about it in the middle of a bits discussion - it's an unnecessary distraction. — Preceding unsigned comment added by Alwhaley (talkcontribs) 15:07, 18 April 2013 (UTC)

I agree, too. It just makes no sense. The purpose of an example is to not be too distracting and to help illustrate something, i.e. in this case the encoding algorithm. Immediately after you look at the silly example you start thinking about sex, because humans are made to be distracted by sex and sexual connotations. Imagine if exam papers, books and all sorts of things had naked men and women on them, for no reason. Imagine studying Computer Vision or something similar, and working with images of nude women, or photographs of sexual intercourse. Why? At that point, the example stops being just an example, and becomes too distracting. I'm curious if the inclusion of this absurd example was an attempt at trolling, or humor. It is kind of funny, but Wikipedia isn't supposed to have crude jokes. I understand that it's taken from a book by Thomas Hobbes, a respected philosopher, however it's taken out of context and to someone (i.e. most people) unfamiliar with this philosopher the message of the quote would be simply "sexual intercourse", which is distracting. We could do the similar with other philosophers, .e.g. Nietzsche or the ancient greeks. We could pull out a quote that appears simply stupid or obscene when taken out of context. There's a reason why people wear clothes nowadays, and conduct their sexual activities in privacy (usually)---it's because sex is distracting and should be done in private, due to all of its distractions and other special qualities. At this point in time and in society we, I think, as a whole, have decided that sex should be done in private (in most of the cases) and be completely separate from specializations of role, such as a job or an academic environment, or a Wikipedia article. Sex is sex, algorithms are algorithms. Picking a sexual quote is like choosing porn images as example in a Computer Vision course. It very obviously communicates something, there's this emphasis on sex which is not direct but it's obviously there and, unless suffering from socially-impairing disorders, it is impossible to drop the subject of sex in an unrelated discussion, in a casual manner. I'm not calling the contributor autistic, I just want to describe why this is an obviously bad mistake, and that unless you are not genetically provided with the ability to discern between sexual situations and academic situations there could be a risk to simply mention sex in an irrelevant way, and do it innocently, not as an attempt to troll. We are here to learn about base64 encoding, not to think about sex, porn, carnal pleasure and other obscenities. Choosing an example should be done by picking a random, neutral quote that best fits the algorithm that is being described. You really have to force yourself to be rude in other to choose a libidinous quote. 188.26.59.246 (talk) 23:35, 23 July 2019 (UTC)

False: padding required for concatenation

> One case where padding characters are required is when multiple Base64 encoded files are concatenated.

That is not true. Concatenate two strings whose length = 0 mod 3 and you will get no padding either way: you will need to somehow delimit (prefix with length, custom separation character, ...). Whichever method you choose, it has to be compatible with base64 strings without padding, thus it can also be applied to non-mod-3 non-padded cases.

The explanation given in the next sentence:

> The 2011 DEF-CON Capture the Flag (CTF) qualifiers[5][dead link] contained a puzzle with a file of concatenated Base64 encoded files.

That is backwards: they (apparently) chose files with a size not divisible by three. Then, yes, bare concatenation works, but not in all cases.

Padding is never strictly necessary.

--46.190.82.146 (talk) 13:12, 26 November 2012 (UTC)

Big Endian

If this is really a WikiProject Computing page and supposed to teach about computer science, then it should mention that embedded in the descripton of converting 8 bit to 6 bit numbers is the assumption that the 8 bit numbers be treated as Big Endian. Wikipedia has a nice web page on that for reference. — Preceding unsigned comment added by Alwhaley (talkcontribs) 15:48, 18 April 2013 (UTC)

Endien-ness of bytes is only of concern to data transmission engineers. As far as computer operations are concerned bytes are atomic. Therefore base64 makes no assumption as to endin-ness, apart from the notational convention that left shift operations increases power and right-shift decreases. -- Q Chris (talk) 16:29, 18 April 2013 (UTC)

Endian-ness matters to the algorithm of extracting 6bit chunks from 8 bit bytes. The first 6 bits extracted are from the "first" 6 bits in the byte- note we are not treating the byte as atomic now. The first 6 bits are on the left in Big Endian format integers. Note that this isn't the format of a byte but of a number. In Little Endian format, the first 6 bits are not only on the right, they come in the reverse order. The Base64 algorithm as defined assumes Big Endian integer format. Data transmission is all to the point since we have to serially process the first 6 bits then the next 6 which come partly from the same byte, and 4 more from the next byte. However this isn't for "data transmission engineers" but for anyone who wants to know how computers work, which goes to my point of being part of the WikiProject Computing pages. The previous commment makes it clear that explanations of these bits of computer science are important. — Preceding unsigned comment added by Alwhaley (talkcontribs) 06:52, 6 May 2013 (UTC)

The algorithm is described in terms of concatenating binary strings, which uses the universal convention that numbers are written with the most significant digits first. You could equally write a definition using mathematical operations that is not bit order dependent. BTW the order of bits in a byte is abstracted to most significant first in every computer; have you ever come across a computer where shift left in either assembler(lsh) or a high level language (<<) divides by two? -- Q Chris (talk) 07:40, 7 May 2013 (UTC)

TOC limit

I removed the {{condense}} template requesting the article to be simplified by combining sections, which had been inserted two years ago. The {{TOC limit}} template near the top of the article takes care of the problem nicely, because the table of contents can be collapsed and hidden if it becomes intrusive, yet the section anchors remain, making it easy to find content. At least, that's the way it works now for desktop computer users; the way mobile device users see it may be different, but that may change in the future, depending on how the Wikipedia mobile interface is implemented. — QuicksilverT @ 15:17, 5 February 2015 (UTC)

Unencoding

@Skintigh Thanks for your edit (diff). I actually checked it carefully using my editor to count the characters and to make a new string with the required ASCII text, and I cannot see how I stuffed it up—I guess I was checking the wrong version. Anyway, it's good now. Johnuniq (talk) 02:39, 20 May 2015 (UTC)

Error regarding '=' characters in decoding, and bug in decoding algorithm

This is wrong: "A single '=' indicates that the four characters will decode to only two bytes, while '==' indicates that the four characters will decode to only a single byte". When encoding, the most significant 4 bits of the second byte can be set while still resulting in two padding characters, so conversely, when decoding, a single '=' may still require a second byte, depending on whether the least significant 4 bits of the preceding character were set.

The Java code erroneously discards the most significant 4 bits of the last character if there are two padding characters. It also discards the last 2 bits if there is a single padding character.

The discussion on decoding is comprehensively infected with this error and needs to be completely rewritten, including the examples and the discussion of the no-padding case.

Wrong formula for size of output data for Javascript encode code

The Javascript code computes: StringBuilder out = new StringBuilder((in.length * 4) / 3);

But the correct formula should be (((in.length+2)/3)*4) to get the ceiling of (in.length/3)

in-len original formula expected
0 0 0
1 1 4
2 2 4
3 4 4
4 5 8
5 6 8
6 8 8

Zyxxel (talk) 19:18, 9 March 2016 (UTC)

The C code does not compile

I found that the C code was weird, so I tried to compile it: it doesn't under gcc and clang.163.5.121.48 (talk) 19:14, 15 September 2017 (UTC)

Removed "Coding Paradigms" section

I've removed the "Coding Paradigms" section. For an article such as this one, pseudocode seems like it would be the most appropriate way to demonstrate the concepts. This article does not need an in depth discussion of the trade offs of various C and JavaScript implementations. Those discussions should happen in language-specific forums.—C45207 | Talk 23:17, 17 November 2017 (UTC)

I've removed this section again, as it had been restored by 216.172.42.58 without substantive improvement. For example code, we should have something more like "An implementation of a base64 encoder in pseudocode is" followed by pseudocode. The external links section would be a good place to link to multiple different implementations in various languages. For reference, the removed text begins "As an example of a small naive program stub to improve time performance (but not space), consider these quasi C code base64 functions...". —C45207 | Talk 02:42, 30 November 2017 (UTC)

The name of the Article should be changed to avoid confusion

I initially clicked the article thinking it would be only about the number system and was confused when I found out otherwise. I would like to suggest the title be changed to something along the lines of "Base64 Encoding" to avoid confusion --Herecomesgibson (talk) 22:18, 18 July 2019 (UTC)

Seldom-used Base16?

“RFC 3548 […] attempts to unify […] the seldom-used Base32 and Base16 encodings.” Base16 is hexadecimal. By what standard is hex “seldom used”? Doug Ewell (talk) 20:40, 11 October 2019 (UTC)

Isn't RFC 4648 more relevant? Regarding your last question, hexadecimal is probably much more often mentioned and used than Base64. Google statistics would probably "confirm" this. BernardoSulzbach (talk) 00:25, 14 October 2019 (UTC)
I think it is ambiguous grammar, the "seldom used" referring to base32 only, not base32 and base16. I will try to make it clearer -- Q Chris (talk) 09:08, 14 October 2019 (UTC)

Semi-protected edit request on 30 October 2021

I want to become a administrator De123abcedx (talk) 04:58, 30 October 2021 (UTC)

Please answer the questions on your talk page first. ClaudineChionh (talkcontribs) 05:26, 30 October 2021 (UTC)

Base 64 alphabet

In case there are future disagreements about the alphabet used for encoding, I am recording what Base64#Variants summary table says:

The above links define two alphabets:

  • base64 A–Za–z0–9+/ (standard "base64")
  • base64url A–Za–z0–9-_ (a variant on the standard)

These alphabets are defined with A representing 0, B representing 1, ... 0 (zero) representing 52 etc.

Some recent edits were reverted as they erroneously changed the order. Similar edits occurred at Base62 where the situation is much less clear because there is no standard encoding. Johnuniq (talk) 06:18, 8 November 2021 (UTC)