This is the talk page for discussing improvements to the Byte article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
Archives: 1 |
This level-4 vital article is rated B-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||
|
Wiki Education Foundation-supported course assignment
editThis article was the subject of a Wiki Education Foundation-supported course assignment, between 14 January 2020 and 15 May 2020. Further details are available on the course page. Student editor(s): Nakanob.
Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 16:28, 16 January 2022 (UTC)
Decibel is not an SI unit
editIn the section titled "Unit Symbol" there is an entire paragraph explaining that the symbol 'B' is the SI unit of the bel. This is not true.
Although often used with SI prefixes - e.g. decibel(dB) - the bel itself is not, nor can it ever be an SI unit itself, it is a dimensionless ratio. Because it is dimensionless, it is often necessary to indicate how it was calculated by adding an appropriate suffix (e.g. dBi, dBm) in order to make meaningful comparisons.
http://physics.nist.gov/cuu/Units/outside.html
JNBBoytjie (talk) 10:38, 13 April 2016 (UTC)
- Thank you. Done. Kbrose (talk) 14:17, 13 April 2016 (UTC)
C 'char' type
editAFAICS 'unsigned char' is right. Based on the C Standard, (signed) char need only hold values between -127 and 127 inclusive, in other words only 255 distinct values. If you want a guarantee of 256 distinct values you need unsigned char. Ewx (talk) 08:04, 25 August 2016 (UTC)
- It is not necessary to specify unsigned char. The C standard already mandates that a value stored in a char is guaranteed to be non-negative. A signed char is an integer type and has to be declared, not an unsigned char. Kbrose (talk) 11:50, 25 August 2016 (UTC)
- I have two problems with this concept....:
- What happened to -128? (0x80)
- What version of the C standard? The original compiler I was using in the 80s (Turbo-C 1.0, 1.5 and 2.0) had by default 'char' being signed. But you could configure the compiler to make it unsigned by default. That was obviously before the current C standard... Dhrm77 (talk) 13:51, 25 August 2016 (UTC)
- Formally speaking there is only one C standard, ISO/IEC 9899:2011; the rest have been withdrawn, as you can see on the ISO website. That doesn't stop people referring to older revisions (or drafts, given the excessive cost of the current version), or implementing them, though. In this case however, the question is irrelevant: all versions of the C standard permit char to be either a signed or an unsigned type. As for 'what happened to -128', the point is to permit a variety of representations of signed types; there's more to the world than x86 and two's complement. Ewx (talk) 08:05, 26 August 2016 (UTC)
- Which C standard? Is the specification for char the same in C89, C90, C99 and C11? If not, the article should reflect that. Shmuel (Seymour J.) Metz Username:Chatul (talk) 18:45, 25 August 2016 (UTC)
- No, char is not guaranteed to be unsigned. C99 and n1570 are completely explicit about this (6.2.5#15). Ewx (talk) 07:57, 26 August 2016 (UTC)
For the record, this is the text of 6.2.5#3:
- An object declared as type char is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative. If any other character is stored in a char object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type.
Kbrose (talk) 11:57, 26 August 2016 (UTC)
- You're reading the wrong bit. That just tells you that certain characters have non-negative representation in a char. It does not tell you that the type itself is unsigned. Once again, see 6.2.5#15 for text that is actually relevant here. Ewx (talk) 18:50, 26 August 2016 (UTC)
- I have to agree with Ewx, I think it just says that if you need to store non-negative values, you can, if you want to store something else, it depends on the implementation, which is a way of saying that they are not defining if a char is signed or unsigned. It even opens the door to implementing a range of -64 to +191 if you wanted to, instead of the classic -128 to +127 or 0 to 255. Dhrm77 (talk) 22:02, 26 August 2016 (UTC)
- -64 to 191 would be forbidden (SCHAR_MIN must be at most -127) but -127 to 127 is permitted (and realistic for 1s complement machines); so is -2147483648 to 2147483647 or 0 to 4294967295 (and realistic for word-addressed machines).Ewx (talk) 07:25, 27 August 2016 (UTC)
Merge Octet (computing) into Byte
editOctet (computing) should be merged into Byte as Octlet (computing) is another name for Byte and Wikipedia does not have two articles for two names of the same thing (instead both are mentioned in the WP:LEAD and the article's title is at the WP:COMMONNAME). -KAP03(Talk • Contributions • Email) 22:47, 26 March 2017 (UTC)
- Oppose. I see your point, but I'm afraid that merging may blur the differences between the terms even more (than how they are confused in the current articles). Byte, Octet, Octlet (and the not mentioned Octad) are related but not multiple names for the same thing in general. They need to be distinguished carefully and we should rather improve/sharpen the articles emphasizing their differences.
- While byte is today understood as referring to a group of 8 bits most often, without context it does not define a specific count of bits. Historically, a byte was defined as any group of bits from 1 to 6 (with 5 and 6 bit being the most commonly used forms even at that time). Later it was defined as the group of bits necessary to hold a character, that is 5 to 8 bits. With the advent of micro-computers in the late 1970s / early 1980s, this shifted towards meaning 8 bits by default. Therefore, byte is a platform-specific term.
- Octet, however, was specifically defined to avoid the ambiguity of the term byte, and always means 8 continous bits, regardless of context and platform. That's why octet (rather than byte) is the term used in formal definitions of f.e. network protocols, in the telecommunication industry, etc.
- Octad is a term similar to octet, however, it has fallen into disuse in recent decades and is not in common use any more. Like octet, it specifically means 8 bits as well, however, it looks at them from the angle of how many bits are necessary to define 129 to 256 states in coding (at least this is what I draw from the usage of similar terms like tetrads and pseudo-tetrads). Looking from that angle it appears as being don't care if those 8 bits holding the state are grouped together physically.
- Octlet (per IEEE 1754) means 8 octets or 64 bits, so it is clearly different from octets.
- --Matthiaspaul (talk) 12:31, 27 March 2017 (UTC)
- oppose - No, they are not two names for the same thing, but names for different things. As a side note, the VFL instructions of Stretch could specify any byte size from 1 to 8 and 12 was a common byte size for CDC users. Shmuel (Seymour J.) Metz Username:Chatul (talk) 19:22, 28 March 2017 (UTC)
- Oppose - There has been significant discussion of the distinction on the respective article talk pages. Bytes have not always been 8 bits. An octet is defined as 8 bits. Bytes are used in processors. Octets are used in communications. It is probably possible to cover both in a single article but that would not be a trivial merge and I'm no convinced that what we'd end up with would be an improvement over current coverage. If someone wants to create a sandbox version of a merged Byte article, I'd be happy to assess in more detail. ~Kvng (talk) 18:34, 9 April 2017 (UTC)
- Support - I think the articles should be merged. It's true that byte has meant other things in the past, but the modern definition (according to the International System of Quantities) is 8 bits. Dondervogel 2 (talk) 18:50, 9 April 2017 (UTC)
- Oppose - The terms don't always represent the same thing and are used for different purposes.Jko831 (talk) 19:37, 21 September 2017 (UTC)
Architectural support for byte sizes other than 8
editOff the tope of my head, these machines come to mind as supporting byte sizes other than 8
- CDC 3600 and 3800
1 to 48 bits - DEC 36-bit machines
1-36 bits - GE, Honeywell and Bull 36-bit machines
6 or 9 bits - RCA 601
3, 4, 6, 8 or 24 bits - UNIVAC and Unisys 36-bit machines
6, 9, 12 or 18
Shmuel (Seymour J.) Metz Username:Chatul (talk) 18:41, 29 March 2017 (UTC)
- Shmuel, do you actually mean bytes in this context? I recall the larger sizes to be called words. If you can, please provide some refs, it would be great if we could track this down to historic sources in order to improve the article.
- --Matthiaspaul (talk) 00:37, 30 March 2017 (UTC)
- Yes, I actually mean byte and both CDC and DEC used the word byte as part of the instruction names, e.g., Deposit Byte. Two easy citations from bitsaver are
- 3600 Computer System Reference Manual (PDF), CDC, October 1966, 60021300
- Book1 Programming with the PDP-10 Instruction Set (PDF), PDP-10 System Reference Manual, CDC, August 1969, 60021300
- Shmuel (Seymour J.) Metz Username:Chatul (talk) 20:40, 3 April 2017 (UTC)
- Another more recent example would be the Nintendo 64 with 9-bit bytes. 2003:71:CF10:FD00:A843:F00A:C1FE:7F1F (talk) 20:40, 23 September 2018 (UTC)
- I can't find any information supporting the claim that the N64 used 9 bit bytes. It used a 64 bit NEC VR4300 CPU using the MIPS architecture, an 8 bit per byte architecture. (I doubt Nintendo and NEC would make a unique 9 bit variant of the MIPS architecture)
- However, it did use a 9 bit wide memory bus using the Rambus protocol. https://en.wiki.x.io/wiki/RDRAM
- But I haven't found info about if this were a simple parity bit, (Ie, not data and therefor not part of a byte.)
- Or if it used it for data, so that instead of sending 64 bits in 8 cycles, it could reduce it to 7.111 cycles. (The bus do make burst reading/writing, so it could then provide more bandwidth. Since a bus can't have "0.111" cycles. But those extra 7 bits in the last cycle could then instead be used for the next set of data, bringing overall throughput up slightly.)
- I have found two conflicting sources that state 500 MB/s (without even mentioning bits/byte) vs 562.5 MB/s (proclaiming 9 bits per byte for the whole architecture), both can't be right. But personally I suspect it to be a parity bit since that makes the most sense, since why make an oddly complex bus implementation. And the source claiming 9 bits/byte is also the source claiming 562.5 MB/s, which it wouldn't be if it were 9 bits/byte, and the source claims to take its data from Wikimedia which only has conflicting data, so obviously it isn't a good source.
- Regardless memory bus width isn't the same as byte size. The majority of modern computers have 64 or 32 bit memory buses, but are still most often 8 bit per byte architectures.
- In the years I spent studying computer architectures I have anecdotally concluded that a byte is likely most easily defined as either:
- A. The simple approach is that the byte is the number of bits one moves along in memory with each address increment. (Ie, if we add 1 to the address and move 14 bits in memory, then it is a 14 bit per byte architecture. But some architectures moves 1 bit at a time, and that isn't really a group of bits.)
- B. The nuanced approach is that a byte is the smallest group of bits we can interact with in memory using the fewest general purpose instructions. (In a Load/Store architecture this would be the smallest number of bits the load/store instructions can provide as a group. For architectures not using registers, then it is the width of our "narrowest" instruction. And it is more debatable for some CISC architectures with application/encoding/protocol specific instructions that at times access arbitrary number of bits all over the place. But removing obvious hardware accelerators from what one considers "the architecture" could be a wise move here, since such are usually far from general purpose and doesn't really meaningfully define the architecture as a whole.)
- And parity bits (and other error correction schemes) should generally be ignored when talking about byte size. Since firstly they describe data and isn't additional data in themselves, and secondly are often hidden from the rest of the instruction set.
- But in the end, it is somewhat hard to make a definition for a byte, since architectures can be rather nuanced and approach the problem of data processing from so many odd angles. (some old computer systems even handled bits/bytes at different sizes for instruction vs data, only adding further nuance to the debate.) Nystemy (talk) 18:43, 24 July 2024 (UTC)
- One article claims the N64's "9th bit is normally used for anti-aliasing and z-buffering and is normally "hidden"." RastaKins (talk) 19:14, 24 July 2024 (UTC)
- The size of a memory bus and the size of a byte are unrelated issues. All of these machines had byte sizes down to one bit, but none had a memory bus that small:
- Burroughs B1700 (1-15)
- CDC 3600 (1-48)
- DEC PDP-6 (1-36)
- IBM 7030 (1-8)
- In some cases the architecture only supports accessing bytes entirely contained within a word. -- Shmuel (Seymour J.) Metz Username:Chatul (talk) 19:31, 24 July 2024 (UTC)
- I would argue that article makes an odd claim. 9 bits isn't much for Z buffering, 512 levels to differentiate depth over is exceptionally crude. And yet again the question is why make such an exceptionally odd variant of the MIPS architecture?
- The rest of that wiki having the exact quote generally talks about the VR4300 CPU as having 8 bits/byte, and never even mentions 9 bits again. (It is like the author got confused over why the memory bus is 9 data bits wide and made something up on the spot. Meanwhile, parity bits are common as mud on buses having an extra bit or two, since error detection is kinda useful, even if it isn't correctable. Though, with only a parity bit it could be the parity bit itself being corrupted.)
- Then we have these four systems.
- B1700 couldn't find much information about it. (source?)
- The CDC 3600's documentation just states it works with 48 bit words and never even mentions something smaller. (Unlike the two systems bellow that makes it rather clear in their documentation, almost as if it is a sales argument. (The CDC documentation does talk about serial interfaces, but that is irrelevant.))
- The PDP-6 seems to be variable byte size. Since its basic load/store instruction can access an arbitrary amount of bits from a word from any arbitrary position within said word. So it is flexible. 1-36 bit per byte check out.
- The IBM 7030 also uses bit level addressing of words and can likewise load/store an arbitrary number of bits at once.
- But in general, if one needs to append additional instructions beyond one's general load/store instructions to work with smaller groups of bits, then it isn't one's byte size. While at one's byte size one wouldn't need these additional instructions.
- The two later systems are flexible in that regard. Most modern architectures meanwhile usually only allow one to specify how many bytes one wants, often only binary weighted increments as well. Nystemy (talk) 11:05, 25 July 2024 (UTC)
B1700 couldn't find much information about it. (source?)
[1][2][3]The CDC 3600's documentation just states it works with 48 bit words and never even mentions something smaller.
[4]- I believe that these references will suffice. -- Shmuel (Seymour J.) Metz Username:Chatul (talk) 12:49, 25 July 2024 (UTC)
References
- ^ "Index of /pdf/burroughs/SmallSystems/B1000". Retrieved July 25, 2024.
- ^ "Index of /pdf/burroughs/SmallSystems/B1000/B1700". Retrieved July 25, 2024.
- ^ Burroughs B 1700 SYSTEMS REFERENCE MANUAL (PDF) (Preliminary ed.). Burroughs Corporation. 1972. 1057155. Retrieved July 25, 2024.
- ^ "Variable Data Field" (PDF). 3600 Computer System - Reference Manual (PDF). Control Data Corporation. October 11, 1966. pp. 3-42–3-45. 60021300K. Retrieved July 25, 2024.
Status of error checking bits
editThe IBM 7030, for which the term byte was coined, did not include error checking bits as part of a byte. Nor did the DEC PDP-6, the CDC 3600, or any of the other computers with the ability to access bytes of various size. The System/360 Principles of Operation contains the text "Within certain units of the system, a bit-correction capability is provided by either appending additional check bits to a group of bytes or by converting the check bits of a group of bytes into an arrangement which provides for error checking and correction (ECC). The group of bytes associated with a single ECC code is called an ECC block. The number of bytes in an ECC block, and the manner in which the conversion or appending is accomplished depend on the type of unit involved and may vary among models." Accordingly, I call for the reinstatement of the text "The byte size designates only the data coding and excludes any parity or other error checking bits." Shmuel (Seymour J.) Metz Username:Chatul (talk) 21:30, 21 June 2017 (UTC)
- I removed that statement because the lead is supposed to summarize the important points in the article body. The statement I removed, (a) is not covered in the body (b) may not be one of the more important points about the topic. I am not at all opposed to including this information in the article body and once that is stable, we can consider it for inclusion in the lead. ~Kvng (talk) 14:51, 24 June 2017 (UTC)
"Octad"
editIs the origin unclear?
The article currently states:
"The exact origin of the term is unclear, but it can be found in British, Dutch, and German sources of the 1960s and 1970s, and throughout the documentation of Philips mainframe computers."
Surely this is just the eighth member of the sequence which starts "monad", "dyad", "triad", ie a group of eight things (looking toward Greek). "Octet" and "octad" appear similar because the Latin and Greek cardinal number 8 both have the same form (octō, ὀκτώ). Compare e.g. "quintet" vs "pentad" for a group of 5.
Of course the correct term would be an ogdoad (from the genitive of the ordinal) but not everyone who wants to use precise, technical language also knows Greek.
If the question is about who first used the term in its computing sense, that may be unanswerable because it probably slipped in from an earlier technical or mathematical sense. –moogsi(blah) 23:46, 23 October 2018 (UTC)
… representing a binary number
editoRLY?
Anybody having a programming experience—even amateur—knows that bytes more frequently do not represent numbers (serving as opcodes, parts of bitmaps or compressed data…) than do represent numbers explicitly. Even for such complicated number format as IEEE 754 it wouldn’t be helpful to think of every isolated byte as of a sensible numerical value. Objections against complete removal? If any, then change to “are capable of representing a binary number” maybe? Incnis Mrsi (talk) 10:06, 27 July 2019 (UTC)
- I'm fine with that. You can change it to “are capable of representing a binary numbers”. Vmelkon (talk) 02:10, 12 February 2021 (UTC)
- I completely agree for the purposes of this article, the following is just being pedantic; Opcodes are a small part of the actual instruction, most of which is just a bunch of numbers that can be directly read on most architectures and for most instructions. For architectures that either limit immediates to 8 bits or actually are 8 bits, the bytes in program code are very likely to have direct meaning as numbers. Sure, you might have to read them as nibbles and mentally stick an "r" in front of them if the thought of assigning numbers to other numbers that represent offsets in sram adds too much confusion on top of the fact that you're trying to read raw instructions, or if you're on ARM calculate the sign bit from a bitwise operation on 3 bits spread around the T4 encoding of the branch instruction because the people designing the instruction set were high on horse tranquilizers that day, but they're numbers with direct meaning as such. The opcode itself is just a number too, although you kinda have to squint in some cases, but if you tell an x86 to 233 it's going to 233, damnit. :D --A Shortfall Of Gravitas (talk) 03:47, 5 August 2021 (UTC)
Unit Multiples
editIs this really correct? "100 gigabytes is specified when the disk contains 100 billion bytes (93 gibibytes) of storage space." The whole section has no attribution, so I can't check it. But I had always gathered that the difference was available storage space on formatted vs un-formatted disk. - Tsuchan (talk) 12:42, 27 May 2020 (UTC)
- This should not need attribution anymore, is not an opinion, but straight forward application of prefixes. But I clarified the statement in a more verbose form. - Kbrose (talk) 15:08, 27 May 2020 (UTC)
- More on the topic: [ https://en.wiki.x.io/?diffonly=1&diff=prev&oldid=1105427872 ] (# My own explanation on the background; currently on this page). - MasterQuestionable (talk) 06:58, 22 September 2022 (UTC)
Proposed merger
editIs it absolutely necessary for kilobyte, kibibyte, megabyte, mibibyte, gigabyte, gibibyte, etc. to all have their own individual wikipedia pages that all say the exact same thing? Can't we put the information on one page, and have all those terms redirect to that one page? — Preceding unsigned comment added by 73.70.13.107 (talk) 10:33, 14 October 2020 (UTC)
- Wikipedia has no problem with duplication and redundancy not least because it is not paper. As long as each article is properly sourced and notable, there's no reason to replace them all with one "mega sized" article. QuiteUnusual (talk) 11:52, 14 October 2020 (UTC)
- @QuiteUnusual: The merge suggestion has merit precisely because the articles don't sum up to a massive article. They're 95% (99%?) the same article—just at different levels of development because some get more attention than others.
- Clearly the hypothetical merge destination would not be Gigabyte, so I'm moving the discussion here to the talk page of Byte. I see that the IP editor making the original proposal tried to put this plan into motion already and selected Binary prefix for the redirect, but I think that may be confusing the thing with the name. Millilitre redirects to Litre § SI prefixes applied to the litre, not Metric prefix, and that strikes me as appropriate.
- Either way, this is a change worth discussing first, and care will need to be taken to preserve the best citations, but yes, I think consolidation is a good way to deal with the existing mess of nearly identical articles. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 14:03, 14 October 2020 (UTC)
- I just made an effort to gather some of the best content from the relevant articles and centralized it here. A lot of what was available is dated and esoteric (“In 2013, one expert estimated that the "amount of data generated worldwide" would reach 4 zettabytes by the end of the year”) and of questionable value.
- The biggest concern now might be how redirects and disambiguation are handled. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 18:20, 10 December 2020 (UTC)
- I made a earnest attempt scrape as many good bits from the 16 redirected articles as possible. Some genuinely interesting facts and citations existed in only one or two of the articles, so I’m hopeful that the consolidation will make those nuggets more findable to readers. There was a lot of cruft to sift through—and I had to make judgment calls—so if anyone wants to go through the trimmings for anything I overlooked, that would be welcome. Cheers —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 02:34, 11 December 2020 (UTC)
- There has been no consensus for these actions, so I have reversed them. The articles have been more or less stable for years, with only occasional curiosity about a single article. There is not mess, and separate articles introduce no problems for WP policies or maintenance. It is good to have places for unit-specific content and that shows in some of the articles. kbrose (talk) 14:59, 11 December 2020 (UTC)
- @Kbrose: I’m not arguing that the existence of the 16 articles I redirected violates policy, but policy violations are not the only reason for advantageous merging. I’m citing two reasons:
- Fragmentation dilutes editor attention. Byte is still rated C-class despite having existed since 2001. The other 16 articles are Start-class dumping grounds for poorly curated trivia. None of these 16 articles is improving at a rate we should be proud of. Take Kilobyte: it’s been something like 22 months since the last substantive improvement (it was yours, and I thank you!); most edits are just fighting entropy. When I tagged a dubious statement there in June, I ended up waiting through three months of silence before removing the statement. This discussion has existed for two months and received no feedback. The stability you cited is not a compelling argument for keeping bad content.
- Purposeful redundancy is okay; gratuitous redundancy is not. I’ve noted above that Litre and its many prefixed variants are successfully consolidated in one article. The same is true for Newton, Decibel, and most units. Is there some good reason that this model isn’t appropriate for Byte? I will note that Gram exists along side Kilogram and Metric tonne but not an exhaustive set of articles, so there is precedent for a middle ground. I don’t see any justification for Kilobyte’s existence within the current version of the article, but it’s not impossible for me to imagine. I doubt Zebibyte will ever need a standalone article.
- I don't think there was a truly compelling reason to revert my edits en masse, and I think we should re-implement the redirects, but I’m open to considering the 16 affected articles on a case-by-case basis. Making Byte a better article, though, should be the highest priority. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 16:43, 11 December 2020 (UTC)
- @Kbrose: I’m not arguing that the existence of the 16 articles I redirected violates policy, but policy violations are not the only reason for advantageous merging. I’m citing two reasons:
- There has been no consensus for these actions, so I have reversed them. The articles have been more or less stable for years, with only occasional curiosity about a single article. There is not mess, and separate articles introduce no problems for WP policies or maintenance. It is good to have places for unit-specific content and that shows in some of the articles. kbrose (talk) 14:59, 11 December 2020 (UTC)
- I made a earnest attempt scrape as many good bits from the 16 redirected articles as possible. Some genuinely interesting facts and citations existed in only one or two of the articles, so I’m hopeful that the consolidation will make those nuggets more findable to readers. There was a lot of cruft to sift through—and I had to make judgment calls—so if anyone wants to go through the trimmings for anything I overlooked, that would be welcome. Cheers —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 02:34, 11 December 2020 (UTC)
- I don't have any strong objections to a merge but clearly some more discussion needed. A single article is fairly standard in other cases, although in this case the prefixed units, especially kB and MB, are the ones in more common use which could muddy the issue of an article title. Perhaps formally tagging the articles for merger would generate a little more interest in this discussion. Some musings on a talk page for a few weeks, with no explicit support and some scepticism, is hardly good support for a fairly dramatic action. A formal merger proposal without clear opposition, on the other hand, could be interpreted as consensus. Lithopsian (talk) 15:12, 11 December 2020 (UTC)
- Just a pre-vote, but I would definitely support merging anything about say, petabyte. Units not in widespread usage, ones that most people wouldn't recognise, are hardly notable in themselves. Lithopsian (talk) 15:15, 11 December 2020 (UTC)
- The trend is usually to divide content to more specific topics. Even the small articles in this series have value, because they quickly point the user to a specific definition without having to sort through a lot of information. This has become more important since automated services and devices exist, increasingly, such as the Google assistants and apps, that pull up specific definitions from WP for key words and topics and read them to the user. This is also a good reason to not bunch parenthesized comments right after the key word or article title with lots of pronunciations and in this case unit symbols. Short, clear, specific sentences have more impact. kbrose (talk) 15:36, 11 December 2020 (UTC)
- There are lots of cases where [[foo]] is a redirect to [[bar#baz]]. — Preceding unsigned comment added by Chatul (talk • contribs)
- I’m also a big believer in making the first two sentence count for all they can because they’re what Google’s Knowledge Graph harvests and regurgitates. I wouldn’t go so far, though, as to agree that Wikipedia needs to adopt dictionary-like fragmentation just to accommodate Google. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 16:50, 11 December 2020 (UTC)
- The Litre, Newton, Decibel, are convincing examples, and, per consistency, I am leaning for a merger. The pages were so similar that creating the 16 pages looked merely as a programming exercise about automated article writing. And fragmentation is trully a editor waste of time: I recently made manually 14 similar edits to see them all reverted at once. QuiteUnusual reply has given no reason why to keep separate articles: stating that there was no reason to merger is not a reason. Neither has kbrose, stating that it was the status quo and that dividing was trendy. So, I am for a merger. I would also add that, giving the user a complete page is like teaching to fish, while the single article for each suffixed unit is like giving one a fish. Teaching is encyclopedic. --Robertiki (talk) 03:37, 12 December 2020 (UTC)
- In addition to Metre, we have separate articles for Kilometre, Millimetre, Micrometre, Nanometre, Picometre and Femtometre. While I can see a case for keeping Kilometre and (maybe) Millimetre, all others seem frivolous to me because so much of the information is duplicated (making them difficult to maintain), and we could better serve our readers by redirecting those to Metre. The same reasoning applies to multiples of byte. I suggest the following: Let's make a single Byte article that addresses the concerns raised against merging, and then review whether we still need Kilobyte, Megabyte, etc? Dondervogel 2 (talk) 09:46, 12 December 2020 (UTC)
- @Dondervogel 2: Do you have specific suggestions for changes to Byte you would want to see implemented before a merger? I think all concerns raised thus far are either about procedure or the relative benefits of keeping the diaspora of articles so as to better mesh with Google (and maybe Wikidata?). As I’ve said, I’m interested in making Byte better, so I’m interested in ideas for improvements. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 14:41, 12 December 2020 (UTC)
- Moved to #Addressing ambiguous definitions of megabyte and gigabyte
- In addition to Metre, we have separate articles for Kilometre, Millimetre, Micrometre, Nanometre, Picometre and Femtometre. While I can see a case for keeping Kilometre and (maybe) Millimetre, all others seem frivolous to me because so much of the information is duplicated (making them difficult to maintain), and we could better serve our readers by redirecting those to Metre. The same reasoning applies to multiples of byte. I suggest the following: Let's make a single Byte article that addresses the concerns raised against merging, and then review whether we still need Kilobyte, Megabyte, etc? Dondervogel 2 (talk) 09:46, 12 December 2020 (UTC)
- The Litre, Newton, Decibel, are convincing examples, and, per consistency, I am leaning for a merger. The pages were so similar that creating the 16 pages looked merely as a programming exercise about automated article writing. And fragmentation is trully a editor waste of time: I recently made manually 14 similar edits to see them all reverted at once. QuiteUnusual reply has given no reason why to keep separate articles: stating that there was no reason to merger is not a reason. Neither has kbrose, stating that it was the status quo and that dividing was trendy. So, I am for a merger. I would also add that, giving the user a complete page is like teaching to fish, while the single article for each suffixed unit is like giving one a fish. Teaching is encyclopedic. --Robertiki (talk) 03:37, 12 December 2020 (UTC)
- The trend is usually to divide content to more specific topics. Even the small articles in this series have value, because they quickly point the user to a specific definition without having to sort through a lot of information. This has become more important since automated services and devices exist, increasingly, such as the Google assistants and apps, that pull up specific definitions from WP for key words and topics and read them to the user. This is also a good reason to not bunch parenthesized comments right after the key word or article title with lots of pronunciations and in this case unit symbols. Short, clear, specific sentences have more impact. kbrose (talk) 15:36, 11 December 2020 (UTC)
- Just a pre-vote, but I would definitely support merging anything about say, petabyte. Units not in widespread usage, ones that most people wouldn't recognise, are hardly notable in themselves. Lithopsian (talk) 15:15, 11 December 2020 (UTC)
- My two cents: we should keep a particular multiple split when it has enough examples, history, etc that justifies a separate article. Example: "Apollo Guidance Computer computer had Kilobytes of RAM memory". If the multiple has only a couple of those examples, then that's a case for merge; if it has 4~5 (or more), plus a history with more than a paragraph, etc, then that's a case for split. In my opinion. Imagine having KiB, MiB, TiB each one with its own example, history, etc all in the same article; it'd be a mess. And I'd guess that there's a lot of encyclopedic history to be told in many (if not all) of the multiples (at least up to Terabyte), considering the rapid evolution and the impact of digital systems in human history.
- For maintainability issues, we can try the template
{{excerpt}}
in case it's not being used already (I didn't check). Feelthhis (talk) 16:17, 12 December 2020 (UTC)- There is another way of reaching the intended purpose this discussion. I would support developing one article into a full treatment of all units and their histories and relationships, but at the same time keeping minimal versions of each separately, consisting only of a concise definition (to be used in Google fact finds) and a link to the general description of the whole set. They would not be redirects, but bare definitions with a reference for background. −Woodstone (talk) 16:24, 12 December 2020 (UTC)
- A lone definition with…what? A warning to editors not to add more? Is there any precedent for this hybrid organization? —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 17:01, 12 December 2020 (UTC)
- There is another way of reaching the intended purpose this discussion. I would support developing one article into a full treatment of all units and their histories and relationships, but at the same time keeping minimal versions of each separately, consisting only of a concise definition (to be used in Google fact finds) and a link to the general description of the whole set. They would not be redirects, but bare definitions with a reference for background. −Woodstone (talk) 16:24, 12 December 2020 (UTC)
A week has passed since the merger and subsequent reversion, and several days have passed since the last contribution to this conversation. My takeaways are these:
- For all eight binary units and the five decimal units larger than Gigabyte, there seems to be clear support for redirecting.
- The primary argument for retention of all 17 articles is to establish a definition for each word. This justification is at odds with point 1 at WP:NOT#DICT, and there seems to be a general consensus among conversation participants that this is not the right approach.
- Dondervogel 2’s primary concern was that the expanded section of Byte be up to the task of clearly explaining the multiple systems. Dondervogel 2 and I tag-teamed to improve that section significantly, and I think it’s currently in much stronger shape than the corresponding section in any of the 17 articles has ever been.
- For Kilobyte, Megabyte, and Gigabyte, there seems to be a recognition that the articles are problematic but not a consensus on a solution.
- The strongest argument for not merging everything to Byte, raised by Lithopsian, is that in everyday life, kB and KB are more common than B.
- This could justify retaining two articles at Byte and at Kilobyte OR retaining a single consolidated article at Kilobyte instead of at Byte. I take the former option more seriously because Byte remains the best article title. Its definition is settled, and the word itself is a constituent part of all the unit names (not just ‘Kilobyte’ and ‘Megabyte’ but ‘Kibibyte’ and ‘Mibibyte’ too). The average reader is going to be able to best understand the relationship between units when the article’s starting point is the base unit.
- Feelthhis noted that
{{excerpt}}
might be able to help keep quality up in satellite articles.
- The strongest argument for not merging everything to Byte, raised by Lithopsian, is that in everyday life, kB and KB are more common than B.
Unless it is felt that formal proposal is genuinely needed, I intend to re-implement the 13 supported redirects in the next day or so, to keep improving Byte, and to give further consideration to what form Kilobyte, Megabyte, and Gigabyte might take to best serve Wikipedia’s readers. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 13:50, 17 December 2020 (UTC)
- While not objecting to jameslucas's proposal, I wonder whether we need two more articles Decimal multiples of byte and Binary multiples of byte, that would each describe a subset of multiples as indicated by their titles. I also wonder whether a similar exercise is needed for Kilobit, Megabit, Kibibit etc. Dondervogel 2 (talk) 14:10, 17 December 2020 (UTC)
- I don't think creating articles with titles that are unlikely search terms helps Wikipedia users as much as it might satisfy us. It also makes it less obvious where kilobyte, megabyte, etc. should redirect. Lithopsian (talk) 15:26, 17 December 2020 (UTC)
- Again, not an objection so far as this proposal goes. I think we may be underestimating the purpose and value of redirects here. Whether the content for, say kilobyte, is in byte or a separate article called kB or whatever may cause sleepless nights for us editors but is of little consequence to the average reader. They can search, link, or otherwise open something about "kilobyte" and will get information about kilobyte, albeit in an article titled "byte". Bolding of common synonyms in the lead, or giving them sections or anchors, avoids confusion and everyone is happy. Separate articles are only really needed when there is sufficient distinct text to make that article useful and where it would otherwise overload or unbalance the parent article, or where multiple semi-distinct topics just make a single article unwieldy. I'm not convinced any of these cases apply here. The article is not overly huge and the child articles contain little distinct information, largely repeated across each one, and easily mergeable. Perhaps they could be expanded, but the distinct information in each one at this point would barely be worth a section in a parent article. Perhaps the question to ask is "if they were all in a single article today, would we want to do a WP:SPLIT?" Nothing to stop them being split again in the future if it becomes useful. Lithopsian (talk) 15:26, 17 December 2020 (UTC)
- JamesLucas, in your first attempt of merging I notice that, from all the content from Exabyte#Usage examples and size comparisons, you brought only the phrase "global monthly Internet traffic in 2004" to your merge. Is this how things will be handled? If that's going to be the tone of this process, then I suggest starting a deletion discussion prior the removal, for respect to the editors who put their time and effort writing all the valuable content that is to be removed. Please take this into consideration before starting the process. Feelthhis (talk) 00:05, 18 December 2020 (UTC)
- @Feelthhis: It’s undoubtedly helpful to have a few real-world examples to illustrate for the reader the relative sizes of these units, which is why I created Byte § Practical examples and harvested the best examples I could find. Dondervogel 2 has already improved it, and I’ll keep trying to expand it. (And I’d be open to adding more tiers—10 kB, 100 kB, 10 MB, etc.—if you think it’d help readers.)
- With that said, I think it’s essential to observe that the vast majority of examples and comparisons present in the 16 articles are not about the units they supposedly illustrate. The average reader is not going to be familiar with “DARPA's ARGUS-IS surveillance system”, so the fact that it could—in 2014—“stream 1 exabyte of high-definition video per day” cannot possibly help most readers understand the size of an exabyte (and those it would help probably don’t need that help). Many of the examples, including the section Exabyte § Library of Congress, are arguably worse because they are dealing with amounts of data an order of magnitude or more different from the example they are supposedly illustrating.
- It’s a bit surprising to me that so much of this trivia was allowed to accumulate, and now that it’s thousands of bytes deep, I appreciate that its removal seems dramatic, but I don’t think most editors who spent some time reviewing it fact-by-fact with an eye towards its purpose within the article would deem the content appropriate or its removal controversial. I made a serious effort to gather the best examples, invited others to double-check my work, and in the course of engaging in this discussion have spent additional time with the material on the chopping block. I don’t think it’s defensible. I’m willing to dot the is on this process if it’s judged necessary, but I hope not to. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 01:29, 18 December 2020 (UTC)
- @JamesLucas: I notice that for Gigabyte you used the example "about half an hour of video". But... just "video"? In a topic about bits and bytes? With all due respect, all examples from Gigabyte are objectively better and were all going to be deleted (in fact, if it wasn't for Kbrose rescuing them, they were all deleted). Do you mind if I invite users Kbrose and QuiteUnusual to this discussion? I hope not. The way I see it now, too many good content will be lost and from now on my position is against the mass merging/deletion.
- The non encyclopedic material (for instance outdated trivia) is best handled in a case-by-case manner instead of a mass deletion across multiple articles, in my opinion. It's like cancer, you want to remove the bad (cancerous) stuff and keep all the good (healthy) stuff. This mass merging/deletion will remove the bad stuff and all the good stuff. Feelthhis (talk) 03:53, 18 December 2020 (UTC)
- The video example for gigabyte was added by me (replacing a self-reference to Wikipedia). The source says "2 GB per hour of video (varies greatly)" in the context of a 4.7 GB DVD. I agree it's weak and can be removed as far as I'm concerned. Dondervogel 2 (talk) 08:13, 18 December 2020 (UTC)
- No one should be excluded from this conversation. I’ve pinged both Kbrose and QuiteUnusual at least once each in the course of this, and I hope I’ve been clear that further curation of examples is being explicitly requested. The cancer analogy, though, I find inapt. If we agree that these “bodies” are now valued primarily as organ donors, it’s probably better for organ-hunting purposes to move them to the freezer intact than to carve them up and in doing so create obfuscating layers of history states. Unlike organic organs, these word organs are still viable after being declared “dead” for a while. (Granted, I’d probably weigh the pros and cons differently if I thought there were many good organs left to be found.) —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 12:39, 18 December 2020 (UTC)
I have redirected the articles for binary units, which contain none of the content being discussed yesterday. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 14:15, 19 December 2020 (UTC)
- I have redirected the articles for decimal units greater than Gigabyte after giving them one more comb through for not-yet-harvested informative elements and finding none. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 02:00, 21 December 2020 (UTC)
Addressing ambiguous definitions of megabyte and gigabyte
edit- I think some of the information from individual articles needs to be copied across before those individual articles can be replaced with re-directs. I imagine there are multiple examples (I've not carried out an audit or check of any kind) but one that springs to mind is the fact that there exist (or at least have existed) 3 different definitions of "megabyte". Another concern is a false impression in the text of symmetry between decimal and binary definitions of kilo, mega, ... yotta. (Only the table hints at the fact that only the first 3 have binary definitions, whereas all of them are decimal). Dondervogel 2 (talk) 15:33, 12 December 2020 (UTC)
- The 3½-inch floppy’s “1.44 MB” seems like a marketing simplification rather than a third definition; I think the way is currently presented in Megabyte, as a definition of equal relevance, is misleading and not to be emulated. It could be retained as a footnote, but that little bit of history more properly belongs at Floppy disk (and, yes, it’s there). —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 17:01, 12 December 2020 (UTC)
- @JamesLucas: If the information about the history of "megabyte" is hidden in Floppy disk the reader would need to be told (in Byte) where to find it. But I actually believe the information should not be hidden there. Much better to have a section about the meaning of the word "megabyte" either in Megabyte itself or (if re-directed here) in Byte. The evolving meaning of related terms (kilobyte, gigabyte, terabyte ...) is also not apparent from a re-direct to Byte. Dondervogel 2 (talk) 09:35, 13 December 2020 (UTC)
- @Dondervogel 2: Maybe I’m missing part of the story? My understanding is that the ‘1.44 MB’ label was a one-off marketing anomaly rather than something that precipitated an evolution of meaning—a funny footnote remembered by few of us and having no relevance to the current definitions of ‘byte’, ‘kilobyte’, or ‘megabyte’. Correct me if I’m wrong! —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 14:45, 13 December 2020 (UTC)
- @JamesLucas: I don't think it can be dismissed as a one-off anomaly. See p17 of Hale & Stanney (2014).[1]. I see it more as an ongoing symptom of a deeper malaise, namely the ambiguity of KB, MG, GB ... This ambiguity continues (and will continue) for as long as there are two different interpretations of each prefix, and the ambiguity increases with increasing order of the anomaly because while MB can be interpreted 3 different ways, for GB (either 1000 MB or 1024 MB) there are 4 different interpretations, and so on. Dondervogel 2 (talk) 16:54, 13 December 2020 (UTC)
- @Dondervogel 2: If the muddling is commonplace, that would being worth mentioning more prominently than I suggested. In the Google Books preview of the Handbook page 17 is a list of references, so I’m not sure I’m seeing what you’re suggesting I see. If it’s another example besides the 3½-inch floppy, that’d be great. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 22:11, 13 December 2020 (UTC)
- I'm not arguing that particular use is widespread, only that the ambiguity is. The statement in question (paraphrasing a little) reads "[MB] may mean 1000x1000 bytes or it may mean 1024x1024 bytes, or even 1000x1024 bytes", and seems to be on p24 of the version your link points to (search for "1024" and you'll find it). Dondervogel 2 (talk) 22:43, 13 December 2020 (UTC)
- Here's another example. Rata (2009)[2] defines kilobyte as either 1000 bytes or 1024 bytes, megabyte as either one million bytes or 1024 kilobytes (with 3 possible interpretations) and gigabyte as either one billion bytes or 1024 megabytes (4 interpretations). It's the same problem, getting worse at each step increment in the exponent. Dondervogel 2 (talk) 23:09, 13 December 2020 (UTC)
- @Dondervogel 2: If the muddling is commonplace, that would being worth mentioning more prominently than I suggested. In the Google Books preview of the Handbook page 17 is a list of references, so I’m not sure I’m seeing what you’re suggesting I see. If it’s another example besides the 3½-inch floppy, that’d be great. —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 22:11, 13 December 2020 (UTC)
- @JamesLucas: I don't think it can be dismissed as a one-off anomaly. See p17 of Hale & Stanney (2014).[1]. I see it more as an ongoing symptom of a deeper malaise, namely the ambiguity of KB, MG, GB ... This ambiguity continues (and will continue) for as long as there are two different interpretations of each prefix, and the ambiguity increases with increasing order of the anomaly because while MB can be interpreted 3 different ways, for GB (either 1000 MB or 1024 MB) there are 4 different interpretations, and so on. Dondervogel 2 (talk) 16:54, 13 December 2020 (UTC)
- @Dondervogel 2: Maybe I’m missing part of the story? My understanding is that the ‘1.44 MB’ label was a one-off marketing anomaly rather than something that precipitated an evolution of meaning—a funny footnote remembered by few of us and having no relevance to the current definitions of ‘byte’, ‘kilobyte’, or ‘megabyte’. Correct me if I’m wrong! —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 14:45, 13 December 2020 (UTC)
- @JamesLucas: If the information about the history of "megabyte" is hidden in Floppy disk the reader would need to be told (in Byte) where to find it. But I actually believe the information should not be hidden there. Much better to have a section about the meaning of the word "megabyte" either in Megabyte itself or (if re-directed here) in Byte. The evolving meaning of related terms (kilobyte, gigabyte, terabyte ...) is also not apparent from a re-direct to Byte. Dondervogel 2 (talk) 09:35, 13 December 2020 (UTC)
- The 3½-inch floppy’s “1.44 MB” seems like a marketing simplification rather than a third definition; I think the way is currently presented in Megabyte, as a definition of equal relevance, is misleading and not to be emulated. It could be retained as a footnote, but that little bit of history more properly belongs at Floppy disk (and, yes, it’s there). —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 17:01, 12 December 2020 (UTC)
New material added on 28 Jan 2020
edit@KiridaSenpai: I just reverted (for the second time) the addition of a large amount of unreferenced material. Addition of new material is welcome if it is backed up by reliable sources. If you wish to reinstate this material, please read WP:BRD and then gain consensus for the change by discussing it here. Dondervogel 2 (talk) 10:16, 29 January 2021 (UTC)
Call for practical examples
editI’m hoping that someone with more technical knowledge than I possess could help source and/or calculate better practical examples for the table. I’ve tried to find published, understandable-to-the-layperson examples, but they are surprisingly hard to come by. Trying to make an example for ‘terabyte’ a few weeks ago, I looked up file sizes of H.264-encoded 1080p video, and a number of independent sources said 30 hours should be very close to 1 TB. Then I went searching for chunk of video 30 hours long (aiming for something that I had heard of despite having never seen), and I found Avatar: The Last Airbender. I see that Canucka has today calculated a substantially different file size for the same video data. I’m very glad that my amateur calculations are being scrutinized, and there are enough factors (encoding options, aspect ratio, compression of animation vs compression of live action, etc.) that I can believe that my best attempts may have been off by 200+%.
FWIW, I really like the introduction of the footnote, which allows the inclusion of a check-our-math explanation without burying the everybody-gets-it conclusion. This also helps us steer clear of WP:OR since the only original work is the basic math, which is fine as long as the inputs are verifiable. The newly revised table entry is too jargon-heavy, so it’d be good if the next example is something where we don’t feel compelled to mention the aspect ratio for instance. Cheers —jameslucas ▄▄▄ ▄ ▄▄▄ ▄▄▄ ▄ 20:36, 24 February 2021 (UTC)
- Video is particularly bad as an example for this as most of it is variable-rate based on content and encoders differ heavily in quality; there are also multiple ways of encoding the same data. As a quick recent example from my own re-encodings of terrible movies from UHD Bluray for streaming over my NAS and adding in Rifftrax commentary, with 10bpc H.265 one can encode with a "constant rate factor" mode that attempts to maintain a steady quality level based on an arbitrary number. The size of the output video varies massively based on content. The first of the spastic Michael Bay transformers movies is 143 minutes and weighed in at ~15GB re-encoded from ~70GB on the original disk. The last Avengers movie, around 50GB from disk, and 181 minutes long, which consists of a surprising number of fairly still scenes and talking heads (so very little motion), came in at 2.6GB at one step higher quality and I ended up re-encoding again at 4 steps higher quality because I couldn't quite believe the compression would have gone that well and it was closer to 5GB... once I watched marvel characters sitting around talking for most of 3 hours the compression made sense. It's also completely possible to re-encode movies with naive constant bit rate compression so that they're both larger than the originals and lose quality in some scenes (which required a burst of higher bit-rate than the encoder was set to for that particular scene). You won't really find a good "average" for video like this because there isn't any. 1TB for 30 hours sounds like it's roughly based on the encoding settings used for 1080p Bluray (including audio), which was often encoded at gigantic sizes purely to eat up space and make it difficult for people to deal with on home computers (or it was back when bluray was new, anyway), and may contain huge lossless audio files in multiple languages that people are accidentally factoring into their numbers for the actual video size.
- You'll not find many published examples because of this... and I'm talking about two movies at around the same resolution (aspect ratio differs), the same type-ish of source (Disney does cripple their UHD Bluray releases by not including DolbyVision which their movies are mastered in, but this has been accounted for in the sizes above by subtracting both the enhancement layer size and DV metadata sizes from the transformers movie original and re-encode.) Maybe a good place to look would be Sony's camera manuals? I don't shoot video but as I recall all the Sony a7 series manuals list approximated video time / SD card size / selected video resolution and bitrate and that's about as good as you'll get (and they still include the warning that the times vary heavily based on content being filmed). --A Shortfall Of Gravitas (talk) 04:43, 5 August 2021 (UTC)
"PiB" listed at Redirects for discussion
edit
A discussion is taking place to address the redirect PiB. The discussion will occur at Wikipedia:Redirects for discussion/Log/2021 May 20#PiB until a consensus is reached, and readers of this page are welcome to contribute to the discussion. ~~~~
User:1234qwer1234qwer4 (talk) 10:55, 20 May 2021 (UTC)
wrongly deleted
editIMHO the way I wrote it, the byte as information, is definitely much clearer when in the context of binary "digits" or flags (and corresponding hexadecimal digits). As it is written now you have many many words surrounding the concept and not touching it. So my edit is definitely in place. Please user:Dondervogel 2 next time discuss before deleting work that your fellow wikipedian put time and effort into.
Here is the deleted section:
Hexadecimal and binary representation: Byte values can be easily represented with hexadecimal digits. Since 4 set bits correspond to the hexadecimal digit F, every four-bit byte value is easily written as a single hexadecimal digit value, and the hexadecimal value of each digit can easily be translated back into its four bit binary value. Thus an 8 bit byte can be read as two 4 bit bytes, each represented by a single hexadecimal digit. So for example with an 8-bit byte, hexadecimal FF is the maximum value with all bits set (corresponding to decimal 255) and hexadecimal 10 is easily translated as binary 0001_0000 (corresponding to decimal 16).
I also added some short captions so that one can trudge through all the wording:
Byte size: The size of the byte has historically been ...
The 8 bit standard: The modern de facto standard of eight bits...
The unit symbol B: The unit symbol for the byte was designated as...
Thanks in Advance, Moshe aka פשוט pashute ♫ (talk) 11:51, 24 June 2021 (UTC)
- Work is rarely ever deleted; all of it is still archived in the page history. Per the BOLD, revert, discuss cycle, it is actually good to first revert and then discuss. Anyway, I do think that a paragraph about hexadecimal representation would be good to have, since bytes are often represented that way. However, I think it should not be put in the lead, since the lead section is supposed to be merely a summary of the article, and this is a bit too specific to be a summary. Lastly, captions are normally not applied to individual paragraphs. That makes it look more like some sort of glossary, which would not really be appropriate here. (also pinging involed editor Dondervogel 2) ―Jochem van Hees (talk) 12:22, 24 June 2021 (UTC)
- If there is consensus for it, I would not object to the paragraph being reinserted further down the article, but please not in the lead. I do think it would require a little more context to explain the difference between the byte as a unit of storage, and a byte of information. Dondervogel 2 (talk) 12:52, 24 June 2021 (UTC)
- First, the text following
Hexadecimal and binary representation:
doesn't discuss binary representation, so the head is misleading. Second, neither the head nor the text mention octal representation, which is both important historically and still, alas, in use for octets. Third, as Dondervogel mentioned, it doesn't belong in the lede. --Shmuel (Seymour J.) Metz Username:Chatul (talk) 13:27, 24 June 2021 (UTC)
- My main point was that the wording in the lead just doesn't even explain what a byte is. Rather it starts "arguing with itself" on side details like what the standard representation is or what the symbold is. So what I will do is simply move all that too out of the lead and create a short and clear explanatory lead. My explanation through representation, which has EVERYTHING to do with the subject and NOTHING to do with computers, as OPPOSED to the way it is now in the lead, IMHO was better. But, since there are already three who seem to agree that I'm wrong, I will not argue but rather put up an alternative simple explanation that DOES give the definition perhaps with an example. פשוט pashute ♫ (talk) 21:26, 14 July 2021 (UTC)
- It is not "arguing with itself", it is explaining that there is no standard definition for a byte. Given that one of the functions of the lead is to define the subject, this seems very appropriate to me. ―Jochem van Hees (talk) 11:33, 16 July 2021 (UTC)
- I give up. But it seems that the article has been changed for the better now. (Not sure anymore what was there before and what has changed if at all).פשוט pashute ♫ (talk) 16:54, 13 December 2021 (UTC)
- It is not "arguing with itself", it is explaining that there is no standard definition for a byte. Given that one of the functions of the lead is to define the subject, this seems very appropriate to me. ―Jochem van Hees (talk) 11:33, 16 July 2021 (UTC)
- My main point was that the wording in the lead just doesn't even explain what a byte is. Rather it starts "arguing with itself" on side details like what the standard representation is or what the symbold is. So what I will do is simply move all that too out of the lead and create a short and clear explanatory lead. My explanation through representation, which has EVERYTHING to do with the subject and NOTHING to do with computers, as OPPOSED to the way it is now in the lead, IMHO was better. But, since there are already three who seem to agree that I'm wrong, I will not argue but rather put up an alternative simple explanation that DOES give the definition perhaps with an example. פשוט pashute ♫ (talk) 21:26, 14 July 2021 (UTC)
Initial description: not quite on point?
editThe first sentence of the lead is:
- "The byte is a unit of digital information that most commonly consists of eight bits."
This treats the byte as a unit of storage or information, and nothing else. However, use of the term normally relates to how the data is constructed, and in particular that it is a sequence of eight bits that are grouped together. (See Byte | Merriam-Webster.) For example, when we refer to a byte of computer memory, we usually specifically mean one of the 8-bit groupings of memory storage that are addressed together using a single byte address, not individual bits that might be scattered in arbitrary locations. I think it makes sense to put the emphasis on this meaning in this article, and to have the definition as a unit (measure of amount) of storage capacity as a derived meaning. —Quondum 16:52, 29 June 2021 (UTC)
[20220818] On recent editing that had references "broken"
edit @Tea2min, please see the relevant edit history [ https://en.wiki.x.io/wiki/Byte?action=history&offset=20220818072031&limit=4 ].
The reasons have been well explained I believe.
.
I believe what should be fixed are the templates, not the usage.
- MasterQuestionable (talk) 17:00, 18 August 2022 (UTC)
I would revert your edit (commit 1105047710) within hours, if there wasn't any objection with plausible reasoning:
For preparing my further edits (which is based on the old version).
- MasterQuestionable (talk) 20:54, 18 August 2022 (UTC)
Due to an implementation problem in MediaWiki, my further editing of the article is pending.
More details: https://www.mediawiki.org/wiki/Topic:X1hnma8u8r7amg4c
- MasterQuestionable (talk) 01:46, 19 August 2022 (UTC)
Line feeds are not allowed in refs: see Category:CS1 errors: invisible characters. So several of the 'quote=' contain line feeds which generate the error. I fixed these but someone has unfixed them again. - Oculi (talk) 11:38, 19 August 2022 (UTC)
- Is there a tool to replace them with
<br />
? If not, is the number small enough to do that manually? - Shmuel (Seymour J.) Metz Username:Chatul (talk) 12:56, 19 August 2022 (UTC) - @User-duck: A recent edit changed the citations to use {{poem quote}}, which preserves new lines (LF for Unix). Why not use {{quote}} and insert
<br />
where breaks are to be kept? - Shmuel (Seymour J.) Metz Username:Chatul (talk) 13:33, 19 August 2022 (UTC) -
- @Chatul: {{poem}} and {{poem quote}} preserve indentation, seemed to be important to original contributor. Also, running all the lines together with
<br />
reduces readability. Feel free to change, I have no real preference. I needed to change the greying with italics because<span>...</span>
appears not to be compatible with {{poem quote}}. - User-duck (talk) 13:54, 19 August 2022 (UTC)
- @Chatul: {{poem}} and {{poem quote}} preserve indentation, seemed to be important to original contributor. Also, running all the lines together with
-
- Why even bother these templates after all? Things would probably be better without using them. - MasterQuestionable (talk) 05:18, 20 August 2022 (UTC)
- - MasterQuestionable (talk) 05:13, 20 August 2022 (UTC)
You seem to have missed the point entirely: It's not really about whether the implementation complains about the line-breaks or not, but the existence of such line-breaks is reasonable and such usages are valid.
I thought about a workaround: having the "quote" content detached from the "Cite" templates; or dropping all the "Cite" templates wholesale. (I don't find these templates anyhow helpful really) - MasterQuestionable (talk) 06:26, 20 August 2022 (UTC)
Lengthy quotes in references
editThere's a problem with the lengthy quotes in the references. I don't know enough about our citation templates to fix this. - Tea2min (talk) 07:25, 18 August 2022 (UTC)
- What are you asking? The section title suggests that you're addressing a stylistic issue, but you don't need knowledge of the citation templates to fix that.
- If you're asking how to quote text in a citation, use the
|quote=
parameter and indicate any elided text with "{{nbsp}}... ". --Shmuel (Seymour J.) Metz Username:Chatul (talk) 12:26, 18 August 2022 (UTC) -
- Who meant the content is too long and don't know how to fix. - MasterQuestionable (talk) 17:11, 18 August 2022 (UTC)
- For example, the Buchholz 1977 and Behmer 2000 reference contain a mixture of black and gray text color, and weird character sequences like "<&>" and "<.>". Tea2min (talk) 17:16, 18 August 2022 (UTC)
- - MasterQuestionable (talk) 17:38, 18 August 2022 (UTC)
The mentioned contents exist originally in the article's source (as XML comments). I transformed them to use more appropriate formatting for better accessibility. The context hints (weird characters you called) are intended to assist the text parsing. Some of them must not be dropped else the content would become inaccessible in Plain Text.
- I just checked Buchholz 1977. The source text quoted did not contain gray text or "weird character sequences". Were they added during transcription into the article? User-duck (talk) 01:33, 21 August 2022 (UTC)
You can't cite a Wikipedia talk page in a Wikipedia article
edithttps://en.wiki.x.io/?diffonly=1&diff=prev&oldid=1105317022
This alone may not suffice as the reason to decide whether certain content is qualified for inclusion or not.
I believe the inclusion criteria should be entirely based on factual validity, besides nothing else.
- MasterQuestionable (talk) 05:39, 20 August 2022 (UTC)
- I tend to agree. Unfortunately "Wikipedia" does not. They discourage first party sources. They specifically mention Wikipedia articles (and wikipedia clones). I found this reference/source troubling and was actually glad to see the content removed because I do not know how to deal with the referencing. Also, I found the tidbit interesting but did not know if it really added to the article. - User-duck (talk) 17:00, 20 August 2022 (UTC)
-
- - MasterQuestionable (talk) 23:43, 20 August 2022 (UTC)
Selectively not including contents of sufficient factual validity: the practice itself would be against the project's Neutrality guideline. Probably this should be also forwarded to relevant guideline discussions. [ Quote User-duck @ CE 2022-08-20 17:00:57 UTC: https://en.wiki.x.io/?diffonly=1&diff=prev&oldid=1105520727 I found this reference/source troubling and was actually glad to see the content removed because I do not know how to deal with the referencing. ] <^> ? . There seems to be logic fault in the statement. [ Quote (previous): Also, I found the <&>tidbit</&> interesting but did not know if it really added to the article. ] <^> The statement is ambiguous and needs clarification.
Be Bold does not mean Be Reckless
edit These revisions [ https://en.wiki.x.io/wiki/Byte?action=history&offset=20220819165028&limit=5 ] apparently resulted in degraded readability comparing to my last revision [ https://en.wiki.x.io/?oldid=1105186269 ].
I'd suggest making more careful verifications (in cases of uncertainty, discuss first) before committing the change.
- MasterQuestionable (talk) 06:12, 20 August 2022 (UTC)
- I disagree about the "degraded readability". The long quotes were not formatted consistently, they were incompatible with the
|quote=
citation parameter, the greyed text is barely readable, and the extraneous "<&>", "< >", etc. markups do not help. I was very careful to make sure the content and intent of the quotes were maintained. I would appreciate any "Reckless" mistakes being corrected (or at least noted). The only reason I noticed this article is the CS1 errors. If the CS1 error messages had not been ignored (this is reckless) and the original quotes had been done outside the citation templates, I would not have noticed them.- PS: I was hoping someone would notice the "Bare URL" and clarify the "... About bits and bytes: prefixes for binary multiples - IEC ..." reference. It does not meet my understanding of the Wikipedia standards for references. I could attempt to clarify it or simple tag it.
- PPS: I have longtime, extensive knowledge about computers. - User-duck (talk) 16:41, 20 August 2022 (UTC)
-
- - MasterQuestionable (talk) 23:51, 20 August 2022 (UTC)
[ Quote User-duck @ CE 2022-08-20 16:41:37 UTC: https://en.wiki.x.io/?diffonly=1&diff=prev&oldid=1105516677 The long quotes ... were incompatible with the "quote" citation parameter, ] <^> The rationale had been explained in the previous discussion. [ Quote (previous): the greyed text is barely readable, ] <^> This is intended. (they originally exist as XML comments; see previous discussion) [ Quote (previous): and the extraneous "<&>", "< >" ''[ It's "<.>". ]'', etc. markups do not help. ] <^> This had also been explained before. [ Quote (previous): The long quotes were not formatted consistently, ] <^> Besides the aforementioned, any more specific instance? [ Quote (previous): The only reason I noticed this article is the CS1 errors. If the CS1 error messages had not been ignored (this is reckless) and the original quotes had been done outside the citation templates, I would not have noticed them. ] <^> ...It gives a hunch that you didn't check the edit history (let alone relevant discussions) before carrying out the edit. [ Quote (previous): I was very careful to make sure the content and intent of the quotes were maintained. I would appreciate any "Reckless" mistakes being corrected (or at least noted). ] <^> Thanks for your effort anyway. Though at a quick glance your revision [ https://en.wiki.x.io/?oldid=1105316230#References ] does not look as good. (overall weird spacing caused by the template; and specifically reference #13, #18, #20, #22) [ Quote (previous): PS: I was hoping someone would notice the "Bare URL" and clarify the "... About bits and bytes: prefixes for binary multiples - IEC ..." reference. It does not meet my understanding of the Wikipedia standards for references. I could attempt to clarify it or simple tag it. ] <^> The content of URI is significant and should not be meddled. (else it would cause accessibility issues) [ Quote (previous): I have longtime, extensive knowledge about computers. ] <^> One with longtime, extensive knowledge on the subject missing so many details... The situation is concerning.
Using {{poem quote}} presrves indentation, which is desirable, but it also preserves soft line breaks, which leads to jagged output and is not desirable. Is there a quote template that preserves indentation, allows wrapping and allows explicit <br />
tags? --Shmuel (Seymour J.) Metz Username:Chatul (talk) 14:49, 21 August 2022 (UTC)
My own explanation on the background
edit May be of use: [
|*| On the "1,000 or 1,024" affairs (Byte counting) # Background
|*| https://github.com/exiftool/exiftool/issues/152#issue-1344954990 ]
Date format
editA standard date format should be established for this article. The predominate format is yyyy-mm-dd but this is not a preferred format. I saw one date using dmy. I would normally pick dmy or mdy, I have a slight preference for {{use dmy dates|cs1-dates=ly}}
. But since the Talk page for this article is rather active, maybe a consensus could be obtained. User-duck (talk) 03:02, 21 August 2022 (UTC)
- In the references (where brevity is helpful), I would stick with yyyy-mm-dd. It's clear, concise, and avoids the tussle between US and non-US formats. Dondervogel 2 (talk) 09:32, 21 August 2022 (UTC)
- ISO 8601 derived time formats tend to be most accessible among all other options. - MasterQuestionable (talk) 16:47, 24 August 2022 (UTC)
GB album example
editHi, it looks like somebody had added a citation needed tag for a 122-minute album being too small to count as 1 GB.
Well, I did the math and it looks like 122 minutes is too long to count as 1 GB, if you think of it in terms of uncompressed CD-quality audio. CD quality is 16-bit stereo at 44100 samples a second, 1411200 bits/second, 84672000 bits/minute, or 10584000 bytes/minute.
So if you count how many minutes can fit in 1 GB at CD quality:
>>> (1000*1000*1000)/(16*2*44100*60/8) 94.48223733938019 >>> (1024*1024*1024)/(16*2*44100*60/8) 101.449529856387
The answer is 94 minutes 28.934240362811398 seconds for 1 GB, 101 minutes 26.97179138322 seconds for 1 GiB. --Kjoonlee 09:23, 22 November 2022 (UTC)
TB video example
editIt lists that all 61 episodes in 4:3 1080p is a way to think of 1 TB. The citation says that it equals 0.2925 TB. That means you could fit the episodes three times. I don’t know how to figure out the math, but helpful examples could be: X hours/minutes of 1080p@60 fps. x hours/minutes of 4K(2160p)@60 fps. Your Glutes (talk) 06:32, 1 September 2023 (UTC)
Should I write wikipage for higher ubits of measurements and kibibytes?
editI've noticed that English wiki has only 3 pages about units if measurements (byte, megabyte and gigabyte). What about others? Кокушев Сергей (talk) 05:22, 16 December 2023 (UTC)
- The short answer is no. There used to be many such articles, but they were considered repetitive. By consensus they were consolidated and harmonised at one location, the Byte article. Dondervogel 2 (talk) 07:34, 16 December 2023 (UTC)
Better example for PB?
edit2000 years of MP3 does not seem a great example (wouldn't it be better as 2 years for a TB). Could there be a better example for a PB? Robertm25 (talk) 14:23, 27 March 2024 (UTC)
Nibble
editHalf-byte: multiples of byte discussed. Also appropriate is characterization of smaller pieces or parts of the eight bit byte. Four bits is referred to as “nibble”, couch as C1 having four bits for “C” and four bits as “1”, a common representation for EBCDIC letter character “A”. 2600:1700:62E0:2BC0:FCBF:D426:2056:2384 (talk) 15:09, 5 November 2024 (UTC)
- ITYM nybble, also called digit or hex digit. -- Shmuel (Seymour J.) Metz Username:Chatul (talk) 18:33, 5 November 2024 (UTC)