Talk:ZFS
Latest comment: 3 months ago by Anastrophe in topic Reports of fragmentation problems and encryption problems
This is the talk page for discussing improvements to the ZFS article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
Archives: 1Auto-archiving period: 12 months |
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
Reports of fragmentation problems and encryption problems
edit((Note: This discussion arose on my talk page after I'd reverted the changes editor Robercik st had added. I think it's at a point where more 'eyeballs' are needed to come to consensus, so I'm copying it here unaltered for further discussion.))
You can read about this in https://github.com/openzfs/zfs/issues/3582 if you need more info Robercik st (talk) 03:06, 13 December 2023 (UTC)
- The problem is there's multiple issues at work here. First, the question of whether or not free space fragmentation is the cause of performance issues, or rather over-utilization of the disk space. Virtually all filesystems are prone to performance issues as available free space diminishes, and since we don't have meaningful metrics to rely on, we can't ascribe one over the other.
- Secondly, most of what you've written is narrative opinion. The link to the ZFS code discussions isn't a reliable source for the claims being made - not by wikipedia's standards. You'd need to find a secondary source that describes these issues.
- Lastly - I suspect you may not be a native english speaker, based on the many grammatical and spelling errors in the content presented. That's not a problem itself, fluency in any language isn't easy. However, for the content to be appropriate to the english wikipedia, it would have to have all the errors fixed before being posted into the public encyclopedia. That however is dependent upon the precedent issues with the content that I described. Numerous opinions on the 'net suggest that by far the larger issue is over-utilization; fragmentation being pointed to as the proximate cause of performance issues hasn't been determined as fact.
- If you can find a better source - a general technology news site would be a good start - then possibly the claims could be notable for the article. I looked around and couldn't find any discussion of the matter - only blog posts and forum commentary, which just aren't acceptable for making broad claims in the article. cheers. anastrophe, an editor he is. 06:27, 13 December 2023 (UTC)
- So i this docs article from oracle you have:
- https://docs.oracle.com/en/operating-systems/solaris/oracle-solaris/11.4/manage-zfs/storage-pool-practices-performance.html#GUID-3568A4BD-BE69-4F17-8A2E-A7ED5C7EFA79
- If a large percentage (more than 50%) of the pool is made up of 8k chunks (DBfiles, iSCSI Luns, or many small files) and have constant rewrites, then the 90% rule should be followed strictly.
- If all of the data is small blocks that have constant rewrites, then you should monitor your pool closely once the capacity gets over 80%. The sign to watch for is increased disk IOPS to achieve the same level of client IOPS.
- If a large percentage (more than 50%) of the pool is made up of 8k chunks (DBfiles, iSCSI Luns, or many small files) and have constant rewrites, then the 90% rule should be followed strictly.
- So there is really confirming what i wrote whether is is space or fragmentation or both but I think that users should know that things because they place on zfs big TB of data so moving it will be very costly especially on production. And there is scarse info about that real problem in zfs.
- I don't know of such requirements on ext4 for example and from my experience postgres workloads works very well after 90 % full FS on ext4 so that kind of problems shouldbe stated surely
- Of course you can correct gramar errors but idea stays right :) Robercik st (talk) 15:17, 13 December 2023 (UTC)
- That's an interesting and useful link. However - it's important to be aware of the WP rules about synthesis. The Oracle page is a 'good/best practices' guidance page: it doesn't state anywhere that these are, specifically, limitations, deficiencies, or shortcomings of ZFS - they're merely suggestions for maintaining good performance. So, the material would certainly be useful in the article, but it can't be presented specifically as a "limitation" unique to ZFS.
- If you want to craft a new segment, perhaps to go under the 'read/write efficiency' section? - I recommend posting it to the ZFS talk page, where I'll be happy to do 'wordsmithing' on it so it reads better for article space. cheers. anastrophe, an editor he is. 21:20, 13 December 2023 (UTC)
- But I didn't find such recomendation for eg.: ext4, xfs or other filesystems so I find it very specific limitation of zfs which can surprise users in very bad way in case of TB of data. An also it confirms what it is stated in https://github.com/openzfs/zfs/issues/3582 especially comment with paragraph from author of ZFS: Matt Ahrenz on ZFS / "Block-pointer Rewrite project for ZFS Fragmentation". So there is clearly problem specific for zfs. Robercik st (talk) 16:40, 14 December 2023 (UTC)
- But again, you've just given the definition of why WP:SYNTHESIS isn't accepted: "Do not combine material from multiple sources to reach or imply a conclusion not explicitly stated by any source. Similarly, do not combine different parts of one source to reach or imply a conclusion not explicitly stated by the source."
- We can't make the conclusion that it's a unique limitation of ZFS; a reliable source has to. Nowhere in the quote from Matt Ahrenz does he mention ext4, XFS, ReiserFS, Fat32, or any other filesystem, nor does the Oracle document.
- If you can find a reliable source that says that it is a limitation unique to ZFS, then by all means, that would be appropriate to the article. Until then, we can't make claims that can't be verified directly from the sources. cheers. anastrophe, an editor he is. 18:29, 14 December 2023 (UTC)
- Ok but can we put fragmentation as limitation specific to zfs in that it is not fixed so we can;t defragment as Matt Ahrenz says so ? Robercik st (talk) 01:54, 15 December 2023 (UTC)
- Hey Robert, before we go further, I'd like your permission to copy this whole thread over to the ZFS talk page. I don't want there to be any sense that I'm "gatekeeping" the content - I'm not an expert, though I did run ZFS filesystems in production for many years. The fragmentation issue - from what I've read - is quite a complicated matter, since it's not fragmentation in the sense the majority of people think of it - as in, it's not file fragmentation, it's free-space fragmentation.
- But - with your permission, let's move this over there so that we can get more eyeballs on the matter and come to a collaborative consensus. Sound okay? cheers. anastrophe, an editor he is. 02:59, 15 December 2023 (UTC)
- Ofcourse You can copy
- Ofcourse it is free-space fragmentation issue and i believe that Oracle recommendations are because of that problem with fragmentation and this is clearly problem with zfs itself. I can hide itself when we have mostly cold storage but it hit hard in cases described in Oracle docs so we need to expose that in article for sure it creator of zfs says we have problem ;). Also very suspicious is that Oracle doesn't give solution to revert this lower performance and if it is because of fragmentation there is no solution other than rewriting whole dataset :( which is shame. Robercik st (talk) 16:46, 15 December 2023 (UTC)
- Shame for fs that is advertised as last word in FS :) Robercik st (talk) 16:48, 15 December 2023 (UTC)
- Ok but can we put fragmentation as limitation specific to zfs in that it is not fixed so we can;t defragment as Matt Ahrenz says so ? Robercik st (talk) 01:54, 15 December 2023 (UTC)
- But I didn't find such recomendation for eg.: ext4, xfs or other filesystems so I find it very specific limitation of zfs which can surprise users in very bad way in case of TB of data. An also it confirms what it is stated in https://github.com/openzfs/zfs/issues/3582 especially comment with paragraph from author of ZFS: Matt Ahrenz on ZFS / "Block-pointer Rewrite project for ZFS Fragmentation". So there is clearly problem specific for zfs. Robercik st (talk) 16:40, 14 December 2023 (UTC)
((This is the end of the discussion that was on my talk page)) cheers. anastrophe, an editor he is. 18:48, 15 December 2023 (UTC)
- Ok so it lasted so long that I want to add something like this:
- Fragmentation of free space is very important issue because is not reversible like in eg. btrfs. Only solution is to use command zfs send | receive to recreate dataset without fragmentation therefore it could be problem with production systems with several TB datasets because of downtime and twice the space to be used. There won't be any solution to this issue [1]. It surfaces when there is not much left space on dataset and also when dataset is used with rewriting prosesses eg. databases. Algorithm for managing free space is not scaled well for very fragmented free space Zfs is therefore better suited for cold data storage.
- Encryption - not supported by project. There are numerous issues which are not fixed like it is when bug surfaces in code without encryption. USE ENCRYPTION ONLY WHEN YOU ARE PREPARED TO LOSE DATA https://github.com/openzfs/openzfs-docs/issues/494
- Robercik st (talk) 10:13, 11 August 2024 (UTC)
- Unfortunately, as before, this requires reliable secondary sources. github commentary/issues don't rise to that level. What you inserted in the article wasn't neutrally worded, and all-caps are specifically unacceptable via the Manual of Style. There were also grammar and punctuation errors. Sorry I didn't reply earlier, I didn't get notification of this talk page addition. cheers. anastrophe, an editor he is. 19:19, 14 August 2024 (UTC)
- Ok so help me put it in neutrally worded sounding. As for secondary source you have comment from group in which Matt Artens works: "Several people who got burned by native encryption recently asked me why there were no warnings around it if it has known bad failure modes, and I didn’t really have a good answer."
- link to docs:
- https://docs.google.com/document/d/1w2jv2XVYFmBVvG1EGf-9A5HBVsjAYoLIFZAnWHhV-BM/edit Robercik st (talk) 09:03, 15 August 2024 (UTC)
- I'll take a shot at it a little later today. The issue here is that these "problems" are speculative and situational, and we can't state emphatically that something is an inherent flaw when all that's available are anecdotal reports. I may be able to finesse a solution. cheers. anastrophe, an editor he is. 16:59, 15 August 2024 (UTC)
- So I've reviewed all of the documents you've provided over the course of the discussion.
- Regarding the fragmentation performance issues - all reports are anecdotal, with reports concurring with the claims and others describing no issues at all at 95% capacity[1]. There's no hard evidence one way or another that ZFS's free-space fragmentation actually is the cause of the performance issues.
- Regarding the native encryption issues, this seems to be better established as a problem, but it is not without contrary POV's as well[2].
- We can't put speculative claims in the encyclopedia -- remember what wikipedia is not. These are matters for user to do their own due diligence on, we're not here to warn about issues that aren't confirmed -- but we can acknowledge that the reports exist. I'm crafting text to do so. cheers. anastrophe, an editor he is. cheers. anastrophe, an editor he is. 22:51, 15 August 2024 (UTC)
- Regarding fragmentation - the issue must be present if creator of zfs is saying that issue is unsolvable. Whats worse it is not curable - only by send | receive so at least let users know so they are warned :).
- Aboud enctyption - Ofcourse you find many people who hadn't burned but read this comment: https://github.com/openzfs/openzfs-docs/issues/494#issuecomment-1946116133. so the worst part is that noone from project is willing to fix issues that are encryption related ;(. This is real problem - no support. Robercik st (talk) 10:45, 17 August 2024 (UTC)
- Re fragmentation, the overarching co-dependence is over-utilization, not free space fragmentation. In 2024, over-utilization is an administration matter, not a ZFS matter. Decades back I ran Solaris systems on UFS. Guess what? Performance went to hell if the disks were over-utilized. Fragmentation didn't make any difference. It was my job to adjust the system so that the performance wasn't terrible. Again I point you to what wikipedia is not. You're suggesting that the encyclopedia give advice or warnings to users based upon issues that may or may not affect them. We don't give advice. What I added lets readers know of problems that some users have experienced, without telling them how to do things, when to do things, or where to do things. The entry is sourced, so they can inform themselves further if they are bothered by it. Same with encryption. Since the failures are not reliably repeatable, we can only inform readers that some people have experienced problems. We can't give advice, or warnings, or discouragement based on that. Again, the statement is cited; the reader can explore further if they wish. cheers. anastrophe, an editor he is. 18:37, 17 August 2024 (UTC)
- Thank you for adding some remarks.
- I think that encryption is not anegdotal because isues are known by project owners so can you delete anegdotal word ? I think they are not reliably repeatable because there is no intrest to do so - what i've said before - no support of encryption from project.
- In fragmentation the real issue is that you can't revert fragmentation so you can't adjust the system to revert performance problems :( Robercik st (talk) 11:13, 18 August 2024 (UTC)
- Encryption problems are reproducible: read rincebrain comment: https://www.reddit.com/r/zfs/comments/10n8fsn/comment/j6b8k1m/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
- It is very clear to me Robercik st (talk) 11:25, 18 August 2024 (UTC)
- Same here: https://github.com/openzfs/zfs/issues/11688#issuecomment-910764101 Robercik st (talk) 12:01, 18 August 2024 (UTC)
- Rincebrain's testing results are definitely not clear to me, nor do they approach reproducibility in any formal sense of the term. Achieving failure "over 50% of the time" as Rincebrain wrote doesn't constitute reliably repeatable, only that "failure happens at slightly greater than random chance in the results". Repeatability of a failure is critical to characterizing a failure. Yes, something is wrong with native encryption, in specific use-cases, sometimes. The massive amounts of speculations in the threads discussing the problem show that it fails in broadly different ways that can't be characterized with uniformity, and nobody can figure out what the problem is, because it's not reliably repeatable.
- I'll note again that as written, the article now acknowledges that some users have experienced problems with native encryption; we can't claim anything more than that. Wikipedia doesn't offer advice or present conclusions that haven't been verified by reliable secondary sources. cheers. anastrophe, an editor he is. 21:51, 18 August 2024 (UTC)
- What I can say for sure is encryption is not maintained and not supported and from zfs i've learned that open source is good when many people use open source project - issues are fixed. If something is used rarely it is quaranted to be not maitained or broken like zfs encryption ;( Robercik st (talk) 14:38, 19 August 2024 (UTC)
- Agreed. Considering the volume of reports of problems, at worst they should do as suggested in the thread about modifying the documentation. They should either add a warning/caution, or remove the ability to add encryption to existing unencrypted data (obviously they can't remove the encryption algorithms as that would be disastrous for those already using it). It's a strange situation. cheers. anastrophe, an editor he is. 16:45, 19 August 2024 (UTC)
- What I can say for sure is encryption is not maintained and not supported and from zfs i've learned that open source is good when many people use open source project - issues are fixed. If something is used rarely it is quaranted to be not maitained or broken like zfs encryption ;( Robercik st (talk) 14:38, 19 August 2024 (UTC)
- 'Anecdotal' doesn't mean wrong or false/untrue. It means that there's a cohort of users who have reported "problems" in a specific use-case. End-users can only express/represent their personal experiences. Based on the reports, failure modes can't be characterized, they are broadly inconsistent, and users have also reported no failures in systems with approximately the same structure/load. Anecdotal only means that there is no formal characterization of the failure mode via repeatability in testing. cheers. anastrophe, an editor he is. 21:56, 18 August 2024 (UTC)
- Ok so we stay like it is. Thanks for colaboration :) Robercik st (talk) 13:51, 19 August 2024 (UTC)
- Re fragmentation, again, there's no evidence to distinguish between too much free space fragmentation as being the problem, or over-utilization of space being the problem. You can adjust the system to address the performance problems: you increase available disk space so that the over-utilization no longer exists. Are there any reports of someone having high free-space fragmentation and over-utilization who have significantly increased their total space available and continue to have the exact same performance problems? If so, that would suggest that free-space fragmentation is the problem. Absent that evidence, it's a tossup between frag and utilization. cheers. anastrophe, an editor he is. 22:02, 18 August 2024 (UTC)
- Re fragmentation, the overarching co-dependence is over-utilization, not free space fragmentation. In 2024, over-utilization is an administration matter, not a ZFS matter. Decades back I ran Solaris systems on UFS. Guess what? Performance went to hell if the disks were over-utilized. Fragmentation didn't make any difference. It was my job to adjust the system so that the performance wasn't terrible. Again I point you to what wikipedia is not. You're suggesting that the encyclopedia give advice or warnings to users based upon issues that may or may not affect them. We don't give advice. What I added lets readers know of problems that some users have experienced, without telling them how to do things, when to do things, or where to do things. The entry is sourced, so they can inform themselves further if they are bothered by it. Same with encryption. Since the failures are not reliably repeatable, we can only inform readers that some people have experienced problems. We can't give advice, or warnings, or discouragement based on that. Again, the statement is cited; the reader can explore further if they wish. cheers. anastrophe, an editor he is. 18:37, 17 August 2024 (UTC)
- And here: https://github.com/openzfs/zfs/issues/11679 you have corruption and issue is still open :( Robercik st (talk) 10:51, 17 August 2024 (UTC)
- Same as previous response. The article notes that users have reported problems. cheers. anastrophe, an editor he is. 18:40, 17 August 2024 (UTC)
- To this: "describing no issues at all at 95% capacity" He have only 10 % fragmented space so he could see no issue at all. Question is what if there is 95 % space used and 95 % fragmented free space. Robercik st (talk) 16:08, 17 August 2024 (UTC)
- But you show exactly what the problem is: "what if". Wikipedia doesn't present speculations. You say what if it's 95%. Do you have evidence that 95% frag/95% utilization is a threshold? Will they experience no performance problems at 94% frag/95% utilization? What about 75/99? 99/10? 80/50? The article now clearly states that users have reported performance issues related to fragmentation and utilization. Nothing more is required or appropriate. cheers. anastrophe, an editor he is. 18:44, 17 August 2024 (UTC)
- Also to fragmentation oracle docs says:
- If data is mostly added (write once, remove never), then it's very easy for ZFS to find new blocks. In this case, the percentage can be higher than normal; maybe up to 95%
- It clearly states that problem is with searching of free space fragmentation because when you only write there is no fragmentation of free space. Robercik st (talk) 16:17, 17 August 2024 (UTC)
- No, the oracle documention doesn't 'clearly state' that there is a "problem". It provides guidance for good performance, which is why it's titled "Storage Pool Practices for Performance", and the very first line states "In general, keep pool capacity below 90% for best performance". That advice can be applied to literally any filesystem. Wikipedia doesn't issue advice. cheers. anastrophe, an editor he is. 18:53, 17 August 2024 (UTC)
- I'll take a shot at it a little later today. The issue here is that these "problems" are speculative and situational, and we can't state emphatically that something is an inherent flaw when all that's available are anecdotal reports. I may be able to finesse a solution. cheers. anastrophe, an editor he is. 16:59, 15 August 2024 (UTC)
- Unfortunately, as before, this requires reliable secondary sources. github commentary/issues don't rise to that level. What you inserted in the article wasn't neutrally worded, and all-caps are specifically unacceptable via the Manual of Style. There were also grammar and punctuation errors. Sorry I didn't reply earlier, I didn't get notification of this talk page addition. cheers. anastrophe, an editor he is. 19:19, 14 August 2024 (UTC)