I ‘only’ have 12Tb drives and yet my zfs-pool already needs ~two weeks to scrub it all. With something like this it would literally not be done before the next scheduled scrub.
The trouble with spinning platters this big is that if a drive fails, it will take a long time to rebuild the array after shoving a new one in there. Sysadmins will be nervous about another failure taking out the whole array until that process is complete, and that can take days. There was some debate a while back on if the industry even wanted spinning platters >20TB. Some are willing to give up density if it means less worry.
I guess Seagate decided to go ahead, anyway, but the industry may be reluctant to buy this.
There is an enterprise storage shelf (aka a bunch of drives that hooks up to a server) made by Dell which is 1.2 PB (yes petabytes). So there is a use, but it’s not for consumers.
That’s a use-case for a fuckton of total capacity, but not necessarily a fuckton of per-drive capacity. I think what the grandparent comment is really trying to say is that the capacity has so vastly outstripped mechanical-disk data transfer speed that it’s hard to actually make use of it all.
For example, let’s say you have these running in a RAID 5 array, and one of the drives fails and you have to swap it out. At 190MB/s max sustained transfer rate (figure for a 28TB Seagate Exos; I assume this new one is similar), you’re talking about over two days just to copy over the parity information and get the array out of degraded mode! At some point these big drives stop being suitable for that use-case just because the vulnerability window is so large that the risk of a second drive failure causing data loss is too great.
I get it. But the moment we invoke RAID, or ZFS, we are outside what standard consumers will ever interact with, and therefore into business use cases. Remember, even simple homelab use cases involving docker are well past what the bulk of the world understands.
I’m not in the know of having your own personal data centers so I have no idea. … But how often is this necessary? Does accessing your own data on your hard drive require a scrub? I just have a 2tb on my home pc. Is the equivalent of a scrub like a disk clean up?
You usually scrub you pool about once a month, but there are no hard rules on that. The main problem with scrubbing is, that it puts a heavy load on the pool, slowing it down.
Accessing the data does not need a scrub, it is only a routine maintenance task.
A scrub is not like a disk cleanup. With a disk cleanup you remove unneeded files and caches, maybe de-fragment as well. A scrub on the other hand validates that the data you stored on the pool is still the same as before. This is primarily to protect from things like bit rot.
There are many ways a drive can degrade. Sectors can become unreadable, random bits can flip, a write can be interrupted by a power outage, etc. Normal file systems like NTFS or ext4 can only handle this in limited ways. Mostly by deleting the corrupted data.
ZFS on the other hand is built using redundant storage. Storing the data spread over multiple drives in a special way allowing it to recover most corruption and even survive the complete failure of a disk. This comes at the cost of losing some capacity however.
I have 2*12TB whitelabel WD drives (harvested from external drives but Datacenter drives accourding to the SN) and one 16 TB Toshiba white-label (purchased directly also meant for datacenters) in a raidz1.
How full is your pool? I have about 2/3rds full which impacts scrubbing I think.
I also frequently access the pool which delays scrubbing.
What is the usecase for drives that large?
I ‘only’ have 12Tb drives and yet my zfs-pool already needs ~two weeks to scrub it all. With something like this it would literally not be done before the next scheduled scrub.
It’s like the petronas towers, everytime they’re finished cleaning the windows they have to start again
Data centers???
Jesus, my pool takes a little over a day, but I’ve only got around 100 tb how big is your pool?
The pool is about 20 usable TB.
Something is very wrong if it’s taking 2 weeks to scrub that.
Sounds like something is wrong with your setup. I have 20TB drives (x8, raid 6, 70+TB in use) … scrubbing takes less than 3 days.
High capacity storage pools for enterprises.
Space is at a premium. Saving space should/could equal to better pricing/availability.
Not necessarily.
The trouble with spinning platters this big is that if a drive fails, it will take a long time to rebuild the array after shoving a new one in there. Sysadmins will be nervous about another failure taking out the whole array until that process is complete, and that can take days. There was some debate a while back on if the industry even wanted spinning platters >20TB. Some are willing to give up density if it means less worry.
I guess Seagate decided to go ahead, anyway, but the industry may be reluctant to buy this.
I would assume with arrays they will use a different way to calculate parity or have higher redundancy to compensate the risk.
If there’s higher redundancy, then they are already giving up on density.
We’ve pretty much covered the likely ways to calculate parity.
There is an enterprise storage shelf (aka a bunch of drives that hooks up to a server) made by Dell which is 1.2 PB (yes petabytes). So there is a use, but it’s not for consumers.
That’s a use-case for a fuckton of total capacity, but not necessarily a fuckton of per-drive capacity. I think what the grandparent comment is really trying to say is that the capacity has so vastly outstripped mechanical-disk data transfer speed that it’s hard to actually make use of it all.
For example, let’s say you have these running in a RAID 5 array, and one of the drives fails and you have to swap it out. At 190MB/s max sustained transfer rate (figure for a 28TB Seagate Exos; I assume this new one is similar), you’re talking about over two days just to copy over the parity information and get the array out of degraded mode! At some point these big drives stop being suitable for that use-case just because the vulnerability window is so large that the risk of a second drive failure causing data loss is too great.
Thats exactly what I wanted to say, yes :D.
I get it. But the moment we invoke RAID, or ZFS, we are outside what standard consumers will ever interact with, and therefore into business use cases. Remember, even simple homelab use cases involving docker are well past what the bulk of the world understands.
I would think most standard consumers are not using HDDs at all.
there was a time i asked this question about 500 megabytes
I am not questioning the need for more storage but the need dor more storage without increased speeds.
I too, am old.
I’m older than that but didn’t want to self report. the first hard disk i remember my father buying was 40mb.
It’s to play Ark: Survival Evolved.
What’s scrubbing for?
A ZFS Scrub validates all the data in a pool and corrects any errors.
I’m not in the know of having your own personal data centers so I have no idea. … But how often is this necessary? Does accessing your own data on your hard drive require a scrub? I just have a 2tb on my home pc. Is the equivalent of a scrub like a disk clean up?
You usually scrub you pool about once a month, but there are no hard rules on that. The main problem with scrubbing is, that it puts a heavy load on the pool, slowing it down.
Accessing the data does not need a scrub, it is only a routine maintenance task. A scrub is not like a disk cleanup. With a disk cleanup you remove unneeded files and caches, maybe de-fragment as well. A scrub on the other hand validates that the data you stored on the pool is still the same as before. This is primarily to protect from things like bit rot.
There are many ways a drive can degrade. Sectors can become unreadable, random bits can flip, a write can be interrupted by a power outage, etc. Normal file systems like NTFS or ext4 can only handle this in limited ways. Mostly by deleting the corrupted data.
ZFS on the other hand is built using redundant storage. Storing the data spread over multiple drives in a special way allowing it to recover most corruption and even survive the complete failure of a disk. This comes at the cost of losing some capacity however.
I worked on a terrain render of the entire planet. We were filling three 2 Tb drives a day for a month. So this would have been handy.
What drives do you have exactly? I have 7x6TB WD Red Pro drives in raidz2 and I can do a scrub less than 24 hours.
I have 2*12TB whitelabel WD drives (harvested from external drives but Datacenter drives accourding to the SN) and one 16 TB Toshiba white-label (purchased directly also meant for datacenters) in a raidz1.
How full is your pool? I have about 2/3rds full which impacts scrubbing I think. I also frequently access the pool which delays scrubbing.
It’s like 90% full, scrubbing my pool is always super fast.
Two weeks to scrub the pool sounds like something is wrong tbh.