I'm interested in storage benchmarks for various configurations in order to figure out what's best for a virtualization environment. The virtualization environment will be proxmox, as it is my choice for the best manageable virtualization platform with plenty of features right now.
I want to look at the following configuration options, which may have an impact on performance:
- thin provisioning
- transparent compression
- multi disk technology(technology, raidlevel)
- ssd caching
Thin provisioning is the method of having virtually unlimited space and provide actual physical existent space only in the amount of actually used space. So you can define multiple TB of disk capacity and only have a 250 GB SSD at the back. If that backend device is getting filled up, you can add more storage when you need it. It's especially helpful in the times of SSDs because they are still considerably more expensive, so you do not want to spend thousands of $ when you in fact do not need it. Furthermore there are big differences in SSD products. SSDs for desktop use maybe quite cheap. But SSDs for server which are heavily written on are much more expensive.
filesystem and lvm
- normal consumer SSD: 500 GB m.2 ssd start from 80 € (Total Lifetime Write Capacity: 300 TB = 600 Full Writes)
- datacenter SSD: 375GB Intel Optane SSD DC P4800X PCIe costs about 1200 €. (Total Lifetime Write Capacity: 20.5 PB = 57,000 Full writes)
Many filesystems have interesting features, which are helpful besides the pure performance and problems which one would not like:
- PRO: zfs and btrfs has checksums and selfhealing against data corruption.
- PRO: zfs and lvm provides methods for thin provisioning
- PRO: ext4 is easy to use. a simple fire and forget filesystem.
- PRO: btrfs has an enormous flexibility
- PRO: lvm has the flexibility to change configurations without downtime
- CON: ext3 has quite long filesystemcheck times.
Transparent compression is a layer which reduces the amount of written/read data onto/from the raw disk and thus may increase speed at the cost of cpu power.
multi disk technology(technology, raidlevel)
There are different multi disk technologies available. Linux Software RAID, LVM, btrfs raid, zfs raid. They combine the speed of multiple devices and add redundancy to be able to cope with device failures without data loss.
ssd caching can accelerate slower hdds by adding putting used data onto the fast ssd as read cache or by storing datas to be written preliminary to the ssd and have it synced to the slower hard disks in the background, not loosing data security, because data written to the ssd is already persistent.
ceph - no option here
Ceph is a very interesting technology. I'm not considering using it, because the money needed to get it run with good performance is a lot higher than just with disks and ssds. You need at least 10 G networking, or even better, which is a lot more costly than 1 G. You need full equipped SSD Storage which is more expensive too. A big plus with ceph is that you get a redundant network storage, so you can immediately start virtual machines on other nodes if a compute node crashes. If money is no problem, and the performance is not needed at the maximum, ceph would be an excellent choice. I have a 3-node-cluster with ceph here up and running. It works like charm. Administration is easy and performance is fine.
In the following threads, I'll introduce more on my environment and scripts of the benchmarking.