VembuHIVE – A custom built file system for data protection
Virtualization has opened many doors in terms of how we treat our production environments. We are now vMotioning or Live Migrating our workloads across a cluster of hosts – we are cloning workloads with much ease and deploying new servers into our environments at a very rapid rate. We have seen many advantages and benefits to the portability and encapsulations that virtualization provides. For a while, our backups though were treated as the same – simply copies of our data sitting somewhere else – only being utilized during those situations when a restore was required. That said over the past 5 years or so we have seen a shift in what we do with our backup data as well. Sure, it’s still primarily used for items such as restores, both on a file and image level – but backup companies have began to leverage that otherwise stale data in ways we could only imagine. We see backups being used for analytics, compliance, and audit scans. We see backups now being used in a devops nature – allowing us to spin up isolated, duplicate copies of our data for testing and development purposes. We have also saw the ‘restore’ process dwindling away, with the “instant” recovery feature taking its’ place, powering up VMs immediately from within the deduplicated and compressed backup files, drastically decreasing our organizations RTO.
So with all of this action being performed on our backup files a question of performance comes into play. No longer are we ok to simply store our backups on a USB drive formatted with a traditional file systems such as FAT or NTFS. The type of data we are backing up, the modern virtualization disk images such as VHDx and VMDK depend on something more from the file system it’s living on – which is why Vembu, a data protection company out of India have developed their own file system for storing backups, the VembuHIVE.
Backups in the HIVE
When we hear the word VembuHIVE we can’t help but turn our attention towards bees – and honestly, they make the perfect comparison as to how the proprietary file system from Vembu performs. A bee hive at its basics is the control center for bees – a place where they all work collectively to support themselves and each other – the hive is where the bees harvest their magic, organizing food, eggs, and honey. The VembuHIVE is the central point of storage for Vembu’s magic, storing the bits and controlling how files are written, read and pieced together. While VembuHIVE can’t produce honey (yet), it does produce data. And it’s because of the way that VembuHIVE writes and reads our source data that we are able to mount and extract our backups in multiple file formats such as ISO, IMG, VMDK and VHDX – in a near instant fashion.
In essence, VembuHIVE is like a virtualized file system overlaid on top of your existing file system that can utilize utilities that mimic other OS file systems – I know that’s a mouthful but let’s explore that some more.
Version Control is key
In my opinion the key characteristic that makes VembuHIVE run is version control – where each and every file produced is accompanied by metadata controlling what version, or point in time, the file is from. Probably the easiest comparison is to that of GIT.
We all know of GIT – the version control system that keeps track of changes to our code. GIT solved a number of issues within the software development ecosystem. For instance, instead of copying complete projects before making changes we could simply branch out on GIT – which would basically track changes to source code and store only those lines which have changed – allowing us to easily roll back or to any point in time within our code – reverting and redoing any changes that were made. This is all done by only storing changes and creating metadata to explain those changes – which in the end gives us a very fast way to revert to different points, fork off new points, all the while utilizing our storage capacity in the most efficient way possible.
VembuHIVE works much in the same way as GIT however instead of tracking source code we are tracking changed blocks within our backup files – allowing us to roll back and ahead within our backup file chain. Like most backup products Vembu will create a full backup during the first run, and subsequently utilize CBT within VMware to copy only changed blocks during incremental backups. That said, the way it handles and intelligently stores the metadata of those incremental backups allows Vembu to essentially present any incremental backup as what they call, a virtual full backup. Basically, this is what allows Vembu BDR to expose our backups, be them full or incremental, in various file formats such as vmdk and vhdx. This is done without performing any conversion on the underlying backup content and in the case of incremental backups there is no merging of changes to the previous full backup before hand. It’s simply an instant export of our backups in whatever file format we chose. I mention that we can instantly export these files, but it should be noted that these point in time backups can be instantly booted and mounted as well – again, no merge, no wait time.
VembuHIVE also contains most of the features you expect to see in a modern file system as well. Features such as deduplication, compression and encryption are also available within VembuHIVE. As well, VembuHIVE contains built-in error correction on top of all of this. Every data chunk within the VembuHIVE file system has it’s own parity file – meaning when data corruption occurs, VembuHIVE can reference the parity file in order to rebuild or repair the data in question. Error correction within VembuHIVE can be performed at many levels as well, protecting data from a disk image level, file-level, chunk-level or backup file-level basis – I think we are covered pretty good here
Finally we’ve mentioned a lot that we can instantly mount and exports our VMs on a VM level basis, however the intelligence and metadata within the VembuHIVE file system goes way beyond that. Aside from exporting as vmkd’s or vhdx’s, VembuHIVE understands how content is organized within the backup file itself – paving the way for instant restores on an application level – think Exchange and Active Directory objects here. Again, this can be done instantly, from any restore point at any point in time without performing any kind of merge process.
In the end VembuHIVE is really the foundation of almost all the functionality that Vembu BDR provides. In my opinion Vembu have made the correct decision by architecting everything around VembuHIVE and by first developing a purpose built, modern file system geared solely at data protection. A strong foundation always makes for a strong product and Vembu has certainly embraced that with their implementation of VembuHIVE