888 749 7067
sales
888 728 9066
support
Login to Customer Portal

Storage Models for Private Clouds

Implementing a private cloud requires a significant virtualization of the business environment, which will change the ways in which an organization uses computing resources such as storage. A private cloud may use local storage in which the same server is both a processing node and storage node. It may also use shared storage where data is stored in a centralized location that’s accessible by all the processing nodes on the private cloud.

Local and shared storage models may both use block-level storage or file-level storage. Devices with block-level storage store raw data images, and modern implementations of block-level storage typically support clones and snapshots of the data. File-level storage provides all the features of a Portable Operating System Interface (POSIX) file system.

Local Storage

Local storage is best for cloud deployments with no more than two servers. Common types of local storage for a private cloud include directory, LVM and ZFS.

LVM

Logical Volume Manager (LVM) is an implementation of logical volume management in the Linux kernel that’s commonly used to facilitate hard drive management. This block-level storage method is a light software layer that transparently stores data in multiple partitions by splitting the available disk space into logical volumes.

LVM allows snapshot and clones, although their creation is inefficient because creating a snapshot interferes with all writes to that volume group. LVM implements proper cluster-wide locking, although it doesn’t support shared storage.

The most common use of LVM for a private cloud is to place it on top of a large iSCSI logical unit number (LUN).

ZFS

ZFS is a logical volume manager and file system developed by Sun Microsystems. Its most significant features for storing data on a private cloud include the following:

Integration of file system and volume management
Automatic integrity checking and repair
Protection against data corruption
Support for high storage capacity
Efficient data compression
Snapshots and clones

ZFS is probably the most advanced storage model currently available for making snapshots and clones. Its raw format is used for VM images, while the subvol format is used for container data.

ZFS inherits its properties from the parent dataset, so the parent’s properties will be a data set’s default values. Common storage properties for ZFS data sets include content, nodes and disable. Storage properties specific to ZFS include pool, blocksize and sparse. The pool property specifies the ZFS pool or file system in which to allocate storage space, and the blocksize property specifies the block size for that volume. The sparse property indicates that the volume will use the ZFS thin-provisioning system, which doesn’t require the reservation to be the same size as the volume.

Like LVM, ZFS can be used in conjunction with an iSCSI server to share storage among multiple processing nodes.

Shared Storage

Shared storage models for a private cloud include NFS, iSCSI, a dedicated Ceph SAN and using Ceph to share processing nodes.

NFS

Network File System (NFS) is a file-based network server based on the directory model, so NFS shares most of its properties with the directory model, including the file-naming conventions and directory layout. The primary advantage of NFS is that administrators can directly configure the NFS server properties, allowing the server to automatically mount the shared drive without modifying /etc/fstab. NFS can also determine if the server is available and request exported shares from it.

The biggest drawback to using and NFS server for storing VM images is that they are stored as files and VMs access the underlying data through the file mechanisms instead of directly accessing the data at the block level. It easy to setup/maintain an NFS server and can be used to easily store other files on the same infrastructure. NFS doesn’t support snapshots, although administrators can implement snapshots and cloning on an NFS server via qcow2.

iSCSI

iSCSI is an IP-based storage networking standard. It provides block-level access to physically separate storage devices by sending SCSI commands over a Transmission Control Protocol / Internet Protocol (TCP/IP) network. iSCSI enables a storage method independent of physical location and is a good option for private clouds that aren’t going to grow. Expanding an iSCSI storage network generally involves adding another iSCSI server, which may result in a cloud having many separate storage servers rather than a single storage cluster.

iSCSI is widely used, and nearly all storage vendors support it. Open-source iSCSI solutions like OpenMedia Vault are also readily available. This solution is in the open-iscsi package, a standard Debian package that must be installed manually. iSCSI supports the common storage properties content, disable and nodes in addition to the iSCSI-specific properties portal and target. The portal property specifies the IP or DNS name with an optional port, while the target property specifies the iSCSI target.

iSCSI doesn’t have its own management interface, so the best way of implementing it is usually to export a large LUN and setup LVM on top of the LUN. The LVM plugin can then manage the storage for that LUN.

Dedicated Ceph SAN

Ceph may be used to create a dedicated storage area network (SAN) for a private cloud. A SAN allows any storage device accessible by a server to appear as if it’s locally attached to the client. The network of storage devices in this model typically aren’t accessible through a local area network (LAN). A SAN only provides block-level operations, although file systems can be built on top of a SAN. This method of obtaining file-level access with a SAN is known as a shared disk file system.

A SAN has been an attractive cloud storage solution for businesses of all sizes, since their price dropped significantly during the early 2000s. Sharing storage in this manner usually simplifies administration since the storage devices don’t need to be moved from server to another in order to shift storage capacity.

Ceph can be thought of as software RAID across multiple servers. The data is distributed and replicated on drives across the SAN. An entry level Ceph SAN would have three storage nodes with at least 2 drives in each node. A VM image would then be distributed among the drives in all of the storage nodes and there would be a (or multiple) copy of each block on another drives/node. This provides the speed of all the drives working together and the redundancy for the block/file/disk/node. This is easily expandable by adding another drive to a node, or adding another storage node, the SAN could grow as large as you like.

Ceph is fault-tolerant because it uses commodity hardware to replicate data. Furthermore, Ceph doesn’t require any specific hardware support and is self-managing. The SAN is managed by monitor processes that could be run as VMs on the processing nodes within the private cloud.

Sharing Ceph on Processing Nodes

A hybrid way of implementing Ceph in a private cloud, would be to run Ceph on the processing nodes. With a setup like this, you could run your entire cloud on three physical servers and have the distributed processing of you cloud with all of the features/reliability of Ceph . All of your data would be distributed and available on any node, so that you could loose one physical server without an outage. This would be a cost-effective way of building out your private cloud. You would also have flexibility in scaling out your cloud as you could add processing only or Ceph only nodes to this configuration.