weka blog detail third level banner


May 8, 2017
David Hiatt

The Internet of Things (IoT) continues to gain significant momentum, requiring IT to put into place parallel file system storage technologies capable of cost-effectively handling huge volumes of data and providing the small-file performance necessary to capture real-time data streams. Placing solid-state disks (SSDs) directly into individual servers in a shared-nothing architecture will become a nightmare at scale. Even worse is the possibility that if a server fails, data may be lost forever.

The answer is to configure these servers into a cluster and aggregate the SSDs into a single scale-out storage system. This spreads the data across the nodes to balance SSD utilization and provides protection from failure of either a single SSD or an entire node. As additional storage nodes are added, not only does the capacity go up to handle IoT data, but each storage node brings with it additional network bandwidth and processor power to ensure there are no traffic jams.

To ensure that data flows freely amongst the storage nodes, the network infrastructure must be able to handle the additional traffic. In the early days of Hadoop when the back-end storage was based on hard disk drives, Gigabit Ethernet performance was adequate. That is no longer the case. Many enterprises have already gone to 10GbE for their server connections and added multiple network interface cards (NICs) for redundancy and performance. This is a minimum design requirement for hyperconverged storage systems (HCI). To ensure adequate network performance in such environments, some enterprises will design their networks to support port speeds as high as 100GbE.

The additional processor and memory that comes with each new node added to the system allows applications such as real-time analytics and big data analysis to get the performance they need to keep up with the expectations of business owners and customers alike. In addition, by using industry standard servers for these storage nodes, applications can run right on the storage nodes where the data resides for dramatic improvements in response time.

Technology never stands still, and in order to overcome the challenges created by IoT initiatives, IT departments are finding that scale-out storage systems based on HCI architectures and the latest generation of SSDs are a great answer.

Coupling these scale-out solutions with tiering functionality allows these SSD equipped servers to run Hadoop workloads very effectively on standard servers, eliminating the special hardware requirements needed to support Hadoop. This will allow organizations to share their unified hardware across different applications more effectively, dramatically improving IT resource utilization.