Why Data Locality is Irrelevant

Barbara Murphy. December 19, 2017
Why Data Locality is Irrelevant

Can A Shared File System Beat a Local File System?

Simply put, Yes!

I just got through convincing another customer that data locality is irrelevant in 2017, if you have the right shared file system.  He was convinced that running your data set on a local file system is the fastest possible performance you can get. And when networks were slow this was definitely the case, but things have radically changed with the introduction of 10 gigabit networking and high performance flash technology in the datacenter.

Exactly What Is Data Locality

Data locality refers to the proximity of data to its computation to avoid CPU starvation on expensive server infrastructure.  The term became popular this framework was adopted by the developers of Hadoop in 2003.  Local copy architectures were developed when the standard storage medium was a hard disk drive and the network connections was 1 Gbit Ethernet. Now pan forward 15 years, 10 Gbit Ethernet is the predominant network and SSDs cost one third of the price of a 2003 model HDD.  The trouble is the myth of data locality continues while the surrounding infrastructure has changed dramatically.

Modern 10Gbit Networks are 10 times faster than SSD and soon with 100Gbit Ethernet, it will be 100 times faster.  Data locality is irrelevant if you can efficiently spread data across many nodes, interconnected with either high speed Ethernet or InfiniBand.

The Data Locality Myth Has Spread to Modern Workloads

The Local copy paradigm has become popular across a broad set of modern workloads including EDA design, software builds, machine learning, genomic workloads, financial analytics and many more computationally intensive workloads.  GPUs are being adopted widely in high performance and technical compute workloads, and with a price tag of $150K per single node, they can’t afford to be idle for any length of time.  Data is copied from a NAS share into local disk, the workload is processed, the results saved back on the NAS share and the data deleted from the local drive, rinse and repeat.

NFS = Not For Speed

The “Local Disk” workflow, although painful, has proven a better alternative than running data off a shared data set because NFS, the de-facto protocol for NAS systems is a serial protocol that was never built for performance. The problems of NFS are beautifully outlined ten years ago in this paper authored by Robin Harris “In short, gigabit Ethernet has already pushed the current NFS protocol to its practical limits.  Now comes 10-gigabit Ethernet – how on earth can today’s NFS scale with that? Short answer: It can’t and it won’t.”

There is a Better way – Fast Parallel Access to Data

NFS provided simplicity but was never built for performance.  Parallel file systems such as IBM GPFS (now SpectrumScale) and Lustre were designed 15 years ago to solve the data access problem but were designed to run on hard disk drives.  They did a phenomenal job parallelizing data access to hundreds or thousands of hard disk drives to get the best sequential parallel access possible, because HDDs work best on highly sequential workloads.  Throw a random workload at this design and performance tanks and network latency is a killer.

The problem is that new workloads in machine learning and analytics have unpredictable and varied file size types which are unsuited for HDD. That is why WekaIO Matrix, the first new parallel file system for NVMe, has proven a game changer. Every line of code has been written from scratch to leverage modern technologies.  It is flash native and delivers native NVMe over Fabrics performance directly to the application clients.  By parallelizing the data I/O across many networked NVMe drives it is capable of providing higher performance to a single GPU client than local disk.

  • Data management is greatly simplified as there is just one data set and it never has to be copied around.
  • Application completion time is faster than can be accomplished on local disk.
  • The data sets can scale to hundreds of petabytes in a single namespace.

 

You may also like:
Worldwide Scale-out File-Based Storage 2019 Vendor Assessment Report
Network Attached Storage Comparison: Isilon vs. Flashblade vs. Weka
What is Cryo-EM? How to Accelerate Cryo-em Data Processing?

Lustre File System

Related Resources

Video
Video

Snapshots and Data Tiering on AWS

Watch Now
Video
Video

CUBEConversation: Liran Zvibel and Stu Miniman

Watch Now
Use Case
Use Case

Machine Learning/AI Use Case

Download