I was taken aback this week when the following link was brought to my attention, asking to respond to the claims made by one of Big Blue’s research scientists. It is in response to our claim that we kick Big Blue’s Scale-out NAS when it comes to file performance on SPEC2014.
My response is to set the record straight by addressing the claims made in the article. I also present a table of the system specifications that can be compared side-by-side (See table at the end of this blog post). WekaIO stands behind its assertions and is happy to validate the results with anyone including Big Blue.
Incorrect Claim 1
“the claims from the WekaIO people are lets say ‘alternative facts’ at best”
No company can get away with a claim citing SPEC2014 that cannot be substantiated. As a member of SPEC, WekaIO has complied fully with all SPEC2014 requirements and the benchmarks referenced in our press release have met all SPEC committee audit and compliance requirements. “Trump” that one Big Blue!
Misleading Claim #2
“the Spectrum Scale results were done on 4 nodes with 2 flash storage devices attached”
Let’s start with the 4 nodes comment – nothing prevents Big Blue from running on 8 nodes or 1000 nodes for that matter. Servers are cheap. However the two flash storage “devices” used in the test are high end Texas Memory Systems arrays that cost over $1 Million. Given that, I think Big Blue could have afforded a few more servers to push the load. More likely explanation is that the 4 servers saturated the back end storage array and it would have taken another $1 million of back end flash arrays to get 2x the performance.
Incorrect Claim #3
“and they compared this to a WekaIO system with 14 times more memory (14TB vs. 1TB)”
The WekaIO software was tested on a hyperconverged system (where customer applications and storage run on shared servers). We tested in the public cloud to highlight how AWS can now offer great scalable file performance on demand. WekaIO did not use 14 times more memory than Big Blue, in fact WekaIO used 40% less (refer to Table to see the direct comparison). The SPEC2014 tests have not formalized a way to address hyperconverged systems and requires you to report the total system memory, which works fine if you are making a dedicated external storage appliance. However, on the hyperconverged solution tested by WekaIO, we used only 5.6% of the memory available (13GBytes/server for a total of 780GBytes). This information is captured in the extensive submission details if you take the time to read it. The Spectrum Scale software is capable of running hyperconverged, feel free to submit results and compete on equal terms.
Misleading Claim #4
“….120 SSDs (vs 64 Flashcore Modules)”
Cry me a Big Blue river. You are comparing 60 x 2 x 320G SATA SSDs that cost about $150 each vs. Flashcore modules that cost $37,400 each. The Flashcore modules tout ultra-low latency and very high IOPS running on Fibre Channel and 40Gbit ethernet networking. So you threw over $1 million in hardware costs at the problem and this is the best you could do? Our tests ran on commodity white box servers in AWS with a shared noisy 10Gbit network.
Misleading Claim #5
“across 15 times more compute nodes (60 vs 4)”
The question that needs to be asked is why would you only use 4 servers if 8 servers could have shown a huge performance increase, servers are cheap? (Refer back to claim #2). I re-iterate that the problem is likely with the back end storage array. The problem with all external storage appliances is that at a certain load the back-end becomes the bottleneck. If you look at how many compute cores was used in the test suite, WekaIO used less cores (120 cores on WekaIO vs a total of 128 cores used by Big Blue). And Big Blue gave itself another break by offloading the RAID 5 calculation to a hardware RAID offload engine whereas WekaIO handled the entire workload doing a 15+2 double parity calculation.
Incorrect Claim #6
“the article claims 1000 builds, while actual submission only delivers 500”
No company can get away with a claim citing SPEC2014 that cannot be substantiated (Repeating point #1). As a member of SPEC, WekaIO has complied fully with all SPEC2014 requirements and met all SPEC committee audit and compliance requests. WekaIO wanted to show the linear scalability of the system by demonstrating 60 nodes with 500 builds and 120 nodes with 1000 builds. We chose not to publish the 120 node, 1000 software build results because of the misleading representation of memory usage on the front summary page. If you would like to see the results of the 1000 build test, refer to the graph below with peak latency less than 3 milliseconds at 1000 builds. This really highlights the power of our hyperconverged storage solution. (Click on graph to enlarge).
Misleading Claim #7
“so they need 14 times more memory and cores and 2 times flash to show twice as many builds at double the response time, I leave this to everybody who understands this facts to judge how great that result really is”
This should be re-written to read “they need 40% less memory, 7% fewer cores, 70% less hardware cost to show twice as many builds. Yes, I agree that these are really great results.
Correct but Very Expensive Claim #8
“Spectrum Scale scales almost linear if you double the nodes, network and storage accordingly so there is no reason to believe we couldn’t easily beat this, its just a matter of assemble the HW in the lab and run the test.”
I have no reason to doubt the scalability of Spectrum Scale, it is a very well proven software suite. SPEC welcomes competition so feel free to resubmit with proof of your claim above. The problem is that to double the number of nodes, network and storage would cost at least $2.5 million in external array hardware. Compare that to purchasing 60 whitebox servers costing less than $350K, with WekaIO using less than 10% of the cluster’s hardware resources (<$35K). That is a tall order to ask IT administrators who are struggling with tight budgets. Because we can run hyperconverged, over 90% of the compute resources are available to the applications meaning that there is no reason to purchase separate storage resources from your compute cluster. Or you can choose to run on-demand in the AWS cloud if your application is bursty. With budgets flat, and application performance demands growing, our hyperconverged solution provides a great alternative to legacy file systems running on expensive storage arrays.
Table of System resources:
|Compute||120 cores||96 cores||Big Blue has 4 compute nodes each with 24 cores, WekaIO used 2 cores per server (60×2)|
|Storage||No external storage array||32 cores||2x E5 V2 8 core processors per Flashsystem 900. The total core count for the Big Blue was 128 (96+32) cores|
780GBytes (of 14TB)
|1,000GBytes||The WekaIO submission states that the memory used was 780 of the available 14TB. Please scroll down the attached link to read the submission detail.|
|Storage Memory||No external storage array||128GBytes||The total Big Blue solution under test had 44% more memory than WekaIO had available (1,128 vs 780 GBytes)|
|RAID||Done by the same cores that ran WekaIO||Done by RAID on Chip on storage platform||The SpectrumScale NSD nodes did not perform RAID processing reducing impact on SPEC testing. The storage array used hardware RAID offload engines.|
|Network||10Gb Ethernet||Dual 40Gb Ethernet per storage appliance||WekaIO ran on a virtual network that was noisy and shared.|
|Intel Chip||2686||2680||Near comparable processor types|
|SSD/Flash||120 SATA SSDs||64 Volumes on TMS Flash||Big Blue uses 47% less, however the storage is ultra-low latency, high performance Texas Memory Systems (TMS) being compared to relatively old SATA drives|
|Cost of Storage System||<$350K to purchase 60 equivalent servers utilizing <10% of server resources||
|The Big Blue Flashsystem 900 storage used in the test cost over $1M. If you were to purchase 60 servers of the type used in the WekaIO submission it would cost 1/3 what Big Blue cost and 90% of the server resources are available to run applications.|