//
archives

Archive for

Dragging up ancient history – How NetApp fooled everyone (FAS3040 v. CX3-40)

Foreword:

Back in January 2009, NetApp published a Storage Performance Council SPC-1 comparison between the NetApp FAS3040 and the EMC CX3-40, claiming that NetApp had the superior array with the FAS charming out 30,985.60 IOPS and CX3 trailing with a woeful 24,997.49 IOPS.

NetApp shopped this score to anyone and everyone who would listen, the media, customers, heck even EMC engineers copped an earful of it whenever NetApp had the chance to deliver it.

But, whilst many fell for NetApps shenanigans, there were a few who questioned it, some even fought back – realising NetApp Long stroked the EMC array, yet short stroked the FAS; but most missed the fact that hidden in plain sight was NetApps deceit – they doctored the test.

Now many suspected that, but NetApp would often come prepared (at least they did with me), and would bring printed copies of the EXECUTIVE SUMMARY version of the document to any meetings where they were presenting these “facts”.

Now, we someone presents this kind of data to you in a meeting, you’re normally respectable enough to give them the meeting without you jumping for the laptop to check if this is true, spending the entirety of the meeting with you face buried in the screen looking for any little piece of evidence which would be a saving grace.

They knew that –they knew most (if not all) would not check the facts then and there, and probably never. And even if they did check, most people wouldn’t know the differences.

But a few went out and decided that, given our experiences of EMC Clariion’s that these numbers could not simply be right; we had to look at the report for ourselves. – And sure enough, people appeared from the woodwork crying foul, they’d spotted that NetApp long stroked the EMC from a mile away.

However, so far as I know, nobody really knew the levels of deceit that NetApp did go to.

The following has been a long time in the making and lucking or un-lucky as it may be, I’ve had a recent spout of being jet-lagged and getting up a 4am when my first meeting is at 10am, stuck in hotels far away from family and bored, with just that little chip on my shoulder about what NetApp did.

I had long since ignored it, but recently, one of my regular reads: http://storagewithoutborders.com/ , a known NetApp devotee decided to post a couple of NetApp responses to EMC’s announcements.

Don’t get me wrong, there’s nothing wrong with being an evangelist for your company, but at the very least, respect your competition.

Now I don’t know about you, but I really hate it when someone pisses on somebody else’s parade!

I hate it even more when they use the competitor’s announcement to coat-tail their own announcement, I think it’s crass and offensive.

I think it’s even worse when it’s been commented on by someone who has NO experience of the product their comparing – only being another minaret for their company.

I posted some comments on his blog about this behaviour, but he seemed to think it was ok, acceptable even, I didn’t.

When he responded, his comments were varied, but polite (John is a really nice guy), but he really does drink the NetApp Kool-Aid.

It got my goat, John had commented in his blog about (even 4 years latter) how much better NetApp’s now e-series and FAS was than the EMC CX3-40 –he just had to bring that old chestnut up.

http://storagewithoutborders.com/2011/11/03/breaking-records-revisited/

See below:

SWB> “1. You need to be careful about the way benchmarks are used and interpreted
2. You should present the top line number honestly without resorting to tricks like unrealistic configurations or aggregating performance numbers without a valid point of aggregation.”

Thank you John, I’ll take that into consideration.

So I just wanted to set the scene for what I’m about to point out:

SWB > “The benchmark at the time was with an equivalent array (the 3040) which is now also three generations old, the benchmark and what it was proving at the time remains valid in my opinion. I’d be interested to see a similar side by side submission for a VNX5300 with FAST Cache and a FAS3240 with Flashcache, but this time maybe we should wait until EMC submits their own configuration first.

SWB > “I’m pretty happy with the way NetApp does their bench-marking, and it seems to me that many others abuse the process, which annoys me, so I write about it.”

I understand John, we all want to believe the things closest to us are infallible – we don’t want to believe our kid is the one who bit another kid; our mother is the better cook and the company and/or products we associate ourselves with are the superior products and our companies practices unblemished – but this can cloud our judgment and cause us to look past the flaw’s.

Now, I want to make something perfectly clear, I’m not Pro/Anti NetApp or EMC in any regard; I have work with both for many years.

I will be taking any other vendors to task as and when I see fit, for now however, your blog is one of my regular reads and I know your distain for FUD – You just happened to be the first, and your post contradicted your supposed dislike for FUD and engineering a benchmark.

I want to go back to the FAS 3040/Clariion CX3-40 example as a demonstration of how NetApp DOES NOT (highlight not shouty) play by the rules.

Executive Summary:

Way back in March 2009, NetApp published a comparison of the two products in an attempt to show NetApps superiority as a performance array, the NetApp FAS3040 achieved SPC-1 result of 30,985 IOPS, whilst the EMC CX3-40 a seemingly meager 24,997 SPC-1 IOPS – the EMC left wanting with a horrific 5,988 IOPS behind the NetApp.

On the face of it; it would appear NetApp had a just and fair lead, but this is simply not true – NetApp Engineered the EMC to be pig-slow, and whilst I wasn’t there at the time and can only speculate the intentions drawn in this post – I cannot, in any sense of the word, believe it was not intentional.

When committing a benchmark, it’s important to ensure a Like-for-like configuration – NetApp simply did not do this!

NetApp used:

· Different Hardware for the Workload Generator,

· Different methods for the ASU presentation,

· Short Stroked the NetApp and Long Stroked the EMC,

· Engineered in higher latency equipment and additional hardware and services into the EMC BoM and;

· Falsified the displayed configuration.

My goal here is to show why I place no faith in NetApp’s or any other vendor’s competitive benchmark.

End Executive Summary.

Now for the Nuts and Bolts:

Now John, I know you have a passion for Benchmarks, and to reiterate the first quote in this reply, I will add that you need to be careful to ensure you are not causing undue and unfair differences in the equipment, tools, software and pricing to give an undue competitive advantage.

I know you’ll probably be crying foul by now and stating it was fair and just, but I can prove that it was not – without a doubt – and for the life of me, I cannot believe how any of this was missed and that NetApp got away with it.

I must warn you, this level of detail is normally reserved for the kind of person who wears anoraks and strokes their beard.

I’ll give a breakdown of how NetApp did this by breaking it into sections and their differences:

1. LUN to volume presentations

2. Workload Generator (WG) Hosts

3. HBAs

4. Array Configurations and BoM

5. RAID Group and LUN Configuration

6. Workload Differences

7. Other Differences and issues

So let’s look at these differences in detail (John, when benchmarking, the devil IS in the detail):

1. LUN to volume presentations:

When NetApps Steve Daniels configured the WG’s (Workload Generators) volumes, he stripped 36 LUNs from the Clariion at the host level, and the NetApp:

NetApp FAS3040:Page 63: http://www.storageperformance.org/results/a00057_NetApp_FAS3040_full-disclosure-r1.pdf
The Volumes as striped LUNs by the WG:
The LUNs presented to the WG:
EMC CX3-40:Page 64: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf
The Volumes as striped LUNs by the WG:
The LUNs presented to the WG:~
Note: because the procedure was not as well documented in the NetApp configuration, it makes reading somewhat difficult.

Despite the fact that he must have known full well that the EMC Clariion had the capability to stripe the volumes in-array or to just simply create 1 larger LUN, it was presented in an atrocious layout.

This wouldn’t seem like a big deal, but it’s a huge difference and creates many host and array performance issues – it’s certainly known to anyone with a strong knowledge of storage networking not to do this unless you have no other choice (which he did):

The phenomenon of striping performance loss at the host is well observed here:

http://sqlblog.com/blogs/linchi_shea/archive/2007/03/12/should-i-use-a-windows-striped-volume.aspx

It would seem that NetApp created a greater depth of striping for the EMC array and utilised for no possible technical reason (other than to make the workload as high as possible) small stripes for the EMC broken over the SP’s, thereby negating any possible use of the cache.

Now, I want to make it clear, there is NO technical reason to create so many LUNs from within each RAID Group, only to create performance problems and in my years of working with EMC Clariion and the many Clariion’s I have worked with, I have never seen a Clariion laid out in such a manor.

2. Workload Generator (WG) hosts

At first glance, the WG hosts seem identical IBM X3650 servers; however, NetApp chose to give themselves a competitive advantage by using a Workload Generator which is considerably better spec’d, mostly around the bus.

For ease of viewing, I’ve circled the offending areas in red:

NetApp FAS3040:
The IBM X3650 use for the NetApp WG is:PCIe based.PAGE 16: http://www.storageperformance.org/results/a00057_NetApp_FAS3040_full-disclosure-r1.pdf
PCIe· is full-duplex – It CAN transmit and receive at the same time· is a Serial Interface and is point to point· has a line speed of 2GB/s per 4 lanes (32 PCIe lanes)· is direct to the north-bridge and CPU· two PCIe HBA’s have 2GB/s each for a total of 4GB/s (8 PCIe lanes, 4 lanes each of 32)
EMC:
The IBM X3650 used for the EMC WG is:PCI-X 133MHz basedPAGE 15: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf
PCI-X· is half-duplex – It CANNOT transmit and receive at the same time· is a Parallel bus and relies on arbitration, scheduling and shares the bus bandwidth· has a Line Speed of 1GB/s· is channeled through multiple chipsets and bridges to before interaction at the north-bridge and CPU· two PCI-X HBA’s still have only 1GB/s bus to share

As you can see, the two IBM x3650 servers are different, even though the EMC WG server had more memory and a faster CPU (I can’t answer for the CPU architectures as it’s not listed).

The WG host bus given to the EMC WG was:

· Slower

· Parallel

· Bandwidth Limited

· Higher Latency

· And half-duplex

Anyone with knowledge of networking will understand the implications or full and half duplex and serial vs. parallel is much the same jump in performance as SATA/PATA.

NetApp speak regularly on the benefits of PCIe over PCI-X:

http://partners.netapp.com/go/techontap/fas6070.html

And I quote:

“We have also changed the system interface on NVRAM to PCIe (PCI Express). This eliminates potential bottlenecks that the older PCI-X based slots might introduce.”Howard: PCIe was designed to overcome bandwidth limitation issues with earlier PCI and PCI-X expansion slots.”Naresh: 100 MHz PCI-X slots are 0.8GB/s peak, and x8 PCIe slots are 4GB/s. PCI-X slots at 100 MHz could be shared between two slots, so a couple of fast HBAs could become limited by the PCI-X bandwidth.”Tom: In addition to increased bandwidth, PCIe provides improved RAS features. For example, instead of a shared bus, each link is point to point.”

It seems NetApp used the inferior Workload Generator (WG) for the EMC and the superior WG for NetApp.

Why did they not use the same host? I can only imagine to increase the Total Service Time when measuring the EMC, possibly doubling the response time!

3. HBAs

Again, at first glance, it would seem NetApp use the same Qlogic HBA’s for both tests – but as highlighted before, the two hosts used were different, one PCIe, the other PCI-X.

The same is applied to the HBA’s, NetApp used the faster and unrestricted HBA’s for their configuration and used the slower and restricted HBA’s for the EMC configuration:

NetApp FAS3040:
The HBA given to the NetApp is the QLE2462, which is:· PCIe· Superiority highlighted beforePAGE 16: http://www.storageperformance.org/results/a00057_NetApp_FAS3040_full-disclosure-r1.pdf
PCIe· is full-duplex – It CAN transmit and receive at the same time· is a Serial Interface and is point to point· has a line speed of 2GB/s per 4 lanes (32 PCIe lanes)· is direct to the north-bridge and CPU· two PCIe HBA’s have 2GB/s each for a total of 4GB/s (8 PCIe lanes, 4 lanes each of 32)
EMC CX3-40:
The HBA given to the EMC is the QLA2462, which is:· PCI-X· The HBA is 266 MHz but limited to 133Mhz because of the hostPAGE 15: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf
PCI-X· is half-duplex – It CANNOT transmit and receive at the same time· is a Parallel bus and relies on arbitration, scheduling and shares the bus bandwidth· has a Line Speed of 1GB/s· is channeled through multiple chipsets and bridges to before interaction at the north-bridge and CPU· two PCI-X HBA’s still have only 1GB/s bus to share

It’s important to note that 1GB/s is PCI-X total bus peek speed which is easily drowned with a 2 port 4Gb/s HBA, let alone 2 of them (as per the BoM and config) – Totalling 2GB/s for both cards, yet only 1GB/s being available.

Whereas PCIe has a maximum throughput of 8GB/s, meaning 2 x PCIe x4 HBA’s would only be using 2GB/s only a quarter of the available host bus bandwidth.

To give half/full duplex and Serial/Parallel some context, imagine two office buildings of the same number of floors (10):· Building Ahas 1 elevator (half-duplex / Parallel – PCIx)o If a person on the ground floor wants to go to level 10, he has to wait until the lift arrives at ground before he can travelo If another person on the ground floor wants to travel to level 5, he has to wait until the lift has completed its travel to the 10th floor and return· Building Bhas 32 elevators (full-duplex / Serial – PCIe)o If a person on the ground floor wants to go to level 10, the elevator is already at the ground floor ready to go.

o If another person on the ground floor wants to travel to level 5, the elevator is already at the ground floor ready to go or has another 30 ready or will be there shortly.

Clearly again, NetApp has engineered a superior WG host for themselves and inferior for EMC.

4. Array Configurations and BoM

Here are the two BoM’s from NetApp and EMC arrays:

NetApp FAS3040:NetApp Page 14: http://www.storageperformance.org/results/a00057_NetApp_FAS3040_full-disclosure-r1.pdf :
EMC CX3-40:EMC – Page 13: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf :

Now here are the interesting titbits, let’s take a look at what I’ve highlighted and why:

Firstly, costs:

– EMC: NetApp included the HBA’s and Switches and multi-pathing as part of the EMC array costs:

1 PP-WN-WG – PPATH WINDOWS WGR EA $1,440 0% $1,440 see attached third party quotation

2 QLA2462-E-SP – 2 PORT 4GB PCI-X EA $1,700 0% $3,400 see attached third party quotation

2 Brocade 16-Port 200e FC Full Fab Switch,-C,R5 EA $8,700 0% $17,400 Network Appliance, Inc.

2 BSWITCH-16PORT-R5 HW Support,Premium,4hr,y mths:36 EA $1,697 0% $3,393 Network Appliance, Inc.

2 BSWITCH-16PORT-R5 SW Subs,Premium,4hr,y mths:36 EA $0 0% $0 Network Appliance, Inc.”

– NetApp: NetApp added the HBA’s and Switches and multi-pathing as the add-on costs

Host Attach Hardware and Software

SW-DSM-MPIO-WINDOWS 1 $0.00 0 $0.00 $0.00

X6518A-R6 Cable,Optical,LC/LC,5M,R6 4 $150.00 0 $150.00 $600.00

X1089A-R6 HBA,QLogic QLE2462,2-Port,4Gb,PCI-e,R6 2 $2,615.00 0 $2,615.00 $5,230.00

SW-DSM-MPIO-WIN Software,Data ONTAP DSM for Windows MPIO 1 $1,000.00 0 $1,000.00 $1,000.00”

Not a big deal there, as the TSC is incorporating the entirety.

– EMC: NetApp included Professional in the EMC costs

1 PS-BAS-PP1 – POWERPATH 1HOST QS EA $1,330 0% $1,330 see attached third party quotation

1 PS-BAS-PMBLK – POWERPATH 1HOST QS EA $1,970 0% $1,970 see attached third party quotation

My side note: (Who the hell needs PS to install PowerPath? And For that matter, who needs a Project management block for 1 host?) (I hope they got their money’s worth!)

– NetApp: There were no included Professional services costs

No wonder the EMC came out more expensive, they put in services which no-body needs, bundled the HBA’s, switching and multi-pathing into the array costs, but didn’t do the same for the NetApp!!!! – Sneaky!

Secondly, Cabling:

– EMC: NetApp included 4 x 8 meter HSSDC2 cables for connection from the array to the first Disk Shelf of each bus with 1m cables from then on:

4 FC2-HSSDC-8M – 8M HSSDC2 to HSSDC2 bus cbl EA $600 0% $2,400 see attached third party quotation

(Added costs in using 8m cables? Yup.)

– NetApp: NetApp included 16 x 0.5 meter HSSDC2 cables for connection from the array to the first disk shelf of each bus and 0.5 meter from then on:

X6530-R6-C Cable,Patch,FC SFP to SFP,0.5M,-C,R6 16 $0.00 0 $0.00 $0.00

Now, this might not seem like a big deal, but 8m cables are the reserve of only very difficult scenarios such as having to stretch many racks to join shelves to the array, it is never used in latency sensitive scenarios and here’s why:

Fibre and copper have similar latencies of 5ns per meter.

For an 8m cable, that translates to 80ns round-trip (the EMC config),

Whereas;

For a 0.5m cable, its 5ns round-trip (.25ns per 0.5 meter) (the NetApp config)

Extend that to a mirrored system with 2 busses that’s 160ns round-trip then add every meter and enclosure after that (up to 0.005ms port-to-port).

Now I want to state again, EMC never use 8m cables except in extreme circumstances and never when low latency is needed!

It’s clear NetApp Engineered the EMC to have a slow a bus as possible when compared to the NetApp!

Thirdly, Bus Layout:

I’m make no representation for the correctness of the NetApp Bus layout, although a little over the time and the fact that NetApp configured it with far more point to point connections for the disk-shelves that was needed (which they would have to boost their performance), it is a configuration which I have seen more than once.

EMC: In the diagram Page 14: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf:

You can clearly see that NetApp show that they use only 11 DAE’s for the Configuration:

Left (bus 0):

1 x Vault Pack with 5 disks

5 x DAE with 75 disks

Right (bus 1):

5 x DAE with 74 disks

But when I look at the RAID Group configuration I see that they use all 12 DAE’s from the BoM, different from the stated configuration:

1 V-CX4014615K – VAULT PACK CX3-40 146GB 15K 4GB DRIVES QTY 5 EA $8,225 0% $8,225 see attached third party quotation

+

11 CX-4PDAE-FD – 4G DAE FIELD INSTALL EA $5,900 0% $64,900 see attached third party quotation

That makes 12 DAE’s – Who cares? You’ll see!

If we look at the configuration scripts used on Page 60: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf

Create Raid Groups:

We see that the first raid group (RG0) Mirror Primary starts at 0_1_0 and the Mirror Secondary starts at 1_3_0

(x_x_x is Bus_Enclosure_Device/Disk):

naviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 createrg 0 0_1_0 1_3_0 0_1_1 1_3_1 0_1_2 1_3_2 0_1_3 1_3_3 0_1_4 1_3_4 0_1_5 1_3_5

And as we go further down, we see the last raid group (RG11) Mirror Primary starts at 0_4_12 and the Mirror Secondary starts at 1_6_12

Then extends into Bus 1 Enclosure 7

(x_x_x is bus_Enclosure_Device/Disk):

naviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 createrg 11 0_4_12 1_6_12 0_4_13 1_6_13 0_5_12 1_7_12 0_5_13 1_7_13 0_1_14 1_3_14 0_2_14 1_4_14

Now I first read that as a typo, why would 0_1_0 be mirrored to 1_3_0 and not 1_0_0 or 1_1_0?

This is what the layout of the Clariion that NetApp setup looked like:

The black boarders represent where the configuration should be and the coloured cells represent where the RAIDGroups are configured with colours matching the raid pair according to the full disclosure.

Hang on a minute, what happened to the 3 x shelves beforehand on bus 1?

(Bus number starts at 0 and continues to 7)

Well, with 12 DAE’s (15 slots per DAE) there are a total of 180 drive slots.

155 disks in total were purchased (150 + 5 in Vault pack)

5 of which are taken by Flare/Vault

There are 12 x 12 disks RAID 1/0 RAIDGroups, so 144 disks used to present as capacity

No mention of how many hot spares were used (only a total of OE and spare capacity), best practice is generally 1:30.

This is what it should look like (forgetting for a minute the fact that NetApp laid it poorly), if enclosures were not jumped:

Because by placing the mirror pair further down the chain, you increase the latency to get to the pair disk, increasing the service time drastically.

There is no reason to do so other than to engineer slowness! No one in their right mind would do so!

NetApp engineered the EMC to have a slow Backend!

5. RAID Group and LUN Configuration

When it came to the RAIDGroup and LUN Layout, this is where it got even worse:

NetApp:NetApp Short-Stroked and created one very large aggregate:Page 62/63: http://www.storageperformance.org/results/a00057_NetApp_FAS3040_full-disclosure-r1.pdf
The following is a diagram showing how the Aggregate for each controller of the NetApp FAS 3040 was configured, to achieve the total capacity presented and unallocated/unused multiply this config x2:Please note: Some of the numbers are estimated, NetApp uses a mix of Base2 and Base10 when presenting and not all numbers were disclosed.I did however; to the best of my ability calculated it as accurately as possible within <5%
Representing the 2 Controllers:
Controller 1 Controller 2
create aggregate with the following configuration:- aggr0 settings, 4 rgs, rg sizes (1×18 + 3×17), 1 spare- aggr0 options:- nosnap=on- set snap reserve = 0 on aggregate aggr0- set snap sched to 0 0 0 on the aggregate aggr0

· spc1 data flexible volume (vol1):

§ create vol1 of size 8493820 MB

· set volume options on vol1:

o nosnap=on

o nosnapdir=off

· set snap reserve = 0 on vol1

· set snap sched to 0 0 0 on vol1

· set space reservation (guarantee) to “none”

Create zeroed luns with no space reservation on each NetApp controller with the following sizes and then map them to the windows igroup created earlier assigning each lun a unique lun id.

o 6 lun files for ASU1 of size 450100 MB each

o 6 lun files for ASU2 of size 450100 MB each

o 6 lun files for ASu3 of size 100022 MB each

Essentially it shows that of the ~19TB available from the raidgroups (after DP capacity loss), the vol1 had ~2,4TB unused/unallocated and that the aggregate aggr0 (or the collective of disks) had a total of ~3.9TB unallocated/unused. Or almost ~35% of the disk capacity per controller (after DP losses) as whitespace.
EMC:NetApp deliberately Long-Stroked the EMC Disks: Page 60: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf
EMC: NetApp it seems, also limited the performance of each EMC RAIDGroup by using only 12 disks per RAID group offering, basically 6 disks mirrored of performance. Eg:naviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 createrg 11 0_4_12 1_6_12 0_4_13 1_6_13 0_5_12 1_7_12 0_5_13 1_7_13 0_1_14 1_3_14 0_2_14 1_4_14
They then Long Stroked each RAIDGroup by having a slice of each broken up into 3 LUNS per RG for a total of 36 LUNS EG:.naviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 bind r1_0 0 -rg 0 -cap 296 -sp anaviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 bind r1_0 1 -rg 0 -cap 296 -sp anaviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 bind r1_0 2 -rg 0 -cap 65 -sp a

Now Typically, we (the storage industry) quote a 15k spindle as being ~180 IOPs per disk avg, but we know that it’s more like ~250 IOPS at the outside and ~100 IOPS at the inside of a disk so the average is about 180 end to end.

NetApp engineered the EMC to utilize all but 23GB of disk when presented to the host>Volume>ASU, essentially using almost all of the capacity of the RAID1/0 RAIDGroups; The only reason is to make sure the EMC utilized almost the full length of the disks or LONG STROKE.

There is no conceivable reason for the EMC to have so many small RaidGroups with little LUNS in them, why not just have a LUN presented form each RAID 1/0 RAIDGroup, heck even stripe inside the array.

NetApp created an aggregate many times larger yet only provisioned ~65% of the capacity of the Aggregates and RAID Groups under them, meaning that NetApp SHORT STROKED their array!

NetApp Per Disk Distribution: EMC Per Disk Distribution:
~65.0% Disk Utilization (Short/Mid Stroke)Free Capacity: 46.53 GB ~82.3% Disk Utilization (High-Long Stroke)Free Capacity: 23.24 GB
Note: To even out the diagram and aid simplicity, I have standardized the two at 133GB which is a balanced breakdown of the two configurations, RAIDGroup striping and other layout methods will layout data slightly differently, but the result is the same.(NetApp place usable with no reserve at 133.2 and EMC at 133.1GB for a 146/144GB drive)Please also note: QD1 IOPS can vary from various disk manufacturers and densities / platter / spindle diameter etc.

When examining the SPC-1 specifications it reveals the following:

http://www.storageperformance.org/specs/SPC-1_v1.11.pdf

2.6.8 SPC-1 defines three ASUs:

– The Data Store (ASU-1) holds raw incoming data for the application system. As the application system processes the data it may temporarily remain in the data store, be transferred to the user store, or be deleted. The workload profile for the Data Store is defined in Clause 3.5.1. ASU-1 will hold 45.0% (+-0.5%) of the total ASU Capacity.

– The User Store (ASU-2) holds information processed by the application system and is stored in a self-consistent, secure, and organized state. The information is principally obtained from the data store, but may also consist of information created by the application or its users in the course of processing. Its workload profile for the User Store is defined in Clause 3.5.2. ASU-2 will hold 45.0% (+-0.5%) of the total ASU Capacity.

– The Log (ASU-3) contains files written by the application system for the purpose of protecting the integrity of data and information the application system maintains in the Data and User stores. The workload profile for the Log is sequential and is defined in Clause 3.5.3. ASU-3 will hold 10.0% (+-0.5%) of the total ASU Capacity.

So, that’s:

o 45.0% for ASU1

o 45.0% for ASU2

o 10.0% for ASU3

By Spreading the benchmark over almost the entire length of the disk for the EMC CX3-40 yet just over half-way for the NetApp is of course give NetApp the advantage and disadvantage the EMC in both IOPs and Latency.

6. Workload differences:

Another interesting dynamic was that NetApp ran different workloads for the FAS3040 vs. the CX3-40.

We already know that by now the test is completely invalid, but what would have been the result if NetApp had have made the test the same?

NetApp:

Page 70 – http://www.storageperformance.org/results/a00057_NetApp_FAS3040_full-disclosure-r1.pdf :

APPENDIX D: SPC-1 WORKLOAD GENERATOR STORAGE COMMANDS

AND PARAMETERS

The content of SPC-1 Workload Generator command and parameter file, used in this

benchmark, is listed below.

javaparms=”-Xmx1024m -Xms512m -Xss256k”

sd=asu1_1,lun=\.E:

sd=asu2_1,lun=\.F:

sd=asu3_1,lun=\.G:

EMC:

APPENDIX D: SPC-1 WORKLOAD GENERATOR STORAGE COMMANDS

AND PARAMETERS

The content of SPC-1 Workload Generator command and parameter file, used in this

benchmark, is listed below.

javaparms=”-Xmx512m”

sd=asu1_1,lun=\.F:

sd=asu2_1,lun=\.H:

sd=asu3_1,lun=\.J:

Page 29 –SPC1 Spec.:

The storage for the SPC-1 workload consists of three Application Storage Units:

􀁸 ASU 1 – Data Store

SPC Benchmark-1􀂥 (SPC-1) Version 1.11 Page 30 of 119

Official Specification Effective – 19 July 2009

􀁸 ASU 2 – User Store

􀁸 ASU 3 – Log/Sequential Write

7. Other differences and issues:

The following are a selection of other issues and differences I found in the full disclosure documents and other locations that I found most interesting, but did not feel they needed to be addressed in as much detail as the rest. Maybe I will spend more time on them later.

NetApp:

On the NetApp system, we found we could improve performance by changing the memory management policy to reflect the fact that most SPC-1 data is not referenced repeatedly. This policy change can be implemented with the following priority settings with Data ONTAP® 7.3:

priority on

priority set enabled_components=cache

priority set volume <volume-name> cache=reuse

The net effect of these commands is to tell the memory system to reuse memory for newer items more aggressively than it would normally. (The enabled_components subcommand is new in Data ONTAP 7.3. If you are using Data ONTAP 7.2 you can skip that command.)

A couple of the things we tuned are still being refined, so they are enabled by the setflag command. In future versions of Data ONTAP either these flags will become options or they will disappear as the system becomes self-tuning for these features.

priv set diag
setflag wafl_downgrade_target 0
setflag wafl_optimize_write_once 0

The “downgrade_target” command changes the priority of a process within Data ONTAP that handles incoming SCSI requests. This process is used by both FC SAN and iSCSI. If your system is not also running NAS workloads, then this priority shift improves response time.

We’re explicitly calling out these settings because, based on our testing, we think they will yield performance benefits for online business application workloads. If you are interested, you can read more about them in a recent NetApp technical report.

System Flags:

o wafl_optimize_write_once: change default value of 1 to 0. This flag affects the initial layout of data within a newly created aggregate. The default data layout favors applications which do not overwrite their data.

o wafl_downgrade_target: change default value of 1 to 0. This flag changes the runtime priority of the process that handles the SCSI protocol for incoming Fibre-Channel requests. For storage systems that are not also servicing NAS requests this change to the process priority is recommended.

Let me read those again:

  •  The default data layout favors applications which do not overwrite their data.
  •  So, that means, that this flag will not optimise the array layout for normal workloads with limited changes like file shares, exchange and sql (without archive of course) but closer to that of HPC data sets. – this seems to be compounded by next flag:
  •  For storage systems that are not also servicing NAS requests this change to the process priority is recommended
  •  Now a bit of rewording and it says – if you are going to use this as a normal SAN only array (no file), then setting this flag will ensure you get decent SAN performance as you won’t be juggling CPU and memory with those functions.

· EMC don’t have this issue, because they build a platform with each component (file, block) being optimised for each function rather than building a General Purpose array that needs to be tuned to specific tasks.

Interestingly, Stephen Daniel was also the author of the NetApp TR whitepaper “Configuring and Tuning NetApp Storage Systems for High-Performance Random-Access Workloads

http://media.netapp.com/documents/tr-3647.pdf

Where he wrote:

“4. Final Remarks This paper provides a number of tips and techniques for configuring NetApp systems for high performance. Most of these techniques are straightforward and well known. Using special flags to tune performance represents a benchmark-oriented compromise on our part. These flags can be used to deliver performance improvements to customers whose understanding of their workload ensures that they will use them appropriately during both the testing and deployment phases of NetApp FAS arrays. Future versions of Data ONTAP will be more self-tuning, so the flags will no longer be required.

Does NetApp consider it normal to be asked to set a parameter that elicits the following response?

“Warning: These diagnostic commands are for use by NetWork Appliance personnel only”.

Clearly not.

Is this not in direct contravention of Storage Performance Council, SPC-1 specification terms and conditions?

Page 13 – http://www.storageperformance.org/specs/SPC-1_v1.11.pdf :

0.2 General Guidelines

The purpose of SPC benchmarks is to provide objective, relevant, and verifiable data to purchasers of I/O subsystems. To that end, SPC specifications require that benchmark tests be implemented with system platforms and products that:

1. Are generally available to users.

2. A significant percentage of the users in the target market segment (server class systems) would implement.

3. Are relevant to the market segment that SPC-1 benchmark represents.

In addition, all SPC benchmark results are required to be sponsored by a distinctly identifiable entity, which is referred to as the Test Sponsor. The Test Sponsor is responsible for the submission of all required SPC benchmark results and materials. The Test Sponsor is responsible for the completeness, accuracy, and authenticity of those submitted results and materials as attested to in the required Letter of Good Faith (see Appendix D). A Test Sponsor is not required to be a SPC member and may be an individual, company, or organization.

The use of new systems, products, technologies (hardware or software) and pricing is encouraged so long as they meet the requirements above. Specifically prohibited are benchmark systems, products, pricing (hereafter referred to as “implementations”) whose primary purpose is performance optimization of SPC benchmark results without any corresponding applicability to real-world applications and environments. In other words, all “benchmark specials,” implementations that improve benchmark results but not general, realworld performance are prohibited.

EMC:

CX3-40 Storage System

The following changes must be made on the CX3-40 storage system:

o Disable write caching on all underlying LUNs used for ASU1 and ASU2. Do not change the default setting of read/write caching for the ASU3 LUNs.

o Set the read policy: low water mark is 30%, high water mark is 50%.

o Set the read caches to 1716 and the write cache to 1300 MB.

Why was the Low and High water marks set so low?

Why not present the cache testing? – I have no doubt if performed so poorly, but this is as a consequence rather than a result.

Closing notes:

Personally, I find what NetApp did here beyond reprehensible, discusting and absurd.

Since March 2009, I have no regarded whatsoever for NetApp’s claims of performance and indignation when questioned or challenged.

Whilst I very much like NetApps products, I have absolutely no faith in NetApp as a company.

I have no trust in NetApp’s claims of their own performance or function and will not accept it until I see it for myself; because NetApp have constantly tried to clamber their way upwards though deceit.

Additionally, due to NetApp, I have no belief that the Storage Performance Council has any merit.

How can the SPC have any credibility when they allow an array vendor to directly manipulate results? Why are there no standards for the testing hardware and software, why is there no scrutiny of sponsored competitive tests by the SPC and for that matter, why is there no scrutiny of sponsors own tests and hardware?

To give some analogies here to what NetApp have done, it’s like:

o Golf bat maker “A” claiming their Iron is better than Golf bat maker “B”’s, but in ‘proving’ so, use theirs to hit a new ball from the fairway, on a tee with a tailwind and using “B”’s with an old ball, from the rough in a headwind.

o Have motorcycle maker “A” testing a similar spec’d “B” bike, putting a 60kg/165cm rider on bike “A” and a 120kg/190cm rider on bike “B”.

You get the idea; not only did NetApp change the variables to suit themselves, but they also modified their own array to run well, whilst configuring the EMC array to perform poorly.

So, how would I have laid it out differently? – I’ll address that at another time.

Rest assured, it would have been very different – one of the many right ways.

Now NetApp may claim that this is a normal environment, yet every best practice guide advises against it, I have never – and nor have any of my colleagues over a collectively very long level of experience – seen a Clariion laid out and configured in such a manour.

If yours even closely resembles 10% of this atrocity, then it is time for a new reseller/integrator or to send your admin on a training course.

Other than that, NetApp should be completely and utterly ashamed of themselves!

That’s it for now, I hope you made it too the end and didn’t succumb to boredom related mortality.

Aus Storage Guy

“Keeping the bastards honest”

Configuring and Tuning NetApp Storage Systems for High-Performance Random-Access Workloads

What is FUD?

As my first real post, I want to introduce you to a term I’m sure you well know – FUD.

What is FUD?

To quote the great Wikipedia: http://en.wikipedia.org/wiki/Fear,_uncertainty_and_doubt

Fear, uncertainty and doubt, frequently abbreviated as FUD, is a tactic used in sales, marketing, public relations,[1][2]politics and propaganda. FUD is generally a strategic attempt to influence public perception by disseminating negative and dubious/false information designed to undermine the credibility of their beliefs.[who?] An individual firm, for example, might use FUD to invite unfavorable opinions and speculation about a competitor’s product; to increase the general estimation of switching costs among current customers; or to maintain leverage over a current business partner who could potentially become a rival.

The term originated to describe disinformation tactics in the computer hardware industry and has since been used more broadly.[3] FUD is a manifestation of the appeal to fear.”

Fear – that everything will go horribly wrong if you use the other guys product.

Uncertainty – that a competitor’s product may not be capable of meeting your requirements.

Doubt – that your possibly making the wrong choice if you go with the other vendor.

A competitive “Land mine” if you will.

FUD is often a misguided attempt by a vendor to unfairly sway you towards their products and away from their competition – Almost every vendor uses FUD to unfairly target their competition and amazingly, almost every vendor is the first to cry foul when a competitor uses FUD against them.

Once in a while, FUD will have a sliver of truth, but the reality is; most FUD is developed from not understanding their competitor’s product, twisting a minor infarction or outright lying.

Most FUD should be taken with an entire salt mine, not just a grain.

Most vendors using FUD either don’t know what they’re talking about or they’re plain telling lies – either way; you will always be best served doing your own research.

I want to give you an example – I work as a multi-vendor storage integrator and on a daily basis work with many vendors and their products, not just talking the talk, but walking the walk; I help sell it (without FUD) and I put it in.

Now, every once in a while, a vendor will come and give a presentation about the virtues of their product; more often than not, they won’t be able to help themselves and they’ll proclaim Vendor B,C,D and E’s products are rubbish. – This normally happens with the new guy who’s not met me yet.

The trouble is; most of the time, I know that product intimately and I proceed to explain to them not to use FUD with me; they’ll of course claim it’s not FUD to which I ask” “What is your personal experience with said product?” – Almost every single time the response has been a very solemn: “None, but our internal competitive analysis program told us that.”.

Now, I don’t intend to be mean to these people, but I don’t appreciate being lied to; I feel they have no respect for themselves, me or their competition; a major rule of business is always respect your competition.

I consider myself a fair kind of person, so I will typically sit down with the vendors representative and show them that what they’re espousing is incorrect, demonstrate that what their company has told them has limited or no factual basis; and most of them appreciate it and cease using this FUD their employer has given them.

Unfortunately, some do not cease and continue to use the exact same FUD, sometimes even blatantly in front of me.

For one particular vendor, this went horrible wrong one day:

It was a pitch to a new customer, it was going well, they liked the features, they liked the value, it would fit the customer’s budget and all was going well.

Then the vendors account manager stood up and started to abuse the competitor, his pre-sales engineer proceeded to back him up and the customer stood up with a raised hand and said:

“Stop there; we have 4 of <competitive vendor>’s products here, I managed it for 2 years before I became the IT manager, my guy’s here manage it now, and the three of us know it well – You are liars and I have no more time for you”

(I paraphrase as the exact words elude me)

That was the end of the meeting, deal lost, customer lost and potentially many more like it, I felt ashamed by association, but couldn’t find the words – I knew the customer was right and wanted to agree with him – but I was so ashamed, angry, full of all sorts of emotions, that I could only walk away in disgust.

I was working for another reseller at that time; this was my first dealing with this particular account manager but not the first with this vendor, before the meeting, I briefed the AM and pre-sales engineer about the customer and warned them not to use FUD.

We often hear the phrase: “There are Lies, Damn Lies and then there’s statistics

Guess what, of the things the Account Manager used benchmark statistics to claim the competitor’s product wouldn’t do X IOPS, the trouble is; the customer was achieving many times more than that – They put it in their RFI for goodness sake. It read something like “our current production SAN environment is operating at “Y” IOPS avg.; any offerings must be capable of meeting or exceeding this capability”.

Vendors will use performance benchmark comparisons as proof of their superiority; usually they will deliberately engineer the competition’s product to be slower than their own and claim it was fair and accurate – That this is how limited the competitor’s product is.

In my experience it is never even remotely accurate – It’s a despicable practice

I plan on showing how these vendors use this kind of tactic in my next post. It’s going to be a stinker!

That’s all for now; but as a parsing recommendation to any customers out there reading this:

  • If a vendor starts claiming that a competitor’s product is inferior, do your own research, ask the competitor to take you to their lab and demonstrate and benchmark for yourself – then buy that one if it suits your needs.
  • If a vendor show’s you competitive statistics, ask this vendor if they were the sponsor of this test, if so, completely disregard their stats and ask for a proof of concept from both sides to show performance, choose which ever meet’s your needs. (I don’t need to tell you this I’m sure.)

Good luck out there, fight the good fight.

Aus Storage Guy.

Introducing Aus Storage Guy

The first post should always be an introduction, right?

So to quote the great Austin Powers: Allow myself to introduce….. myself.

I have been in the IT Industry for over 18 years, still quite fresh by many standards and I’ve long since realised that there is still so much to learn. I’m discovering new and interesting things everyday.

My day job is as a storage integrator in pre-sales, storage architect, implementer and problem solver – working end-to-end to address my customers often complex needs.

Over the years, I’ve been privileged to have been working in, on and around the IT storage industry, which as a niche is very complex, fraught and interesting; it’s often a completely misunderstood and under valued segment of IT but, the reality is that of all the things a company could loose of it’s IT infrastructure, data is irreplaceable; that’s why working in data storage is a challenge that is worthy of the rewards and the difficulties.

My experience in storage goes back to my early days, I entered the industry towards the end of the Mainframe/Mid/Mini computer domination, learning the ropes on big tin before moving over to the decentralised model adopted later in the ’90’s, worked feverishly during the late ’90’s on the year 2000 bug, and watched with amusement the bursting of the tech bubble…. And I shook my head at the detractors who said the Y2k bug was overstated –

I was involved in a great number of simulations of financial systems prior to Y2k which showed disastrous consequences.
(lots of money would have been lost if not corrected, Japanese Power Plants were hit, as were Satellites, someone was charged US$91,250 for having a video that was 100 years overdue, the first child born in Denmark on new years day was registered as being 100 years old at birth, a German man reported he was credited $6m on the 30th December 1899. Telecom Itallia sent out bills dated 1900) – Yes, they really happened.

My experience in mass data storage goes back to those heady days of mainframes and mini’s, configuring high performance (for the time) storage for said mainframes and mini’s; then later with the great decentralisation of compute during the later half of the ’90, I carried this experience over to those NT servers everyone had been adopting – building storage arrays from scratch was the order of the day – and now back into the world of centralisation with VMWare and alike.
(It’s always funny seeing the complete cycle.)

However, I’ve been very fortunate to have been able to focus almost exclusively on data storage for the last decade as it’s presented many great opportunities to delve deeper into the very concepts I had formed and broken in my mainframe days and everyday, I discover new and exciting information constantly.

My training and experience is, luckily enough – multi-vendor – which has given me a very broad perspective on many of the vendors, their arrays and practices and affords me a great insight into the correct selection of an array and it’s needs and for that matter, my customers needs.

Which brings me to my storage experience and knowledge, which has been garnered and refined over the years (in no order of preference or strength):

Netapp

  • FAS
  • E-Series (previously LSI Engenio)

EMC

  • Symmetrix / DMX / VMAX
  • HADA / Clariion / Celerra / VNX
  • Centera
  • Avamar
  • Data Domain
  • and Networker

HP

  • VA / EVA
  • XP (HDS OEM)
  • 3par
  • Data Protector

HDS

  • Thunder / Lightning
  • AMS
  • USP / VSP

Dell

  • Compellent
  • Equalogic

IBM

  • N-Series (NetApp OEM)
  • DS (LSI Engenio OEM)
  • XIV

Brocade / McData

Cisco MDS

TMS RamSan

And a whole host of others – too many to list them all.

As a rule, I despise vendor FUD, because of my multi-vendor experience, it irritates me to no-end when the vendors use it and one of my goals with this blog is to dispel vendor FUD once and for all.
It will be detailed, but I assure you, I will do my utmost to ensure that it’s accurate and founded.

My intention is not to cause angst, only to reveal the truth and anytime I’ve got it wrong, I will be glad to correct myself.

The other goal is to start a series on the virtues of Enterprise Storage, hoping to educate on the importance and differences and to aide my readers in making the most informed decisions possible.

I actively encourage constructive feedback and comments and if there’s something about data storage you want to understand, please feel free to ask.

That’s it for the moment, I hope to bring you more as time allows.

Aus Storage Guy.