//
archives

ausstorageguy

I am a IT Enterprise Storage Specialist based in Australia, with multi-vendor training and a knack for getting to the truth behind storage. I specialise in and have previously worked for EMC, however; I am also highly skilled and trained in: -NetApp FAS -NetApp E-Series (Formally LSI Enginio) -Dell Compellent and Equalogic -Hitachi USP, VSP and AMS -HP EVA, Lefthand, 3Par (and HDS OEMs) -LSI Engenio OEMs (IBM DS and Sun StorageTek) -TMS RamSAN -IBM XIV As you can imagine, that's alot of training... Thankfully; as a speciallist in storage, I don't have to think about much else. I try (Very hard) to leave any personal/professional attachment to any given product at the door and I have a zero tollerance for FUD. (Fear Uncertainty Doubt) So I beg all vendors commentators to leave the FUD, check the facts and let's just be real about storage. There may be some competetive analysis done on this blog, but I assure you, I will have check, re-check and checked again the information I present. However, should I get it wrong - which, above all else is a much greater tallent - I will correct it as quickly as possible.
ausstorageguy has written 9 posts for ausstorageguy

Why I belive you should stop worrying about SPC-1 benchmarks.

First up, I want to make it very clear that in no way am I questioning the quality of the products mentioned in this post. I have been involved in the implementation of most of them and I believe they are quality arrays.

###

FTC Disclosure: I am NOT employed by any vendor and receive no compensation from any vendor with exception of the following:

· EMC – Various USB Keys, 1 x Iomega HDD, 1 x decent rain jacket, 1 x baseball cap, several t-shirts and polo’s, a few business lunches and dinners (not in relation to this blog), 1 x bottle opener keyring, pens.

· NetApp – Various USB Keys, 1x multi-tool and torch, 1 x baseball cap, several t-shirts and polo’s, a few business lunches and dinners (not in relation to this blog), 1 x bottle opener keyring (just like EMC J ), pens, playing cards, clock.

· HDS – Various USB Keys, 1 x baseball cap, several t-shirts and polo’s, stress ball.

· Compellent and Dell Compellent – Various USB Keys, 1 x baseball cap, several t-shirts and polo’s

· IBM – Various USB Keys, 1 x baseball cap, several t-shirts and polo’s, stress ball (Ironic really).

· HP – Various USB Keys, 1 x baseball cap, several t-shirts and polo’s, brolly, stress ball.

· Most garnered as gifts or prizes at conferences.

I hold no shares in any IT vendor – although donations are graciously encouraged; however no influence shall be garnered from them.

###

Storage performance is one of the many considerations a business should make when deciding on an appropriate storage solution to meet the requirements, but are benchmarks such as the SPC-1 benchmark the best approach?

I have a few issues with the SPC-1 Benchmark, but the biggest problem is the lack of disclosure, not so much in the “Full disclosure” reports that accompany the results, but with the lack of

information available about the benchmark process.

Steve Daniel at NetApp was quoted as saying: “We don’t see customers ask for SPC or SPEC numbers, one of the primary reasons we continue to publish SPEC SFS benchmarks is for historical reasons; we can’t just stop without raising questions.”

Personally, I think questions are good; so long as the answers are better.

In this instance, why continue with a benchmark which is near impossible to translate to a business case?

What is the relevance of the results to you and your business needs?

You see, there is little to no detail about the process: what its baseline is or the metrics measured in the benchmark; which, as far as I’m concerned limits the ability of business to discern the relevance of the benchmark to their environment.

I know these metrics, methods and details exist, however, in order to obtain them, you need to sign up as a member, the costs of which range between USD $500-6000, and should you sign up; I’m of the understanding that once signed up, you’re under a Non Disclosure Agreement and therefore cannot divulge the information garnered. (Please, someone correct me on this if I’m wrong)

The “Full disclosure” reports available provide details about the Bill of Materials, the configuration and the results, but don’t define how they came to these results.

Put it this way, when I was many years younger and many more kilo’s lighter, I use to sprint an 11.02 seconds average.

11.02 seconds for what exactly? Without that bit of detail, the 11.02 is irrelevant, was it 10 meters, 100 meter, 110m hurdles, swimming, walking on my hands backwards?

For the record, it was 100m track, but then what were the conditions? Was it wet, dry, windy? Did I have spikes on and was it a grass or synthetic track, up-hill, down-hill or perfectly flat?

None of the conditions in the storage benchmarks are available to the average, non-subscriber; so how does one determine if the measurement and the results are relevant to them or not?

In order to be able to understand how a benchmark may be relevant to your business, you need to know the conditions of which the results were achieved.

What is the relationship between an IOPS and Latency result and how many users and applications the storage array will support in a timely manner?

No two environments and conditions are the same.

Usain “Lightning” Bolt is an amazing man, the fastest man in the world; with amazing 6 Olympic gold medals and a world record 9.58 seconds for the 100m sprint, which makes it look like I did my time push-starting a bus with a gammy leg. He is a real inspiration!

But, could the “Lightning bolt” – as formidable as he is – achieve the same result, running though a packed shopping centre pushing a loaded trolley? He’d be quicker than you or I; I’m certain of that, but no, that’s a different circumstance than a professional running track in an Olympic stadium.

That’s another issue I have with the results; there is very little reflection or comparison to a real-life environment.

Most of the results available are quite often 1-to-1 relationships between the workload generator (WG) and the storage array, occasionally there are a few results using LPARS or VPARS (Partitioned servers) and/or multiple servers, but nothing in terms of server virtualisation; no VMware, no Hyper-V and no XEN clusters, or even groups of servers with workloads similar to yours.

The IOPS results from the SPC-1 benchmark may as well be flat-out, straight-line achievements because there is no way for you to translate x number of IOPS into your environment – your VMware/Exchange servers/MS SQL/MY SQL/Oracle/SAP/PeopleSoft and custom in-house

developed environment.

Can you use the results to determine whether the array will suit Exchange 2003 or Exchange 2010, both with very different workloads?

Additionally, there are no results with remote, synchronous replication involved, none with asynchronous remote replication either for that matter, which would directly affect latency.

And where are the results employing tiering, compression, dedupe and although there are a few results using thin provisioning (only 2-3 I think), there aren’t many, even though almost all vendors support these features. I would imagine that these are the kinds of features most businesses do or would want to use in a real-life workload.

How can the SPC-1 Benchmark reports represent anything akin to your environment when there is no environment alike? Even if you did implement the exact same configuration of storage, storage networking and server/s to your computing requirements and applications; the chances of achieving the same IOPS result would be next to nothing.

Creditability of the results.

How can the SPC have any credibility when they allow an array vendor to directly manipulate results? Why are there no standards for the testing hardware, OS and software, why is there no scrutiny of sponsored competitive tests by the SPC and for that matter, why is there no scrutiny of sponsors own tests and hardware?

I wrote in my (rather short) piece “Dragging up ancient history – How NetApp fooled everyone (FAS3040 v. CX3-40)” about how I believe NetApp engineered the configurations of FAS and the ClarIIon arrays to ensure that the NetApp FAS results looked better; and various people came out of the woodwork, trying to educate me on how to read a benchmark, or how the SPC-1 benchmarks were audited; however, since the workload generators are different in each case, there is no standard to which the comparison can be made.

Different OSes and volume/file managers handle read and write workloads differently, some are more or less tolerant, some cache, some don’t; some switches handle traffic different to another; same goes for HBA’s and CPU’s/Memory and Busses. So if the WG is different each time, then how can there ever be a credible comparison?

When writing my original post, I came across some very interesting remarks in the “Full disclosure” reports, including this:

priv set diagsetflag wafl_downgrade_target 0setflag wafl_optimize_write_once 0

Which NetApp wrote in http://media.netapp.com/documents/tr-3647.pdf:

“4. Final Remarks This paper provides a number of tips and techniques for configuring NetApp systems for high performance. Most of these techniques are straightforward and well known. Using special flags to tune performance represents a benchmark-oriented compromise on our part. These flags can be used to deliver performance improvements to customers whose understanding of their workload ensures that they will use them appropriately during both the testing and deployment phases of NetApp FAS arrays. Future versions of Data ONTAP will be more self-tuning, so the flags will no longer be required.

And the storage performance council wrote in their guidelines http://www.storageperformance.org/specs/SPC-1_v1.11.pdf:

0.2 General GuidelinesThe purpose of SPC benchmarks is to provide objective, relevant, and verifiable data to purchasers of I/O subsystems. To that end, SPC specifications require that benchmark tests be implemented with system platforms and products that:…3. Are relevant to the market segment that SPC-1 benchmark represents.In addition, all SPC benchmark results are required to be sponsored by a distinctly identifiable entity, which is referred to as the Test Sponsor. The Test Sponsor is responsible for the submission of all required SPC benchmark results and materials. The Test Sponsor is responsible for the completeness, accuracy, and authenticity of those submitted results and materials as attested to in the required Letter of Good Faith (see Appendix D). A Test Sponsor is not required to be a SPC member and may be an individual, company, or organization.The use of new systems, products, technologies (hardware or software) and pricing is encouraged so long as they meet the requirements above. Specifically prohibited are benchmark systems, products, pricing (hereafter referred to as “implementations”) whose primary purpose is performance optimization of SPC benchmark results without any corresponding applicability to real-world applications and environments. In other words, all “benchmark specials,” implementations that improve benchmark results but not general, realworld performance are prohibited.

I’m not intentionally picking on NetApp here, I just happened to have that example handy. Sorry NetApp folk.

How can the council maintain creditability when the system is cheated? The flags were noted in the “Full disclosure” report, but the actual use case was not?

My Issue here is not a question of the council’s credibility – I can see the issue is not the Auditors fault – but the dependency on “Good Faith”; it seems common practice now of vendors to submit their configurations, masked behind complex configurations or scripts, hiding the details from the common folk. This practice of scripts is common practice for a field engineer installing an array; however, it makes very difficult for the average punter to dig deeper to find the truth.

Here’s another example, the HP 3Par configuration below, like the NetApp example above obscures the little detail of configuration for benchmarks, this one was discovered by Nate Amsden of the blog techopsguys:

http://www.techopsguys.com/2011/10/19/linear-scalability/#comments

createcpg -t r1 -rs 120 -sdgs 120g -p -nd $nd cpgfc$ndcreatevv -i $id cpgfc${nd} asu2.${j} 840g;createvlun -f asu2.${j} $((4*nd+i+121)) ${nd}${PORTS[$hba]}

Here you’ll see in red, that I’ve highlighted where HP have pinned the volumes to controller pairs and LU’s (LUNs) to ports on the controllers – This is not the typical configuration that HP will sell you.

If this configuration, HP are bypassing the interconnect mesh; which, by HP’s own description says:

“The interconnect is optimized to deliver low latency, high-bandwidth communication and data movement between Controller Nodes through dedicated, point-to-point links and a low overhead protocol which features rapid inter-node messaging and acknowledgement.”

If that’s the case – and I know it is – then why not use it?

Who uses only 55% of their configured capacity?

The SPC-1 benchmark specifies that the unused capacity be less than 45% of the Physical Storage Capacity:

Source: http://www.storageperformance.org/specs/SPC-1_v1.11.pdf

In the old days of short stroking disks this may have been relevant, but with modern storage arrays with flash/SSD caches and storage tiering, there should no longer be much of a need.

Now, maybe it’s a case of my customer base and that of my colleagues as well, but in this economic climate most customers’ storage arrays tend to have a very high capacity utilisation; the lowest I’ve encountered recently was close to 70% utilisation with the expected remaining capacity to be used in the next 3-6 months.

I’d imagine that most businesses would be the same.

The other side is that, this is tested against the configured capacity, not the total potential capacity of the array as available at the time of the test.

Most storage arrays performance degrades as the capacity utilisation increases and it’s not always linear, some might run in a fairly straight line, others might drop-off at 90% others not until 100%, so why not test it to the end?

Why not also benchmark the full capability of the array?

The trouble with disks is, the closer you get to the centre, the lower the IOPS; to use the entire radius of the disk end to end is full stroking; SNIA define short-stroking as: “A technique known as short-stroking a disk limits the drive’s capacity by using a subset of the available tracks, typically the drive’s outer tracks.

Looking arround the various submissions, I can see that there are many cases of under utilisation of the storage arrays tested and it makes me wonder – what would happen to the results if the used the lot, and then tested it to full capacity.

Shouldn’t they be realistic configurations?

Many of the vendors submitting arrays for benchmarks state that the configurations are as bought by real customers; however, I have a problem with that.

Just because they may have been bought, it does not make such configurations a “Commonly bought configuration”.

Take a look at the configuration of this HP EVA P6500 for example:

Who exactly spends $130,000 USD for a HP EVA 6500 with only 8x200GB SSDs delivering only 515GB of capacity and only achieving 20,003 IOPS? – It’s just not realistic.

This isn’t the only example; there are dozens of such examples just like the one above.

However, in reality, I just don’t see how these configurations are justifiable.

Hitachi VSP submitted 1st November 2011

So, what should a benchmark look like?

I would never proclaim to know all the answers, however, to begin with; one which can be easily translated into real-life examples:

· MS SQL, MY SQL, Oracle etc.

· MS Exchange 2003/2010

· SAP, PeopleSoft

· VMware

· File Servers (or as a NAS)

All tested in 100-250, 251-500, 501-1,000, 1,000-4,000, 4,001-10,000 and 10,001-25,000 user configurations.

Real translatable workloads; which a decision maker can utilise to determine if, not only the array vendors and products suit their needs, but also the configurations as well.

Advertisements

How I added headphone monitoring to my Canon 5D Mark II

In the last few conferences I’ve attended, I was asked by a few of you how I got headphones to work with my Canon 5D Mark II, when it seemed like there was no other components so, as promised, here’s a quick write up.

Shown here with my 5D are:

CustomSLR Split Strap

Rode VideoMic Pro

Sennheiser HD280 Pro Monitoring Headphones

Fiio E6 Headphone Amplifier

I’m running Canon firmware 2.1.2 (current as of Oct 2012) and Magic Lantern v2.3.

Note: Should you follow my guide, all care, but absolutely no responsibility shall be taken. This is for educational purposes only.

 

 

Well, first up, I went and got myself a Fiio E6.
It’s a
highly regarded, very portable headphone amplifier, measuring 41w x 40l x 9d mm (about 1.6″ square x 0.4″) and weighs a paltry 16grams (about 0.5oz), that only costs about $30/£20/€25.

This little amp is brilliant, both for its size and cost, it was cheap enough for me to butcher it for my needs without Aus Storage Wife having to worry.


But the first thing about it that didn’t bode well was the clip, it’s on the wrong angle, it’s pretty easy to break (although Fiio do include a spare in the box) and it’s not as secure as I’d like attached to my camera:


The second thing was that the buttons and switches rattled, which when recording with a fairly sensitive microphone, isn’t what you want.

The last was that it didn’t include a cable which would plug straight into the cameras AV jack.

So, here’s what I did to solve those problems:

First task was to crack the little blighter open, a fairly easy task with the side of a razor blade, working the edges slowly, it’s only a little “super glue” like substance around the edges and came apart without much force.

Don’t get me wrong, I don’t think it’d fall apart from normal use, but it didn’t strain me. Cute huh? I wonder what it wants be when it grows up?



On the right showing the battery (golden gumstick), the PCB with mini-usb on the left and the 2 x 1/8″(3.5mm) input and output jacks.

 

Next up was to breakout my trusty old soldering iron and very carefully remove the SMT 3.5mm INPUT jack (please note, I’m a check twice kinda guy, I almost took of the output jack, but checked twice)

You will note how close the soldering pads for the jacks are to the other components; I had to file down my smallest soldering tip to a very fine point to do it. Lucky I have spares.

Then it was time to chop up the original Canon AV cable, which Canon were nice enough to make the cable colours Red (Right)/White (Left) Audio and Yellow (Video), as seen on the picture on the right.

Note also that I left the Video Cable (unstripped) well alone, others I seen have cut this short at the plug, but I may want it later, so it stays!

The Ground I twisted together and the red is shorter than the white to match up with the correct solder pads


 

The Canon 5d Mark II AV Cable is configured as such:

(note: it’s a 3.5mm TRRS plug)


Tip = Left audio (white)
Ring 1 = video (yellow)
Ring 2 = common sheild
Sleeve = Right audio (red)

I even got out my multi-meter to check that, before hacking off the RCA connectors!

 

 

 

 

 

 

 

Headphone sockets are typically TRS:

Tip = Left (White)

Ring = Right (Red)

Sleeve = Ground (Black?)

From there, it was soldering time again, I brazed the new cable wires and then soldered them to the pads, you can see my working on paper in the back ground to make sure I don’t stuff it up. Always work to a plan!:


 

Now, you may be wondering why I didn’t just put a normal headphone plug on the end of the cable and avoid all this? Well, cables get lost in quick pack downs, plugged in the wrong way when stressed and a whole host of other reasons, so the simpler the end solution, the better the result.

Make it easy to use, even if it means a little bit of hard work to get there.

 

Check to make sure it still works after soldering, not shown on the other end are some alligator clips from an old MP3 player connected to the modified cable to test and cheep headphones to the other… it works:

Then hot glue the new cable to the PCB.


Then I put a dab of soft RTV silicone on the volume switches and the channel of the power switch to stop the rattling, glued it back together, cable tie to keep the 3 cables together and glued on some black button snaps:

Measured the centres of the button snaps for the strap mounting and marked the strap:



 

Used my old gas powered soldering iron to melt the holes for the new black button snaps (shown with snaps):

Note, I used my soldering iron, because, although the button snap kit came with a whole punch, I find webbing frays over time, the iron melts the webbing making it strong around the hole.



 

And we’re done!:

Shown in the middle with my little Sony IEM’s for those times when I don’t want to stand out like a

EMC VFCache versus IBM XIV Gen3 SSD Caching – Setting Tony Straight

Background:

Way back in February 2012, Tony Pearson over at the IBM Inside Storage Systems blog, wrote a comparison of EMC VFCache (A localised PCI-E SSD based cache) vs. IBM XIV Gen3 SSD Read Cache (A remote SSD cache) and it was such a misinforming comparison, I felt It was important to try and clarify the difference between the two and make a more accurate comparison for the benefit of both the buying public and those at IBM who obviously need to do a bit more research.

Now, I’ve been MIA for a bit and extremely busy with some major life changing events, so I do apologise for both my absence and the haphazard writing here, but I hope it does help to understand the significant differences between the two solutions.

Setting Tony Straight

I don’t have a relative in the film business like Tony Pearson, but I do have a video shop down the road.

It’s often the case that after the release of a rather serious movie, that a sometimes funny / often terrible parody of said movie is released.

FTC Disclosure: I am NOT employed by any vendor and receive no compensation from any vendor with exception of the following:

  • EMC – Various USB Keys, 1 x Iomega HDD, 1 x decent rain jacket, 1 x baseball cap, several t-shirts and polo’s, a few business lunches and dinners (not in relation to this blog), 1 x bottle opener keyring, pens.
  • NetApp – Various USB Keys, 1x multi-tool and torch, 1 x baseball cap, several t-shirts and polo’s, a few business lunches and dinners (not in relation to this blog), 1 x bottle opener keyring (just like EMC J ), pens, playing cards, clock.
  • HDS – Various USB Keys, 1 x baseball cap, several t-shirts and polo’s, stress ball.
  • Compellent and Dell Compellent – Various USB Keys, 1 x baseball cap, several t-shirts and polo’s
  • IBM – Various USB Keys, 1 x baseball cap, several t-shirts and polo’s, stress ball (Ironic really).
  • HP – Various USB Keys, 1 x baseball cap, several t-shirts and polo’s, brolly, stress ball.
  • Most garnered as gifts or prizes at conferences.

Whilst I may sound like a mouth piece for EMC by now, I’m my own man; just out to stop FUD, who ever writes it!

Some of these great examples are:

But the one movie which I want to reference most is: Thank you for smoking – a parody on the role of PR as a whole – because, it provides a great example of how Tony uses a fallacy known as the “Straw man argument“.

It’s often easier to argue on what someone doesn’t believe than what they do believe. The straw man argument is characterized by a misrepresentation of an opponent’s viewpoint to make for easier and more eloquent criticism of that opinion.

In the following example from the movie “Thank You for Smoking,” notice how Nick characterizes Joey’s position as “anti-choice” which is absurd and meaningless in the context of their original debate:

Joey:
So, what happens when you’re wrong?
Nick:
Well, Joey, I’m never wrong.
Joey:
But you can’t always be right.
Nick:
Well, if it’s your job to be right, then you’re never wrong.
Joey:
But what if you are wrong?
Nick:
Okay, let’s say that you’re defending chocolate and I’m defending vanilla. Now, if I were to say to you, “Vanilla’s the best flavor ice cream”, you’d say …?
Joey:
“No, chocolate is.”
Nick:
Exactly. But you can’t win that argument. So, I’ll ask you: So you think chocolate is the end-all and be-all of ice cream, do you?
Joey:
It’s the best ice cream; I wouldn’t order any other.
Nick:
Oh. So it’s all chocolate for you, is it?
Joey:
Yes, chocolate is all I need.
Nick:
Well, I need more than chocolate. And for that matter, I need more than vanilla. I believe that we need freedom and choice when it comes to our ice cream, and that, Joey Naylor, that is the definition of liberty.
Joey:
But that’s not what we’re talking about.
Nick:
Ah, but that’s what I’m talking about.
Joey:
But … you didn’t prove that vanilla’s the best.
Nick:
I didn’t have to. I proved that you’re wrong, and if you’re wrong, I’m right.

Rather than go into detail of what a straw man argument is, I’ll let this guy do a better job of explaining it for you:

And it is to this; I wonder if Tony Pearson over at IBM is writing in the style of the great lampoon when he made a comparison between IBM XIV’s latest additions of SSD drive as array based read cache vs. EMC host based VFCache.

You see, the two are only related by the fact that they are FLASH/SSD solutions – however; that’s where the similarities stop, they solve a completely different problems. One creates an extended read cache in the array – away from the application (XIV Gen 2 SSD Cache) and the other creates an extended read cache in the host right next to the application (EMC VFCache).

Now if you want to talk about copy-cat; well Tony, adding flash as a cache to the XIV more than 2 years after EMC did it with the Clariion, that is a copy-cat, except the XIV is only a read cache, EMC Fast Cache is both read and write!

I’ve mentioned before about my view on FUD – I just don’t like it – and Tony’s post had nothing but FUD in the form of a straw man argument “sprinkled liberally” all over it!

But it’s not just Tony with IBM… HP, NetApp and a whole host of others are popping up from the woodwork to proclaim that VFCache is some sort of solution to a non-existent problem that only exists with EMC, but the reality is, all vendors have the same problem but none at present have an alternative of their own to VFCache and no array in existence can solve the problem that VFCache solves – Latency external to the array – and they’ll keep preaching this until that is…. they release their own.

A real comparison of the two architectures – EMC VFCache vs. IBM XIV Gen3 SSD Cache:

In reality, the EMC VFCache and XIV Gen3 SSD Cache have some very minor similarities, but the two solutions are completely different as are the intended purposes and architectures.

Category EMC VFCache IBM XIV Gen3 SSD Caching
Placement: In Host In Array
Physical makeup: PCIe SSD SLC Card(SLC = Single Level Cell 1bit /cell) SATA/SAS SSD MLC drive in a PCI/e interposer slot.(MLC = Multi-level Cell 2bits/cell)
Intended purpose: For applications demanding ultra-low latency such as OLTP, reporting, analytics etc.Host and/or application specific acceleration of data reads localised at the host to bypass external latencies resulting from distance, media and switching. Overall array acceleration or selected system volumes for regular read requests to localise data within a higher speed medium than that of traditional rotating disks and reduce latency of read requests to bypass array disks.
Architecture: In-host installed PCIe SSD card with host driver level filter algorithms to intelligently detect regular read requests and keep the required data as close the application as possible rather than resort to the storage network and it’s latencies. In-Array installed SSD drive with intelligent algorithms to detect regular read requests and reduce the dependency on traditional rotational drives for read.
Method: Intelligent algorithms to detect read requests and create a local copy of read data in the host. Intelligent algorithms to detect read requests and create an array ssd cache to supplement the array’s DRAM based cache.
Benefit: Enables ultra-high speed/low-latency read request delivery to greatly improve response times of time sensitive applications.Data is closer to the application. Enable high-speed read request delivery for the entirety of the data served by the array (or specific volumes).

You see, VFCache solves a problem that no storage array architecture can solve: Latency outside of the array.

Here are the causes of this latency in order:

  • Application
  • OS
  • File system
  • CPU
  • Memory
  • Bus
  • Block Driver
  • Host Bus Adapter (HBA)
  • HBA Media (the electronic to optics conversion)
  • Cable
  • Switch port media
  • Switch ASIC quad
  • Switch ASIC quad (another one if not in the same quad)
  • Switch port media
  • Cable
  • HBA Media (the electronic to optics conversion)
  • Array HBA
  • Array BUS
  • Array CPUs
  • Jibbly bits inside the array code (Programming is a dark art to me)
  • Array Memory
  • Array Internal HBA and media
    • Switching if applicable
  • Array backend Cables
  • Drive Tray switching (Or more CPU/Memory/BUS/Drive Controller if applicable)
  • Drives
  • And back again. (Did I miss anything? – FOBOTS, Routing, dirty media, OSI Layers………..)

    From there on in, it’s up to the array to then deliver the requested information and send it back though the same external path and its corresponding latencies.

Now, let’s look at an example of what VFCache does:

Imagine this rather quick office worker, who suddenly needs a yellow form to fill out, so she rushes off down the hall, into the elevator (lift to the yanks, placard pet to the est cannuks it seems) down, then down the hall to the records department, gets the form from the records keeper and back again, but every time she needs the same form, she does this over and over again:

Now this time, she gets clever and invests in a filing cabinet to store the forms she uses most often locally – Big time saver that:

But say she now needs a green form as well and the yellow on a regular basis, well, she goes off and gets it like the first example, but this time, keeps a copy in the filing cabinet like the yellow form:

Now let’s replace this clever girl, forms, filing cabinet and records department with a data centre and it’d look like this:

Here is an example of how a read request is served in a typical storage environment:

Here’s what happens when VFCache is introduced and serves a read request already known:

And this time, when a new read request is made that is not in cache, but soon will be:

So, we now understand how EMC VFCache works, shall we take a look at another, improved bodged comparison table?:

Category EMC VFCache IBM XIV Gen3 SSD Caching
Servers supported Selected x86-based models of Cisco UCS, Dell PowerEdge, HP ProLiant DL, and IBM xSeries and System x servers – Pretty much most environments out there! Then the VNX Supports quite a few more than XIV; from my recollection even support for 520bytes a sector version for IBM System Z running z/OS and iSeries running All of these, plus any other blade or rack-optimized server currently supported by XIV Gen3, including Oracle SPARC, HP Titanium, IBM POWER systems, and even IBM System z mainframes running Linux
Operating System support Linux RHEL 5.6 and 5.7, VMware vSphere 4.1 and 5.0, and Windows 2008 x64 and R2 – Yup, pretty much anything which needs acceleration! All of these, plus all the other operating systems supported by XIV Gen3, including AIX, IBM i, Solaris, HP-UX, and Mac OS X
Protocol support FCP (With iSCSI, FCoE and more to come I’m sure) FCP and iSCSI
Vendor-supplied driver required on the server Yes, the VFCache driver must be installed to use this feature. No, IBM XIV Gen3 uses native OS-based multi-pathing drivers, not quite as good as the multipath IO drivers from EMC, HDS and Symantec.
Works with a variety of storage solutions from many vendors Yes, VFCache is QUALIFIED with EMC storage at present, but will work with almost all FC storage No, You need an XIV GEN 3 to use SSD cache
External disk storage systems required None, it appears the VFCache has no direct interaction with the back-end disk array, so in theory the benefits are the same whether you use this VFCache card in front of EMC storage or IBM storage XIV Gen3 is required, as the SSD slots are not available on older models of IBM XIV.
Ability to provide data read requests in less than 100 microseconds (< 100μs) latency Yes!!! No, XIV Gen3 is subject to all the same old delivery issues listed earlier.
Able to provide read cached data without introduced latency of interconnection Yes, application > OS > BUS > VFCache No
Able to accelerate VMware guests with ultra-low latency and eliminate read bottle necks in storage networking Yes!!! No, XIV Gen3 is subject to all the same old delivery issues listed earlier, just like any other array.
Ability to support multiple arrays Yes No, you stick the SSD in a XIV Gen3, it’s limited to that array.
Can use higher speed array disks such as 10/15k or even SSD when not in cache Yes, when not in cache, it can use high speed array disks for consistent performance No, if it’s not in cache, you’re stuck with 7.2k  RPM SAS (no faster than what’s in your typical desktop)
Support for multiple servers Yes, put ‘em in as many servers as you want An advantage of the XIV Gen3 SSD caching approach is that the cache can be dynamically allocated to the busiest data from any server or servers. (No difference to that of almost any other array that offers SSD caching)
Support for active/active server clusters Not yet…… but the VNX is, just like it’s designed to be – Tony, this is a localised cache. Yes!
Sequential-access detection Yes, back at the array where it’s designed to be for sequential access; not cache! And the VNX is not crippled by sequential access like the XIV due to only being able to use 7.2k drives vs. The VNX which is able to use 15k and 10k drives as well as 7.2k Yes! XIV algorithms detect sequential access and avoid polluting the SSD with these blocks of data.
Number of SSD supported One, and that’s all you should need, it’s an in-host cache! Oh and add to that, EMC FAST Cache can provide you up to 2TB of array based cache – Still need more than that, EMC VNX can support even more SSD’s as real drives, heck, they’ve even got a full array of nothing but SSD; the EMC VNX 5500-F Only 6 to 15 (one per XIV module).
Pin data in SSD cache Yes, using split-card mode, you can designate a portion of the 300GB to serve as Direct-attached storage (DAS). All data written to the DAS portion will be kept in SSD. However, since only one card is supported per server and the data is unprotected, this should only be used for ephemeral data like logs and temp files. No, there is no option to designate an XIV Gen3 volume to be SSD-only. Consider using Fusion-IO PCIe card as a DAS alternative, or another IBM storage system for that requirement.

Personal note: Tony, I loved how you added the Fusion IO bit to the end of your table AFTER my comment……. gota love all that research…..

See what I did there? Anyone can devise one of the tables tilted to their preference. The truth is; there shouldn’t be a comparison table…. They’re two completely different solutions.

Tony’s blog also has this little gem that I found funny:

Sequential-access detection None identified. However, VFCache only caches blocks 64KB or smaller, so any sequential processing with larger blocks will bypass the VFCache. Yes! XIV algorithms detect sequential access and avoid polluting the SSD with these blocks of data.

However, according to IBM Red Book REDP-4842-00, it appears the IBM XIV Gen3 SSD cache also bypasses the SSD Cache for any read larger than 64k….. hmm…. they have another similarity other than being ssd:

3.2.3 Random reads with SSD Caching enabled:

According to IDC, IBM continues to lose market share on the storage side.  On a recent earnings call, IBM announced (again) that storage revenues had declined in an otherwise rudely robust marketplace where everyone else seems to be going.

Tony is an extraordinarily smart guy, he’s an IBM master inventor for peat sake; why are IBM wasting such an intelligent resource on writing such nonsensical misrepresentation disguised as a factual piece?

Could you imagine what IBM’s results would be if instead they used such a talented person; well they wouldn’t be stuck with an 11.4% and falling share in the data storage market vs. EMC’s solid 29% (According to IDC June 2012)

I guess the final movie I’m reminded of is the Simpsons movie with Comic Book Guy’s sarcastic streak – It reminds me of Tony’s rather peppery comments:

Quote:

  1. Did I need to do this? No, but the XIV team asked me nicely to write about this, pretty please, with sugar on top, so I did.
  2. This is FUD. No argument there. For those who can’t find the FUD sprinkled throughout my post, it is the list of factual disappointments in the VFCache announcement, including, but not limited to, (a) that it only works on select server models and operating systems, (b) that it only works with FCP protocol, (c) that customers are limited to only one card per server, and (d) that EMC does not recommend anything other than ephemeral data to be placed on the card in split-card DAS mode, to name a few. I agree that sometimes FUD is difficult to find for some readers, but in this post I consolidated the FUD into an easy to read table, in the first column, highlighted in bright yellow.

    Tony, I’d be more concerned about the factual disappointments in your own post.

     

Best video I could find, sorry.

Anyway, best regards, I hope it helped to clarify things.

Aus Storage Guy!

P.s. Anyone have any flash backs to the 90’s – NetScape, AOL et all – with the animated gifs? J

Thanks Hoosier, nice and concise, Great work – reblogging!

Hoosier Storage Guy

It sure would’ve been nice to see this sooner, but better late than never.  Finally, we get to see the really good stuff that has been in the works for sometime and takes a great product and makes it even better.     This is a key update for any existing EMC VNX customers (though I recommend waiting 1-2 quarters before upgrading) and any new VNX customers.   

The key updates include:

  • Support for mixed RAID types in a storage pool
  • A new Flash 1st auto-tiering policy
  • New RAID templates to support better efficiency – such as changing RAID6 protection scheme from 6+2 to 14+2.
  • In-family data-in-place upgrades – bringing back the capability that existed within Clariion to essentially do a head-swap and grow to the next model. 
  • Windows Branch Cache support for CIFS/SMB file shares
  • Load-balancing and re-balancing within a storage tier
  • VNX Snapshots now provides write-in-place pointer-based snapshots that in their…

View original post 18 more words

5 Minute CIO – Introduction.

I’d like to introduce a series on my blog called “5 Minute CIO”.

5 Minute CIO is going to be an ongoing series aimed at the CIO and IT Manager, discussing data storage and its importance in any organisation, with a bit here and there for the other CxO’s as well.

Data storage is possibly the most important part of an organisation – unlike networks or servers, data cannot be replaced, well not very easily – let’s face it, without storage, (I’m not going to say there’s no data) there’s no electronic data; no application data, no files, databases and no email.

(I wouldn’t cry over that last one personally)

I frequently hear from CIO’s and IT Managers that storage is a bit of an enigma, esoteric even and difficult to understand and I certainly understand that; there just isn’t the focus and information on storage that there is with networks, servers and applications – and it’s grossly undervalued in many companies.

When an application is running slow, the finger is usually pointed at storage. (And that finger is often in right direction)

When it runs out of space, there’s only one place to point that finger – yes, storage.

So, my challenge here is to help in the best way I can – Education – learning and understanding storage as mentioned before can be complex, its often full of jargon flowing from the (often very intelligent) storage experts in a manner which is difficult to understand.

I wish to change that by writing this series on data storage for CIO’s and IT Manager’s in quick and easy to read posts – Let’s face it, CxO’s are busy people –  which will help to explain and demystify storage and help you steer your organisations data storage requirements efficiently.

They say knowledge is power and I plan to arm you with the information you need.

I’ll be writing most of the posts in an increasing degree of complexity, that is to say, I’ll start each post at a high-level and progress in to more detail as the post goes on; which means that you can garner as little or as much information as you need.

Start high – get a bit more technical – end in the nuts and bolts.

You’re a CxO that probably means you are very busy – so that’s why I’ll start high – but sometimes, you need to know a bit more and then there’s those times where you need to be armed with the technical stuff, so it’ll be there when you need it.

I also hope as part of this exercise, that these posts become a reference to anyone with aspirations in the data storage industry as well.

Overall, each “5 Minute CIO” post will be; as the name suggests, aimed at senior personal and coincidently, should be no more than about 5 Minutes to read.

Look out for the “5 Minute CIO” posts in the future and if there’s a subject you wish me to cover that I haven’t covered yet, please leave a comment and I’ll do my best.

Cheerio,

 

Aus Storage Guy.

An excellent post by Aussie Storage Blog on the differences between SATA and SAS.

Aussie Storage Blog

Here are two common statement I often hear from clients:

  1. I don’t just want SAS drives, I also want SATA drives.  SATA drives are cheaper than SAS drives.
  2. Nearline SAS drives are just SATA drives with some sort of converter on them.

So is this right?  Is this the actual situation?

First up, if your storage uses a SAS based controller with a SAS backplane, then normally you can plug SAS drives into that enclosure, or you can plug SATA drives into that enclosure.    This is great because when you plug SATA drives into a SAS backplane, you can actually send SCSI commands to the drive plus you can send native SATA commands t00 (which is  handy when you are writing software for RAID array drivers).

But (and this is a big but) what we do know is that equivalent (size and RPM) SAS drives perform better than SATA drives…

View original post 725 more words

Dragging up ancient history – How NetApp fooled everyone (FAS3040 v. CX3-40)

Foreword:

Back in January 2009, NetApp published a Storage Performance Council SPC-1 comparison between the NetApp FAS3040 and the EMC CX3-40, claiming that NetApp had the superior array with the FAS charming out 30,985.60 IOPS and CX3 trailing with a woeful 24,997.49 IOPS.

NetApp shopped this score to anyone and everyone who would listen, the media, customers, heck even EMC engineers copped an earful of it whenever NetApp had the chance to deliver it.

But, whilst many fell for NetApps shenanigans, there were a few who questioned it, some even fought back – realising NetApp Long stroked the EMC array, yet short stroked the FAS; but most missed the fact that hidden in plain sight was NetApps deceit – they doctored the test.

Now many suspected that, but NetApp would often come prepared (at least they did with me), and would bring printed copies of the EXECUTIVE SUMMARY version of the document to any meetings where they were presenting these “facts”.

Now, we someone presents this kind of data to you in a meeting, you’re normally respectable enough to give them the meeting without you jumping for the laptop to check if this is true, spending the entirety of the meeting with you face buried in the screen looking for any little piece of evidence which would be a saving grace.

They knew that –they knew most (if not all) would not check the facts then and there, and probably never. And even if they did check, most people wouldn’t know the differences.

But a few went out and decided that, given our experiences of EMC Clariion’s that these numbers could not simply be right; we had to look at the report for ourselves. – And sure enough, people appeared from the woodwork crying foul, they’d spotted that NetApp long stroked the EMC from a mile away.

However, so far as I know, nobody really knew the levels of deceit that NetApp did go to.

The following has been a long time in the making and lucking or un-lucky as it may be, I’ve had a recent spout of being jet-lagged and getting up a 4am when my first meeting is at 10am, stuck in hotels far away from family and bored, with just that little chip on my shoulder about what NetApp did.

I had long since ignored it, but recently, one of my regular reads: http://storagewithoutborders.com/ , a known NetApp devotee decided to post a couple of NetApp responses to EMC’s announcements.

Don’t get me wrong, there’s nothing wrong with being an evangelist for your company, but at the very least, respect your competition.

Now I don’t know about you, but I really hate it when someone pisses on somebody else’s parade!

I hate it even more when they use the competitor’s announcement to coat-tail their own announcement, I think it’s crass and offensive.

I think it’s even worse when it’s been commented on by someone who has NO experience of the product their comparing – only being another minaret for their company.

I posted some comments on his blog about this behaviour, but he seemed to think it was ok, acceptable even, I didn’t.

When he responded, his comments were varied, but polite (John is a really nice guy), but he really does drink the NetApp Kool-Aid.

It got my goat, John had commented in his blog about (even 4 years latter) how much better NetApp’s now e-series and FAS was than the EMC CX3-40 –he just had to bring that old chestnut up.

http://storagewithoutborders.com/2011/11/03/breaking-records-revisited/

See below:

SWB> “1. You need to be careful about the way benchmarks are used and interpreted
2. You should present the top line number honestly without resorting to tricks like unrealistic configurations or aggregating performance numbers without a valid point of aggregation.”

Thank you John, I’ll take that into consideration.

So I just wanted to set the scene for what I’m about to point out:

SWB > “The benchmark at the time was with an equivalent array (the 3040) which is now also three generations old, the benchmark and what it was proving at the time remains valid in my opinion. I’d be interested to see a similar side by side submission for a VNX5300 with FAST Cache and a FAS3240 with Flashcache, but this time maybe we should wait until EMC submits their own configuration first.

SWB > “I’m pretty happy with the way NetApp does their bench-marking, and it seems to me that many others abuse the process, which annoys me, so I write about it.”

I understand John, we all want to believe the things closest to us are infallible – we don’t want to believe our kid is the one who bit another kid; our mother is the better cook and the company and/or products we associate ourselves with are the superior products and our companies practices unblemished – but this can cloud our judgment and cause us to look past the flaw’s.

Now, I want to make something perfectly clear, I’m not Pro/Anti NetApp or EMC in any regard; I have work with both for many years.

I will be taking any other vendors to task as and when I see fit, for now however, your blog is one of my regular reads and I know your distain for FUD – You just happened to be the first, and your post contradicted your supposed dislike for FUD and engineering a benchmark.

I want to go back to the FAS 3040/Clariion CX3-40 example as a demonstration of how NetApp DOES NOT (highlight not shouty) play by the rules.

Executive Summary:

Way back in March 2009, NetApp published a comparison of the two products in an attempt to show NetApps superiority as a performance array, the NetApp FAS3040 achieved SPC-1 result of 30,985 IOPS, whilst the EMC CX3-40 a seemingly meager 24,997 SPC-1 IOPS – the EMC left wanting with a horrific 5,988 IOPS behind the NetApp.

On the face of it; it would appear NetApp had a just and fair lead, but this is simply not true – NetApp Engineered the EMC to be pig-slow, and whilst I wasn’t there at the time and can only speculate the intentions drawn in this post – I cannot, in any sense of the word, believe it was not intentional.

When committing a benchmark, it’s important to ensure a Like-for-like configuration – NetApp simply did not do this!

NetApp used:

· Different Hardware for the Workload Generator,

· Different methods for the ASU presentation,

· Short Stroked the NetApp and Long Stroked the EMC,

· Engineered in higher latency equipment and additional hardware and services into the EMC BoM and;

· Falsified the displayed configuration.

My goal here is to show why I place no faith in NetApp’s or any other vendor’s competitive benchmark.

End Executive Summary.

Now for the Nuts and Bolts:

Now John, I know you have a passion for Benchmarks, and to reiterate the first quote in this reply, I will add that you need to be careful to ensure you are not causing undue and unfair differences in the equipment, tools, software and pricing to give an undue competitive advantage.

I know you’ll probably be crying foul by now and stating it was fair and just, but I can prove that it was not – without a doubt – and for the life of me, I cannot believe how any of this was missed and that NetApp got away with it.

I must warn you, this level of detail is normally reserved for the kind of person who wears anoraks and strokes their beard.

I’ll give a breakdown of how NetApp did this by breaking it into sections and their differences:

1. LUN to volume presentations

2. Workload Generator (WG) Hosts

3. HBAs

4. Array Configurations and BoM

5. RAID Group and LUN Configuration

6. Workload Differences

7. Other Differences and issues

So let’s look at these differences in detail (John, when benchmarking, the devil IS in the detail):

1. LUN to volume presentations:

When NetApps Steve Daniels configured the WG’s (Workload Generators) volumes, he stripped 36 LUNs from the Clariion at the host level, and the NetApp:

NetApp FAS3040:Page 63: http://www.storageperformance.org/results/a00057_NetApp_FAS3040_full-disclosure-r1.pdf
The Volumes as striped LUNs by the WG:
The LUNs presented to the WG:
EMC CX3-40:Page 64: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf
The Volumes as striped LUNs by the WG:
The LUNs presented to the WG:~
Note: because the procedure was not as well documented in the NetApp configuration, it makes reading somewhat difficult.

Despite the fact that he must have known full well that the EMC Clariion had the capability to stripe the volumes in-array or to just simply create 1 larger LUN, it was presented in an atrocious layout.

This wouldn’t seem like a big deal, but it’s a huge difference and creates many host and array performance issues – it’s certainly known to anyone with a strong knowledge of storage networking not to do this unless you have no other choice (which he did):

The phenomenon of striping performance loss at the host is well observed here:

http://sqlblog.com/blogs/linchi_shea/archive/2007/03/12/should-i-use-a-windows-striped-volume.aspx

It would seem that NetApp created a greater depth of striping for the EMC array and utilised for no possible technical reason (other than to make the workload as high as possible) small stripes for the EMC broken over the SP’s, thereby negating any possible use of the cache.

Now, I want to make it clear, there is NO technical reason to create so many LUNs from within each RAID Group, only to create performance problems and in my years of working with EMC Clariion and the many Clariion’s I have worked with, I have never seen a Clariion laid out in such a manor.

2. Workload Generator (WG) hosts

At first glance, the WG hosts seem identical IBM X3650 servers; however, NetApp chose to give themselves a competitive advantage by using a Workload Generator which is considerably better spec’d, mostly around the bus.

For ease of viewing, I’ve circled the offending areas in red:

NetApp FAS3040:
The IBM X3650 use for the NetApp WG is:PCIe based.PAGE 16: http://www.storageperformance.org/results/a00057_NetApp_FAS3040_full-disclosure-r1.pdf
PCIe· is full-duplex – It CAN transmit and receive at the same time· is a Serial Interface and is point to point· has a line speed of 2GB/s per 4 lanes (32 PCIe lanes)· is direct to the north-bridge and CPU· two PCIe HBA’s have 2GB/s each for a total of 4GB/s (8 PCIe lanes, 4 lanes each of 32)
EMC:
The IBM X3650 used for the EMC WG is:PCI-X 133MHz basedPAGE 15: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf
PCI-X· is half-duplex – It CANNOT transmit and receive at the same time· is a Parallel bus and relies on arbitration, scheduling and shares the bus bandwidth· has a Line Speed of 1GB/s· is channeled through multiple chipsets and bridges to before interaction at the north-bridge and CPU· two PCI-X HBA’s still have only 1GB/s bus to share

As you can see, the two IBM x3650 servers are different, even though the EMC WG server had more memory and a faster CPU (I can’t answer for the CPU architectures as it’s not listed).

The WG host bus given to the EMC WG was:

· Slower

· Parallel

· Bandwidth Limited

· Higher Latency

· And half-duplex

Anyone with knowledge of networking will understand the implications or full and half duplex and serial vs. parallel is much the same jump in performance as SATA/PATA.

NetApp speak regularly on the benefits of PCIe over PCI-X:

http://partners.netapp.com/go/techontap/fas6070.html

And I quote:

“We have also changed the system interface on NVRAM to PCIe (PCI Express). This eliminates potential bottlenecks that the older PCI-X based slots might introduce.”Howard: PCIe was designed to overcome bandwidth limitation issues with earlier PCI and PCI-X expansion slots.”Naresh: 100 MHz PCI-X slots are 0.8GB/s peak, and x8 PCIe slots are 4GB/s. PCI-X slots at 100 MHz could be shared between two slots, so a couple of fast HBAs could become limited by the PCI-X bandwidth.”Tom: In addition to increased bandwidth, PCIe provides improved RAS features. For example, instead of a shared bus, each link is point to point.”

It seems NetApp used the inferior Workload Generator (WG) for the EMC and the superior WG for NetApp.

Why did they not use the same host? I can only imagine to increase the Total Service Time when measuring the EMC, possibly doubling the response time!

3. HBAs

Again, at first glance, it would seem NetApp use the same Qlogic HBA’s for both tests – but as highlighted before, the two hosts used were different, one PCIe, the other PCI-X.

The same is applied to the HBA’s, NetApp used the faster and unrestricted HBA’s for their configuration and used the slower and restricted HBA’s for the EMC configuration:

NetApp FAS3040:
The HBA given to the NetApp is the QLE2462, which is:· PCIe· Superiority highlighted beforePAGE 16: http://www.storageperformance.org/results/a00057_NetApp_FAS3040_full-disclosure-r1.pdf
PCIe· is full-duplex – It CAN transmit and receive at the same time· is a Serial Interface and is point to point· has a line speed of 2GB/s per 4 lanes (32 PCIe lanes)· is direct to the north-bridge and CPU· two PCIe HBA’s have 2GB/s each for a total of 4GB/s (8 PCIe lanes, 4 lanes each of 32)
EMC CX3-40:
The HBA given to the EMC is the QLA2462, which is:· PCI-X· The HBA is 266 MHz but limited to 133Mhz because of the hostPAGE 15: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf
PCI-X· is half-duplex – It CANNOT transmit and receive at the same time· is a Parallel bus and relies on arbitration, scheduling and shares the bus bandwidth· has a Line Speed of 1GB/s· is channeled through multiple chipsets and bridges to before interaction at the north-bridge and CPU· two PCI-X HBA’s still have only 1GB/s bus to share

It’s important to note that 1GB/s is PCI-X total bus peek speed which is easily drowned with a 2 port 4Gb/s HBA, let alone 2 of them (as per the BoM and config) – Totalling 2GB/s for both cards, yet only 1GB/s being available.

Whereas PCIe has a maximum throughput of 8GB/s, meaning 2 x PCIe x4 HBA’s would only be using 2GB/s only a quarter of the available host bus bandwidth.

To give half/full duplex and Serial/Parallel some context, imagine two office buildings of the same number of floors (10):· Building Ahas 1 elevator (half-duplex / Parallel – PCIx)o If a person on the ground floor wants to go to level 10, he has to wait until the lift arrives at ground before he can travelo If another person on the ground floor wants to travel to level 5, he has to wait until the lift has completed its travel to the 10th floor and return· Building Bhas 32 elevators (full-duplex / Serial – PCIe)o If a person on the ground floor wants to go to level 10, the elevator is already at the ground floor ready to go.

o If another person on the ground floor wants to travel to level 5, the elevator is already at the ground floor ready to go or has another 30 ready or will be there shortly.

Clearly again, NetApp has engineered a superior WG host for themselves and inferior for EMC.

4. Array Configurations and BoM

Here are the two BoM’s from NetApp and EMC arrays:

NetApp FAS3040:NetApp Page 14: http://www.storageperformance.org/results/a00057_NetApp_FAS3040_full-disclosure-r1.pdf :
EMC CX3-40:EMC – Page 13: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf :

Now here are the interesting titbits, let’s take a look at what I’ve highlighted and why:

Firstly, costs:

– EMC: NetApp included the HBA’s and Switches and multi-pathing as part of the EMC array costs:

1 PP-WN-WG – PPATH WINDOWS WGR EA $1,440 0% $1,440 see attached third party quotation

2 QLA2462-E-SP – 2 PORT 4GB PCI-X EA $1,700 0% $3,400 see attached third party quotation

2 Brocade 16-Port 200e FC Full Fab Switch,-C,R5 EA $8,700 0% $17,400 Network Appliance, Inc.

2 BSWITCH-16PORT-R5 HW Support,Premium,4hr,y mths:36 EA $1,697 0% $3,393 Network Appliance, Inc.

2 BSWITCH-16PORT-R5 SW Subs,Premium,4hr,y mths:36 EA $0 0% $0 Network Appliance, Inc.”

– NetApp: NetApp added the HBA’s and Switches and multi-pathing as the add-on costs

Host Attach Hardware and Software

SW-DSM-MPIO-WINDOWS 1 $0.00 0 $0.00 $0.00

X6518A-R6 Cable,Optical,LC/LC,5M,R6 4 $150.00 0 $150.00 $600.00

X1089A-R6 HBA,QLogic QLE2462,2-Port,4Gb,PCI-e,R6 2 $2,615.00 0 $2,615.00 $5,230.00

SW-DSM-MPIO-WIN Software,Data ONTAP DSM for Windows MPIO 1 $1,000.00 0 $1,000.00 $1,000.00”

Not a big deal there, as the TSC is incorporating the entirety.

– EMC: NetApp included Professional in the EMC costs

1 PS-BAS-PP1 – POWERPATH 1HOST QS EA $1,330 0% $1,330 see attached third party quotation

1 PS-BAS-PMBLK – POWERPATH 1HOST QS EA $1,970 0% $1,970 see attached third party quotation

My side note: (Who the hell needs PS to install PowerPath? And For that matter, who needs a Project management block for 1 host?) (I hope they got their money’s worth!)

– NetApp: There were no included Professional services costs

No wonder the EMC came out more expensive, they put in services which no-body needs, bundled the HBA’s, switching and multi-pathing into the array costs, but didn’t do the same for the NetApp!!!! – Sneaky!

Secondly, Cabling:

– EMC: NetApp included 4 x 8 meter HSSDC2 cables for connection from the array to the first Disk Shelf of each bus with 1m cables from then on:

4 FC2-HSSDC-8M – 8M HSSDC2 to HSSDC2 bus cbl EA $600 0% $2,400 see attached third party quotation

(Added costs in using 8m cables? Yup.)

– NetApp: NetApp included 16 x 0.5 meter HSSDC2 cables for connection from the array to the first disk shelf of each bus and 0.5 meter from then on:

X6530-R6-C Cable,Patch,FC SFP to SFP,0.5M,-C,R6 16 $0.00 0 $0.00 $0.00

Now, this might not seem like a big deal, but 8m cables are the reserve of only very difficult scenarios such as having to stretch many racks to join shelves to the array, it is never used in latency sensitive scenarios and here’s why:

Fibre and copper have similar latencies of 5ns per meter.

For an 8m cable, that translates to 80ns round-trip (the EMC config),

Whereas;

For a 0.5m cable, its 5ns round-trip (.25ns per 0.5 meter) (the NetApp config)

Extend that to a mirrored system with 2 busses that’s 160ns round-trip then add every meter and enclosure after that (up to 0.005ms port-to-port).

Now I want to state again, EMC never use 8m cables except in extreme circumstances and never when low latency is needed!

It’s clear NetApp Engineered the EMC to have a slow a bus as possible when compared to the NetApp!

Thirdly, Bus Layout:

I’m make no representation for the correctness of the NetApp Bus layout, although a little over the time and the fact that NetApp configured it with far more point to point connections for the disk-shelves that was needed (which they would have to boost their performance), it is a configuration which I have seen more than once.

EMC: In the diagram Page 14: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf:

You can clearly see that NetApp show that they use only 11 DAE’s for the Configuration:

Left (bus 0):

1 x Vault Pack with 5 disks

5 x DAE with 75 disks

Right (bus 1):

5 x DAE with 74 disks

But when I look at the RAID Group configuration I see that they use all 12 DAE’s from the BoM, different from the stated configuration:

1 V-CX4014615K – VAULT PACK CX3-40 146GB 15K 4GB DRIVES QTY 5 EA $8,225 0% $8,225 see attached third party quotation

+

11 CX-4PDAE-FD – 4G DAE FIELD INSTALL EA $5,900 0% $64,900 see attached third party quotation

That makes 12 DAE’s – Who cares? You’ll see!

If we look at the configuration scripts used on Page 60: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf

Create Raid Groups:

We see that the first raid group (RG0) Mirror Primary starts at 0_1_0 and the Mirror Secondary starts at 1_3_0

(x_x_x is Bus_Enclosure_Device/Disk):

naviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 createrg 0 0_1_0 1_3_0 0_1_1 1_3_1 0_1_2 1_3_2 0_1_3 1_3_3 0_1_4 1_3_4 0_1_5 1_3_5

And as we go further down, we see the last raid group (RG11) Mirror Primary starts at 0_4_12 and the Mirror Secondary starts at 1_6_12

Then extends into Bus 1 Enclosure 7

(x_x_x is bus_Enclosure_Device/Disk):

naviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 createrg 11 0_4_12 1_6_12 0_4_13 1_6_13 0_5_12 1_7_12 0_5_13 1_7_13 0_1_14 1_3_14 0_2_14 1_4_14

Now I first read that as a typo, why would 0_1_0 be mirrored to 1_3_0 and not 1_0_0 or 1_1_0?

This is what the layout of the Clariion that NetApp setup looked like:

The black boarders represent where the configuration should be and the coloured cells represent where the RAIDGroups are configured with colours matching the raid pair according to the full disclosure.

Hang on a minute, what happened to the 3 x shelves beforehand on bus 1?

(Bus number starts at 0 and continues to 7)

Well, with 12 DAE’s (15 slots per DAE) there are a total of 180 drive slots.

155 disks in total were purchased (150 + 5 in Vault pack)

5 of which are taken by Flare/Vault

There are 12 x 12 disks RAID 1/0 RAIDGroups, so 144 disks used to present as capacity

No mention of how many hot spares were used (only a total of OE and spare capacity), best practice is generally 1:30.

This is what it should look like (forgetting for a minute the fact that NetApp laid it poorly), if enclosures were not jumped:

Because by placing the mirror pair further down the chain, you increase the latency to get to the pair disk, increasing the service time drastically.

There is no reason to do so other than to engineer slowness! No one in their right mind would do so!

NetApp engineered the EMC to have a slow Backend!

5. RAID Group and LUN Configuration

When it came to the RAIDGroup and LUN Layout, this is where it got even worse:

NetApp:NetApp Short-Stroked and created one very large aggregate:Page 62/63: http://www.storageperformance.org/results/a00057_NetApp_FAS3040_full-disclosure-r1.pdf
The following is a diagram showing how the Aggregate for each controller of the NetApp FAS 3040 was configured, to achieve the total capacity presented and unallocated/unused multiply this config x2:Please note: Some of the numbers are estimated, NetApp uses a mix of Base2 and Base10 when presenting and not all numbers were disclosed.I did however; to the best of my ability calculated it as accurately as possible within <5%
Representing the 2 Controllers:
Controller 1 Controller 2
create aggregate with the following configuration:- aggr0 settings, 4 rgs, rg sizes (1×18 + 3×17), 1 spare- aggr0 options:- nosnap=on- set snap reserve = 0 on aggregate aggr0- set snap sched to 0 0 0 on the aggregate aggr0

· spc1 data flexible volume (vol1):

§ create vol1 of size 8493820 MB

· set volume options on vol1:

o nosnap=on

o nosnapdir=off

· set snap reserve = 0 on vol1

· set snap sched to 0 0 0 on vol1

· set space reservation (guarantee) to “none”

Create zeroed luns with no space reservation on each NetApp controller with the following sizes and then map them to the windows igroup created earlier assigning each lun a unique lun id.

o 6 lun files for ASU1 of size 450100 MB each

o 6 lun files for ASU2 of size 450100 MB each

o 6 lun files for ASu3 of size 100022 MB each

Essentially it shows that of the ~19TB available from the raidgroups (after DP capacity loss), the vol1 had ~2,4TB unused/unallocated and that the aggregate aggr0 (or the collective of disks) had a total of ~3.9TB unallocated/unused. Or almost ~35% of the disk capacity per controller (after DP losses) as whitespace.
EMC:NetApp deliberately Long-Stroked the EMC Disks: Page 60: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf
EMC: NetApp it seems, also limited the performance of each EMC RAIDGroup by using only 12 disks per RAID group offering, basically 6 disks mirrored of performance. Eg:naviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 createrg 11 0_4_12 1_6_12 0_4_13 1_6_13 0_5_12 1_7_12 0_5_13 1_7_13 0_1_14 1_3_14 0_2_14 1_4_14
They then Long Stroked each RAIDGroup by having a slice of each broken up into 3 LUNS per RG for a total of 36 LUNS EG:.naviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 bind r1_0 0 -rg 0 -cap 296 -sp anaviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 bind r1_0 1 -rg 0 -cap 296 -sp anaviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 bind r1_0 2 -rg 0 -cap 65 -sp a

Now Typically, we (the storage industry) quote a 15k spindle as being ~180 IOPs per disk avg, but we know that it’s more like ~250 IOPS at the outside and ~100 IOPS at the inside of a disk so the average is about 180 end to end.

NetApp engineered the EMC to utilize all but 23GB of disk when presented to the host>Volume>ASU, essentially using almost all of the capacity of the RAID1/0 RAIDGroups; The only reason is to make sure the EMC utilized almost the full length of the disks or LONG STROKE.

There is no conceivable reason for the EMC to have so many small RaidGroups with little LUNS in them, why not just have a LUN presented form each RAID 1/0 RAIDGroup, heck even stripe inside the array.

NetApp created an aggregate many times larger yet only provisioned ~65% of the capacity of the Aggregates and RAID Groups under them, meaning that NetApp SHORT STROKED their array!

NetApp Per Disk Distribution: EMC Per Disk Distribution:
~65.0% Disk Utilization (Short/Mid Stroke)Free Capacity: 46.53 GB ~82.3% Disk Utilization (High-Long Stroke)Free Capacity: 23.24 GB
Note: To even out the diagram and aid simplicity, I have standardized the two at 133GB which is a balanced breakdown of the two configurations, RAIDGroup striping and other layout methods will layout data slightly differently, but the result is the same.(NetApp place usable with no reserve at 133.2 and EMC at 133.1GB for a 146/144GB drive)Please also note: QD1 IOPS can vary from various disk manufacturers and densities / platter / spindle diameter etc.

When examining the SPC-1 specifications it reveals the following:

http://www.storageperformance.org/specs/SPC-1_v1.11.pdf

2.6.8 SPC-1 defines three ASUs:

– The Data Store (ASU-1) holds raw incoming data for the application system. As the application system processes the data it may temporarily remain in the data store, be transferred to the user store, or be deleted. The workload profile for the Data Store is defined in Clause 3.5.1. ASU-1 will hold 45.0% (+-0.5%) of the total ASU Capacity.

– The User Store (ASU-2) holds information processed by the application system and is stored in a self-consistent, secure, and organized state. The information is principally obtained from the data store, but may also consist of information created by the application or its users in the course of processing. Its workload profile for the User Store is defined in Clause 3.5.2. ASU-2 will hold 45.0% (+-0.5%) of the total ASU Capacity.

– The Log (ASU-3) contains files written by the application system for the purpose of protecting the integrity of data and information the application system maintains in the Data and User stores. The workload profile for the Log is sequential and is defined in Clause 3.5.3. ASU-3 will hold 10.0% (+-0.5%) of the total ASU Capacity.

So, that’s:

o 45.0% for ASU1

o 45.0% for ASU2

o 10.0% for ASU3

By Spreading the benchmark over almost the entire length of the disk for the EMC CX3-40 yet just over half-way for the NetApp is of course give NetApp the advantage and disadvantage the EMC in both IOPs and Latency.

6. Workload differences:

Another interesting dynamic was that NetApp ran different workloads for the FAS3040 vs. the CX3-40.

We already know that by now the test is completely invalid, but what would have been the result if NetApp had have made the test the same?

NetApp:

Page 70 – http://www.storageperformance.org/results/a00057_NetApp_FAS3040_full-disclosure-r1.pdf :

APPENDIX D: SPC-1 WORKLOAD GENERATOR STORAGE COMMANDS

AND PARAMETERS

The content of SPC-1 Workload Generator command and parameter file, used in this

benchmark, is listed below.

javaparms=”-Xmx1024m -Xms512m -Xss256k”

sd=asu1_1,lun=\.E:

sd=asu2_1,lun=\.F:

sd=asu3_1,lun=\.G:

EMC:

APPENDIX D: SPC-1 WORKLOAD GENERATOR STORAGE COMMANDS

AND PARAMETERS

The content of SPC-1 Workload Generator command and parameter file, used in this

benchmark, is listed below.

javaparms=”-Xmx512m”

sd=asu1_1,lun=\.F:

sd=asu2_1,lun=\.H:

sd=asu3_1,lun=\.J:

Page 29 –SPC1 Spec.:

The storage for the SPC-1 workload consists of three Application Storage Units:

􀁸 ASU 1 – Data Store

SPC Benchmark-1􀂥 (SPC-1) Version 1.11 Page 30 of 119

Official Specification Effective – 19 July 2009

􀁸 ASU 2 – User Store

􀁸 ASU 3 – Log/Sequential Write

7. Other differences and issues:

The following are a selection of other issues and differences I found in the full disclosure documents and other locations that I found most interesting, but did not feel they needed to be addressed in as much detail as the rest. Maybe I will spend more time on them later.

NetApp:

On the NetApp system, we found we could improve performance by changing the memory management policy to reflect the fact that most SPC-1 data is not referenced repeatedly. This policy change can be implemented with the following priority settings with Data ONTAP® 7.3:

priority on

priority set enabled_components=cache

priority set volume <volume-name> cache=reuse

The net effect of these commands is to tell the memory system to reuse memory for newer items more aggressively than it would normally. (The enabled_components subcommand is new in Data ONTAP 7.3. If you are using Data ONTAP 7.2 you can skip that command.)

A couple of the things we tuned are still being refined, so they are enabled by the setflag command. In future versions of Data ONTAP either these flags will become options or they will disappear as the system becomes self-tuning for these features.

priv set diag
setflag wafl_downgrade_target 0
setflag wafl_optimize_write_once 0

The “downgrade_target” command changes the priority of a process within Data ONTAP that handles incoming SCSI requests. This process is used by both FC SAN and iSCSI. If your system is not also running NAS workloads, then this priority shift improves response time.

We’re explicitly calling out these settings because, based on our testing, we think they will yield performance benefits for online business application workloads. If you are interested, you can read more about them in a recent NetApp technical report.

System Flags:

o wafl_optimize_write_once: change default value of 1 to 0. This flag affects the initial layout of data within a newly created aggregate. The default data layout favors applications which do not overwrite their data.

o wafl_downgrade_target: change default value of 1 to 0. This flag changes the runtime priority of the process that handles the SCSI protocol for incoming Fibre-Channel requests. For storage systems that are not also servicing NAS requests this change to the process priority is recommended.

Let me read those again:

  •  The default data layout favors applications which do not overwrite their data.
  •  So, that means, that this flag will not optimise the array layout for normal workloads with limited changes like file shares, exchange and sql (without archive of course) but closer to that of HPC data sets. – this seems to be compounded by next flag:
  •  For storage systems that are not also servicing NAS requests this change to the process priority is recommended
  •  Now a bit of rewording and it says – if you are going to use this as a normal SAN only array (no file), then setting this flag will ensure you get decent SAN performance as you won’t be juggling CPU and memory with those functions.

· EMC don’t have this issue, because they build a platform with each component (file, block) being optimised for each function rather than building a General Purpose array that needs to be tuned to specific tasks.

Interestingly, Stephen Daniel was also the author of the NetApp TR whitepaper “Configuring and Tuning NetApp Storage Systems for High-Performance Random-Access Workloads

http://media.netapp.com/documents/tr-3647.pdf

Where he wrote:

“4. Final Remarks This paper provides a number of tips and techniques for configuring NetApp systems for high performance. Most of these techniques are straightforward and well known. Using special flags to tune performance represents a benchmark-oriented compromise on our part. These flags can be used to deliver performance improvements to customers whose understanding of their workload ensures that they will use them appropriately during both the testing and deployment phases of NetApp FAS arrays. Future versions of Data ONTAP will be more self-tuning, so the flags will no longer be required.

Does NetApp consider it normal to be asked to set a parameter that elicits the following response?

“Warning: These diagnostic commands are for use by NetWork Appliance personnel only”.

Clearly not.

Is this not in direct contravention of Storage Performance Council, SPC-1 specification terms and conditions?

Page 13 – http://www.storageperformance.org/specs/SPC-1_v1.11.pdf :

0.2 General Guidelines

The purpose of SPC benchmarks is to provide objective, relevant, and verifiable data to purchasers of I/O subsystems. To that end, SPC specifications require that benchmark tests be implemented with system platforms and products that:

1. Are generally available to users.

2. A significant percentage of the users in the target market segment (server class systems) would implement.

3. Are relevant to the market segment that SPC-1 benchmark represents.

In addition, all SPC benchmark results are required to be sponsored by a distinctly identifiable entity, which is referred to as the Test Sponsor. The Test Sponsor is responsible for the submission of all required SPC benchmark results and materials. The Test Sponsor is responsible for the completeness, accuracy, and authenticity of those submitted results and materials as attested to in the required Letter of Good Faith (see Appendix D). A Test Sponsor is not required to be a SPC member and may be an individual, company, or organization.

The use of new systems, products, technologies (hardware or software) and pricing is encouraged so long as they meet the requirements above. Specifically prohibited are benchmark systems, products, pricing (hereafter referred to as “implementations”) whose primary purpose is performance optimization of SPC benchmark results without any corresponding applicability to real-world applications and environments. In other words, all “benchmark specials,” implementations that improve benchmark results but not general, realworld performance are prohibited.

EMC:

CX3-40 Storage System

The following changes must be made on the CX3-40 storage system:

o Disable write caching on all underlying LUNs used for ASU1 and ASU2. Do not change the default setting of read/write caching for the ASU3 LUNs.

o Set the read policy: low water mark is 30%, high water mark is 50%.

o Set the read caches to 1716 and the write cache to 1300 MB.

Why was the Low and High water marks set so low?

Why not present the cache testing? – I have no doubt if performed so poorly, but this is as a consequence rather than a result.

Closing notes:

Personally, I find what NetApp did here beyond reprehensible, discusting and absurd.

Since March 2009, I have no regarded whatsoever for NetApp’s claims of performance and indignation when questioned or challenged.

Whilst I very much like NetApps products, I have absolutely no faith in NetApp as a company.

I have no trust in NetApp’s claims of their own performance or function and will not accept it until I see it for myself; because NetApp have constantly tried to clamber their way upwards though deceit.

Additionally, due to NetApp, I have no belief that the Storage Performance Council has any merit.

How can the SPC have any credibility when they allow an array vendor to directly manipulate results? Why are there no standards for the testing hardware and software, why is there no scrutiny of sponsored competitive tests by the SPC and for that matter, why is there no scrutiny of sponsors own tests and hardware?

To give some analogies here to what NetApp have done, it’s like:

o Golf bat maker “A” claiming their Iron is better than Golf bat maker “B”’s, but in ‘proving’ so, use theirs to hit a new ball from the fairway, on a tee with a tailwind and using “B”’s with an old ball, from the rough in a headwind.

o Have motorcycle maker “A” testing a similar spec’d “B” bike, putting a 60kg/165cm rider on bike “A” and a 120kg/190cm rider on bike “B”.

You get the idea; not only did NetApp change the variables to suit themselves, but they also modified their own array to run well, whilst configuring the EMC array to perform poorly.

So, how would I have laid it out differently? – I’ll address that at another time.

Rest assured, it would have been very different – one of the many right ways.

Now NetApp may claim that this is a normal environment, yet every best practice guide advises against it, I have never – and nor have any of my colleagues over a collectively very long level of experience – seen a Clariion laid out and configured in such a manour.

If yours even closely resembles 10% of this atrocity, then it is time for a new reseller/integrator or to send your admin on a training course.

Other than that, NetApp should be completely and utterly ashamed of themselves!

That’s it for now, I hope you made it too the end and didn’t succumb to boredom related mortality.

Aus Storage Guy

“Keeping the bastards honest”

Configuring and Tuning NetApp Storage Systems for High-Performance Random-Access Workloads

What is FUD?

As my first real post, I want to introduce you to a term I’m sure you well know – FUD.

What is FUD?

To quote the great Wikipedia: http://en.wikipedia.org/wiki/Fear,_uncertainty_and_doubt

Fear, uncertainty and doubt, frequently abbreviated as FUD, is a tactic used in sales, marketing, public relations,[1][2]politics and propaganda. FUD is generally a strategic attempt to influence public perception by disseminating negative and dubious/false information designed to undermine the credibility of their beliefs.[who?] An individual firm, for example, might use FUD to invite unfavorable opinions and speculation about a competitor’s product; to increase the general estimation of switching costs among current customers; or to maintain leverage over a current business partner who could potentially become a rival.

The term originated to describe disinformation tactics in the computer hardware industry and has since been used more broadly.[3] FUD is a manifestation of the appeal to fear.”

Fear – that everything will go horribly wrong if you use the other guys product.

Uncertainty – that a competitor’s product may not be capable of meeting your requirements.

Doubt – that your possibly making the wrong choice if you go with the other vendor.

A competitive “Land mine” if you will.

FUD is often a misguided attempt by a vendor to unfairly sway you towards their products and away from their competition – Almost every vendor uses FUD to unfairly target their competition and amazingly, almost every vendor is the first to cry foul when a competitor uses FUD against them.

Once in a while, FUD will have a sliver of truth, but the reality is; most FUD is developed from not understanding their competitor’s product, twisting a minor infarction or outright lying.

Most FUD should be taken with an entire salt mine, not just a grain.

Most vendors using FUD either don’t know what they’re talking about or they’re plain telling lies – either way; you will always be best served doing your own research.

I want to give you an example – I work as a multi-vendor storage integrator and on a daily basis work with many vendors and their products, not just talking the talk, but walking the walk; I help sell it (without FUD) and I put it in.

Now, every once in a while, a vendor will come and give a presentation about the virtues of their product; more often than not, they won’t be able to help themselves and they’ll proclaim Vendor B,C,D and E’s products are rubbish. – This normally happens with the new guy who’s not met me yet.

The trouble is; most of the time, I know that product intimately and I proceed to explain to them not to use FUD with me; they’ll of course claim it’s not FUD to which I ask” “What is your personal experience with said product?” – Almost every single time the response has been a very solemn: “None, but our internal competitive analysis program told us that.”.

Now, I don’t intend to be mean to these people, but I don’t appreciate being lied to; I feel they have no respect for themselves, me or their competition; a major rule of business is always respect your competition.

I consider myself a fair kind of person, so I will typically sit down with the vendors representative and show them that what they’re espousing is incorrect, demonstrate that what their company has told them has limited or no factual basis; and most of them appreciate it and cease using this FUD their employer has given them.

Unfortunately, some do not cease and continue to use the exact same FUD, sometimes even blatantly in front of me.

For one particular vendor, this went horrible wrong one day:

It was a pitch to a new customer, it was going well, they liked the features, they liked the value, it would fit the customer’s budget and all was going well.

Then the vendors account manager stood up and started to abuse the competitor, his pre-sales engineer proceeded to back him up and the customer stood up with a raised hand and said:

“Stop there; we have 4 of <competitive vendor>’s products here, I managed it for 2 years before I became the IT manager, my guy’s here manage it now, and the three of us know it well – You are liars and I have no more time for you”

(I paraphrase as the exact words elude me)

That was the end of the meeting, deal lost, customer lost and potentially many more like it, I felt ashamed by association, but couldn’t find the words – I knew the customer was right and wanted to agree with him – but I was so ashamed, angry, full of all sorts of emotions, that I could only walk away in disgust.

I was working for another reseller at that time; this was my first dealing with this particular account manager but not the first with this vendor, before the meeting, I briefed the AM and pre-sales engineer about the customer and warned them not to use FUD.

We often hear the phrase: “There are Lies, Damn Lies and then there’s statistics

Guess what, of the things the Account Manager used benchmark statistics to claim the competitor’s product wouldn’t do X IOPS, the trouble is; the customer was achieving many times more than that – They put it in their RFI for goodness sake. It read something like “our current production SAN environment is operating at “Y” IOPS avg.; any offerings must be capable of meeting or exceeding this capability”.

Vendors will use performance benchmark comparisons as proof of their superiority; usually they will deliberately engineer the competition’s product to be slower than their own and claim it was fair and accurate – That this is how limited the competitor’s product is.

In my experience it is never even remotely accurate – It’s a despicable practice

I plan on showing how these vendors use this kind of tactic in my next post. It’s going to be a stinker!

That’s all for now; but as a parsing recommendation to any customers out there reading this:

  • If a vendor starts claiming that a competitor’s product is inferior, do your own research, ask the competitor to take you to their lab and demonstrate and benchmark for yourself – then buy that one if it suits your needs.
  • If a vendor show’s you competitive statistics, ask this vendor if they were the sponsor of this test, if so, completely disregard their stats and ask for a proof of concept from both sides to show performance, choose which ever meet’s your needs. (I don’t need to tell you this I’m sure.)

Good luck out there, fight the good fight.

Aus Storage Guy.

Introducing Aus Storage Guy

The first post should always be an introduction, right?

So to quote the great Austin Powers: Allow myself to introduce….. myself.

I have been in the IT Industry for over 18 years, still quite fresh by many standards and I’ve long since realised that there is still so much to learn. I’m discovering new and interesting things everyday.

My day job is as a storage integrator in pre-sales, storage architect, implementer and problem solver – working end-to-end to address my customers often complex needs.

Over the years, I’ve been privileged to have been working in, on and around the IT storage industry, which as a niche is very complex, fraught and interesting; it’s often a completely misunderstood and under valued segment of IT but, the reality is that of all the things a company could loose of it’s IT infrastructure, data is irreplaceable; that’s why working in data storage is a challenge that is worthy of the rewards and the difficulties.

My experience in storage goes back to my early days, I entered the industry towards the end of the Mainframe/Mid/Mini computer domination, learning the ropes on big tin before moving over to the decentralised model adopted later in the ’90’s, worked feverishly during the late ’90’s on the year 2000 bug, and watched with amusement the bursting of the tech bubble…. And I shook my head at the detractors who said the Y2k bug was overstated –

I was involved in a great number of simulations of financial systems prior to Y2k which showed disastrous consequences.
(lots of money would have been lost if not corrected, Japanese Power Plants were hit, as were Satellites, someone was charged US$91,250 for having a video that was 100 years overdue, the first child born in Denmark on new years day was registered as being 100 years old at birth, a German man reported he was credited $6m on the 30th December 1899. Telecom Itallia sent out bills dated 1900) – Yes, they really happened.

My experience in mass data storage goes back to those heady days of mainframes and mini’s, configuring high performance (for the time) storage for said mainframes and mini’s; then later with the great decentralisation of compute during the later half of the ’90, I carried this experience over to those NT servers everyone had been adopting – building storage arrays from scratch was the order of the day – and now back into the world of centralisation with VMWare and alike.
(It’s always funny seeing the complete cycle.)

However, I’ve been very fortunate to have been able to focus almost exclusively on data storage for the last decade as it’s presented many great opportunities to delve deeper into the very concepts I had formed and broken in my mainframe days and everyday, I discover new and exciting information constantly.

My training and experience is, luckily enough – multi-vendor – which has given me a very broad perspective on many of the vendors, their arrays and practices and affords me a great insight into the correct selection of an array and it’s needs and for that matter, my customers needs.

Which brings me to my storage experience and knowledge, which has been garnered and refined over the years (in no order of preference or strength):

Netapp

  • FAS
  • E-Series (previously LSI Engenio)

EMC

  • Symmetrix / DMX / VMAX
  • HADA / Clariion / Celerra / VNX
  • Centera
  • Avamar
  • Data Domain
  • and Networker

HP

  • VA / EVA
  • XP (HDS OEM)
  • 3par
  • Data Protector

HDS

  • Thunder / Lightning
  • AMS
  • USP / VSP

Dell

  • Compellent
  • Equalogic

IBM

  • N-Series (NetApp OEM)
  • DS (LSI Engenio OEM)
  • XIV

Brocade / McData

Cisco MDS

TMS RamSan

And a whole host of others – too many to list them all.

As a rule, I despise vendor FUD, because of my multi-vendor experience, it irritates me to no-end when the vendors use it and one of my goals with this blog is to dispel vendor FUD once and for all.
It will be detailed, but I assure you, I will do my utmost to ensure that it’s accurate and founded.

My intention is not to cause angst, only to reveal the truth and anytime I’ve got it wrong, I will be glad to correct myself.

The other goal is to start a series on the virtues of Enterprise Storage, hoping to educate on the importance and differences and to aide my readers in making the most informed decisions possible.

I actively encourage constructive feedback and comments and if there’s something about data storage you want to understand, please feel free to ask.

That’s it for the moment, I hope to bring you more as time allows.

Aus Storage Guy.