Soooo, What’s a “Converged Infrastructure” Exactly?

I have been looking at a lot of stuff relating to the notion of a converged infrastructure. The essential idea is this: The legacy platform, i.e. a single-image, physically booted server, typically attached to a SAN, is hopelessly outdated. This design is essentially 30 years old, and does not take into account many interesting technology changes, notably:

  • Virtualization
  • Deduplication
  • Flash storage, including both PCIe-based flash and SSDs

Thus, the converged infrastructure platform looks dramatically different from the legacy platform. Essentially, a converged infrastructure consists of a large cluster. A clustered I/O subsystem is extended across all of the nodes in this cluster, and all I/O to this file system is reflected on all nodes. This is either implemented in block or file (NFS), depending on the vendor involved. All of the storage hardware is direct attached. No SAN, in other words. Thus, a converged infrastructure is designed for data center consolidation, similar to the virtualization hypervisor market. Some of the converged infrastructure vendors run natively on a hypervisor, and some do not. More on this later.

The converged infrastructure market is crowded, that’s for sure. It would be difficult in a blog like this to cover all of them, so I will focus on three:

And, of course, my focus is on the Oracle space. As usual, I think in terms of running Oracle as efficiently and inexpensively as possible. Also, in terms of hypervisors, I will only address VMware. (I have no technical exposure to Hyper-V or KVM.)

Starting with Nutanix, the architecture is best described in the Nutanix Bible by Steven Poitras. I have slightly reworked one of his graphics to reflect my bias, again Oracle on VMware:

 

nutanix architecture

 

Essentially, all of Nutanix’s IP runs in a VM, referred to as the CVM. The hypervisor and the CVM boot off of a small partition. Then the CVM connects to the other nodes in the Nutanix cluster, and assembles a clustered file system, which is published out as an NFS export. The hypervisor then mounts this export as an NFS datastore. From there, all user space VMs (including Oracle VMs) are booted off of .vmdk files on the NFS datastore. All block-level I/O (other than boot) is handled by the CVM via the virtualized SCSI controller.

Again, this is from the VMware perspective. The architectures for other hypervisors are different. But I digress.

Moving to Simplivity, my source for their technical stuff would be the OmniCube Technical Deep Dive. (At 10 pages, this deep dive is not quite so deep as I would prefer, for sure.) The Simplivity architecture is very similar to Nutanix, in most respects except one: Simplivity adds a piece of hardware which they call the OmniCube Accelerator Card (OAC). Otherwise, the diagram for Simplivity looks just like Nutanix:

simplivity architecture

Again, all of Simplivity’s IP runs in the OVC VM, other than the OAC itself, of course. The OAC is a custom-built PCIe card, which among other things acts as the I/O controller. Like Nutanix, the OVC exports an NFS mount, which ESXi mounts as an NFS datastore. From there, all user space I/O, including Oracle, runs through the .vmdk layer within the hypervisor.

Now, looking at ScaleiO, the architecture is dramatically different from either Nutanix or Simplivity. First of all, Nutanix and Simplivity are both hardware platforms, complete with custom-built hardware. ScaleiO is a piece of software. It is designed to be layered on top of a legacy platform, and provide a slick, easy path to a converged infrastructure. Specifically, ScaleiO does not require the use of a hypervisor, and thus can run in a physically-booted context. This is one of ScaleiO’s main advantages over hypervisor-based converged platforms like Nutanix and Simpivity.

ScaleiO consists of two major components: The ScaleiO Data Client (SDC) and the ScaleiO Data Server (SDS). In a Linux context (again, the only OS I care about deeply), both the SDS and the SDC are implemented as kernel loadable modules, similar to device drivers. The SDS manages the local storage hardware, connects with other nodes to the ScaleiO cluster, performs cache coherency, etc. The SDC then connects to the SDS, which appears to it as a SCSI target. Thus, the SDS publishes storage objects to the SDC, which the local OS sees as LUNs. From there, the local OS simply performs normal I/O.

The stack diagram for a physically-booted ScaleiO cluster node could not be simpler:

scaleio architecture

ScaleiO can also be run in a virtualized context. In this case, predictably, ScaleiO looks very similar to Nutanix or Simplivity, in that it has a controller VM as well, called the ScaleiO VM (SVM). This SVM runs both the SDS and the SDC. All I/O is channeled through the SVM. However, everything in ScaleiO is implemented in a block, rather than file, manner. Thus, the ESXi hypervisor sees a block storage pool which it converts into a VMFS file system, rather than using NFS. (The SVM provides an iSCSI target for this purpose. ) Here is how ScaleiO looks in a virtualized configuration:

scaleio virt architecture

The other interesting thing about ScaleiO is that it allows you to run either in a converged or a diverged manner. Since the client (SDC) and server (SDS) are separate components, you can run them on separate hardware, effectively turning the ScaleiO server cluster into a storage array. See the following graphic for an example (thanks to Kevin Closson for this):

scaleiodivergedmodel

Of course, you can also run ScaleiO in a converged manner, in which case the platform looks very much like Nutanix or Simplivity (with the exceptions noted).

Now, looking at each of these architectures in the context of running Oracle, it appears that ScaleiO has the obvious edge. This is because:

  • Both Nutanix and Simplivity require you to virtualize in order to run on their platform. ScaleiO does not. Even the most ardent proponent of virtualizing Oracle (and I would certainly qualify on that score) would want to maintain the option to run on bare metal if necessary. Adopting a platform which absolutely requires the customer to virtualize all Oracle workloads is probably not going to work for many Oracle customers.
  • The use of a VMware NFS datastore as the primary container for Oracle storage is problematic, especially in an Oracle context. While I was with the EMC Global Solutions Organization, we tested NFS datastores for Oracle datafiles. It had a huge performance impact, relative to either normal NFS (i.e. directly mounted on the guest OS, and using Oracle Direct NFS Client) or a block-based VMFS file system, using conventional SAN storage. There is no reason that this would be any different in a converged context. Think about it. The code path for an I/O on an Oracle ASM diskgroup which is being stored on a .vmdk file which is, in turn, on an NFS datastore requires an enormously long code path. Compare that to ScaleiO, especially in a physically-booted context, where the I/O code path is no more lengthy than using a normal device driver!
  • The diverged approach for ScaleiO is arguably custom-built for Oracle. I have made a pretty good living for the last (approximately) 17 years of my life by understanding one thing: Oracle is expensive, and therefore the Oracle-licensed CPU is the single most expensive piece of hardware in the entire configuration. Offloading any work from the Oracle-licensed CPU onto a non-Oracle licensed CPU is typically a very profitable decision. (Arguably, this is why Oracle database servers adopted storage array technology so widely: By offloading utility operations like snaps, test / dev cloning, data warehouse staging, backup, and the like onto a storage array, the customer preserves the precious Oracle-licensed CPU to do what it does best: Run queries.) Both Nutanix and Simplivity require the customer to run in a converged manner, thus using the Oracle-licensed CPU (on the ESXi host in this case) to run storage I/O operations. That’s wasteful of that precious Oracle-licensed CPU. Thus, it is entirely possible that a fully converged infrastructure may be a poor fit for Oracle, because of simple economics. By enabling a diverged configuration (i.e. looking more like a traditional SAN storage array) ScaleiO neatly optimizes the Oracle CPU.

Of course, it remains to be seen how this all pans out. It’s early. For now, though ScaleiO is looking good to me.

Posted in emc, oracle, storage, Uncategorized | Tagged , , , , , , , , | Leave a comment

Is ExaData an “Appliance”

My boss, Sam Lucido, raised the following question on the Everything Oracle at EMCwebsite:

Appliances such as microwave ovens, refrigerators, iPods, iPads and TVs are excellent examples of the ease-of-use approach. Bringing the inherently complex world of Oracle databases together with the ease-of-use approach of appliances is challenging. By definition if Oracle Exadata is an appliance then its use should be simple, require relatively little maintenance and like a refrigerator do its job which in this case is run databases at extreme performance levels. If Oracle Exadata isn’t an appliance than what is it?

I found this question quite compelling. Remember that I really grew up in a truly appliance-oriented environment (NetApp). See my first blog posts for more information on my background at NetApp. For this reason, I think I understand what an appliance is pretty well.

At NetApp (and during my early days at NetApp, filers were true appliances by any measure), the appliance concept meant that the device was a toaster: One lever to push down, and one knob to turn. That’s it. Plug it in. It works. No step 2.

Cisco really originated the term appliance. The Cisco router replaced the previous form of router, which was typically a UNIX box running routed. As such, the Cisco router pretty much defined the concept of what it means to be a true appliance.

The folks at Cisco made the following argument: We don’t need all of the infrastructure of UNIX to do routing. A UNIX box has to do a lot of things. A router really only has to do one thing: Networking. We could make a dramatically simplified device which would be able to do routing really well, at a much lower cost than a UNIX box.

Based upon this concept, an appliance has the following characteristics:

  • Extremely simple interface. Should be vastly simpler than doing it the non-appliance way. I.e. a Cisco router is vastly simpler than running routed on a UNIX box.
  • A single purpose. The device must be dedicated to doing one thing, but doing it extremely well. Like the way a Cisco router is much better at doing routing than a UNIX box running routed. Or like the way a NetApp filer is much better at doing NFS file serving than a UNIX box running nfsd. You get the idea. By dramatically reducing the number of functions the device performs, you also dramatically reduce the amount of code that must be run on the device. (The original NetApp ONTAP OS was a single-threaded 16 bit OS with only a few 100K of lines of code.) This leads to the next feature of an appliance which is:
  • Vastly reduced cost. The original NetApp filer was about a $5,000 device. An equivalent UNIX box used as an NFS file server ran around $50,000. Similar cost differences existed for Cisco routers vs. UNIX boxes as routers.
  • Transformative technology. An appliance, if it is truly an appliance, becomes the obvious and natural way to do things. Within a very short period of time after introducing the router, Cisco controlled the router market. They completely displaced the previous way of doing routing. The same thing occurred in file serving with NetApp.

By any reasonable measure, Oracle ExaData fails all of these tests:

  • It has as complex an interface as any Oracle database server (which is to say it runs the most complex and expensive piece of software ever written for general purpose use). Certainly not appliance-like.
  • An Oracle ExaData rack contains general purpose compute servers, which can be used to run basically anything you want. You can load any Oracle application on it certainly, and no-one would claim that an Oracle database server is an appliance!
  • Oracle ExaData is manifestly more expensive than a normal, open-systems database server, and vastly more expensive (assuming intelligent management) than using VMware vSphere for virtualizing Oracle database servers.
  • Oracle ExaData is possibly addictive in the Big Blue sense, but it is certainly not a transformative technology in the same way that a Cisco router, iPad, iPhone, or such is.

In terms of an analogy that works, I like to use cars. The two companies in the car business that manufacture appliance cars are Honda and Toyota. The Honda Civic is an appliance car, as is the Toyota Camry. Either one of these cars provides all of the appliance advantages:

  • They have a radically simplified interface. Everything about these cars is designed to make them effortless to operate. Because they are so simple, they are also very reliable and efficient.
  • They are single purpose vehicles. They get you from point a to point b. That’s it. Nothing fancy.
  • They are sold at a very reasonable cost, relative to non-appliance vehicles (such as BMW, or Mercedes for example).
  • Once you have driven a Honda Civic or Toyota Camry, assuming you are an appliance driver (and there are a lot folks who are appliance drivers), these cars are completely addictive. You simply trade one in for the new model once the old one wears out (and they take a long, long time to wear out). I have known folks who have been driving these cars (in various model years) their entire lives.

Using the car analogy, ExaData is definitely not a Honda or a Toyota. It is not even a BMW or a Mercedes. It is a Ferrari. It is a tricked out, high performance machine. It is very fast, no question. It is *&^% expensive though. And it is very, very complex and demanding to drive.

Posted in appliance, emc, exadata, netapp, oracle | Tagged , , , , , | Leave a comment

Oracle Licensing on VMware – no magic

Acknowledgements to Bart Sjerps for this content.

There seems to be a lot of confusion on licensing when customers consider running Oracle databases on VMware. Part of the confusion is caused by Oracle on purpose (classic FUD) by suggesting licensing is more expensive on VMware than on physical servers. The reality couldn’t be more different – I strongly believe that many customers can actually *save* on database licenses by going virtual. But to understand how to achieve this, you need to know a few things – I hope I can clear this up in a short explanation. I will keep the discussion to Oracle database licenses and ignore application/middleware etc. for now.

License models

Customers typically license their basic database by one out of three options:

  1. License by CPU (core) – the more CPU cores, the more licenses are needed. There is a processor core factor depending on the type of CPU and can be 0.25, 0.5, 0.75 or 1.0.
  2. License by named user – the more named users, the more licenses are needed. The amount of CPU’s is not important, neither the amount of total databases. Typically one license pack per 25 users.
  3. Enterprise License – the customer negotiates a contract for the whole company and afterwards can deploy as many databases on as many servers/cpus as he wants.

If a customer uses 2 or 3, then it does not matter if they run virtual or physical. But there are also no license savings possible without re-negotiating their contracts. I don’t want to go as far as suggesting to customers to change their license models so we leave this as-is for now.

In my experience, most enterprise customers use either cpu licensing or enterprise contracts. Some have different licensing methods for different business units. Oracle can be very creative in customer-specific contracts so expect to find a different situation for each individual customer.

But let’s assume CPU licensing for the sake of this discussion.

Maintenance & support

Users typically buy the CPU licenses  but then have to pay maintenance for the time they use the licenses. Yearly maintenance cost is about 25% of license (list price). I have no information on typical discounts. I expect customers to get at least 50% discount off the price list (but only on licenses, not on maintenance AFAIK).

Database Edition and options

The plain database license comes in 3 versions (for servers):

  1. Standard Edition One – Maximum 2 processors, no options allowed. Only used for testing and very small deployments
  2. Standard Edition (SE) – Maximum 4 processors, no options allowed. Only used for smaller sizes and workloads (but stay tuned)
  3. Enterprise Edition (EE) – No limitations and on top of EE, you can have many licensed features. Most customers will use this, at least for production databases

On top of the basic Database license, most customers use a set of options, each requiring additional licenses per CPU. The most common options are:

  • Real Application Clusters (RAC) – allows many servers running the same database (active-active clustering) to allow scale out performance and high availability.
  • Real Application Clusters One Node – same but one database can only run actively on one node. For high availability only.
  • Active Data Guard – remote replication using log shipping. Note that standard Data Guard is free, but Active Data Guard allows the standby database to be opened for read-only purposes and offers some extra features.
  • Partitioning – allows tables to be split up in smaller chunks. Absolutely required when running large databases and no downtime can be tolerated. Eases administration work and offers some performance benefits.
  • Real Application Testing – allows workloads to be recorded and re-played on another database to do performance and functionality testing
  • Advanced Compression – allows database blocks to be compressed – requiring less storage and boosting performance (in most cases).
  • Diagnostics Pack / Tuning pack – provides automated reports. Oracle AWR (Advanced Workload Reports – a performance reporting tool) is part of Tuning Pack.

In my experience, nearly all customers have partitioning. Most customers have tuning/diagnostics pack. Some customers have RAC. Some customers have the other options. There are more options available but these are the most common.

Many customers have 3 or more options – sometimes the options cost more than the base database license – especially if they use RAC they will have most of the other options, too.

Running on a cluster

If a database runs on a cluster, then Oracle assumes the database can make use of any processor in the cluster. This is independent on what kind of cluster is used (so can be MSCS, HP MCSG, Vmware, Oracle RAC, etc).

This is basically the foundation for all FUD and confusion. For example, if you deploy a VMware farm (cluster) of 16 servers, and all virtual machines run all kinds of stuff (file/print, exchange, apps, etc etc) and only one tiny virtual machine in the corner, with only one virtual CPU runs a small Oracle database, you would expect only to pay for one CPU core – but Oracle’s reasoning is that this tiny VM can be dynamically moved (VMotion) to all nodes in the cluster and on any processor. Therefore, all CPU’s have to be fully licensed by Oracle. So in this case, running the single database on a (small) physical server would be cheaper than running on a VM in the farm.

Total cost of the stack

In a typical database server deployment, the cost of the database licensing is far greater than the cost of the hardware + OS licenses combined. I have no hard numbers but I assume the average DB license cost (plus options) is 10 times larger than the cost of the server + OS.

So a $5,000 server would typically require $50,000 on licenses. Then because maintenance is 25% yearly, the total cost of licenses over a 3 to 5 year period is even higher – so for a 5 year TCO the total license cost might be $75,000 (assumption – could also be closer to $100,000 – and no, I didn’t make a mistake with an extra zero, Oracle *really* is this expensive).

Utilization

It is very hard to size a typical Oracle database based application. There are no good methods or calculations to figure out how much CPU power, disk I/O and memory is needed to run a given app. So historically, project teams size their database servers for peak loads, and because they cannot predict how big the peak load is, they double the resources “just in case”. The end result is that most database servers are way oversized in terms of CPU and memory.

Most physical deployed database servers will average on about 10-15% CPU load (or less). However, they will peak to higher loads at certain times, such as monday morning when many users log in, or when month/quarter/year-end batch processing is started, etc.

Then, the utilization numbers can be influenced by other tasks of the processors. Some common causes of “artificially high” CPU loads on database servers:

  • CPU is involved in storage mirroring (i.e. Host Level Mirroring – using Oracel ASM or a Unix volume manager)
  • CPU is involved in file transfers over the IP network
  • Backup (non-serverless, using CPU, Network and I/O bandwidth)
  • Customers run the application server on the same machine driving up CPU load – This can drive up CPU load from 10% to 90% or more !!
  • Same for Middleware and Enterprise Service Buses (Think Oracle BEA, IBM Websphere, SAP Netweaver, etc)
  • A  bunch of monitoring/management agents burn CPU cycles (Tivoli, BMC, HP Openview, CA, etc). Each agent maybe consuming 1% but add it up and you have another 5-10% overhead.
  • Administrators generate database dumps / exports and run their own reports, scripts and tools. They run ad-hoc queries as well that should not be on production.
  • Poorly tuned database servers cause paging and other CPU overhead – hard to diagnose but driving up CPU and I/O significantly.
  • Database admin tasks (table reorganizations, (re)building indexes, converting tablespaces, …)
  • And so on…

All of these cause the processors, expensively licensed for database processing, to do other stuff.

So if a server is running at 15% utilization, then the utilization caused by the database workload itself might only be 10% and the rest caused by other stuff (whether needed or not).

Needless to say that Oracle likes customers to use their expensive licensed CPU’s for other tasks because it forces them to buy additional CPUs sooner and therefore drive their license revenues.

Isn’t life great for an Oracle rep? 😉

Number of databases

Most customers run many databases. For the average enterprise customer that I visit, 100+ databases is a normal number. A big global that I visited runs 3000+ Oracle databases worldwide (and this is only the scope of this specific project team). Imagine the cost of licensing all these databases on all individual servers…

Why so many? Well, customers do not like to share multiple applications on one database (and often this is not even supported). So if you run SAP ERP, Oracle JD Edwards, your own banking app and a few others, they all require their own production database.

For each production database, you might find an acceptance environment, test system, development server, maybe a staging area to load data into the data warehouse, maybe a firefighting environment, a standby for D/R, a training system and so on. Customers will rarely share production environments on the same server (unless virtualized or at least with workload management segregation). Sometimes they share a few databases for non-prod on a server. So for, say, 100 databases, the average customer runs between 30 and 50 (physical) servers.

Power of big numbers

It does not require rocket science to understand that many of these databases do not require peak performance at the same time. A development system typically drives workload during daytime (when developers are coding new application features). A data warehouse runs queries during the day and loads in the evening. For a production system it depends on the business process. An acceptance system might sit idle for weeks and then suddenly peak for a few days preparing for a new version deployment into the live production system. And so on.

So what if you could share resources across databases – without influencing code levels, security, stability and so on?

If that would be possible – you would not size for “peak load times two” anymore. You would size for what you expect and assume an average utilization of, say, 70% over the whole landscape. If one database needs extra horsepower, there is enough available in the landscape.

How much license cost would you save by bringing down the number of CPU’s so that utilization goes up from 10% to 70%?

What would be the effect on power, cooling, floor space, hardware investments, time-to-market?

What would be the business advantage of not limiting production performance of a single server, by whatever was sized during initial deployment? Risk avoidance?

What would be the business advantage of solving future performance issues by just adding the latest and greatest Intel server in the cluster and Vmotion the troubled database over?

Wasn’t this exactly why we started server virtualization in the first place about 8 years ago? And why EMC aquired VMware?

Wouldn’t you think the average Oracle sales rep is scared to death when his customer starts considering to run his databases on a virtual (cloud) platform? Would it make sense for him to drive his customers mad with FUD around licensing, support issues and whatever he can think of to prevent his customers going this way? Even threatening to drop all support if they continue to go in that direction?

If Oracle is scared of losing license revenue, wouldn’t you think there is a huge potential for savings for our customers here?

The journey to the private database cloud

So how should we deal with this?

A few starting points:

  • Oracle supports VMware. Period. Any other claim of Oracle reps can be taken with a grain of salt (to be more specific: it’s nonsense).
  • Oracle does NOT certify VMware. Then again, Oracle does not certify anything except their own hard- and software. But IMO, support is all you need and the discussion around certification leads nowhere.
  • Oracle might ask the customer to recreate issues on a physical server if they suspect problems with the hypervisor. Isn’t it great that we can do this easily with Replication Manager? 😉
  • Oracle only supports Oracle RAC on VMware for one specific version (11.2.0.2). Any other version with RAC is not recommended on VMware because of support issues. Expected to change in the future.
  • Both EMC and VMware offer additional support guarantees for customers deploying Oracle on Vmware. So where Oracle pulls back, EMC and VMware will fix any issue anyway.
  • Performance is no longer an issue. With Vsphere 5, a single virtual machine can have 32 virtual processors, 1 TB ram and drive 1 million iops. Only the most demanding workloads would not fit in this footprint. But with customers running hundreds of databases, maybe we should start with the 95% + that DO fit and make significant savings there. By the time we’re done, VMware will have Vsphere 6 and who knows what happens then.

How to get around the licensing issue

As I said, Oracle requires licenses for all servers in a cluster. So how do you limit the number of licenses? By deploying an Oracle-only VMware cluster. Only run Oracle databases here. No apps, no middleware, no fileservers, and try to move everything off that does not relate to database processing. No host replication, no storage mirroring, etc.

Say you have a legacy environment with 10 servers, each with 16 cores, so you have 160 cores licensed with oracle EE and a bunch of options. Average CPU load is 15% but let’s assume 20% to be conservative.

I claim that a single VMware cluster with 3 servers each with 32 cores will easily do the job. Now we have 3 * 32 = 96 cores to be licensed. 96/160 = 0.6 = 60% so we saved 40% on licensing right away. Probably the average CPU load on the whole cluster will still be much less than 70% so we can gradually add a bunch more databases until we average out on 70%.

If the old system was not running Intel x86 but SPARC, PA-RISC or POWER cpu’s then the processor factor was probably 1.0 or 0.75. Intel has 0.5. So for 96 cores (Intel) you would need to pay 48 full licenses. Another 33% savings.

The savings of 40% on licensing will easily justify an investment of a nice new EMC storage infrastructure with EFDs, FAST-VP and all other goodies. Do you think the customer will push us hard for a $0.01 lower GB price competing HDS or Netapp if we just saved them millions in Oracle licenses?

But the story does not end here.

Additional savings

Let’s assume the customer needed high availability and scale-out performance and was running Oracle RAC. RAC is the most expensive licensed option and you need at least two for a two-node cluster. But VMware allows for HA (High Availaiblity clustering) as well. Using VMware HA instead of RAC, you would have to fail-over and recover the database in case of an outage – if the customer cannot tolerate this then he needs to stick with RAC (only for mission critical databases!). But most customers can live with 5 minutes of downtime in case a server CPU fails and in that case, replacing RAC with VMware HA can save them another big bunch of dollars.

Let’s assume that with virtualization you justified the investment in a nice EMC infrastructure with Flash drives to replace the competitive gear. Now the Oracle cluster is no longer limited by storage I/O’s and can drive more workload out of the same 3 VMware servers in the cluster. But you can also replace host mirroring (where applicable). You can implement snapshot backups to get the I/O load away from the production servers. You removed the middleware and apps stuff from the database servers – reducing CPU utilization and allowing even more headroom for DB consolidation – all without buying extra licenses from Oracle.

You have a customer who wants even more?

What if they create TWO database clusters for VMware? One for production (running Oracle Enterprise Edition (EE) with all the options they need) and one for Non-prod (running Oracle Standard Edition (SE) without options – good enough for test/dev and smaller, non-mission critical workloads). I bet the number of non-prod databases will be much more than prod. By removing the expensive options AND moving from Enterprise to Standard Edition, you saved another ton of money on Oracle licensing as SE is much cheaper than EE. But be aware – the devil is in the details and using Standard Edition is not for the faint-of-heart (for example, you could no longer clone a partitioned database to a SE enabled server because of the missing license and functionality). Still if the customer is keen on saving as much as possible, then this might be the final silver bullet…

Do they run a huge Enterprise Data warehouse? Carefully find out if they have troubles with it and see if you can position Greenplum – saving another bag of money and speeding up their BI queries. But be careful, in an Oracle-religious shop it might backfire on you…

Reality Check

I had this discussion already with a few enterprise customers. And found that although the story is easy in theory, the reality is different. If a customer already has the 160 CPU licenses purchased from Oracle, then the Oracle rep will not happily give a money-back in return of the shelfware licenses. So in that case the customer can only save on maintenance and support. But having enough licenses on the shelf, he would not have to purchase any more for the next 5 to 10 years. So talk cost avoidance instead of immediate savings. And again, if they are licensed by user or have a site license, then saving on licenses will be a tough discussion. Still, the savings on power/cooling/hardware/floorspace would still be significant enough to proceed anyway.

And don’t forget the other benefits of private cloud of which we all know how to position: they are no different for Oracle than for other business applications.

Final thought

For this to work you need a customer that is willing to work with you and be open on how they negociated with Oracle, and a team of DB engineers to work with you to make it happen. If internal politics cause significant roadblocks then you will get nowhere.

It’s not an easy sell but the rewards can be massive. We’re only just starting to figure out how to convince customers and drive this approach. Feedback welcome and let me know if you need support.

Resources

The online Everything Oracle at EMC community has lots of information on this subject. See in particular this presentation which I co-presented (with Sam Lucido) at this year’s VMworld.

Landing page with Oracle ‘s price list: http://www.oracle.com/us/corporate/pricing/price-lists/index.html

Download “US Oracle Technology Commercial Price List” for the database license document. Read the fine print because it’s not always as simple as it seems.

Processor Core Factor information:http://www.oracle.com/us/corporate/contracts/processor-core-factor-table-070634.pdf

 

 

 

 

 

 

 

 

Posted in licensing, oracle, storage, virtualization | Tagged , , , , , | Leave a comment

Darryl Smith’s Oracle RAC Recent Blog Post

EMC IT’s Darryl Smith recently issued an outstanding blog post on our ECN community, the subject of which was: Is Oracle RAC Becoming More of a Corner Case? I thought I would add a few thoughts to Darry’s blog post.

I have been saying for a while that most Oracle databases do not need the level of uptime and fault tolerance that RAC provides, so Darryl and I are certainly thinking along the same lines.

I think that the value proposition for vSphere and NFS are very similar. Given that I have spent the bulk of my career pushing NFS and NAS for Oracle database storage, the synergy is obvious at least to me.

Both technologies are about having a “good enough” infrastructure for an Oracle database which, while certainly important to the business, is not the back-end for an online catalog, or an online securities trading app.

For databases like that, I would recommend neither vSphere nor NFS. But the vast, vast majority of databases running Oracle do not fall into this category.

In the case of NFS, for years I made the statement that I believe that 90% of all Oracle databases running in datacenters all of the world could be run over an NFS mount with absolutely no change in performance, reliability, or user experience. (There would, on average, be a big reduction in cost and improvement in manageability, though.)

vSphere is exactly the same. For anything other than the most barn-burning performance, with absolutely the highest standards of fault tolerance, vSphere works just fine. It provides a very high level of reliability, a few minutes a year of downtime, at a vast reduction in cost and complexity compared to RAC.

Because the value propositions are so similar, I believe that the combination of NFS and vSphere is going to become increasingly popular. We’ll certainly see how it turns out.

Posted in oracle, storage, virtualization | Tagged , , , , , , | Leave a comment

Oracle OpenWorld 2011 Technical Sessions: Part 1

This begins a series of blog posts on the subject of technical sessions that EMC presented at OOW2011. To begin with, we will examine the session that I co-presented with Kevin Jernigan entitled Clone Oracle Databases Online in Seconds with Oracle Clonedb and DirectNFS. The presentation for this session can be found here.

It may sound strange given EMC’s reputation and history, but EMC has a strong partnership with Oracle in the area of NAS. We began working with Oracle on the Direct NFS client (“dNFS”) way back in 2007, when dNFS was introduced as a major new feature in Oracle Database 11g Release 1. At that time, EMC, was a co-presenter (together with Oracle) at Oracle OpenWorld 2007 on this subject.

The goals of dNFS were simple, and well defined:

  • Improve performance (in terms of latency and throughput) of NFS-mounted Oracle database I/O
  • Reduce CPU cost of NFS-mounted Oracle database I/O
  • Improve network port scalability and failover of NFS-mounted Oracle database I/O
  • Simplify administration of NFS in terms of both network port scaling and mount point parameters
  • Make administration of NFS uniform across all platforms
dNFS succeeds widely at all of these. dNFS dramatically improves latency of network I/O from Oracle by eliminating most context switches and making the code path to the disk much shorter. dNFS also provides better port scaling than kernel NFS, with much simpler administration. No fussy ether channel network switch configuration is required. All of the port scaling and port failover is provided within the Oracle environment.
dNFS also made administration completely uniform across all platforms. Amusingly, Windows is included! NFS now works on Windows, at least with an Oracle database.
In September 2010, Oracle announced that dNFS would be improved in Oracle Database 11g Release 2. One major improvement is the addition of dNFS clonedb, a thinly-provisioned, rapid database replication feature. This feature also works very well with storage-based replication.
The basis for dNFS clonedb is a database copy. This copy can be a backup, a storage-based snapshot or clone, or an operating system copy. (Of course, EMC likes to leverage our storage-based snaps and clones.)
Once a copy exists, it can be used to create a clonedb instance. The steps to do this are contained in My Oracle Support article 1210656.1 . Effectively, this creates a read / write virtual database which takes up minimal space, is created almost instantly, and contains only the space required to store any changes to the database. Given that storage-based snapshots are also space and time efficient (that is, they take up very little space, and can be created very quickly) there is a great deal of synergy between these technologies.
EMC did some performance testing with dNFS clonedb. The network diagram for the testbed is here:

clonedb RAC network diagram

First we established a baseline in terms of performance. We ran an OLTP workload against the production database with no additional operations. This produced the following:
RAC baseline performance
Notice the perfectly clean scaling of this performance chart. We then tested the performance of creating a snapshot to serve as the source for the clonedb database. This produced:

RAC snapshot performance

 

Not the slight response time hit when the snapshot was taken. However, transactional throughput was not affected. Basically, this is at most a minimal performance hit. Finally we tested performance while creating clonedb database instances from the storage snapshot. This produced:

RAC clonedb performance

Note that there was no performance hit during clonedb creation at all. One additional test was performed. We measured the storage space occupied by the clonedb when it was created. A 10 TB database was used as the source. The total space occupied by the clonedb was only 7 MB.

See the link above for the full presentation of this technical session. Further blog posts in this series will contain summaries of other EMC technical sessions at OOW 2011.

Posted in oracle, storage | Tagged , , , , , | Leave a comment

Oracle OpenWorld 2011 Keynotes

I attended Oracle OpenWorld 2011 as usual this year, at which I co-presented a technical session with Kevin Jernigan on the subject of Direct NFS clonedb. Since I had a full conference pass, I was able to attend many of the keynotes at this event.

I did not get to attend the Sunday Larry keynote, although I understand that it was pretty terrible. I did attend the Wednesday Larry keynote, and can tell you this: As a long-term Larry watcher (I have attended every OOW since 1998) it was hands down the worst Larry keynote I have ever seen. Perhaps because Larry had just lost his best friend (Steve Jobs died of cancer the day of this keynote), he was off his game. That’s the only reasonable explanation I have for such a spectacular failure of such a precious and expensive opportunity.

Larry’s keynote consisted largely of a 20 minute-plus long demo of Fusion Applications, which while interesting in itself, was excrutiatingly dull, and definitely would have been much better if handled by someone other than Larry. What Larry is great at (and what was largely lacking in his keynote) is the flamboyant, extravagant, and provocative way that he baits his competitors and makes outlandish claims concerning his own products’ performance. He did a bit of sniping at (SalesForce.com Chairman and CEO) Marc Benioff. Aside from that, nothing.

The other Oracle keynotes were not much better. This was an almost content-free OOW for me. The ExaData and ExaLogic stuff was more of the same from last year. The big product launch was of course Fusion Applications, but that is just software. There was no big splash on the hardware side.

The best keynotes by far were Joe Tucci and John Chambers. Both provided a lot of content, much of which was competitive to Oracle, interestingly. Cisco is certainly competing with Sun in the server market and EMC competes with Oracle in storage, virtualization and datawarehousing software. All of this was discussed extensively during these keynotes. Joe even said the “v” word (VMware) during his keynote, and Pat further elaborated on this subject during his follow on to Joe’s talk. Pat also included a long demo of vCenter which was assisted by EMC’s own Chad Sakac.

Chambers talk was very compelling to me. His vision for a new platform in the post-PC age was a serious challenge to both Microsoft and Apple in their ecosystems. I am a bit skeptical that Cisco can successfully compete with those vendors in their respective spaces, though. Microsoft owns the business productivity space, and Apple has a serious stake in the home and home office space. Cisco will need to penetrate both of these in order to fulfill Chamber’s vision. We’ll see.

I thought OOW was about Oracle. This year, not so much. This year, the interesting stuff at OOW was from other vendors. Score one for EMC and Cisco, and score zero for Oracle this year.

Posted in oracle, storage, technology | Tagged , , , , , , , , | 1 Comment

Announcing the Oracle Heretic Blog

Many of you have been following me for some time on the Oracle Storage Guy blog. I have been continuously receiving page hits and comments on blog posts I made years ago. I am grateful for all of the support you have given me.

As many of you have noticed, The Oracle Storage Guy blog has been quiet lately. I have been moving into a new role, reporting to Sam Lucido, and in that role I have taken on responsibilities relating to social networking. Specifically, I am now (together with Sam) responsible for administering the Everything Oracle at EMC website. As I have moved out of my former role within EMC’s Global Solutions Organization (now Enterprise Solutions Group), I will be more involved with customers, speaking at tradeshows and the like. In that role, blogging is becoming more of a critical part of my role.

I am also actively on twitter, also at OracleHeretic.

This post announces the creation of the Oracle Heretic blog. The thrust of the new blog will be to discuss pressing issues of the Oracle technical space, especially as it relates to storage and virtualization. The tone of this new blog will be slightly stronger than the old Oracle Storage Guy blog as well. I will be more assertive here than I have been in the past. Hopefully, this fresh new approach will be welcome. Please watch this space, and let me know your thoughts! For a while, I will be posting on both locations simultaneously. In the future, I will phase out the Oracle Storage Guy blog in favor of this one.

Posted in Uncategorized | Tagged , , , | 1 Comment