Tuesday, March 9, 2010

Oracle 11g Data Warehouse Storage

I’m in the middle of deploying an Oracle Data Warehouse solution and we decided to go a new direction with the storage.  Normally, for a large Oracle cluster like this, we’d purchase a large HP or NetApp storage array and put a couple hundred disks in it.  As you can imagine, this costs a lot of money, usually several hundred thousand USD.  There are some limitations to this approach for data warehousing.  Let me explain:

Sequential read performance is our most important criteria for the data warehouse storage.  Even write performance is not very critical, because data will be loaded infrequently through an ETL process.  There will only be a few users at a time on the system, but those users want their queries to run very fast.  These are complex queries with JOINs that perform many full table scans.  Random I/O performance is not that important, since this is not your typical transactional database or OLTP application.

From past benchmarking of a fully loaded HP EVA 8100, I’ve determined that it can do about 75K random I/Os per second, however, it can also do about 1.6GB/sec. sequential reads.  The random performance is pretty good, but knowing we have a hard upper limit of 1.6GB/sec. is very limiting.  If we need more performance from the data warehouse, we have to purchase a second storage array and stripe across both, costing even more money.  We needed a more cost effective solution.

Enter the HP MSA 2000.  These are “budget” storage arrays that are not nearly as powerful as an EVA, don’t hold nearly as many disks, and don’t have nearly as much random I/O performance.  However, they do have pretty good sequential performance.  Here are their performance characteristics:

MSA2 G2 Performance MSA2300fc G2 MSA2000sa G2 MSA2000i G2
Protocol (host connect)

4 Gb
Fibre Channel

3 Gb
SAS

1GbE Ethernet

MSA2000 RAID 10 Performance Results      
Random Reads IOPs

22,874

21,861

13,658

Random Writes IOPs

15,008

14,491

14,399

Random Mix IOPs 60/40 read/write

17,878

17,700

13,493

Sequential Reads MB

1,238

1,050

274

Sequential Writes MBs

532

532

266

MSA2000 RAID 5 Performance Results      
Random Reads IOPs

22,044

21,000

12,335

Random Writes IOPs

2,714

2,714

2,714

Random Mix IOPs 60/40 read/write

5,325

4,926

5,325

Sequential Reads MBs

1,238

1,050

274

Sequential Writes MBs

725

617

266

MSA2000 RAID 6 Performance Results      
Random Reads IOPs

21,975

21,000

12,292

Random Writes IOPs

1,876

1,876

1,876

Random Mix IOPs 60/40 read/write

4,293

3,865

3,851

Sequential Reads MBs

1,238

1,050

274

Sequential Writes MBs

772

729

266

Looking at the table above, the MSA 2000 we’re interested in is the MSA 2300fc G2, which can do 1238 MB/sec. sequential reads.  At 1.24GB/second, this is getting very close to the EVA 8100’s 1.6GB/second.  Now, consider that this array with a couple shelves of disks costs about 20% of the price of an EVA 8100.

For our data warehouse solution, we decided to purchase 4 of these smaller MSA 2300fc G2 arrays, and use Oracle ASM to stripe across them, hopefully achieving aggregate sequential throughput of approximately ~5GB/second (4x 1.24GB/second).

This solution seems to fit a data warehouse workload better than a single large storage array, because as performance and storage capacity needs grow, more MSA 2000s can simply be added, increasing performance and storage capacity as the application performance requirements grow.

No comments:

Post a Comment