Oracle DB, balancing flash with bulk disks

03-28-2013

Registered User

3, 0

Join Date: Mar 2013

Last Activity: 8 September 2013, 8:53 AM EDT

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

Oracle DB, balancing flash with bulk disks

I'm working as sys admin on a db server that's currently serviced by 30x 15k disks and is getting about 3000-3500 iops. A consultant has suggested we can replace that with 10x 10k disks and 700g of flash. He has not performed any testing to determine the working set, growth patterns, etc... and I find the suggestion a little hard to comprehend. For what it's worth, he's not a DB storage specialist, rather a generalist.

I'm looking for any information from Oracle or EMC on the ratio between the flash tier and the disk tier. Note, ASM and ZFS are not being used in this case, the array will do the tiering in 1g block chunks

Nix_Lover

View Public Profile for Nix_Lover

Find all posts by Nix_Lover

03-28-2013

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

In modern systems, 700G is still usually quite bigger than ram, and as a cache for more frequently used items, it would be fine. You might add it to what you have! No head motion on ssd means it should be pretty solid, until the flash dies of over-erase. It can not only speed things but save a lot of wear and tear on the servos. It is not about growth patterns, just caching. The storage capacity issue is completely separate, but the iops of the hybrid is hard to estimate, since it depends on cache hit rate. The bulk bandwidth, like for a complete dump, will still depend on more than spindles. I am not seeing what makes the 10x10k better than the 30x15k, except power consumption. Finally, moving data in and out of ssd will actually create more total I/O, if dirty pages go there and later go to non-ssd, and when clean pages go SSD not simply flushed. Counts of controllers, cables, and even memory bandwidth can get stretched.

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

03-29-2013

Registered User

3, 0

Join Date: Mar 2013

Last Activity: 8 September 2013, 8:53 AM EDT

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by DGPickett

In modern systems, 700G is still usually quite bigger than ram, and as a cache for more frequently used items, it would be fine. You might add it to what you have! No head motion on ssd means it should be pretty solid, until the flash dies of over-erase. It can not only speed things but save a lot of wear and tear on the servos. It is not about growth patterns, just caching. The storage capacity issue is completely separate, but the iops of the hybrid is hard to estimate, since it depends on cache hit rate. The bulk bandwidth, like for a complete dump, will still depend on more than spindles. I am not seeing what makes the 10x10k better than the 30x15k, except power consumption. Finally, moving data in and out of ssd will actually create more total I/O, if dirty pages go there and later go to non-ssd, and when clean pages go SSD not simply flushed. Counts of controllers, cables, and even memory bandwidth can get stretched.

Thanks for your thoughtful response. I wish I could add the flash to my existing storage system but it's an older SAN and the decision has been made to move to a new, smaller unit. At this point, I'm just trying to make sure that the right balance is struck.

I agreed with your statement 'The storage capacity issue is completely separate, but the iops of the hybrid is hard to estimate, since it depends on cache hit rate.' We haven't done any analysis to determine the size of the disk working set, so it's impossible to determine if 700g is enough or not. The thrashing you refer to when moving data in and out of the flash tier will stress the system as will any significant ingestion of data.

Instead of 10x 10k + 5 flash drives, I'm pushing for 20x 10k + 4 flash. That would be 15 drives for data (with 5 flash additional for tiering, 20 total for data) and 5 drives for logging with no flash tier. I believe that would provide a more balanced storage system and allow for 2x faster bulk transactions and also additional headroom for all the overhead that comes with tiering.

Any other thoughts?

Nix_Lover

View Public Profile for Nix_Lover

Find all posts by Nix_Lover

04-09-2013

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

I was always fond of theSolaris Client File System. I think something like that could make the thrashing not s problem. You have a local SSD FS that is dedicated to be the local hierarchal cache of the remote NFS SAN. When space is a problem, the least used file on the cache can be deleted, as it is backed by the nfs drive. You can even texture and manage the update rate of modified SSD/cache pages back to NFS, so frequently modified pages are not sent many times. If we are talking RDBMS sorts of Commit trust, then it needs to be a mirror drive since updates are not immediately sent to NFS. It was originally designed for virtually diskless systems, as the local disk starts almost empty. For instance, the first guy to run vi gets it from NFS and saves it to local disk when done. Local disk size is never an issue, as it is all cache for the NFS. Backups are never an issue, as the data is all really on NFS once the mods are all synced up. There is a resiliency in the design, as if there is a burst of churn, it goes to the local drive eradicating old not dirty pages as new dirty pages, and gets gradually repatriated to NFS. If it get modified 20 times, it might send it 20 times , or once, depending on the prssure.

Theoretically, one might do a hierarchy of ram to ssd (striped, mirrored, RAIDn perhaps) to local hard drives (all striped together for max speed, RAIDn perhaps) and have the NFS layer on top of that.

Another possibility is to have a very local second system for the "local drive" layer, multiple back to back full duplex gig ether or similar cheap but very high performance very local networking. That way, the local IO is not so stressed. Each machine has one RAM to disk to NFS hierarchy going on, but the capacity for surges in churn is enhanced by the second host, buffering the SAN. Such a box does not need CPUs unless someone wants a compressed drive or RAID in software, or so much RAM, just I/O capacity.

The strategy should be built on solid business case, using tested limits of performance of the pieces and not neglecting the necessity for redundant storage/mirroring outside the "Commit" space.

One nice thing about hierarchical storage is that once files are repatriated to NFS, it is all disposable, so you can flush and then reconfigure in more disk for performance or churn buffering, without any data reconfiguration. That could be a very short outage. Once the data is flushed/repatriated, you could change out the whole system and restart empty. It might be sluggish for a few moments until the working set is local, and then off you go.

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

04-09-2013

Registered User

2,977, 644

Join Date: Oct 2010

Last Activity: 14 September 2019, 1:15 PM EDT

Location: France

Posts: 2,977

Thanks Given: 88

Thanked 644 Times in 613 Posts

Not sure it will help , but maybe this would be of interest:
Oracle memory usage on Solaris box

You can also look at the dynamic intimate shared memory (dism) relating articles as well as some pdf file around the net about running DB (Oracle i suppose) in solaris containers.
Running Oracle Database in Solaris 10 Containers - Best Practices
� Doc ID: Note:317257.1 (old metalink note)

I retrieved it from my archive so maybe there are some more up to date version of the note directly available on the myoracle support abut those topics.

If your problem is about overload of physical IO, you should first figure whether this load is normal or not, if it is normal, then ok for an hardware upgrade but if the load is due to bad tweaking (could be at different levels, OS, DB, ... ) then a performance audit may be the way to go.

ctsgnb

View Public Profile for ctsgnb

Find all posts by ctsgnb

04-09-2013

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

Yes, you'd think they would find a way to get SSD/disk to NFS without using main system RAM, so the moving of dirty pages to NFS is not a conflict with processing or moving pages to and from local disk and RAM. The one central bus for all theme can get overwhelmed. But that is hot rod computing for you, it takes a 100 horse power bottle to hold a 100 horsepower genie. It is rare that making one part faster does not leave one at the mercy of some other part: RAM banked not interlaced, memory controllers, busses, disk controllers, cables and fibers, cache onboard controllers or disks, and of course, latency destroying the value of bandwidth.

Since the Oracle RDBMS is at the center of the storm, they should be a rich resource for the right solutions.

Suppose a sort of serial churn of a huge set, where a huge table is read and new pages are sent to another table. What will the flow top out at, if we are moving

the input pages from NFS to RAM,
RAM to CPU,
RAM to local disk, then discarded as old, and
writing new pages CPU to RAM,
RAM to local disk,
local disk to RAM and
RAM to NFS.

Each page through is: 3 RAM writes, 4 RAM reads, one CPU read (assuming one CPU or CPU group caches it through all processing), one CPU write, 2 local disk writes, 1 local disk read, one network read and one network write. Which is the bottleneck? Which comes in second, third, and by how much?

Last edited by DGPickett; 04-09-2013 at 04:08 PM..

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

04-09-2013

Registered User

1,015, 157

Join Date: Jun 2009

Last Activity: 25 June 2018, 8:15 AM EDT

Posts: 1,015

Thanks Given: 3

Thanked 157 Times in 149 Posts

OK I'm wondering where NFS came into the picture - as far as I can tell, the storage is SAN-based. I'm also assuming you mean the Solaris cachefs. I'd recommend against using that - it no longer exists in Solaris 11, and the many Sun engineers I've worked with, not a one of them had much good to say about cachefs.

If the OP is looking at EMC SANs for 20+ disk drives, it's probably a lot more cost-effective to look at something like this:

Oracle Database Appliance

No SAN O&M, no FC issues.

achenle

View Public Profile for achenle

Find all posts by achenle

Solaris

Oracle DB, balancing flash with bulk disks

3 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Help in MQ load balancing

Discussion started by: senkerth

2. AIX

CPU Load balancing in AIX 5.2, Oracle

Discussion started by: ucbus

3. Shell Programming and Scripting

how floppy disks, CDs and flash drives (pen drives) are accessed in UNIX

Discussion started by: nokia1100