Friday, February 06, 2009

 

Using Oracle Parallel Query in a RAC Environment

Oracle RAC and the Parallel Query Option can be a powerful combination, but if you don’t entirely understand how it works; it might not be doing exactly what you think it is. The parallel query option allows some long running operations such as table scans to be divided into smaller chunks and run by multiple server processes. Since operations such as table scans are very I/O intensive, this essentially allows other processes to be doing work on your behalf while some of them are simply waiting on the I/O to return. Let’s look at an example.

Let’s say that you have a table scan that is going to read 1,000,000 blocks. Furthermore, ignoring all disk caching, driver optimizations, etc. let’s assume that each physical I/O takes 2ms. So, in order to read this entire table into the SGA it will take 1,000,000 I/Os x 0.002 sec/I/O = 2,000 sec which is 33.3 minutes. So, it will take a half an hour (plus overhead) just to read the data into the SGA. Most of the time which is spent idle waiting on the I/O to return.

Now let’s turn on parallel query with a parallelism of 10. Oracle will internally divide this 1,000,000 block table into multiple pieces and pass those pieces off to the 10 parallel query processes. If you are using anything other than a single SATA drive for your database the I/O subsystem can easily handle the parallelism and will now return that same 1,000,000 blocks in 1/10th of the time, or 3.33 minutes.

Note: I prefer to have control over my parallelism, so I will use the parallel hint to invoke parallel query as such; SELECT /*+ PARALLEL(table_alias, degree) */ FROM table;

Taking it up a notch, now let’s consider that the operations are consuming all of the CPU in the system. I can now additionally add nodes to use in a RAC cluster for parallel query, thus taking advantage of parallelism and RAC. This can be done by extending the hint to include the number of nodes to use in a parallel query operation as such; /*+ PARALLEL(table_alias, degree, instances) */ FROM table; This will allow me to specify not only the number of query slaves to use, but the number of instances to use as well.

This is only part of the story. Now it is necessary to specify whether we are talking about 10g or 11g. Let’s start with Oracle 10g.

Parallel Query and Oracle 10g RAC

With Oracle 10g, the number of RAC nodes that will participate in a parallel query operation depends on a couple of parameters; instance_groups and parallel_instance_group. The instance_groups parameter is used to set up instance_groups and the parallel_instance_group parameter is used to specify which group you belong to.

First, set up the instance groups. In this example I have a four node RAC cluster:

INSTANCE_GROUPS

Assign the node a set of instance groups that it participates in

The following graphic illustrates the configuration.

Setting the parallel_instance_group parameter will define which group you belong to, thus defining where your parallel queries can run.

For example, if I set my parallel_instance_group parameter to Node1, my parallel queries will only run on node 1 as shown here:

Setting the parameter parallel_instance_group in the init.ora file will specify the default value for each user who logs into that instance. An alter session command can be used to modify that parameter as shown here: SQL> alter session set parallel_instance_group=all_nodes;

This will allow the parallel query to span all nodes in the RAC cluster.

Note: You don’t have to be on the node that parallel query is configured to run on. If you happen to log onto node 3 and your parallel_instance_group is Node1 non-parallel queries will run on node 3, but any parallel queries will run on node 1.

Parallel Query and Oracle 11g RAC

So, what about Oracle 11g? If you are running on Oracle 11g forget everything I just told you. One of the best features about Oracle 11g is that it now uses services to specify which nodes your parallel query runs on. So, depending on which service you connect to, the preferred nodes for that service are the nodes that parallel queries will use.

Conclusions

Depending on the operation that you are doing, you might be able to achieve huge performance gains by parallelizing across multiple nodes in a RAC cluster. Of course your mileage might vary, and if you aren’t careful you might increase the load on nodes to a point where you are affecting other users. I have a saying that “If you aren’t careful with parallelism, you might paralyze your system.”


This page is powered by Blogger. Isn't yours?