Difference between revisions of "Streams and Multi-GPU using CUVI"
(7 intermediate revisions by one other user not shown) | |||
Line 9: | Line 9: | ||
//You may notice that all the functions below take an optional parameter for stream | //You may notice that all the functions below take an optional parameter for stream | ||
//If the | //If the user doesn't wish to use it, the program will execute on a single stream by default | ||
//Creating a 3-channel Image container on GPU | //Creating a 3-channel Image container on GPU | ||
CuviImage* gimg = new CuviImage(size, | CuviImage* gimg = new CuviImage(size,host_img->depth,3); | ||
//Creating a | //Creating a single channel Image container for output on GPU | ||
CuviImage* gout = new CuviImage(size, | CuviImage* gout = new CuviImage(size,host_out->depth,1); | ||
//Uploading RGB image to GPU | //Uploading RGB image to GPU | ||
Line 27: | Line 27: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
|} | |} | ||
===Same example with Streams=== | ===Same example with Streams=== | ||
Line 74: | Line 73: | ||
for(int i=0; i<streamCount; i++) | for(int i=0; i<streamCount; i++) | ||
gout[i]->download(host_out->imageData + i*(size.height * host_out->widthStep), host_out->widthStep, streams[i]); | gout[i]->download(host_out->imageData + i*(size.height * host_out->widthStep), host_out->widthStep, streams[i]); | ||
//Don't forget to destroy streams and free memory | |||
</syntaxhighlight> | |||
|} | |||
==Multi-GPU in CUVI== | |||
Applications that use CUVI also have the liberty to scale up on a Multi-GPU environment without changing a single line of code. Now that we know how to play with streams in CUVI, multi-GPU is nothing more than dividing those stream execution across all the available devices in the machine. You can write a single piece of code with few checks and error handling that will run on a single GPU machine and scale up on a multi-GPU environment. | |||
{| | |||
|style="font-size:130%;"| | |||
<syntaxhighlight lang="cpp"> | |||
cuviGetDeviceCount(&DeviceCount); | |||
if(DeviceCount>1){ | |||
//Multi-GPU code here | |||
} | |||
</syntaxhighlight> | </syntaxhighlight> | ||
|} | |} | ||
A user can divide stream execution among GPUs just by selecting the desired device for execution before any CUVI call. One does that by calling <code>cuviSetCurrentDevice(X);</code> where 'X' represents the device id and its range is {0,N-1} in a machine containing N CUDA capable GPUs. Any call of CUVI following this will execute on device X until you set the current device to another. |
Revision as of 18:15, 4 May 2012
Using Streams with CUVI
CUVI framework provides a way to use streams with minimal coding effort. Each function call in CUVI takes an optional parameter to specify the stream on which it should run. The code below shows how a simple function call of CUVI can be divided into streaming calls on GPU. For most of the cases this will result in better performance as copying image data to GPU and processing that data on GPU happens simultaneously.
CUVI example
In this example we use CUVI's RGB2Gray function from Color Operations module on a full HD input image
|
Same example with Streams
Streams greatly improve performance of your application by hiding data processing time in data copying time. Instead of waiting for the complete image to be copied on GPU before processing, streaming enables processing the data as it arrives on GPU. Here's how you can use streaming in your application using CUVI:
|
Multi-GPU in CUVI
Applications that use CUVI also have the liberty to scale up on a Multi-GPU environment without changing a single line of code. Now that we know how to play with streams in CUVI, multi-GPU is nothing more than dividing those stream execution across all the available devices in the machine. You can write a single piece of code with few checks and error handling that will run on a single GPU machine and scale up on a multi-GPU environment.
|
A user can divide stream execution among GPUs just by selecting the desired device for execution before any CUVI call. One does that by calling cuviSetCurrentDevice(X);
where 'X' represents the device id and its range is {0,N-1} in a machine containing N CUDA capable GPUs. Any call of CUVI following this will execute on device X until you set the current device to another.