Difference between revisions of "Streams and Multi-GPU using CUVI"

From CUVI Wiki
Line 17: Line 17:


//Uploading RGB image to GPU
//Uploading RGB image to GPU
gimg->upload(gimg->data, host_img->imageData, host_img->widthStep);
gimg->upload(host_img->imageData, host_img->widthStep);


//function call
//function call
Line 23: Line 23:


//Download resultant gray image back to host
//Download resultant gray image back to host
gout->download(host_out->imageData, gout->data, host_out->widthStep);
gout->download(host_out->imageData, host_out->widthStep);


</syntaxhighlight>
</syntaxhighlight>
Line 37: Line 37:


//Number of data chunks
//Number of data chunks
int streamCount = 3;
int streamCount = 4;


//Creating a 3-channel Image container on GPU
//Size of each image chunk
CuviImage* gimg = new CuviImage(size,img->depth,3);
CuviSize size = cuviSize(host_img->width, host_img->height/streamCount);
//Creating a sing channel Image container for output on GPU
CuviImage* gout = new CuviImage(size,img->depth,1);


//Height of each chunk in of the image
//Creating 3-channel Image containers on GPU for each chunk
size_t cHeight = gimg->height / streamCount;
CuviImage ** gimg = new CuviImage*[streamCount];
for(int i=0; i<streamCount; i++)
    gimg[i] = new CuviImage(size,img->depth,3);


//Creating streams
//Creating single channel Image containers for output on GPU for each chunk
CuviImage ** gout = new CuviImage*[streamCount];
for(int i=0; i<streamCount; i++)
    gout[i] = new CuviImage(size,out->depth,1);
//Creating a stream against each chunk of data
CuviStream **streams = new CuviStream*[streamCount];
CuviStream **streams = new CuviStream*[streamCount];
for(int i=0; i<streamCount; i++)
for(int i=0; i<streamCount; i++)
     cuviCreateStream(&streams[i]);
     cuviCreateStream(&streams[i]);


//Chunks sizes for each stream in bytes
//gpuChunk and hostChunk are mostly different because their pitch is different
size_t gpuChunk =  cHeight  * gimg->pitch,
      gpuOutChunk =  cHeight  * gout->pitch,
      hostChunk = cHeight  * host_img->widthStep;
      hostOutChunk = cHeight  * host_out->widthStep;


//Uploading image data to GPU in streams
//Uploading image data to GPU in streams
for(int i=0; i<streamCount; i++)
for(int i=0; i<streamCount; i++)
      inputImage->upload(gimg->data + i*gpuChunk, host_img->imageData + i*hostChunk, host_img->widthStep, streams[i]);
    gimg[i]->upload(host_img->imageData + i*(size.height*host_img->widthStep), host_img->widthStep, streams[i]);
   
   
//Function call on each stream
//Function call on each stream
for(int i=0; i<streamCount; i++)
for(int i=0; i<streamCount; i++){
      cuvi::colorOperations::RGB2Gray(gimg + i*gpuChunk, gout + i*gpuOutChunk, streams[i]);
    cuvi::colorOperations::RGB2Gray(gimg[i],gout[i],streams[i]);
   
   
//Downloading resultant data back to host in streams
 
//Downloading resultant data back to host image in streams
for(int i=0; i<streamCount; i++)
for(int i=0; i<streamCount; i++)
      gout->download(host_out->imageData + i*hostOutChunk , gout + i*gpuOutChunk, host_out->widthStep, streams[i]);
    gout[i]->download(host_out->imageData + i*(size.height * host_out->widthStep), host_out->widthStep, streams[i]);
 
//Don't forget to destroying streams and free memory
 
//Don't forget to destroy streams and free memory


</syntaxhighlight>
</syntaxhighlight>
|}
|}

Revision as of 17:46, 4 May 2012

Using Streams with CUVI

CUVI framework provides a way to use streams with minimal coding effort. Each function call in CUVI takes an optional parameter to specify the stream on which it should run. The code below shows how a simple function call of CUVI can be divided into streaming calls on GPU. For most of the cases this will result in better performance as copying image data to GPU and processing that data on GPU happens simultaneously.

CUVI example

In this example we use CUVI's RGB2Gray function from Color Operations module on a full HD input image

//You may notice that all the functions below take an optional parameter for stream
//If the users doesn't wish to use it the program will execute on a single stream by default
 
//Creating a 3-channel Image container on GPU
CuviImage* gimg = new CuviImage(size,img->depth,3);
//Creating a sing channel Image container for output on GPU
CuviImage* gout = new CuviImage(size,img->depth,1);

//Uploading RGB image to GPU
gimg->upload(host_img->imageData, host_img->widthStep);

//function call
cuvi::colorOperations::RGB2Gray(gimg,gout);

//Download resultant gray image back to host
gout->download(host_out->imageData, host_out->widthStep);


Same example with Streams

Streams greatly improve performance of your application by hiding data processing time in data copying time. Instead of waiting for the complete image to be copied on GPU before processing, streaming enables processing the data as it arrives on GPU. Here's how you can use streaming in your application using CUVI:

//Number of data chunks
int streamCount = 4;

//Size of each image chunk
CuviSize size = cuviSize(host_img->width, host_img->height/streamCount);

//Creating 3-channel Image containers on GPU for each chunk
CuviImage ** gimg = new CuviImage*[streamCount];
for(int i=0; i<streamCount; i++)
     gimg[i] = new CuviImage(size,img->depth,3);
 

//Creating single channel Image containers for output on GPU for each chunk
CuviImage ** gout = new CuviImage*[streamCount];
for(int i=0; i<streamCount; i++)
     gout[i] = new CuviImage(size,out->depth,1);
 
//Creating a stream against each chunk of data
CuviStream **streams = new CuviStream*[streamCount];
for(int i=0; i<streamCount; i++)
     cuviCreateStream(&streams[i]);


//Uploading image data to GPU in streams
for(int i=0; i<streamCount; i++)
     gimg[i]->upload(host_img->imageData + i*(size.height*host_img->widthStep), host_img->widthStep, streams[i]);
 

//Function call on each stream
for(int i=0; i<streamCount; i++){
     cuvi::colorOperations::RGB2Gray(gimg[i],gout[i],streams[i]);
 

//Downloading resultant data back to host image in streams
for(int i=0; i<streamCount; i++)
     gout[i]->download(host_out->imageData + i*(size.height * host_out->widthStep), host_out->widthStep, streams[i]);


//Don't forget to destroy streams and free memory