Difference between revisions of "Performance & Benchmark"
(25 intermediate revisions by the same user not shown) | |||
Line 4: | Line 4: | ||
|style="font-size:85%;"| | |style="font-size:85%;"| | ||
<tabs> | <tabs> | ||
<!-- Jetson Nano Starts --> | |||
<tab name="Jetson Nano"> | |||
{|class="wikitable" | |||
<!--|+ style="caption-side:bottom; color:#e76700;"|''Kernel Time in milliseconds (ms) with CUVI v1.7.0 on Jetson Nano--> | |||
|- | |||
|colspan="5" |Kernel Time in milliseconds (ms) with CUVI v1.7.0 on Jetson Nano having '''128''' CUDA Cores. | |||
|- | |||
! Algorithm / Image Size | |||
! 720p | |||
! 1080p | |||
! 4k (3840x2160) | |||
! 8k (7680x4320) | |||
|- | |||
|- | |||
|[[Function:Add|add]] - 2 Images | |||
|2.99 | |||
|8.38 | |||
|15.63 | |||
|50.27 | |||
|- | |||
|[[Function:ChannelMix|channelMix]] | |||
|4.09 | |||
|7.42 | |||
|15.70 | |||
|53.35 | |||
|- | |||
|[[Function:Demosaic|demosaic]] | |||
|8.11 | |||
|11.77 | |||
|42.99 | |||
|172.40 | |||
|- | |||
|[[Function:DemosaicDFPD|demosaicDFPD]] | |||
|12.6 | |||
|23.86 | |||
|88.87 | |||
|357.94 | |||
|- | |||
|[[Function:GammaCorrect|gammaCorrect]] | |||
|3.12 | |||
|5.69 | |||
|14.13 | |||
|45.80 | |||
|- | |||
|[[Function:HistEq|histEq]] - Single Channel | |||
|5.29 | |||
|7.88 | |||
|20.53 | |||
|61.52 | |||
|- | |||
|[[Function:LUT|LUT]] | |||
|1.89 | |||
|2.77 | |||
|11.26 | |||
|25.42 | |||
|- | |||
|[[Function:blackGammaLUT|blackGammaLUT]] | |||
|4.08 | |||
|7.52 | |||
|18.36 | |||
|62.27 | |||
|- | |||
|[[Function:RGB2Gray|rgb2gray]] | |||
|1.95 | |||
|2.67 | |||
|10.64 | |||
|20.89 | |||
|- | |||
|[[Function:FocusStack|focusStack]] - Stacking 5 Images | |||
|252.52 | |||
|452.81 | |||
|1830.54 | |||
|7320.52 | |||
|- | |||
|[[Function:BitConversion|bitConversion]] - From 8 to 16 bits | |||
|3.74 | |||
|8.46 | |||
|15.71 | |||
|63.33 | |||
|- | |||
|[[Function:Crop|crop]] | |||
|1.67 | |||
|4.76 | |||
|9.04 | |||
|28.93 | |||
|- | |||
|[[Function:Resize|resize]] - Scale=2.0 | |||
|9.29 | |||
|18.43 | |||
|55.58 | |||
|222.41 | |||
|- | |||
|[[Function:Resize|resize]] - Scale=0.5 | |||
|2.33 | |||
|5.23 | |||
|7.01 | |||
|17.08 | |||
|- | |||
|[[Function:Rotate|rotate]] - Non Cropping, Angle = -3.76f | |||
|3.43 | |||
|7.73 | |||
|26.00 | |||
|58.34 | |||
|- | |||
|[[Function:WarpPerspective|warpPerspective]] | |||
|5.09 | |||
|11.48 | |||
|19.86 | |||
|81.36 | |||
|- | |||
|[[Function:ImageFilter|imageFilter]] - 5x5 floating point window | |||
|18.44 | |||
|26.80 | |||
|89.58 | |||
|358.29 | |||
|- | |||
|[[Function:UnderwaterFilter|underwaterFilter]] | |||
|29.55 | |||
|50.74 | |||
|79.09 | |||
|332.95 | |||
|- | |||
|[[Function:haarFwd|haarFwd]] | |||
|10.27 | |||
|18.60 | |||
|40.45 | |||
|130.86 | |||
|} | |||
</tab> | |||
<!-- Jetson Nano Ends--> | |||
<!-- GTX 1080 Starts --> | |||
<tab name="GTX 1080"> | |||
{|class="wikitable" | |||
<!--|+ style="caption-side:bottom; color:#e76700;"|''Kernel Time in milliseconds (ms) with CUVI v1.8.0 on GTX 1080--> | |||
|- | |||
|colspan="5" |Kernel Time in milliseconds (ms) with CUVI v1.8.0 on GTX 1080 having '''2,560''' CUDA Cores. | |||
|- | |||
! Algorithm / Image Size | |||
! 720p | |||
! 1080p | |||
! 4k (3840x2160) | |||
! 8k (7680x4320) | |||
|- | |||
|- | |||
|[[Function:Add|add]] - 2 Images | |||
|0.05 | |||
|0.10 | |||
|0.42 | |||
|1.69 | |||
|- | |||
|[[Function:ChannelMix|channelMix]] | |||
|0.04 | |||
|0.08 | |||
|0.34 | |||
|1.33 | |||
|- | |||
|[[Function:Demosaic|demosaic]] | |||
|0.12 | |||
|0.26 | |||
|1.01 | |||
|4.04 | |||
|- | |||
|[[Function:DemosaicDFPD|demosaicDFPD]] | |||
|0.31 | |||
|0.69 | |||
|2.77 | |||
|10.98 | |||
|- | |||
|[[Function:GammaCorrect|gammaCorrect]] | |||
|0.04 | |||
|0.10 | |||
|0.40 | |||
|1.61 | |||
|- | |||
|[[Function:HistEq|histEq]] - Single Channel | |||
|0.08 | |||
|0.18 | |||
|0.61 | |||
|2.18 | |||
|- | |||
|[[Function:LUT|LUT]] | |||
|0.05 | |||
|0.10 | |||
|0.35 | |||
|1.25 | |||
|- | |||
|[[Function:blackGammaLUT|blackGammaLUT]] | |||
|0.99 | |||
|0.21 | |||
|0.74 | |||
|2.73 | |||
|- | |||
|[[Function:RGB2Gray|rgb2gray]] | |||
|0.02 | |||
|0.05 | |||
|0.21 | |||
|0.83 | |||
|- | |||
|[[Function:FocusStack|focusStack]] - Stacking 5 Images | |||
|8.66 | |||
|14.44 | |||
|65.14 | |||
|270.59 | |||
|- | |||
|[[Function:BitConversion|bitConversion]] - From 8 to 16 bits | |||
|0.06 | |||
|0.14 | |||
|0.58 | |||
|2.30 | |||
|- | |||
|[[Function:Crop|crop]] | |||
|0.03 | |||
|0.07 | |||
|0.23 | |||
|0.93 | |||
|- | |||
|[[Function:Resize|resize]] - Scale=2.0 | |||
|0.19 | |||
|0.41 | |||
|1.70 | |||
|6.83 | |||
|- | |||
|[[Function:Resize|resize]] - Scale=0.5 | |||
|0.02 | |||
|0.04 | |||
|0.14 | |||
|0.58 | |||
|- | |||
|[[Function:Rotate|rotate]] - Non Cropping, Angle = -3.76f | |||
|0.08 | |||
|0.16 | |||
|0.66 | |||
|2.69 | |||
|- | |||
|[[Function:WarpPerspective|warpPerspective]] | |||
|0.08 | |||
|0.22 | |||
|0.79 | |||
|3.21 | |||
|- | |||
|[[Function:ImageFilter|imageFilter]] - 5x5 floating point window | |||
|0.30 | |||
|0.66 | |||
|2.63 | |||
|9.18 | |||
|- | |||
|[[Function:UnderwaterFilter|underwaterFilter]] | |||
|0.45 | |||
|0.96 | |||
|3.39 | |||
|11.62 | |||
|- | |||
|[[Function:haarFwd|haarFwd]] | |||
|0.14 | |||
|0.34 | |||
|1.35 | |||
|5.10 | |||
|} | |||
</tab> | |||
<!-- GTX 1080 Ends--> | |||
<!-- Xavier NX Starts --> | <!-- Xavier NX Starts --> | ||
<tab name="Xavier NX"> | <tab name="Xavier NX"> | ||
{|class="wikitable" | {|class="wikitable" | ||
|+ style="caption-side:bottom; color:#e76700;"|''Kernel Time in milliseconds (ms) with CUVI v1.8.0 on Jetson Xavier NX | <!--|+ style="caption-side:bottom; color:#e76700;"|''Kernel Time in milliseconds (ms) with CUVI v1.8.0 on Jetson Xavier NX--> | ||
|- | |||
|colspan="5" |Kernel Time in milliseconds (ms) with CUVI v1.8.0 on Jetson Xavier NX having '''384''' CUDA Cores. | |||
|- | |- | ||
! Algorithm / Image Size | ! Algorithm / Image Size | ||
Line 93: | Line 356: | ||
|7.57 | |7.57 | ||
|30.32 | |30.32 | ||
|- | |||
|[[Function:Resize|resize]] - Scale=0.5 | |||
|0.08 | |||
|0.33 | |||
|0.82 | |||
|2.89 | |||
|- | |- | ||
|[[Function:Rotate|rotate]] - Non Cropping, Angle = -3.76f | |[[Function:Rotate|rotate]] - Non Cropping, Angle = -3.76f | ||
Line 126: | Line 395: | ||
</tab> | </tab> | ||
<!-- Xavier NX Ends--> | <!-- Xavier NX Ends--> | ||
<!-- RTX 2060 Starts --> | <!-- GTX 1650 Starts --> | ||
<tab name="RTX 2060"> | <tab name="GTX 1650"> | ||
{|class="wikitable" | |||
<!--|+ style="caption-side:bottom; color:#e76700;"|''Kernel Time in milliseconds (ms) with CUVI v1.8.0 on GTX 1650--> | |||
|- | |||
|colspan="5" |Kernel Time in milliseconds (ms) with CUVI v1.8.0 on GTX 1650 having '''896''' CUDA Cores. | |||
|- | |||
! Algorithm / Image Size | |||
! 720p | |||
! 1080p | |||
! 4k (3840x2160) | |||
! 8k (7680x4320) | |||
|- | |||
|- | |||
|[[Function:Add|add]] - 2 Images | |||
|0.08 | |||
|0.18 | |||
|0.72 | |||
|2.92 | |||
|- | |||
|[[Function:ChannelMix|channelMix]] | |||
|0.09 | |||
|0.21 | |||
|0.85 | |||
|3.41 | |||
|- | |||
|[[Function:Demosaic|demosaic]] | |||
|0.35 | |||
|0.78 | |||
|3.53 | |||
|13.1 | |||
|- | |||
|[[Function:DemosaicDFPD|demosaicDFPD]] | |||
|0.75 | |||
|1.69 | |||
|6.74 | |||
|27.1 | |||
|- | |||
|[[Function:GammaCorrect|gammaCorrect]] | |||
|0.18 | |||
|0.41 | |||
|1.60 | |||
|6.34 | |||
|- | |||
|[[Function:HistEq|histEq]] - Single Channel | |||
|0.15 | |||
|0.32 | |||
|1.21 | |||
|9.44 | |||
|- | |||
|[[Function:LUT|LUT]] | |||
|0.05 | |||
|0.11 | |||
|0.42 | |||
|1.74 | |||
|- | |||
|[[Function:blackGammaLUT|blackGammaLUT]] | |||
|0.09 | |||
|0.22 | |||
|0.90 | |||
|3.66 | |||
|- | |||
|[[Function:RGB2Gray|rgb2gray]] | |||
|0.06 | |||
|0.12 | |||
|0.49 | |||
|2.01 | |||
|- | |||
|[[Function:FocusStack|focusStack]] - Stacking 5 Images | |||
|46.10 | |||
|97.24 | |||
|257.62 | |||
|1180.50 | |||
|- | |||
|[[Function:BitConversion|bitConversion]] - From 8 to 16 bits | |||
|0.15 | |||
|0.35 | |||
|1.40 | |||
|5.63 | |||
|- | |||
|[[Function:Crop|crop]] | |||
|0.06 | |||
|0.18 | |||
|0.61 | |||
|2.49 | |||
|- | |||
|[[Function:Resize|resize]] - Scale=2.0 | |||
|0.36 | |||
|0.80 | |||
|3.22 | |||
|12.88 | |||
|- | |||
|[[Function:Resize|resize]] - Scale=0.5 | |||
|0.03 | |||
|0.06 | |||
|0.23 | |||
|0.93 | |||
|- | |||
|[[Function:Rotate|rotate]] - Non Cropping, Angle = -3.76f | |||
|0.14 | |||
|0.33 | |||
|1.30 | |||
|5.16 | |||
|- | |||
|[[Function:WarpPerspective|warpPerspective]] | |||
|0.12 | |||
|0.29 | |||
|1.14 | |||
|4.68 | |||
|- | |||
|[[Function:ImageFilter|imageFilter]] - 5x5 floating point window | |||
|0.97 | |||
|2.17 | |||
|8.66 | |||
|34.64 | |||
|- | |||
|[[Function:UnderwaterFilter|underwaterFilter]] | |||
|0.66 | |||
|1.22 | |||
|4.59 | |||
|18.61 | |||
|- | |||
|[[Function:haarFwd|haarFwd]] | |||
|0.19 | |||
|0.43 | |||
|1.77 | |||
|6.84 | |||
|} | |||
</tab> | |||
<!-- GTX 1650 Ends--> | |||
<!-- RTX 2060 Mobile Starts --> | |||
<tab name="RTX 2060 (Mobile)"> | |||
{|class="wikitable" | {|class="wikitable" | ||
|+ style="caption-side:bottom; color:#e76700;"|''Kernel Time in milliseconds (ms) with CUVI v1.8.0 on RTX 2060 | <!--|+ style="caption-side:bottom; color:#e76700;"|''Kernel Time in milliseconds (ms) with CUVI v1.8.0 on RTX 2060 (Mobile)--> | ||
|- | |||
|colspan="5" |Kernel Time in milliseconds (ms) with CUVI v1.8.0 on RTX 2060 Mobile having '''1,920''' CUDA Cores. | |||
|- | |- | ||
! Algorithm / Image Size | ! Algorithm / Image Size | ||
Line 215: | Line 616: | ||
|2.21 | |2.21 | ||
|8.70 | |8.70 | ||
|- | |||
|[[Function:Resize|resize]] - Scale=0.5 | |||
|0.02 | |||
|0.05 | |||
|0.16 | |||
|0.64 | |||
|- | |- | ||
|[[Function:Rotate|rotate]] - Non Cropping, Angle = -3.76f | |[[Function:Rotate|rotate]] - Non Cropping, Angle = -3.76f | ||
Line 247: | Line 654: | ||
|} | |} | ||
</tab> | </tab> | ||
<!-- RTX 2060 Ends--> | <!-- RTX 2060 Mobile Ends--> | ||
</tabs> | </tabs> | ||
|} | |} | ||
==Color Pipeline== | ==Color Pipeline== | ||
Let's take a typical color pipeline and | Let's take a typical color pipeline and measure its performance on one of the entry level GPUs. Any color pipeline almost always starts with the Raw image. Before converting to RGB, you might want to do some processing on the raw which may include applying LUTs (look up tables), FPN (fixed point noise) removal and fixing white balance. Next comes demosaic/debayer followed by several further enhancement functions and a color space conversion into the desired format. This pipeline can perform in real-time on a decent entry level GPU on an 8k images and at over 100 FPS on a 2k image: | ||
[[File:color_pipeline.png|300px|thumb|left|Color pipeline where each box represents | [[File:color_pipeline.png|300px|thumb|left|Color pipeline where each box represents one or more functions.]] | ||
===Performance=== | ===Performance=== | ||
*Image size: 8k | *Image size: '''8k''' | ||
*Debayer method: DFPD | *Debayer method: DFPD | ||
*RAW Size: 59.9 MB | *RAW Size: 59.9 MB | ||
Line 262: | Line 669: | ||
*Sharpening: 7x7 | *Sharpening: 7x7 | ||
*GPU: GTX 1080 | *GPU: GTX 1080 | ||
*FPS: 26 FPS | *FPS: '''26 FPS''' |
Latest revision as of 15:04, 31 October 2022
Measured with NVIDIA's Performance tools for Windows and Linux. Timing figure represents time of kernel/function in milliseconds (rounded) on a single GPU. The benchmarks are performed on color images with 8-bits per channel except where mentioned otherwise. The list below is a small subset of 100+ features in CUVI.
|
Color Pipeline
Let's take a typical color pipeline and measure its performance on one of the entry level GPUs. Any color pipeline almost always starts with the Raw image. Before converting to RGB, you might want to do some processing on the raw which may include applying LUTs (look up tables), FPN (fixed point noise) removal and fixing white balance. Next comes demosaic/debayer followed by several further enhancement functions and a color space conversion into the desired format. This pipeline can perform in real-time on a decent entry level GPU on an 8k images and at over 100 FPS on a 2k image:
Performance
- Image size: 8k
- Debayer method: DFPD
- RAW Size: 59.9 MB
- Codec: JPEG2000
- Sharpening: 7x7
- GPU: GTX 1080
- FPS: 26 FPS