Difference between revisions of "Performance & Benchmark"

From CUVI Wiki
 
(135 intermediate revisions by the same user not shown)
Line 1: Line 1:
If one thing CUVI gives you, it's performance boost over competitive libraries and solutions. Using GPGPU as the underlying hardware, Imaging and Vision modules get maximum benefit due to their inherent parallel algorithms. In addition to cost cutting on CPU-based clusters, CUVI gives up to '''15x''' speedup over Intel IPP.
Measured with NVIDIA's Performance tools for Windows and Linux. Timing figure represents time of kernel/function in milliseconds (rounded) on a single GPU. The benchmarks are performed on color images with 8-bits per channel except where mentioned otherwise. The list below is a small subset of [[CUVI_Features|100+ features]] in CUVI.


[[File:CUVI_Speedup.jpg|center|border]]
{|
 
|style="font-size:85%;"|
Applications using CUVI are generally ten times faster than their CPU counterpart. CUVI framework also gives the ease to scale the application on more than one GPU making it as fast as you want.
<tabs>
 
<!-- Jetson Nano Starts -->
[[File:CUVI_Bench.jpg|center|border]]
<tab name="Jetson Nano">
 
{|class="wikitable"
==Benchmark==
<!--|+ style="caption-side:bottom; color:#e76700;"|''Kernel Time in milliseconds (ms) with CUVI v1.7.0 on Jetson Nano-->
The following benchmark is performed on NVIDIA GTX 1080 via Nsight for Performance tool on Windows 10 (64-bit) and CUDA toolkit version 9.1. Timing figure represents frames per second (fps) based on only the processing time on the single GPU. The benchmarks are performed on 8-bit images except if mentioned otherwise. The benchmarks for 16-bit demosaicDFPD on 1080p, 4k and 8k image are 1550fps, 412fps and 94fps.
|-
|colspan="5" |Kernel Time in milliseconds (ms) with CUVI v1.7.0 on Jetson Nano having '''128''' CUDA Cores.
|-
! Algorithm / Image Size
! 720p
! 1080p
! 4k (3840x2160)
! 8k (7680x4320)
|-
|-
|[[Function:Add|add]] - 2 Images
|2.99
|8.38
|15.63
|50.27
|-
|[[Function:ChannelMix|channelMix]]
|4.09
|7.42
|15.70
|53.35
|-
|[[Function:Demosaic|demosaic]]
|8.11
|11.77
|42.99
|172.40
|-
|[[Function:DemosaicDFPD|demosaicDFPD]]
|12.6
|23.86
|88.87
|357.94
|-
|[[Function:GammaCorrect|gammaCorrect]]
|3.12
|5.69
|14.13
|45.80
|-
|[[Function:HistEq|histEq]] - Single Channel
|5.29
|7.88
|20.53
|61.52
|-
|[[Function:LUT|LUT]]
|1.89
|2.77
|11.26
|25.42
|-
|[[Function:blackGammaLUT|blackGammaLUT]]
|4.08
|7.52
|18.36
|62.27
|-
|[[Function:RGB2Gray|rgb2gray]]
|1.95
|2.67
|10.64
|20.89
|-
|[[Function:FocusStack|focusStack]] - Stacking 5 Images
|252.52
|452.81
|1830.54
|7320.52
|-
|[[Function:BitConversion|bitConversion]] - From 8 to 16 bits
|3.74
|8.46
|15.71
|63.33
|-
|[[Function:Crop|crop]]
|1.67
|4.76
|9.04
|28.93
|-
|[[Function:Resize|resize]] - Scale=2.0
|9.29
|18.43
|55.58
|222.41
|-
|[[Function:Resize|resize]] - Scale=0.5
|2.33
|5.23
|7.01
|17.08
|-
|[[Function:Rotate|rotate]] - Non Cropping, Angle = -3.76f
|3.43
|7.73
|26.00
|58.34
|-
|[[Function:WarpPerspective|warpPerspective]]
|5.09
|11.48
|19.86
|81.36
|-
|[[Function:ImageFilter|imageFilter]] - 5x5 floating point window
|18.44
|26.80
|89.58
|358.29
|-
|[[Function:UnderwaterFilter|underwaterFilter]]
|29.55
|50.74
|79.09
|332.95
|-
|[[Function:haarFwd|haarFwd]]
|10.27
|18.60
|40.45
|130.86
|}
</tab>
<!-- Jetson Nano Ends-->
<!-- GTX 1080 Starts -->
<tab name="GTX 1080">
{|class="wikitable"
<!--|+ style="caption-side:bottom; color:#e76700;"|''Kernel Time in milliseconds (ms) with CUVI v1.8.0 on GTX 1080-->
|-
|colspan="5" |Kernel Time in milliseconds (ms) with CUVI v1.8.0 on GTX 1080 having '''2,560''' CUDA Cores.
|-
! Algorithm / Image Size
! 720p
! 1080p
! 4k (3840x2160)
! 8k (7680x4320)
|-
|-
|[[Function:Add|add]] - 2 Images
|0.05
|0.10
|0.42
|1.69
|-
|[[Function:ChannelMix|channelMix]]
|0.04
|0.08
|0.34
|1.33
|-
|[[Function:Demosaic|demosaic]]
|0.12
|0.26
|1.01
|4.04
|-
|[[Function:DemosaicDFPD|demosaicDFPD]]
|0.31
|0.69
|2.77
|10.98
|-
|[[Function:GammaCorrect|gammaCorrect]]
|0.04
|0.10
|0.40
|1.61
|-
|[[Function:HistEq|histEq]] - Single Channel
|0.08
|0.18
|0.61
|2.18
|-
|[[Function:LUT|LUT]]
|0.05
|0.10
|0.35
|1.25
|-
|[[Function:blackGammaLUT|blackGammaLUT]]
|0.99
|0.21
|0.74
|2.73
|-
|[[Function:RGB2Gray|rgb2gray]]
|0.02
|0.05
|0.21
|0.83
|-
|[[Function:FocusStack|focusStack]] - Stacking 5 Images
|8.66
|14.44
|65.14
|270.59
|-
|[[Function:BitConversion|bitConversion]] - From 8 to 16 bits
|0.06
|0.14
|0.58
|2.30
|-
|[[Function:Crop|crop]]
|0.03
|0.07
|0.23
|0.93
|-
|[[Function:Resize|resize]] - Scale=2.0
|0.19
|0.41
|1.70
|6.83
|-
|[[Function:Resize|resize]] - Scale=0.5
|0.02
|0.04
|0.14
|0.58
|-
|[[Function:Rotate|rotate]] - Non Cropping, Angle = -3.76f
|0.08
|0.16
|0.66
|2.69
|-
|[[Function:WarpPerspective|warpPerspective]]
|0.08
|0.22
|0.79
|3.21
|-
|[[Function:ImageFilter|imageFilter]] - 5x5 floating point window
|0.30
|0.66
|2.63
|9.18
|-
|[[Function:UnderwaterFilter|underwaterFilter]]
|0.45
|0.96
|3.39
|11.62
|-
|[[Function:haarFwd|haarFwd]]
|0.14
|0.34
|1.35
|5.10
|}
</tab>
<!-- GTX 1080 Ends-->
<!-- Xavier NX Starts -->
<tab name="Xavier NX">
{|class="wikitable"
<!--|+ style="caption-side:bottom; color:#e76700;"|''Kernel Time in milliseconds (ms) with CUVI v1.8.0 on Jetson Xavier NX-->
|-
|colspan="5" |Kernel Time in milliseconds (ms) with CUVI v1.8.0 on Jetson Xavier NX having '''384''' CUDA Cores.


|-
! Algorithm / Image Size
! 720p
! 1080p
! 4k (3840x2160)
! 8k (7680x4320)
|-
|-
|[[Function:Add|add]] - 2 Images
|0.29
|0.61
|2.04
|8.61
|-
|[[Function:ChannelMix|channelMix]]
|0.27
|0.61
|2.31
|9.02
|-
|[[Function:Demosaic|demosaic]]
|1.87
|2.3
|9.17
|36.74
|-
|[[Function:DemosaicDFPD|demosaicDFPD]]
|2.33
|4.96
|19.07
|77.75
|-
|[[Function:GammaCorrect|gammaCorrect]]
|0.22
|0.48
|1.89
|7.47
|-
|[[Function:HistEq|histEq]] - Single Channel
|0.68
|0.92
|3.24
|9.20
|-
|[[Function:LUT|LUT]]
|0.10
|0.30
|0.86
|3.28
|-
|[[Function:blackGammaLUT|blackGammaLUT]]
|0.36
|0.68
|1.86
|7.29
|-
|[[Function:RGB2Gray|rgb2gray]]
|0.14
|0.25
|0.96
|3.83
|-
|[[Function:FocusStack|focusStack]] - Stacking 5 Images
|142.56
|285.95
|1103.14
|4399.84
|-
|[[Function:BitConversion|bitConversion]] - From 8 to 16 bits
|0.38
|0.77
|3.12
|12.34
|-
|[[Function:Crop|crop]]
|0.13
|0.48
|2.05
|6.05
|-
|[[Function:Resize|resize]] - Scale=2.0
|0.85
|1.90
|7.57
|30.32
|-
|[[Function:Resize|resize]] - Scale=0.5
|0.08
|0.33
|0.82
|2.89
|-
|[[Function:Rotate|rotate]] - Non Cropping, Angle = -3.76f
|0.23
|0.49
|1.90
|7.64
|-
|[[Function:WarpPerspective|warpPerspective]]
|0.24
|0.68
|2.26
|9.38
|-
|[[Function:ImageFilter|imageFilter]] - 5x5 floating point window
|2.97
|7.89
|23.76
|108.21
|-
|[[Function:UnderwaterFilter|underwaterFilter]]
|1.57
|3.49
|13.6
|47.39
|-
|[[Function:haarFwd|haarFwd]]
|1.07
|2.39
|6.47
|25.70
|}
</tab>
<!-- Xavier NX Ends-->
<!-- GTX 1650 Starts -->
<tab name="GTX 1650">
{|class="wikitable"
<!--|+ style="caption-side:bottom; color:#e76700;"|''Kernel Time in milliseconds (ms) with CUVI v1.8.0 on GTX 1650-->
|-
|colspan="5" |Kernel Time in milliseconds (ms) with CUVI v1.8.0 on GTX 1650 having '''896''' CUDA Cores.
|-
! Algorithm / Image Size
! 720p
! 1080p
! 4k (3840x2160)
! 8k (7680x4320)
|-
|-
|[[Function:Add|add]] - 2 Images
|0.08
|0.18
|0.72
|2.92
|-
|[[Function:ChannelMix|channelMix]]
|0.09
|0.21
|0.85
|3.41
|-
|[[Function:Demosaic|demosaic]]
|0.35
|0.78
|3.53
|13.1
|-
|[[Function:DemosaicDFPD|demosaicDFPD]]
|0.75
|1.69
|6.74
|27.1
|-
|[[Function:GammaCorrect|gammaCorrect]]
|0.18
|0.41
|1.60
|6.34
|-
|[[Function:HistEq|histEq]] - Single Channel
|0.15
|0.32
|1.21
|9.44
|-
|[[Function:LUT|LUT]]
|0.05
|0.11
|0.42
|1.74
|-
|[[Function:blackGammaLUT|blackGammaLUT]]
|0.09
|0.22
|0.90
|3.66
|-
|[[Function:RGB2Gray|rgb2gray]]
|0.06
|0.12
|0.49
|2.01
|-
|[[Function:FocusStack|focusStack]] - Stacking 5 Images
|46.10
|97.24
|257.62
|1180.50
|-
|[[Function:BitConversion|bitConversion]] - From 8 to 16 bits
|0.15
|0.35
|1.40
|5.63
|-
|[[Function:Crop|crop]]
|0.06
|0.18
|0.61
|2.49
|-
|[[Function:Resize|resize]] - Scale=2.0
|0.36
|0.80
|3.22
|12.88
|-
|[[Function:Resize|resize]] - Scale=0.5
|0.03
|0.06
|0.23
|0.93
|-
|[[Function:Rotate|rotate]] - Non Cropping, Angle = -3.76f
|0.14
|0.33
|1.30
|5.16
|-
|[[Function:WarpPerspective|warpPerspective]]
|0.12
|0.29
|1.14
|4.68
|-
|[[Function:ImageFilter|imageFilter]] - 5x5 floating point window
|0.97
|2.17
|8.66
|34.64
|-
|[[Function:UnderwaterFilter|underwaterFilter]]
|0.66
|1.22
|4.59
|18.61
|-
|[[Function:haarFwd|haarFwd]]
|0.19
|0.43
|1.77
|6.84
|}
</tab>
<!-- GTX 1650 Ends-->
<!-- RTX 2060 Mobile Starts -->
<tab name="RTX 2060 (Mobile)">
{|class="wikitable"
{|class="wikitable"
<!--|+ style="caption-side:bottom; color:#e76700;"|''Kernel Time in milliseconds (ms) with CUVI v1.8.0 on RTX 2060 (Mobile)-->
|-
|colspan="5" |Kernel Time in milliseconds (ms) with CUVI v1.8.0 on RTX 2060 Mobile having '''1,920''' CUDA Cores.
|-
! Algorithm / Image Size
! 720p
! 1080p
! 4k (3840x2160)
! 8k (7680x4320)
|-
|-
|[[Function:Add|add]] - 2 Images
|0.06
|0.14
|0.51
|2.01
|-
|[[Function:ChannelMix|channelMix]]
|0.07
|0.14
|0.55
|2.25
|-
|[[Function:Demosaic|demosaic]]
|0.24
|0.53
|2.10
|8.10
|-
|[[Function:DemosaicDFPD|demosaicDFPD]]
|0.52
|1.22
|4.53
|18.1
|-
|[[Function:GammaCorrect|gammaCorrect]]
|0.12
|0.28
|1.02
|4.30
|-
|[[Function:HistEq|histEq]] - Single Channel
|0.21
|0.24
|0.84
|3.10
|-
|-
!
|[[Function:LUT|LUT]]
! 1080p Full HD
|0.03
! 4k Ultra HD
|0.08
! 8k Ultra HD
|0.29
|1.20
|-
|-
| [[Function:AutoColor| Auto Color]]
|[[Function:blackGammaLUT|blackGammaLUT]]  
| 7088.83 fps
|0.069
| 1850.02 fps
|0.16
| 461.36 fps
|0.61
|2.50
|-
|-
| [[Function:DemosaicDFPD|Demosaic (DFPD)]]
|[[Function:RGB2Gray|rgb2gray]]
| 1707.94 fps
|0.04
| 412.72 fps
|0.09
| 101.86 fps
|0.34
|1.43
|-
|-
| [[Function:Demosaic|Demosaic (Linear)]]
|[[Function:FocusStack|focusStack]] - Stacking 5 Images
| 4258.88 fps
|25.77
| 1025.64 fps
|55.86
| 234.66 fps
|221.60
|605.53
|-
|-
| [[Function:Lowlight| Low Light Enhancement]]
|[[Function:BitConversion|bitConversion]] - From 8 to 16 bits
| 2143.02 fps
|0.01
| 525.16 fps
|0.24
| 145.52 fps
|0.95
|3.81
|-
|-
| [[Function:Resize|Resize (2x - Nearest Neighbor)]]
|[[Function:Crop|crop]]
| 4169.51 fps
|0.04
| 1048.44 fps
|0.12
| 260.164 fps
|0.41
|1.70
|-
|-
| [[Function:Resize|Resize (2x - Linear)]]
|[[Function:Resize|resize]] - Scale=2.0
| 2494.80 fps
|0.25
| 613.65 fps
|0.55
| 151.53 fps
|2.21
|8.70
|-
|-
| [[Function:Resize|Resize (2x - Cubic)]]
|[[Function:Resize|resize]] - Scale=0.5
| 1778.42 fps
|0.02
| 456.68 fps
|0.05
| 108.44 fps
|0.16
|0.64
|-
|-
| [[Function:Resize|Resize (0.5x - Nearest Neighbor)]]
|[[Function:Rotate|rotate]] - Non Cropping, Angle = -3.76f
| 47,265.68 fps
|0.04
| 12,396.48 fps
|0.09
| 3145.28 fps
|0.36
|1.11
|-
|-
| [[Function:Resize|Resize (0.5x - Linear)]]
|[[Function:WarpPerspective|warpPerspective]]
| 26,365.05 fps
|0.08
| 6793.71 fps
|0.20
| 1703.32 fps
|0.77
|3.10
|-
|-
| [[Function:Resize|Resize (0.5x - Cubic)]]
|[[Function:ImageFilter|imageFilter]] - 5x5 floating point window
| 11,232.92 fps
|0.65
| 3143.94 fps
|1.56
| 799.00 fps
|5.81
|13.7
|-
|-
|[[Function:UnderwaterFilter|underwaterFilter]]
|0.53
|1.10
|4.00
|15.2
|-
|[[Function:haarFwd|haarFwd]]
|0.14
|0.30
|1.21
|4.90
|}
|}
</tab>
<!-- RTX 2060 Mobile Ends-->
</tabs>
|}
==Color Pipeline==
Let's take a typical color pipeline and measure its performance on one of the entry level GPUs. Any color pipeline almost always starts with the Raw image. Before converting to RGB, you might want to do some processing on the raw which may include applying LUTs (look up tables), FPN (fixed point noise) removal and fixing white balance. Next comes demosaic/debayer followed by several further enhancement functions and a color space conversion into the desired format. This pipeline can perform in real-time on a decent entry level GPU on an 8k images and at over 100 FPS on a 2k image:
[[File:color_pipeline.png|300px|thumb|left|Color pipeline where each box represents one or more functions.]]
===Performance===
*Image size: '''8k'''
*Debayer method: DFPD
*RAW Size: 59.9 MB
*Codec: JPEG2000
*Sharpening: 7x7
*GPU: GTX 1080
*FPS: '''26 FPS'''

Latest revision as of 15:04, 31 October 2022

Measured with NVIDIA's Performance tools for Windows and Linux. Timing figure represents time of kernel/function in milliseconds (rounded) on a single GPU. The benchmarks are performed on color images with 8-bits per channel except where mentioned otherwise. The list below is a small subset of 100+ features in CUVI.

Kernel Time in milliseconds (ms) with CUVI v1.7.0 on Jetson Nano having 128 CUDA Cores.
Algorithm / Image Size 720p 1080p 4k (3840x2160) 8k (7680x4320)
add - 2 Images 2.99 8.38 15.63 50.27
channelMix 4.09 7.42 15.70 53.35
demosaic 8.11 11.77 42.99 172.40
demosaicDFPD 12.6 23.86 88.87 357.94
gammaCorrect 3.12 5.69 14.13 45.80
histEq - Single Channel 5.29 7.88 20.53 61.52
LUT 1.89 2.77 11.26 25.42
blackGammaLUT 4.08 7.52 18.36 62.27
rgb2gray 1.95 2.67 10.64 20.89
focusStack - Stacking 5 Images 252.52 452.81 1830.54 7320.52
bitConversion - From 8 to 16 bits 3.74 8.46 15.71 63.33
crop 1.67 4.76 9.04 28.93
resize - Scale=2.0 9.29 18.43 55.58 222.41
resize - Scale=0.5 2.33 5.23 7.01 17.08
rotate - Non Cropping, Angle = -3.76f 3.43 7.73 26.00 58.34
warpPerspective 5.09 11.48 19.86 81.36
imageFilter - 5x5 floating point window 18.44 26.80 89.58 358.29
underwaterFilter 29.55 50.74 79.09 332.95
haarFwd 10.27 18.60 40.45 130.86
Kernel Time in milliseconds (ms) with CUVI v1.8.0 on GTX 1080 having 2,560 CUDA Cores.
Algorithm / Image Size 720p 1080p 4k (3840x2160) 8k (7680x4320)
add - 2 Images 0.05 0.10 0.42 1.69
channelMix 0.04 0.08 0.34 1.33
demosaic 0.12 0.26 1.01 4.04
demosaicDFPD 0.31 0.69 2.77 10.98
gammaCorrect 0.04 0.10 0.40 1.61
histEq - Single Channel 0.08 0.18 0.61 2.18
LUT 0.05 0.10 0.35 1.25
blackGammaLUT 0.99 0.21 0.74 2.73
rgb2gray 0.02 0.05 0.21 0.83
focusStack - Stacking 5 Images 8.66 14.44 65.14 270.59
bitConversion - From 8 to 16 bits 0.06 0.14 0.58 2.30
crop 0.03 0.07 0.23 0.93
resize - Scale=2.0 0.19 0.41 1.70 6.83
resize - Scale=0.5 0.02 0.04 0.14 0.58
rotate - Non Cropping, Angle = -3.76f 0.08 0.16 0.66 2.69
warpPerspective 0.08 0.22 0.79 3.21
imageFilter - 5x5 floating point window 0.30 0.66 2.63 9.18
underwaterFilter 0.45 0.96 3.39 11.62
haarFwd 0.14 0.34 1.35 5.10
Kernel Time in milliseconds (ms) with CUVI v1.8.0 on Jetson Xavier NX having 384 CUDA Cores.
Algorithm / Image Size 720p 1080p 4k (3840x2160) 8k (7680x4320)
add - 2 Images 0.29 0.61 2.04 8.61
channelMix 0.27 0.61 2.31 9.02
demosaic 1.87 2.3 9.17 36.74
demosaicDFPD 2.33 4.96 19.07 77.75
gammaCorrect 0.22 0.48 1.89 7.47
histEq - Single Channel 0.68 0.92 3.24 9.20
LUT 0.10 0.30 0.86 3.28
blackGammaLUT 0.36 0.68 1.86 7.29
rgb2gray 0.14 0.25 0.96 3.83
focusStack - Stacking 5 Images 142.56 285.95 1103.14 4399.84
bitConversion - From 8 to 16 bits 0.38 0.77 3.12 12.34
crop 0.13 0.48 2.05 6.05
resize - Scale=2.0 0.85 1.90 7.57 30.32
resize - Scale=0.5 0.08 0.33 0.82 2.89
rotate - Non Cropping, Angle = -3.76f 0.23 0.49 1.90 7.64
warpPerspective 0.24 0.68 2.26 9.38
imageFilter - 5x5 floating point window 2.97 7.89 23.76 108.21
underwaterFilter 1.57 3.49 13.6 47.39
haarFwd 1.07 2.39 6.47 25.70
Kernel Time in milliseconds (ms) with CUVI v1.8.0 on GTX 1650 having 896 CUDA Cores.
Algorithm / Image Size 720p 1080p 4k (3840x2160) 8k (7680x4320)
add - 2 Images 0.08 0.18 0.72 2.92
channelMix 0.09 0.21 0.85 3.41
demosaic 0.35 0.78 3.53 13.1
demosaicDFPD 0.75 1.69 6.74 27.1
gammaCorrect 0.18 0.41 1.60 6.34
histEq - Single Channel 0.15 0.32 1.21 9.44
LUT 0.05 0.11 0.42 1.74
blackGammaLUT 0.09 0.22 0.90 3.66
rgb2gray 0.06 0.12 0.49 2.01
focusStack - Stacking 5 Images 46.10 97.24 257.62 1180.50
bitConversion - From 8 to 16 bits 0.15 0.35 1.40 5.63
crop 0.06 0.18 0.61 2.49
resize - Scale=2.0 0.36 0.80 3.22 12.88
resize - Scale=0.5 0.03 0.06 0.23 0.93
rotate - Non Cropping, Angle = -3.76f 0.14 0.33 1.30 5.16
warpPerspective 0.12 0.29 1.14 4.68
imageFilter - 5x5 floating point window 0.97 2.17 8.66 34.64
underwaterFilter 0.66 1.22 4.59 18.61
haarFwd 0.19 0.43 1.77 6.84
Kernel Time in milliseconds (ms) with CUVI v1.8.0 on RTX 2060 Mobile having 1,920 CUDA Cores.
Algorithm / Image Size 720p 1080p 4k (3840x2160) 8k (7680x4320)
add - 2 Images 0.06 0.14 0.51 2.01
channelMix 0.07 0.14 0.55 2.25
demosaic 0.24 0.53 2.10 8.10
demosaicDFPD 0.52 1.22 4.53 18.1
gammaCorrect 0.12 0.28 1.02 4.30
histEq - Single Channel 0.21 0.24 0.84 3.10
LUT 0.03 0.08 0.29 1.20
blackGammaLUT 0.069 0.16 0.61 2.50
rgb2gray 0.04 0.09 0.34 1.43
focusStack - Stacking 5 Images 25.77 55.86 221.60 605.53
bitConversion - From 8 to 16 bits 0.01 0.24 0.95 3.81
crop 0.04 0.12 0.41 1.70
resize - Scale=2.0 0.25 0.55 2.21 8.70
resize - Scale=0.5 0.02 0.05 0.16 0.64
rotate - Non Cropping, Angle = -3.76f 0.04 0.09 0.36 1.11
warpPerspective 0.08 0.20 0.77 3.10
imageFilter - 5x5 floating point window 0.65 1.56 5.81 13.7
underwaterFilter 0.53 1.10 4.00 15.2
haarFwd 0.14 0.30 1.21 4.90

Color Pipeline

Let's take a typical color pipeline and measure its performance on one of the entry level GPUs. Any color pipeline almost always starts with the Raw image. Before converting to RGB, you might want to do some processing on the raw which may include applying LUTs (look up tables), FPN (fixed point noise) removal and fixing white balance. Next comes demosaic/debayer followed by several further enhancement functions and a color space conversion into the desired format. This pipeline can perform in real-time on a decent entry level GPU on an 8k images and at over 100 FPS on a 2k image:

Color pipeline where each box represents one or more functions.

Performance

  • Image size: 8k
  • Debayer method: DFPD
  • RAW Size: 59.9 MB
  • Codec: JPEG2000
  • Sharpening: 7x7
  • GPU: GTX 1080
  • FPS: 26 FPS