Elemental Technologies: Harnessing GPU Power for Video Processing - Prof. Jingke Li, Study notes of Computer Science

An insight into elemental technologies, a company specializing in video processing solutions. The company's background, story, and products are discussed, with a focus on their transition from building asics to using cuda for software-based video processing. The benefits of this approach, such as cost reduction and high performance, are highlighted.

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-5nc
koofers-user-5nc 🇺🇸

10 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Harnessing Stream Processors:
massively parallel processing
Jesse Rosenzweig, CTO, [email protected]
April 21st, 2009 Elemental Technologies Incorporated Confidential
Agenda
Company Background
Story of a Startup
The Elemental Video Engine
Elemental Product Line
CUDA introduction
Conclusion
2Elemental Technologies Incorporated Confidential
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Elemental Technologies: Harnessing GPU Power for Video Processing - Prof. Jingke Li and more Study notes Computer Science in PDF only on Docsity!

Harnessing Stream Processors:

massively parallel processing

Jesse Rosenzweig, CTO, [email protected]

April 21st, 2009 (^) Elemental Technologies Incorporated Confidential

Agenda

• Company Background

• Story of a Startup

• The Elemental Video Engine

• Elemental Product Line

• CUDA introduction

• Conclusion

(^2) Elemental Technologies Incorporated Confidential

Company Background

  • Our Mission:
    • To create the fastest, highest quality video solutions by harnessing massively parallel, off-the-shelf hardware.harnessing massively parallel, off the shelf hardware.
  • Founded in 2006
  • Team led display revolution at
  • Headquartered in beautiful Portland, Oregon
  • Profitable in first quarter of revenue (Q4 ‘08)

3

  • Raised $7.1M Series A in June 2008

Elemental Technologies Incorporated Confidential

Story of a Startup

  • Founded August 2006
    • Focus was to build ASIC St d l t d / d

VCU3D Comb IR

  • Standalone transcoder / encoder
  • Estimated cost $20M to revenue
  • Funding sources limited
  • Elemental 2.0: April 2007

TS Demux

PODController

Decryption EncryptionTS Remux

4

p

  • NVIDIA G80 had been released
  • CUDA had been launched
  • Powerful parallel engine available
  • Switched to software model! Elemental Technologies Incorporated Confidential

Disruptive Innovation

  • Elemental’s video harnesses key GPU trends
    1. GPUs have become immensely powerful 2 2. GPUGPUs have become extremely programmable h b t l bl
    2. PCI-e bus allows fast CPU / GPU communication

(^7) Elemental Technologies Incorporated Confidential

Video Engine Pipeline

  • Harnesses both the CPU and GPU strengths
  • Achieves up to 10x performance of CPU-only
  • Efficient use of system resources is key

(^8) Elemental Technologies Incorporated Confidential

Elemental Video Engine

  • Currently used by a variety of applications:
    • Virtualization / Remote Video Distribution
    • U it d St tUnited States Intelligence Community I t lli C it
    • Professional Video Editing

(^9) Elemental Technologies Incorporated Confidential

Product Target Features

Elemental’s Product Line

  • All powered by Elemental core technology

Elemental Video Engine™ SDK Developer^ • Flexible and extensible

  • Supports a variety of codecs Badaboom™ Media Converter Consumer^ • Video on mobile devices
  • 1 million+ downloads Elemental Accelerator for CS4 Professional^ • Premiere Pro plug-in
  • Bundled w/ NVIDIA Quadro CX

10

Q3 ‘08 Q4 ‘08 Q1 ‘09 Q2 ‘09 Q3 ‘

Badaboom™ Media Converter available RapiHD™ Accelerator for Adobe Premiere Pro CS4 available

RapiHD™ SDK available

Elemental Technologies Incorporated Confidential

CUDA Introduction

Elemental Technologies Incorporated Confidential

CUDA Introduction

• What is CUDA?

  • Compute Unified Device Architecture
  • PP arallel processing at a very low levelll l i t l l l
  • Extensions to C

(^14) Elemental Technologies Incorporated Confidential

GPU Hardware Introduction

  • Arrays of

multiprocesors

  • Each multiprocessor has sets of processors
  • Each processor executes the same instruction on different data

15

  • Each processor has access to shared memory

Elemental Technologies Incorporated Confidential

CUDA Introduction

  • Memory types
    • Global/Device  GPU’s DRAMGPU s DRAM. Slowest of all memory Slowest of all memory
    • Constant  Cached global memory for constant read-only data
    • Texture  2D cache and hardware interpolation for global memory
    • Shared  Fast memory (as fast as registers) available to a CUDA block

16

y ( g )

  • Register  Set of general purpose registers available for the thread

Elemental Technologies Incorporated Confidential

CUDA Introduction

  • Typical data flow
    • CPU produces/captures data
    • CC opy data to GPU DRAMd t t GPU DRAM
    • Kernel loads data from DRAM into shared memory
    • Threads execute, in parallel, on data in shared memory
    • Once threads are done (syncthreads), move data back into GPU DRAM
    • Move results back to CPU

19

Move results back to CPU

Elemental Technologies Incorporated Confidential

CUDA Introduction

  • Occupancy
    • The ratio of the number of active warps peractive warps per multiprocessor to the maximum number of active warps
    • Current NVIDIA GPU capability has a max of 32 active warps

20

active warps

  • Higher occupancy is not necessarily faster for any given algorithm, but is a measure of how much work can be done per clock.Elemental Technologies Incorporated Confidential

CUDA Introduction

  • Optimize kernels by
    • minimizing registers => simple algorithms
    • Mi iMinimizing shared memory usage => resourceful mem i i h d f l management
    • Maximizing warps per block => give the device enough work.
    • Good memory access  Coalesced global reads and writes R d b k fli t h d

21

 Reduce bank conflicts on shared memory.

Elemental Technologies Incorporated Confidential

CUDA Introduction

  • Example –

Matrix Multiply

  • Each thread block is responsible for computing one square sub-matrix Csub of C;
  • Each thread within the block is responsible

22

for computing one

element of Csub.

Elemental Technologies Incorporated Confidential

CUDA Introduction

  • GPU Side (part 2)
    • Load shared memory with datawith data
    • Do matrix multiply in parallel
    • Write result to global memory

(^25) Elemental Technologies Incorporated Confidential

CUDA Introduction

  • Performance for A[48,80] * B[128, 48] =

C[128,80]

  • GPU 10ms (5.4x faster)
  • CPU 54ms
  • 491k multiplies and 491k adds.

(^26) Elemental Technologies Incorporated Confidential

CUDA Introduction

  • Performance for A[48,8000] * B[12800, 48] =

C[12800,8000]

  • GPU 663ms ( 14.2x faster )
  • CPU 9,483ms
  • ~5 billion multiplies and adds.

(^27) Elemental Technologies Incorporated Confidential

Compute competition

  • CUDA only for NVIDIA, but Mac, Linux and

Windows supported

  • OpenCL (Apple) and DX11 (Microsoft) for all

GPU and CPU platforms.

(^28) Elemental Technologies Incorporated Confidential

More information

  • CUDA: www.nvidia.com/CUDA
  • OpenCL: www.khronos.org/opencl/
  • DX11 Compute: DirectX March 2009 release
  • www.elementaltechnologies.com!

(^31) Elemental Technologies Incorporated Confidential