


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The implementation of a video-based rendering (vbr) system for generating novel views of dynamic scenes using depth images. The concept of image-based rendering (ibr) and its extension to dynamic scenes, the use of depth maps for improving view quality, and methods for constructing and rendering depth images. The document also covers the merging of multiple depth images and the rendering of dynamic scenes.
Typology: Study Guides, Projects, Research
1 / 4
This page cannot be seen from the preview
Don't miss anything!



Figure 1: (i) Top row shows the input views. First three images are from 3 different view points, Next three images are at different time instant of the same 3 view points. (ii) Middle row shows the depth maps for the corresponding input views in the top row. (iii) Bottom row shows the novel views. First three images are rendered at random novel view point of different time instants. Next three images are time freezed versions of the third one at other different novel view points.
Image Based Rendering (IBR) holds a lot of promise for navigating through a real world scene without modeling it manually. IBR tech- niques include those with geometry and those without. Geometric information in the form of a depthmap aligned with the image holds a lot of promise for IBR due to the several methods available to capture it. It can improve the quality of generated views using a limited number of views. When extended to dynamic scenes, it is called Video Based Rendering (VBR). I implemented tool for syn- thetically generating depth images for dyanmic scenes, and created a VBR system.
CR Categories: I.3.3 [Computer Graphics]: Picture/Image Generation—Display Algorithms
Keywords: Image based rendering, Dynamic Scenes
Conventional approaches to render a scene require geometric de- scription such as polygonal meshes, lights and material properties. To render high quality images from such descriptions requires very accurate models and material properties. Getting such accurate geometric models with accurate colors is a very difficult problem.
IBR can be explained using the Figure. 2. It describes the problem of computing the novel image with ’?’ written on top when input views 1 to 4 are given from other view points.
Image Based Rendering holds a lot of promise for navigating through a real world scene without modeling it manually. Different
Figure 2: Describes the problem of generating novel image (labeled with ’?’) from the input views 1 to 4.
representations have been proposed for IBR in the literature. Im- age Based Rendering techniques include those with geometry and those without. Geometric information in the form of a depthmap aligned with the image holds a lot of promise for IBR due to the several methods available to capture it. It can improve the qual- ity of generated views using a limited number of views [Mark et al. 1997; McMillan and Bishop 1995; Zitnick et al. 2004; Sashi Kumar Penta and P. J. Narayanan 2005].
Early IBR efforts produced new views of scenes given two or more images of it [Chen and Williams 1993; Seitz and Dyer 1996]. They needed point-to-point correspondence, which contained all the structural information about the scene in these methods. Many later techniques followed this philosophy of generating new views purely from images. These include methods that represent the scene as a collection of rays, which in the most general case
produced the plenoptic function [Adelson and Bergen 1991]. A new view is generated by picking an appropriate subset of such rays [Levoy and Hanrahan 1996; Gortler et al. 1996; Shum and He 1999]. They require a large number of input views – often running into thousands – for modeling a scene satisfactorily. This makes them practically unusable other than for static scenes. The representation was also bulky and needs sophisticated compression schemes.
The use of approximate geometry for view generation was a sig- nificant contribution of Lumigraph rendering [Gortler et al. 1996]. The availability of even approximate geometry can reduce the re- quirements on number of views drastically. View-dependent texture mapping [Debevec et al. 1996] used known geometry and selects textures relevant to the view being generated to model architectural monuments. Unstructured Lumigraph [Buehler et al. 2001] extend this idea to rendering using an unstructured collection of views and approximate models. The Virtualized Reality system captured dy- namic scenes and modeled them for subsequent rendering using a studio with a few dozens of cameras [Narayanan et al. 1998]. Many similar systems have been built in recent years for modeling, im- mersion, videoconferencing, etc. [Towles et al. 2002; Baker et al. 2002]. Recently, a layered representation with full geometry recov- ery for modeling and rendering dynamic scenes has been reported by Zitnick et al. [Zitnick et al. 2004].
3 Depth Image Representation
The basic representation consists of an image and a depth map aligned with it, along with the camera calibration parameters. The depth is a two-dimensional array of real or integer values, with lo- cation (i, j) storing the depth or normal distance to the point that projects to pixel (i, j) in the image. Figure 1 gives images and depth maps for synthetic scenes from different viewpoints and different time instants. Closer points are shown brighter in the depth map.
4 Depth Image construction
The Depth Image can be created using a suitable 3D structure re- covery method described above. Multicamera stereo remains the most viable option as cameras are inexpensive and non-intrusive. A calibrated, instrumented setup consisting of a dozen or so cam- eras can capture static or dynamic events as they happen. Depth map can be computed for each camera using other cameras in its neighbourhood and a suitable stereo program. The camera image and calibration matrix complete one Depth Image. This is repeated for all cameras resulting in the Depth Image representation of the scene.
Synthetic depth images shown in the Figure 1 are generated using OpenGL. To generate the synthetic depth images of a scene, first scene is rendered from different view points, then color buffer and depth buffer are read from OpenGL and stored as images. This is repeated for every time instant for all the given view points while the scene is animated.
5 Rendering of dynamic depth-images
Depth images can be rendered using splatting or triangulation.
5.1 Splatting
The point cloud can be splatted or rendered as point-features. Splatting techniques broaden the individual 3D points to fill the space between points. The colour of the splatted point is obtained from the corresponding image pixel. Splatting has been used as the method for fast rendering, as point features are quick to ren- der [Rusinkiewicz and Levoy 2000]. The disadvantage of splatting is that holes can show up where data is missing if we zoom in much. Images of Figure 3 have been rendered using this approach. Holes due to shift in viewpoint can be seen on the computer screen and on the people at the back.
Figure 3: Rendered views of the real scene (left) synthetic scene (right) using splatting for rendering.
5.2 Triangulation
A simple triangulation can be imposed on the point cloud as fol- lows: Convert every 2 × 2 section of the depth map into 2 triangles by drawing one of the diagonals. The depth discontinuities are han- dled by breaking all edges with large difference in the z -coordinate between its end points and removing the corresponding triangles from the model. Triangulation results in the interpolation of the interior points of the triangles, filling holes created due to the lack of resolution. The interpolation can produce low quality images if there is considerable gap in the resolutions of the captured and ren- dered views, such as when zooming in. Images of Figure 4 have been rendered using this approach. Holes due to shift in viewpoint can be seen on the computer screen and on the people at the back.
Figure 4: Rendered views of the real scene (left) synthetic scene (right) using triangulation for rendering.
5.3 Rendering multiple Depth Images
The colour and the depth values of each pixel of the new view are available from each Depth Image. The first task is to fill the holes in one view using the others. Each pixel of the new view could con- tain colour and z values from multiple Depth Images. For example, point A and point B of Figure 5 map to the same pixel. The closest point is the correct point and should be chosen to provide colour. In general, when n Depth Images map to a pixel, they should be merged based on the closest z value in the new view. The conven- tional z-buffering algorithm can be used for this and can take advan- tage of hardware acceleration. When a portion of the scene is part of multiple Depth Images, the z -buffer values will be close, as for
MCMILLAN, L., AND BISHOP, G. 1995. Plenoptic Modelling: An Image-Based Rendering Algorithm. In SIGGRAPH.
NARAYANAN, P. J., RANDER, P. W., AND KANADE, T. 1998. Constructing Virtual Worlds Using Dense Stereo. In Proc of the International Conference on Computer Vision.
RUSINKIEWICZ, S., AND LEVOY, M. 2000. QSplat: A multireso- lution point rendering system for large meshes. In SIGGRAPH.
SASHI KUMAR PENTA AND P. J. NARAYANAN. 2005. Compres- sion of Multiple Depth-maps for IBR. In Proc. Pacific Confer- ence on Computer Graphics and Applications.
SEITZ, S. M., AND DYER, C. R. 1996. View Morphing. In SIGGRAPH.
SHUM, H.-Y., AND HE, L.-W. 1999. Rendering with concentric mosaics. In SIGGRAPH.
TOWLES, H., CHEN, W.-C., YANG, R., KAM, S.-U., AND FUCHS, H. 2002. 3D Tele-Collaboration Over Internet2. In International Workshop on Immersive Telepresence (ITP2002).
ZITNICK, C. L., KANG, S. B., UYTTENDAELE, M., WINDER, S., AND SZELISKI, R. 2004. High-quality video view interpolation using a layered representation. In SIGGRAPH.