
























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Virtual Environments; Subject: Computer Science; University: Georgia Institute of Technology-Main Campus; Term: Unknown 2002;
Typology: Study notes
1 / 96
This page cannot be seen from the preview
Don't miss anything!

























































































Over the past decade, interest in interactive 3D graphics has increased dramatically. A decade ago interactive 3D graphics was largely restricted to the research laboratory, but now it is the topic of popular books and newspaper and magazine articles. There are several reasons for this increase in interest. First, display hardware for 3D graphics is now widely available in the PC marketplace. This was definitely not the case a decade or more ago, when any form of 3D graphics hardware was very expensive. Second, the use of 2D graphics in user interfaces is now well established, so the idea of interactive 3D graphics does not seem so strange. Now, every PC has a graphical user interface, but until the middle 1980’s, the user interfaces were still based on command-line interactions. Third, the general population is now more sensitive to computer technology and the effect it can have on their lives. A significant segment of the population uses computers at work or has a computer or video game at home. Thus, advances in computer technology can be of direct use to them. The popular press uses terms like Virtual Reality to describe interactive 3D computer systems. futuristic ap- plications that contain a considerable amount of hype about the potentials of this technology. While these articles may be fun to read, they do not accurately represent the current state of the art in this technology. In this book, we present the technical details behind this technology, and how to use it in real applications. A number of terms have been used for this new technology. Our favorite term is Highly Interactive Three Dimensional (HITD) user interfaces, since this covers a wide range of user interfaces that make use of interactive 3D graphics technology. A HITD user interfaces addresses a 3D application, uses 3D input and output devices, displays information in a 3D format and uses natural 3D interactions. It provides the user with a complete 3D environment for solving his or her problem. The term Virtual Reality (VR) is usually used to describe an interaction 3D computer system that simulates a virtual world or some aspect of the real world. The emphasis is on the creation of a 3D world that the user can explore and interact with. VR is a subset of HITD user interfaces, and most HITD user interface techniques are are used to develop VR user interfaces. The term VR has been associated with a lot of hype, so some researchers in the field have used other terms, such as Virtual Worlds or Virtual Environments , to describe this type of user interface and avoid the hype associated with VR. While this may make them feel a little bit better, the question eventually arises – ”aren’t you really doing VR?”
Why should we bother with interactive 3D graphics? The use of 2D graphics is already well established and a wide range of software exists to support the development of 2D applications. Three-dimensional graphics is much more complex and requires a new set of interaction techniques and software tools. What is it about interactive 3D graphics that will make all this extra work worthwhile? We can address these concerns by examining the following four issues:
1
1.2: A Brief History 3
The ideas behind interactive 3D graphics and the construction of virtual environments have been around since the early 1960’s. This section provides a brief overview of the history of the field with an emphasis on technical advances. Most of the early work in this field was based on custom built hardware, so there were few people working in the field in the early years. Once commercial hardware became available in the late 1980’s the number of researchers in the field grew rapidly. The early work in this field was done by Ivan Sutherland when he was at Harvard University in the 1960’s [35]. At that time Sutherland developed the first head-mounted computer display. This display was based on two miniature CRT’s that were mounted on either side of the user’s head. A mirrored optics system was used bring the images to the user’s eyes. All the graphics were wire-frame and they were produced by special-purpose display hardware. Two techniques were used for head tracking. One system was based on ultrasound and was somewhat noisy and inaccurate. The other technique was based on the use of a mechanical linkage between the user’s head and the ceiling of the laboratory. The two main problems with this display were the very high voltage that was required to drive the CRT’s and the very limited form of 3D graphics that was available. A considerable amount of pioneering work has been done at the University of North Carolina at Chapel Hill (UNC), led by Fred Brooks. They have been active in the area of interactive 3D graphics since their work on interactive molecular modeling in the 1970’s. The UNC team also did extensive work on the use of force feedback in user interfaces. However, Brooks observed that in the early 1970’s, computers were simply not fast enough for the enormous computational demands of interactive 3D graphics, so he set aside this research agenda until CPU speeds were sufficient [6]. Brooks took up the HITD agenda again in the mid 1980’s, creating the first interactive architectural visualization system in 1986 [4]. In 1986 the first ACM Siggraph Symposium on Interactive 3D Graphics was held at UNC. This symposium in many ways marked the start of current research in interactive 3D graphics One of the first affordable VR systems was developed by Mike McGreevy and Scott Fisher at NASA Ames research center [11]. Their main goal was to provide computer support for the space station and the display of planetary data. This system was based on the use of a head-mounted display and the VPL DataGlove. The head-mounted display was custom built using two LCD’s from portable televisions. The LCD’s were mounted on a motorcycle helmet (later a bicycle helmet was used) and a wide-angle optics system was placed between the displays and the user’s eyes. This produced an affordable and effective head-mounted display. VPL Ltd., an early manufacturer of VR systems, used this design as the basis for their first head-mounted display. By today’s standards, the graphics and interaction in this system were quite primitive, but in many ways it marks the start of current research efforts in VR. VPL was probably the first commercial enterprise selling VR hardware and software. Indeed, VPL’s Jaron Lanier coined the term “Virtual Reality”, and played a significant part in bringing popular attention to VR. In 1987, VPL introduced the DataGlove, which was the first device for measuring hand position, orientation, and finger bend angles with a reasonable degree of accuracy and portability. They later produced a commercial version of the NASA head-mounted display. VPL also produced software packages to support the design of virtual environments. In the late 1980’s they were selling complete hardware and software systems, called “Reality Built for Two”, or RB2, that could be used to produce prototype VR applications. The availability of the VPL hardware made it possible for a number of research groups to start exploring the software techniques required to produce good 3D user interfaces. In 1990 Mark Green and Rob Jacob organized a workshop at Siggraph dealing with non-standard user interface technologies. The title of this workshop was ”non-WIMP User Interfaces”. Its main purpose was to encourage user interface designers and researchers to think about user interface styles that are different from traditional 2D Windows, Icons, Menus and Pointing (WIMP) user interfaces. This includes 3D user interfaces and VR in addition to pen-based interaction, wearable computers, and some forms of multimedia. The researchers at this workshop identified 5 characteristics that separate non-WIMP user interfaces from more traditional 2D user interfaces, and these characteristics are typical of the user interfaces addresses in this book. The 5 characteristics are:
16th April 2002 1.2: A Brief History Page: 3
4 Chapter 1: Introduction
a DataGlove has 16 degrees of freedom (3 for position, 3 for orientation, and 10 for finger bend angles). Similarly, most simulation applications allow the user to control many parameters and in many cases several parameters can be controlled simultaneously.
What are the successful applications of VR and highly interactive 3D? Until the late 1990’s VR had great potential, but few real successes. The one exception to this is 3D video games such as Mario World, Gauntlet Legends, DOOM, Quake, Half-Life, Tekken, F355, and their successors^1. It is the popularity of these games that has created consumer demand for high-performance 3D graphics, allowing economies of scale to bring commodity 3D graphics to the PC platform. In the Action genre of 3D games (Super Mario World, Gauntlet Legends), one moves around a complex 3D space in the pursuit of an overarching goal, and the player views a representation of his/her character from a view above the action. By contrast, in the First-Person Shooter genre (Doom, Quake, Half-Life), the player sees the action from the character’s point of view, and usually progresses by killing every enemy character in sight. The Adventure genre (Shenmue, Myst, The Longest Journey) advances the action by the solving of puzzles rather than by combat. In the fighting genre (Tekken, Soul Calibur), one-on-one fisticuffs take place from a third-person view, with each player controlling a character though an increasingly arcane set of button hits. Button combinations result in a pre-programmed sequence of punches, kicks, throws, and so on. Finally, in the Racing genre (Metropolis Street Racer, Ridge Racer series), players race vehicles through a realistic 3D track, either from a cockpit view or from behind the vehicle. The usual interface to these games is the keyboard and the mouse, but one could add a force-feedback joystick or other exotic input devices to spice up the action, and in some cases, make the game somewhat easier to play. The key to the technical success of these games is appropriate balance of realism and responsiveness. Every animated frame must appear within 1/60th of a second, so game designers are careful to maintain graphical loads at a level appropriate to the hardware they are using. There are a number of recreational applications that fit in the entertainment category that are outside the usual genres for 3D. For example, it is easy to simulate most racquet sports using simple VR systems. A large screen video projector and an instrumented racquet can be used as the main I/O devices. Why would we want to do this? The VR system can provide a virtual partner to play with. This can be used when a real partner is not available or when learning a new sport. Multiplayer keyboard and mouse-based games usually have single-player mode for these occasions. Since the level of play of the virtual player can easily be controlled this can provide a good learning mechanism. Second, the physics of the game can easily be controlled and thus challenging games can be produced for players at different skill levels. For beginning or older players the ball can be slowed down, so the games is not quite as difficult, while for expert players the ball can be speeded up to make the game much more challenging.
(^1) This is not intended to be an exhaustive list!
Page: 4 1.3: Some Applications 16th April 2002
6 Chapter 1: Introduction
around them. Applications like Virtual Therapy, simulation and training, and entertainment all benefit from the VR style, because the user wants or needs to be fully involved. Typically, users of Scientific and Information Visual- ization or collaborative design applications need to deal rapidly with physical artifacts of the office environment, such as the telephone and paper documents, so the VR style is less successful there. Highly-Interactive 3D is still useful in this context, because the benefits of rapid 3D understanding cannot be delivered any other way.
The state of the art in the design and implementation of 3D user interfaces is still a little primitive in comparison to 2D user interfaces. This is partly due to the current state of the hardware devices used in these user interfaces. These devices are still much more expensive than the devices used in 2D user interface and are not quite as reliable. But, the main reason is the length of time that researchers have been seriously studying 3D user interfaces. The mouse and the basic ideas behind windows were developed in the 1960’s and most of the techniques used in 2D user interfaces were developed in the 1970’s at places like the Xerox Palo Alto Research Center. Through the 1970’s and early 1980’s numerous research groups worked on prototypes of the software tools that are now used to generate 2D user interfaces. Thus, the techniques that are currently used in the development of 2D user interfaces were under development for about 15 years before this style of user interface became popular. In the early 1970’s graphical user interfaces had many of the same characteristics that 3D user interfaces now have, they used expensive equipment (mouse and bitmapped display), they were hard to program, and the interaction was not as smooth as the textual user interfaces that were then popular. The evolution process is currently occurring in 3D user interfaces. Researchers are exploring the design metaphors and techniques that are required for this style of user interface, designing better interaction techniques, developing software tools, and improving interfaces. The appropriate software architectures and types of software tools required for the development of 3D user interfaces is a very active research area with a considerable amount of room for innovation. The purpose of this book is to explain how to build Virtual Reality applications and other applications that share similar techniques.
A VR system is a collection of hardware and software elements that take input from the user and display output to the user in real-time. How do we construct such a system? What do we need to know to make the user most effectively with the system?
Page: 6 1.4: Where Do We Go From Here 16th April 2002
1.5: What Do You Need To Know? 7
These and other questions will be dealt with in the following chapters. The organizational idea of the book is to start from the “bottom” with the basic building blocks of HITD systems, then build and combine these subsystems into more complex systems. This bottom-up approach allows the reader interested in building a system to start progress through the book in order. We will start with a quick introduction to perceptual psychology, since the purpose of HITD applications is to convince the user that the application looks and sounds just like the real world. We then start from basic hardware technology, explaining the basic principles of Head-Mounted displays, DataGloves, and other input and output devices and systems. We next discuss the mathematical notations and software technologies that drive 3D computer graphics. We assume that we are dealing with OpenGL written in the C language in the software code samples in this book. OpenGL has been around a while, and has settled into wide acceptance. C has been around longer, and although it is showing its age, most readers will understand it. The next chapter deals with organizational metaphors, and models of system architecture. The goal in this chapter is to provide an organizational basis for the chapters that follow. Another goal is to outline the various HITD styles that have been developed. The next N chapters deal with system architecture issues. Next, we do high-level stuff....
16th April 2002 1.5: What Do You Need To Know? Page: 7
HITD interfaces take advantage of the spatial and perceptual capabilities of people. HITD interfaces combine much more sensory input than non-3D interfaces, so it is important to understand what a person’s perceptual capabilities are so that a HITD interface will be successful. This chapter will outline what these capabilities are, and will describe the implications that these sensory and perceptual capabilities have on HITD interface design. There are traditionally 5 human senses; vision, hearing, touch, taste and smell. There are two other important senses that come into play in HITD interfaces: Proprioception - the sense of where one’s body is in space, and Balance - the sense of which way is up. These two senses will be defined and discussed later in this chapter. Unfortunately, taste and smell are difficult to stimulate through non-chemical means, and so we will not deal with these at all.
Vision is, of course, the sense that detects light. Visible light is electromagnetic energy with wavelengths between 400 nanometers (nm) and 700 nm, corresponding to the rainbow colors from violet, indigo, blue, green, yellow, and orange to red. The range of visible colors is called the visible spectrum. If you pass a beam of sunlight through a triangular glass prism, the beam will be split into a rainbow, which indicates that white light has a certain amount of light from every wavelength in the spectrum. The rainbow colors are spectrally pure because they are each a single narrow band of frequencies of light. Most colors are not spectrally pure because they are made up of a number of wavelengths. The color you perceive is typically the dominant wavelength , or the wavelength of the spectrum that has the most amount of energy. Therefore, two colors may be made up of a different spectral energy distribution but be perceived as the same color. This is true of white light as well. For example, light bulbs and sunlight are not the same color, but they are each perceived as white in isolation.
Each eye has an array of light-sensitive cells at its back inner surface called the retina. There are two types of retina cells: rods and cones. Cone cells sense visible light in one of three bands of color, traditionally called Red, Green and Blue. For red and green, this is a little misleading because the peak sensitivities of the Green cones is a greenish yellow, and for the red cones it is a reddish yellow. The sensitivity of the red and green cones is about 10 times higher than for the blue cones, implying that variations in the brightness of blue light are harder to detect than the same variations in the red and green range. Television signals were designed to take advantage of this by allocating most color bandwidth to Red and Green information and only a small amount to blue. Cone cells are concentrated at the Fovea , which is the focal point of the image that your eye is looking at. The fovea is very densely packed with cones, with only a few rods. The fovea is therefore the area where the eye’s ability to see detail is highest. Rod cells sense visible light of any color, are more sensitive than cone cells, and are distributed over a wide area of the retina. Because of their higher sensitivity, night vision is mostly provided by rods. Cone cells do not
10 Chapter 2: Perception
get enough light under low-light conditions, which is why color is hard to perceive in low light. In contrast to the fovea, which contains mostly cone cells, the peripheral vision area surrounding the fovea contains mostly rod cells. The peripheral zone is also much less densely populated, and hence can resolve much less detail than the foveal area. The highest density is around the fovea, with progressively lower densities toward the periphery. In terms of the temporal performance of the eye, almost everyone can detect a change in the image within 140 milliseconds (ms). This implies that to give the appearance of motion, a new image must be presented to the viewer at least every 140 ms, or at 7 images per second. Some people will not find this convincing, however, so the usual benchmark is to supply a new image every 100ms, or 10 images per second. The first movies were projected at 16 frames per second, which seemed acceptable to most people. However, people perceived flicker (hence the nickname ”flicks” for movies) because the movie projector would turn off the image while the film was being advanced, then turn on the light to expose the next image. Motion picture film was standardized at 24 frames per second with the introduction of sound in order to reduce the flickering sensation, to reduce jerkiness of the motion, and to ensure consistent sound reproduction. What this says about vision is that there are different temporal scales at work, with the rods in the periphery operating much more quickly and therefore detecting flicker while the cones operate slower and sense color.
At the lowest level, the eyes receive light and the rods and cones generate nerve impulses that communicate to the rest of the visual system. Somehow, out of this array of light quantities, the brain perceives a 3D world. It is not really known how this works, but we can enumerate ways in which features of the visual input are used by the visual system to infer 3 dimensions. Traditionally, these features of the visual input are called ”depth cues”, as follows: Occlusion: This is the optical property that nearer opaque objects hide objects that are more distant. This is the strongest depth cue [18], and it will override other conflicting cues. Hidden-surface removal at some point in the 3D graphics rendering pipeline provides this cue. Perspective: This is the geometric property that objects look smaller at greater distances. The closer an object gets, the bigger the field of view it takes up. Perspective projection in 3D graphics rendering delivers this cue. Aerial Perspective: This is a property of optical media such as air that objects at greater distances become more indistinct, tending to lose contrast and converge towards the color of the optical medium. One common example is fog or smog, which makes distant objects look indistinct and monochromatic. This can be simulated in the rendering pipeline by blending object color with fog color using a ratio that increases the proportion of fog at greater distances. Another name for this operation is ”depth cuing”, which blends object color with the background color. Motion Parallax: This is the geometric property that objects that are more distant are seen to move lesser distances than closer objects moving at the same actual speed. An animated object rendered in perspective will deliver this cue. Kinetic Depth Effect: This cue arises from the tendency of people to assume that moving objects are rigid unless there is ample evidence to the contrary. Minimal visual stimuli, such as a few moving dots, will typically be perceived as points on a moving solid object. This cue is a direct consequence of motion parallax, and can be viewed as the integration of the motion parallaxes of many points on an object. Shadows: Given a known source of light, a shadow cast by one object upon another can show which object is farther away. A very strongly built-in visual assumption is that light comes from above (i.e. the sun). For example, object A casting a shadow on object B implies that B is underneath A. One very strong version of this cue is the appearance of indentations and outcrops on a surface. These will both will cast a shadow in the direction of the light, and since the light is assumed to come from above, the shadow distinguishes indentation from outcrop. Shadows are not handled automatically by the 3D graphics system, and must be set up by the modeling system. Relative size of familiar objects: This is a property of people’s experience with the visual world. People have an internal conception of object sizes, and use this internal conception to judge the size of unfamiliar objects. One typical example in technical literature is to place a coin or a ruler beside the unfamiliar object being described. An example of where this is not used is in advertisements for figurines which have the caption ”actual size” on the page. Another example of how powerful this cue can be is in old science fiction movies, where tiny models such as cars and buildings are placed beside Godzilla to show how big he is! This cue is not provided directly by the rendering pipeline, but must instead be provided by the model being rendered. An important constraint is that all objects in the model should be at a consistent scale to ensure that familiarity is available at all times.
Page: 10 2.1.2: 3D Vision 16th April 2002