Another Tech Blog: September 2012

Thursday, September 20, 2012

Good Design Examples

Firstly, I would like to begin with the spoon. It is a device that affords scooping, stirring, and never has to be explained or troubleshooted. It is simple, cheap, and can be made from a wide variety of materials. It hasn't been changed in years because there is nothing to change. The spoon is perfectly designed.

Next, consider the microwave. It provides audible feedback when entering input and once food is ready. It utilizes an interlock force function to ensure the users safety by not operating the microwave emitter when the door is ajar. Buttons are clearly labeled and have one-to-one control mapping. The microwave's only flaw is feature-creep, which can be avoided by shopping around and has yet to interfere with good mappings.

The consumer fan also features good design elements. Physical constraints ensure the safety of the user and indicate that the fan's operation doesn't involve interaction with the blades. There is a single knob to control fan speed, which operates as standardized with the off position located next to the highest fan speed position. The fan does nothing more than what is expected, and nothing less. Another humble, well designed device.

Something I was immediately grateful for when I ordered my Kindle was its packaging. It was functional - the Kindle arrived in mint condition, but was also both aesthetically pleasing and easy to open. It is obvious to the user how to open the package due to the tab placed next to the dotted-line, which everyone knows symbolizes perforation. Once the tab is pulled the lid easily opens to reveal a Kindle, nested in an aesthetically pleasing box with the charging cord and the instructions, which consist of 3 simple steps, placed on the Kindle's screen - impossible to miss.

Lastly, I would like to present a packaging concept for Coke that I stumbled onto a few years ago. The bottle affords portability, drinking, stacking, and advertising. The most notable of these features is stacking, which most bottles don't afford. Stacking coke bottles would allow for easier shipping and handling, and the new shape makes the bottles, and their logos, stand out. By offsetting the lid the new shape still affords the user the ability to drink easily.

Bad Design Examples

I have a pair of Tenqa Remxd bluetooth headphones that I really enjoy, especially considering the $40 price tag, however, the initial process of pairing them with my iPhone took about 45 minutes. These headphones are poorly designed primarily due to poor mapping and a lack of feedback. The headphones are paired by holding down the play button and waiting for the LED to flash. This is impossible to determine without an instruction manual and furthermore, doesn't always work. As it turns out, the play button has to be held down prior to turning the headphones on and kept held down until the device flashes. Additionally, the flashing that indicates that the headphones are discoverable is a red and blue alternating flash, which is similar to a blue only flash that occasionally occurs for reasons I cannot determine.

The other day I discovered a new Monster energy drink called Ubermonster. The bottle for this drink is extremely poorly designed - it looks like a twist-off cap, but it isn't. Also, most bottle openers are too small to accomodate this lid. I opened mine using a pair of pliers. It took about 5 minutes to open, and I sliced my finger open in the process. As a side note, this drink also appears to be alcoholic, based on the design of the bottle and the advertising featuring "advanced brewing technology". Google-ing Ubermonster indicates that many, many people are confused about this bottle's appearance and that my frustrations with the lid are not isolated.

These are the exit doors in the basement of the Psychology Building. I can never open them correctly, which I now know has to due with a lack of visibility. This example isn't exactly unique, but I found it relevant due to my personal experiences with these doors. Shortly after I took this picture a girl confidently pressed on the door, smacked her head, and dropped her laptop down the stairs. She was fine, but her laptop didn't survive the fall. Another casualty of poor design.

This is a coworkers mouse that I was attempting to fix at work today. Other than being generally uncomfortable, my primary criticism of its design is that it doesn't afford the user the ability to turn it off without pulling the battery. I found this extremely unusual, and so did my coworker, whose computer did a variety of inconvenient things due to involuntary mouse clicks that occurred while I was examining it. This is a significant design flaw, as wireless mice are dependent on their battery to function, and the best way to conserve battery life is to turn the device off when it isn't in use.

Lastly, I address the issues with the iPhone's design. Yes, despite being one of millions of users that depend on the iPhone to function, its design is not perfect. The particular problem I have with the iPhone concerns mapping and affordability. Firstly, Apple's insistance on minimizing the number of hardware buttons means that the home button is responsible for accessing the home screen, activating Siri, and, little do most users know, performing a hard reset. The issue of the hard reset is my primary concern. When a phone freezes and no longer responds to a soft reset, most users respond by pulling the battery, something that the iPhone doesn't afford users the ability to do. Instead, iPhone users must perform a hard reset, a task that most users aren't aware exists and that is mapped to controls that nobody would think to try. A hard reset is performed by holding the sleep button and the home button simultaneously for roughly 10 seconds.

Design of Everyday Things: Overall Summary

The Design of Everyday Things can be summarized trivially by concatenating my previous blog posts, so I am choosing to use this entry to discuss my thoughts on the book and what I’ve taken away from it.

This book made me think about design in a completely new way. I’ve always known that a product’s success or failure hinged heavily on design, but I’d never before considered the aspect of designing a product or system analytically. Design makes an impact not in what you notice, but in what you don’t notice. It is a very subtle art that evidently eluded me prior to reading this book. I’ve learned that a product’s design should aim to be as intuitive as possible by utilizing constraints and natural mappings. Both of these forms of communicating function to the user operate on a subconscious level to guide the user without them becoming aware. Nobody explicitly thinks about the driver’s seat being located on the left or a floppy disk not fitting in an optical drive, these things are obvious to us because humans rely on previous experience and physical constraints to reason about the world. We are naturally hardwired to take these types of things into account, and therefore, they don’t impede our thought processes or interrupt our flow.

This book also opened my eyes to the processes involved in learning a new system and the various types of pitfalls that we encounter frequently across a wide variety of devices but attribute to human error. People depend on their perceptions to interpret events, which often leads to the misattribution of causality, leading people to blame themselves or software for instance, when the real issue is a hardware problem. The book analyses decision-making using the Action Cycle, Stages of Evaluation, and Stages of Execution, all of which are combined into the Seven Stages of Action. These models help to identify two primary sources of error that result from poor design: the Gulfs of Execution and Evaluation. The Gulf of Execution is the gap between user intentions and actions allowable by the system and the Gulf of Evaluation is the gap between a user’s interpretation of the system and how well the intentions have been met. Errors occur more frequently when these gulf are large, primarily due to a lack of visibility and feedback. Visibility is a design principle aimed at improving a user’s ability to identify available actions and feedback is important because it informs the user of the state of the system and the effects of their actions.

I definitely feel that reading this book has made me a better programmer by informing me of the aspects of design outside the scope of software. Knowing how a user thinks about systems and formulates decisions is a valuable asset when designing for usability. Additional topics that were discussed in varying length that I found particularly interesting were forcing functions, the reversal of design principles for increasing task difficulty, and the use of information in the world to remind and cue user behavior.

Design of Everyday Things: User Centered Design

Design should enable the user to figure out what to do and to figure out what is happening by:

Making it easy to determine what actions are possible by using constraints
Making the conceptual model, alternative actions, and results visible
Making it easy to evaluate the state of the system
Following natural mappings

Additionally, the principles of design are:

Use knowledge in the world and in the head

Knowledge in the world is useful if it is natural and easily available
Knowledge in the head is more efficient and therefore, design should not impede experienced users

Simplify

Minimize the amount of planning needed to complete a task
Understand the limits of human memory
Use technology to enhance visibility
Provide mental aids

Make things visible

The user should know what is possible and how to carry an action out
Actions should match intentions
System states should be readily apparent

Use good mappings

Relationships between intentions, actions, effects, and states should be intuitive and natural
Take human factors into account

Exploit constraints

Use real and imagined limitations to guide users actions

Design for error

Understand that error is inevitable
Make actions reversible and irreversible actions difficult to initiate

When all else fails, standardize

When good mappings aren’t possible, use standardized mappings
Users are trained to recognize standards
Standardizations are an extension of cultural constraints

Principles and practices of good design can be manipulated to make tasks that should be difficult, difficult. Guns should not be readily accessible, some doors shouldn’t be opened, etc.

Design of Everyday Things: The Design Challenge

The design process involves testing, modification, and retesting, which necessitates that the item in question be relatively simplistic and the craftsmen flexible. This process is known as hill-climbing. This process is hampered by time constraints and the desire for individuality - everyone wants their product now and the want it to be unique. These goals prevent designs from benefitting from their previous iterations.

There are also other considerations that affect design. Usability affords comfort at the expense of aesthetics and designing for aesthetics negatively impacts comfort and efficiency. When cost considerations dominate, comfort, aesthetics, and durability all suffer. All three considerations must be carefully balanced to create a good design. This is a difficult prospect when you consider that designers are “professionals” that are often judged by their colleagues based on aesthetics, clients are typically focused on cost, and the users’ desire for usability is often not met.

Wednesday, September 19, 2012

Design of Everyday Things: To Err is Human

Humans are prone to error, which can be broken into two categories. Slips are errors resulting from automatic behavior and mistakes result from conscious decisions. Slips are typically minor, easily identified errors, whereas mistakes are much more difficult to detect and can be major events.

Slips can be further broken down into capture errors, description errors, data-driven errors, associative activation errors, loss-of-activation errors, and mode errors. Capture errors occur when habitual actions override intended actions, such as finding yourself driving to work on Sunday instead of church. Description errors occur when an intended action is similar to other available actions and a lack of specification results in a slip. This type of error is most frequent when right and wrong choices are in close proximity. Data-driven errors occur when extraneous data influences our actions, for example, typing a phone number instead of a credit card number shortly after calling someone. Associative activation errors when external data triggers an incorrect action - saying "come in" instead of "hello" when answering a phone. Loss-of-activation errors occur when we forget what we are doing. Lastly, mode errors occur when devices have contextual operations and we operate one mode thinking we are operating another. Design can be improved by taking slips into account - minimizing them and providing adequate feedback and correction when they inevitably occur. Minimizing slips can be done by differentiating choices and requiring confirmation. Another good design consideration would be the elimination of irreversible actions.

Mistakes occur from choosing inappropriate goals - poor decision-making, misclassification, or lack of information. Humans make decisions based on expectations and prior experience rather than logical deduction. Previous chapters have already explored the problems associated with memory, so it is no big surprise that people make so many mistakes. Another theory of cognition is the neural net approach. This theory uses the structure of the brain to conclude that human cognition is based on activation and inhibitory signals that travel along neurons through the brain. Thoughts are represented by stable patterns of signal activity.

Tasks can be structured into models for analysis (data structures). Turn-based decision-making games such as tic-tac-toe can be modeld using a decision tree - the author refers to this type of structure as wide and deep. Simpler sets of data, such as a menu, can be represented using a list - this is a shallow structure. An example of a narrow structure is a cookbook recipe, where there are few alternatives, resulting in a decision tree that is narrow and deep.

Behavior is often thought of as conscious, but much of it is subconscious. Subconscious thought is based on pattern recognition and generalization. By analyzing trends the subconscious is able to guide behavior quickly and efficiently, but perhaps not as accurately as we might like. Conscious thought is slow, laborious, and relies on STM, which we know is very limited and subject to flaws.

Mistakes are very hard to identify, especially when they are a result of misinterpretation. Furthermore, understanding of an event before and after is occurs can be drastically different. Another factor to consider is the role of social pressure. The perception of pressure leads to misunderstanding, mistakes, and accidents.

The author discusses 4 things that designers can do to design for errors:

Understand the causes of error and design to minimize those causes
Make actions reversible whenever possible and make irreversible actions difficult to carry out
Make erros easily discoverable and correctable
Change the attitude toward errors. Think of actions as approximations of what is desired.

Forcing functions address errors by forcing users' actions. An interlock forces operations to take place in proper sequence (a microwave cannot function with the door open). A lockin keeps an operation active when appropriate (soft functions as opposed to hard functions). Lockout devices prevent actions, such as safety rails that prevent accidental death. Forcing functions are very effective but almost universally hated by users. People don't like constraints, even if they are in the user's best interests.

Monday, September 17, 2012

Design of Everyday Things: Knowing What to Do

This chapter begins with an experiment involving Legos, in which a 13 piece motorcycle has constructed with no prior information or guidance purely based on the set physical, semantic, and cultural constraints applied to it. Physical constraints limit possibilities - a square peg cannot be placed in a round hole. No special training is required to understand physical constraints because they are governed by the world, however, their effectiveness is determined by the ease with which they can be determined and interpreted. Semantic constraints rely on the meaning of the situation - in this case, there is only one sensible position for the driver. Cultural constraints rely on accepted conventions - signs should be visible, screws are tightened clockwise. In this case, the "Police" sign on the motorcycle should be placed right-side-up and in a visible location and the clear yellow brick is obviously a headlight, as is the convention. Logical contraints applied to the construction of the motorcycle include the imperative that all blocks be used with no gaps in the final product. Natural mappings work by providing logical constraints.
Next the author discusses several examples of poor design. He begins by re-addressing the door situation presented in chapter 1. Then, he moves on to discussing switches, which frequently lack a logical mapping and grouping. These discussions are conducted in the context of mapping and constraint principles discussed in previous chapters.
The principle of visibility states that relevant parts should be made visibile and the feedback principle states that actions should have an immediate and obvious effect. Visibility allows users to infer how an object is manipulated and what these manipulations can be expected to produce. Feedback allows users to learn through trial-and-error and reduce misattributions of causality and misconceptions. Feedback can be given visually or audibly. In fact, most sounds made by devices aren't made out of necessity, they are made to inform the user that an event has occurred (such as the shutter sound made by a digital camera).

The Design of Everyday Things: Knowledge in the Head and in the World

This chapter begins with the shocking realization that people's knowledge and behavior are not always equivalent - for example, a typist can type with speed and accuracy without being able to arrange the keys on a keyboard. The author attributes the following reasons for this phenomenon: information is in the world and combined with knowledge to produce behavior, precision is not required - the correct choice need only be differentiated from the others, natural constraints provide limits, and cultural constraints guidance. Furthermore, people possess two kinds of knowledge: knowledge of (explicit memory, declarative knowledge) and knowledge how (implicit memory, procedural knowledge).
Memory is divided into short-term memory (STM) and long-term memory (LTM). In computing terms, STM in analogous to RAM and LTM a hard disk. STM is limited to 5-9 segments - these segments need not be individual characters, a technique called chunking can be used (ex. a phone number is typically 7 digits in 3 chunks) and STM is very volatile. LTM contains information that takes longer to retrieve but isn't as easily forgotten. The author categorizes memory: memory for arbitrary things (rote learning), memory for meaningful relationships, and memory through explanation. Memorizing arbitrary things is difficult because of a lack of clues - no context. Memorization based on relationships is significantly easier, giving constraints and structure to limitless possibilities. The best form of memorization involves understanding. This allows a person to reconstruct the knowledge trying to be remembered using procedural memory. This is why mental models are so valuable.
In addition to memory, knowledge also exists in the world. One such form of knowledge is reminders, which consist of a signal and a message. Another way the world can communicate is with natural mappings, such as with burners on a range. There are tradeoffs associated with how knowledge is stored. Memory is requires learning and is not readily retrievable, but is very efficient and doesn't rely on clues. Knowledge in the world doesn't require as much overhead, but is dependent on the environment.

Design of Everyday Things: The Psychology of Everyday Actions

This chapter addressed the psychological considerations addressed by good design. People instinctively explain their surroundings, with or without adequate knowledge to do so. This results in misconceptions and misattributions of causality. The author illustrates these concepts using examples such as an A/C thermostat and a colleague's computer problems. Many people think of a thermostat as a valve or a timer, and that turning the temperature up higher or lower than intended can speed up the heating/cooling process. This is a misconception - a false mental model. The colleague's computer troubles resulted from a misattribution of causality. He thought that a program was causing his terminal to fail, when the real culprit was a hardware problem. When problems such as these occur, people are apt to blame themselves and become frustrated.
The chapter goes on to discussing the stages of action: perception, interpretation, evaluation, goals, intention to act, sequence of actions, execution of sequence. These stages form an approximate model and a continual feedback loop into the world. The loop can be started at any point, and people don't always behave logically with well-formed goals. These stages serve to aid design by re-emphasizing the principles of good design: visibility, good conceptual model, good mappings, and feedback.

Thursday, September 13, 2012

The Chinese Room

John Searle’s experiment addresses the question of whether a machine can be programmed to literally “understand” a concept (he called this type of programming strong AI) or if the best a machine can do is only simulate “understanding” (weak AI). The experiment involves a closed room in which Chinese characters are entered through a slot and a man uses a procedure (program) to generate a response to the input without knowing Chinese (the procedure is written in English, or memorized). Since the man doesn’t actually understand Chinese, but to the outside observer he does, Searle asserts that this scenario translates to computers as well and that the apparent “understanding” demonstrated by an AI is only a simulation at best.

Searle responds to several criticisms in his paper. The reply I found most compelling (probably due to my psychology minor and its emphasis on cognitive processes) was the brain simulator reply. This reply argues that Searle’s experiment should have been redesigned by having the man manipulate valves that are mapped to synapses in a Chinese persons brain. This would result in the Chinese person receiving a reply in Chinese without the man (or the valves) understanding Chinese. I liked this reply because, upon initial examination, I agreed with it. However, upon reading Searle’s response, I understand its flaw. Attempting AI is pointless if you concede that understanding the brain is necessary to understand the mind because AI is essentially aiming to translate the “software” of the mind to mechanical hardware, rather than the brain. As a psychologist I would argue that an understanding of the brain is in fact necessary to understand the mind - as evidenced by the many advances made in psychology by examining the structure of the brain, however, as a computer scientist I understand that regardless of this, conceding (read: admitting) this would nullify the concept of strong AI, as defined by Searle. The brain simulator reply depends on having an understanding of the brain and is therefore an odd and counterproductive argument to make in the context of AI.

I really enjoyed these readings. I typically find thought experiments irrelevant, and perhaps this one is as well, but I can appreciate the question Searle was trying to address and its an important question. As for my opinion, I side with Searle: we have yet to understand the physical basis of the “mind” and until we do all we can hope for is a poor facsimile, simulation, of its functions. At least for now, “understanding” is reserved for the realm of the living.

Tuesday, September 11, 2012

Paper Reading #6: ClayVision: The (Elastic) Image of the City

Introduction

ClayVision is a paper by Yuichiro Takeuchi and Ken Perlin. Takeuchi is an Associate Researcher at Sony Computer Science Laboratories Inc. and earned his PhD from The University of Tokyo. In March he obtained a Masters in Design Studies from Harvard. Perlin is a Professor of Computer Science at the NYU Media Research Lab and Director of the Games for Learning Institute.

Summary

ClayVision takes a new approach to augmented reality assisted urban navigation by utilizing knowledge from non-computer science fields to break the current paradigm, which informs the user by pasting potentially irrelevant and frequently unwanted information on top of reality. ClayVision uses computer vision and image processing to create a dynamic real-time replica of the user’s perspective that can then be morphed and adjusted to direct the user and convey information.

ClayVision seeks to take AR from being gimmicky to being a “calm” technology. Current navigation applications involve information bubbles and overlays, not augmenting reality, but distorting reality. A user’s attention is very limited and navigating an urban environment is potentially dangerous. ClayVision addresses the issue of user safety and attention using Edward Tufte’s Data-Ink Ratio, which states that the effectiveness of visual communications can be analyzed using a ratio of ink used to convey information to total ink used in the graphic.

Central to ClayVision’s function is computer-vision based localization, which the authors recognize as an emerging field and an open problem. To address this, the authors created a database of pictures using the iPad’s camera (the tablet used to prototype ClayVision) for a set of predetermined locations and calculate the device’s pose. The authors’ rationalize this sidestep by asserting that even if ClayVision only works in limited locations, it can provide insights into future applications and the design of the system.

Image processing of the video feed is done using a simplified procedure based on SIFT, which outputs a set of feature points and other data used to determine the relative position of the entire frame. This processing is done in real-time on an iPad 2. Output is used to compare the video feed to the database of pictures and the template pictures are transformed based on the iPad’s camera specifications to produce the correct pose. After localization, projection and modelview matrices are calculated to map 3D building models onto the feed. These models are then textured using information from the feed and transformed to communicate information to the user. Texturing is done correctly by altering the image background with template picture information in a way that doesn’t disrupt the video and allows for transformations that don’t cause excessive errors.

While there are a limitless number of transformations possible with ClayVision the authors chose a select few to discuss based on the information they wished to convey to the user. They found that emphasizing buildings was best done by changing their size, value, texture, color, orientation, shape, and/or position. Shape, orientation, and position were ruled out due to humans’ poor selective perception of shapes and the possible confusion of information brought on by changing orientation or position. Emphasis of buildings was implemented by increasing texture saturation and by changing the heights of buildings to emphasize/de-emphasize them. This could be done statically or dynamically, but the motion effects were found to be distracting and posed a potential safety issue. Building usage was expressed by altering facades (making a downtown cafe more distinguishable by giving it a French picturesque exterior). City regions are made distinguishable using post-rendering processes, such as applying a toon-like effect. Lastly, artificial structures can be erected to provide landmarks to the user.

The authors’ approach to this paper is based on discussion around their prototype and possibilities for extending ClayVision in the future, from a software and hardware standpoint.

Related Works

Augmented Reality Navigation by Uchechukwuka Monu & Matt Yu
An Image-Based System for Urban Navigation by Duncan Robertson & Roberto Cipolla
A Touring Machine: Prototyping 3D Mobile Augmented Reality Systems for Exploring the Urban Environment by S. Feiner, B. MacIntyre, T. Höllerer & A. Webster
A Wearable Computer System with Augmented Reality to Support Terrestrial Navigation by B. Thomas, V. Demczuk, W. Piekarski, D. Hepworth & B. Gunther
Pervasive Information Acquisition for Mobile AR-Navigation Systems by Wolfgang Narzt et. al.
AR Navigation System for Neurosurgery by Yuichiro Akatsuka et. al.
Visually Augmented Navigation in an Unstructured Environment Using a Delayed State History by Ryan Eustice, Oscar Pizarro & Hanumant Singh
A Vision Augmented Navigation System by Michael Bosse et. al.
A Vision Augmented Navigation System for an Autonomous Helicopter by Michael Bosse
A Survey of Augmented Reality by Ronald T. Azuma

Augmented reality as a means of navigation is not a new idea. In 1997, when the field of AR was relatively young, Azuma discussed the future of AR in his paper A Survey of Augmented Reality. In this paper he mentions the many potential applications for AR, including navigation.

In addition to being an established idea, AR navigation has also been implemented in a variety of different ways. Augmented Reality Navigation and An Image-Based System for Urban Navigation discuss AR navigation implemented on a mobile phone. A Touring Machine: Prototyping 3D Mobile Augmented Reality Systems for Exploring the Urban Environment and A Wearable Computer System with Augmented Reality to Support Terrestrial Navigation explore AR navigation on custom wearable hardware. Pervasive Information Acquisition for Mobile AR-Navigation Systems discusses an AR navigation system for cars in great detail. AR Navigation System for Neurosurgery takes AR navigation into the operating room by focusing on microscopic navigation, rather than macroscopic. A Vision Augmented Navigation System goes into detail about an AR navigation system and follows up with an application of this system in A Vision Augmented Navigation System for an Autonomous Helicopter.

All of these papers take on the task of using computer enhanced reality to guide users, but each of these applications is very similar or addresses a niche problem (surgery). ClayVision doesn’t claim to be a unique application, it claims to take a unique approach. The only paper I could find that attempts AR navigation in a novel way was Visually Augmented Navigation in an Unstructured Environment Using a Delayed State History, but even this paper fails to address the design and human factors concerns discussed in ClayVision.

Evaluation

Evaluation in this paper is non-existent. The only users mentioned in the paper are the authors. The prototype was not subjected to any standards or any type of measures. There are basic comparisons made between ClayVision and similar research, as in any paper, but that is the closest the authors come to evaluating ClayVision.

Discussion

I think the premise behind ClayVision is really interesting and a valid topic for research, but I was disappointed that the authors neglected to get user feedback or test their prototype against existing AR software. I’d be really interested to see them followup with a more complete analysis in the near future.

Monday, September 10, 2012

Design of Everyday Things: Chapter 1

The first chapter of Design of Everyday Things was very thought provoking. I never realized how non-functional visual cues could have such an impact on a product’s usability. I’ve never read a book that analyzes the design process and considerations so systematically. I’ve always thought of design as a vague field without much structure - and therefore not something subject to formal analysis. I’m glad that I was wrong; it looks as though this book has a lot to offer and I look forward to the coming chapters.

Thursday, September 6, 2012

Paper Reading #5: Playable Character: Extending Digital Games into the Real World

Introduction

Playable Character: Extending Digital Games into the Real World was written by Jason Linder & Wendy Ju. Linder and Ju conducted their research at the California College of the Arts. Linder currently works for Adobe’s Creative Technologies Lab and Ju is a researcher at the Stanford HCI Group.

Summary

Playable Character discusses a series of prototype games developed to explore how real-world activity could be incorporated into digital game systems. These games led to the design of Forest, a game developed for the Friends of the Urban Forest (FUF). The prototype games were developed as probes using paper or simple Flash or Processing programming. Informal testing of these prototypes was conducted using friends and colleagues as players. Data collection was integrated into the games whenever possible and supplemented with player interviews.

In Simulation City, players were asked to imagine playing SimCity with the added constraint that any buildings added to their city had to be photographed in the real-world. The authors found that selections were heavily skewed toward interesting buildings and art installations and responses indicated that player “...interest can be maintained simply by providing a facility for personalized collections of real-world items.”

SphereQuest explored the connections that could be developed between a player and his avatar by asking players to perform real-world tasks to enhance their avatars in-game. Players were required to document these activities and complete a survey for the purpose of data collection. The authors found that players that chose to complete activities that they could imagine their character doing (being stealthy as opposed to reading a book) connected with the game and enjoyed themselves.

The Other End was designed specifically for a known social setting to illustrate the importance of context in a game created to overlap with an existing social structure. The game consisted of checking in at a camera station, walking to the other end of a hall, and checking in at the other station. Scoring was done based on frequency of participation (with punishment for lack of participation) and improvement & the players with the most points, most trips, and best time were displayed on a leader-board. The game quickly created a competitive environment centered around achieving the fastest time, to the exclusion of the other leader-board types. This competition served to advertise the game and facilitate social interaction.

Cubelord, the second social engagement game, involved accumulating territory (cubicles) via the game’s virtual currency. The player with the most cubes at any moment was crowned “Cubelord” and given a cape and scepter. Each cube had a price, price increase rate, and return price to encourage players to formulate strategies. Currency was earned by performing tasks involving the disclosure of personal information, singing, cleaning, providing homework assistance, etc. Game runners collected this information and credited players with game funds. The current state of the game was available online and a terminal was used for purchasing cubes. The authors observed that players prioritized tasks by convenience and players cleverly attempted to get credit for less than accurate responses, but no actual cheating took place.

The design probes explored “...how the physical world could be mapped into the game world (Simulation City), how the virtual-world could prompt real-world actions (SphereQuest), how people's social and physical motivations could be organized, (The Other End), and how virtual motivations could motivate social disclosure (CubeLord).” These observations resulted in the formulation of five design patterns: personalized collection (Simulation City), narrative alignment (SphereQuest), gaming the game (The Other End), progressive disclosure (The Other End), and persistent convenience (CubeLord).

The authors developed Forest for FUF to explore how they could integrate what they learned from their probe experiments with specific real-world goals. The game structure was divided into three categories: real-world tasks, the virtual forest controlled by real and virtual choices, and status tracking. Forest is played by performing a set of tasks that activates virtual features of the game. These features take the form of badges and virtual currency (Leaves) that the player can use to customize their virtual forest (similar to CubeLord). Solo tasks were location-based (for convenience) and centered on individual trees that a player could “check in” to their game (similar to Simulation City). Cataloging trees involves collecting data that FUF can use in their tree care efforts and filling in missing data and trees (this is verified to prevent players from gaming the game). Team tasks are directly related to FUF participation categories: outreach, planting, and tree care. Leaves for team tasks are redeemed using QR codes prepared by FUF team leaders. The virtual forest functions similarly to FarmVille with the purpose of encouraging continued participation and social networking with other FUF members.

Related Works

Towards Massively Multi-user Augmented Reality on Handheld Devices by D. Wagnet, T. Pintaric, F. Ledermann & D. Schmalsteig
ARQuake: The Outdoor Augmented Reality Gaming System by W. Piekarski & B. Thomas
Touch-Space: Mixed Reality Game Space Based on Ubiquitous, Tangible, and Social Computing by A. D. Cheok, X. Yang, Z. Z. Ying, M. Billinghurst & H. Kato
From Game Design Elements to Gamefulness: Defining “Gamification” by S. Deterding, D. Dixon, R. Khaled & L. Nacke
Rethinking agency and immersion: video games as a means of consciousness-raising by Gonzalo Frasca
Paper Prototyping: The Fast and Easy Way to Design and Refine User Interfaces by Carolyn Snyder
Player-Centered Game Design: Experiences in Using Scenario Study to Inform Mobile Game Design by Laura Ermi & Frans Mäyrä
The PowerHouse: A Persuasive Computer Game Designed to Raise Awareness of Domestic Energy Consumption by M. Bang, C. Torstensson & C. Katzeff
A Video Game for Cyber Security Training and Awareness by B. D. Cone, C. E. Irvine, M. F. Thompson & T. D. Nguyen
The Digital Game-Based Learning Revolution by Marc Prensky

This research touched on many different fields and ideas, none of which were particularly novel on their own. Gamification is the central idea behind Playable Character and has become a very popular topic of research in the last decade. From Game Design Elements to Gamefulness: Defining “Gamification” is a paper that analyses the idea of using non-game context to motivate user activity and retention. Playable Character also incorporates augmented reality gaming into their prototypes and in the Forest mobile-application.

Towards Massively Multi-user Augmented Reality on Handheld Devices and ARQuake: The Outdoor Augmented Reality Gaming System explore applications of AR gaming very similar to Forest. Towards Massively Multi-user Augmented Reality on Handheld Devices is similar to Forest in its mobility and its emphasis on multi-user experience. ARQuake’s emphasis on outdoor AR gaming is very similar to Forest’s focus on trees. Touch-Space: Mixed Reality Game Space Based on Ubiquitous, Tangible, and Social Computing leverages AR and social networking to create an application aimed at being convenient to the user, much like Forest’s social networking and emphasis on convenience as a means of keeping users interested.

Rethinking agency and immersion: video games as a means of consciousness-raising and The PowerHouse: A Persuasive Computer Game Designed to Raise Awareness of Domestic Energy Consumption discuss the use of gamification to increase player awareness of important topics. A Video Game for Cyber Security Training and Awareness and The Digital Game-Based Learning Revolution discuss gaming as a teaching tool. These are concepts very central to the purpose of Forest, which is to increase membership, participation, and fund-raising for FUF.

Playable Character’s contributions include the methodology used in the study as well as the applications. Paper Prototyping: The Fast and Easy Way to Design and Refine User Interfaces is a book that explores paper prototyping as a method to conduct informal studies. Much of the experimentation in Playable Character was conducted using paper prototyping and similar informal methods. Player-Centered Game Design: Experiences in Using Scenario Study to Inform Mobile Game Design is a study conducted in the same style as Playable Character. This study conducts small experiments in an informal way and applies the knowledge gained from this experimentation to the design of mobile games.

Evaluation

The only evaluation came in the form of informal feedback from test users and FUF members. Feedback during the experimental stages served to guide the development of Forest. Forest feedback was discussed in the paper but the application of this feedback was outside the scope of the research. Evaluation was purely qualitative, systematic, and informal.

Discussion

This is by far the most unusual CHI paper I have read. While the contributions made by Playable Character aren’t readily apparent, I do feel that there is something to take away from this study. The authors explored a variety of novel game scenarios through quick prototyping and applied their findings to a real-world application. This topic, while unusual, offers an interesting perspective that I found very unique. Evaluation of the prototypes developed in Playable Character was entirely informal, qualitative, and unstructured (no mathematical or statistical measures were applied to the analysis of their qualitative data). Normally this would pose a problem, but due to the nature of this research, I feel that their evaluation was adequate.

Wednesday, September 5, 2012

Paper Reading #4: Not Doing But Thinking: The Role of Challenge in the Gaming Experience

Introduction

Not Doing But Thinking: The Role of Challenge in the Gaming Experience was written by Anna L. Cox, Paul Cairns, Pari Shah & Michael Carroll. Dr. Cox is a senior lecturer in Human-Computer Interaction at University College London and an Associate Chair for CHI 2013, CHI 2012, and CogSci 2012. Dr. Cairns is a senior lecturer in Human-Computer Interaction at The University of York interested in video games and modeling user interactions. Cox and Cairns coauthored Research Methods for Human-Computer Interaction, published by Cambridge University Press in 2008. Pari Shah graduated from University College London in 2008 with a degree in Psychology and Michael Carroll studied Computer Science at The University of York.

Summary

This paper investigates the role of challenge in a user’s experience of immersion through three studies. The concept of challenge is distilled into two modalities: pushing a gamer’s physical limits (twitch mechanics) and pushing a gamer’s cognitive limits (time constraints).The first experiment “manipulate[s] the number of interactions required to make progress in the game and thus the speed with which the gamer must interact with the game.” Their second and third experiments focus on making the gamer think faster by manipulating the level of time pressure under which the gamer must perform. They hypothesize that cognitive challenges have a greater effect on immersion and therefore expect higher levels of immersion to be reported in experiments two and three, in comparison to the first experiment.

The authors identify immersion as “a graded experience ranging from engagement, through engrossment to total immersion.” Total immersion is synonymous to being in a state of flow, during which all of a gamer’s mental faculties are focused on the task at hand (the game). Flow is achieved “as a result of an appropriate balance between the perceived level of challenge and the person’s skills.” This idea led the authors to consider the role of expertise, hypothesizing that immersion will decrease if the game is too challenging (resulting in a state of anxiety) or if the game isn’t challenging enough (resulting in a state of boredom).

Experiment one was conducted using a tower defense game and 40 players of varying levels of expertise. The game was calibrated to an appropriate rate of play, reduced approximately one third to create a low effort condition(LE), and increased approximately one third for the high effort condition(HE). Players were divided into 3 expertise groups: insufficient expertise (IX), low expertise (LX), and high expertise (HX) and immersion was measured using the Immersive Experience Questionnaire (IEQ) modified with 3 additional questions. Players were given a mandatory in-game tutorial, allowed to play for 8 minutes or until they ran out of lives, and immediately evaluated using the IEQ. Players in the HE condition performed an average of 60% more actions than LE players and players with more expertise consistently performed better, demonstrating the success of the experimental setup. The authors found that physical effort had no significant effect on player immersion and that LX players were equally immersed in both conditions, while HX players experienced decreased immersion when presented with the HE condition. This is hypothesized to be due to HX players viewing the challenge of the game to be cognitive and therefore found the added physical challenges of the HE condition unreasonable or frustrating. No interaction between expertise and level of challenge were observed (having more expertise didn’t decrease the number of actions performed). IX player data was discarded due to significant differences in performance, likely due to not reading the tutorial.

Experiment two tested the hypothesis that participants playing under time pressure will experience significantly higher immersion and challenge than those playing without time pressure. Testing was conducted using 22 players playing Bejeweled in a timed or un-timed mode for 15 minutes before completing the IEQ. The authors found that players playing the timed mode experienced a higher level of challenge as well as significantly more immersion. Effects of expertise were not measured in this experiment.

Experiment three addressed the hypothesis that expertise affects the level of cognitive challenge associated with a game, thereby affecting the level of immersion. The authors tested their hypothesis with 20 players, divided into expert or novice groups, playing Tetris at low difficulty (level 1) or high difficulty (level 6). Players played for 15 minutes with the low difficulty players not allowed to progress past level 2 and the high difficulty players allowed to continue until the game ended before resetting. All players played for a total of 15 minutes before completing their IEQ. Expert and novice players were equally immersed at high difficulty. At low difficulty, novice players experienced a slight increase in immersion and expert players experienced a significant drop. These results confirm that immersion is dependent on a balance between skill and challenge, but reveal that challenge has no effect on immersion when expertise isn’t taken into account.

Related Works

Immersion, Engagement, and Presence: A Method for Analyzing 3-D Video Games by Alison McMahan
Flow and Immersion in First-Person Shooters: Measuring the player’s gameplay experience by L. Nacke & C. A. Lindley
Ludic Engagement and Immersion as a Generic Paradigm for Human-Computer Interaction Design by C. A. Lindley
Revising Immersion: A Conceptual Model for the Analysis of Digital Game Involvement by Gordon Calleja
Patterns in Game Design by S. Björk & J. Holopainen
Video Games: Perspective, Point-of-View, and Immersion by L. N. Taylor
Sex Differences in Video Game Play: A Communication-Based Explanation by K. Lucas & J. L. Sherry
Video Game Designs by Girls and Boys: Variability and Consistency of Gender Differences by Y. B. Kafai
Heuristic Evaluation for Games: Usability Principles for Video Game Design by D. Pinelle, N. Wong & T. Stach
Explaining the Enjoyment of Playing Video Games: The Role of Competition by P. Vorderer, T. Hartmann & C. Klimmt

To begin, I will establish that the study of game experience is well established. Patterns in Game Design is a book on the paradigms of game design and Heuristic Evaluation for Games: Usability Principles for Video Game Design is a paper exploring rule-of-thumb evaluations aimed at improving game design. Both of these works focus on gaming experience from the point of a designer, explicitly emphasizing the role of challenge and the ultimate goal being the creation of an immersive experience.

Additionally, the role of challenge is well established and studied within the context of gaming. Explaining the Enjoyment of Playing Video Games: The Role of Competition takes a psychological look at what makes games fun on a universal level. The authors argue, successfully, that the unifying feature of fun games is competition, namely, the desire to win. This competition comes either in the form of opposing players, or in the form of challenges imposed by the game itself. Further research into the effects of challenge on video games revealed important data that the authors of Not Doing But Thinking failed to take into account. Sex Differences in Video Game Play: A Communication-Based Explanation and Video Game Designs by Girls and Boys: Variability and Consistency of Gender Differences are two independent studies that discuss the variation in response to challenges within video games between men and women. Not Doing But Thinking makes claims on the effects of challenge on game immersion and experience without taking into account that it is well known that men and women have different cognitive responses to challenge and that these differences are evident in the way that they play games.

Next, I assert that the role of immersion in gaming is well studied. Immersion, Engagement, and Presence: A Method for Analyzing 3-D Video Games explores the effects of 3-D design on immersion, analyzing the relationship between immersion and artwork. Video Games: Perspective, Point-of-View, and Immersion is a similar study that focuses on the player’s perspective on a game world and their point-of-view within said world to discuss immersion. Revising Immersion: A Conceptual Model for the Analysis of Digital Game Involvement takes a systemic approach to exploring immersion, focusing on the various forms and levels of involvement that contribute to an immersive experience. Ludic Engagement and Immersion as a Generic Paradigm for Human-Computer Interaction Design identifies immersion as a critical goal for all human-computer interaction applications and explores its potential use in ludic systems. These studies all take a different approach to analyzing the effects and conditions of immersion and agree that immersion is critical to the gaming experience.

Having established that the study of game experience is nothing new and that the role of challenge and the importance of immersion are well studied topics, Not Doing But Thinking only remains novel in that it addresses immersion from a cognitive standpoint and uses quantitative measures in its analysis. This sets Not Doing But Thinking apart from the aforementioned studies, but not from Flow and Immersion in First-Person Shooters: Measuring the player’s gameplay experience. This paper explores immersion from sensory, imaginative, and challenge-based perspectives. Nacke & Lindley use a host of measurements ranging from psychophysiological indications of arousal to qualitative flow measurements to effectively analyze what factors contribute to an immersive experience. I found their study to be much more robust and conclusive, effectively rendering Not Doing But Thinking insignificant.

Evaluation

The authors use a combination of quantitative and qualitative measures to evaluate their experiments on both a systemic level and a component level. In experiment one, evaluation was done based on players’ scores in the tower defense game, the number of actions they performed while playing the game, a quantitative measure of the players’ expertise, and an Immersion Experience Questionnaire. These measures were effectively used to validate their experimental design (component-based evaluation) and conclude that physical effort had no significant effect on player immersion and there was no interaction between expertise and level of challenge. Experiment two used players’ Bejeweled scores and IEQ data to conclude that time constraints increase the level of immersion experienced. Experiment three used a qualitative assessment of players’ skill levels, their Tetris scores, and IEQs to conclude that the level of challenge only affects immersion when skill is taken into account.

Discussion

While the authors’ evaluation methods were excellent their attempt at novelty failed. There is at least one other study that measures immersion using quantitative data and that study does so much more successfully. This study had a good premise but ultimately concluded very little. I enjoyed the paper until I realized that their experiments yielded little data. What little enthusiasm I clung to after reading this paper was quickly dashed once I discovered Flow and Immersion in First-Person Shooters.

Monday, September 3, 2012

Paper Reading #3: LightGuide: Projected Visualizations for Hand Movement Guidance

Introduction

LightGuide: Projected Visualizations for Hand Movement Guidance is a paper by Rajinder Sodhi, Hrvoje Benko, and Andrew D. Wilson. Sodhi is a grad student at the University of Illinois working on his PhD in computer science under the advisement of David Forsyth and Brian Bailey. Benko received his PhD from Columbia University and now works in the Natural Interaction Research group at Microsoft Research focusing on human-computer interaction. Wilson also works for Microsoft Research as a principle researcher, prior to which he obtained his PhD from MIT.

Summary

LightGuide is a proof-of-concept implementation for a system that takes a novel approach to gesture guidance using a projector and a depth camera (Kinect). The authors were motivated by the current how-to paradigm’s lack of valuable feedback in the form of physical interaction. Typically, when learning a gesture such as a yoga pose, an instructor provides feedback by correcting errors through physical touch. When an instructor isn’t present, users rely on videos, diagrams, and textual descriptions. With the rising dependence on do-it-yourself materials found online, this poses a challenge - one that the authors take on with LightGuide.

LightGuide uses a depth camera and a projector mounted to a fixed position on the ceiling to project gesture cues onto the hand of a user. The depth camera and projector are calibrated precisely to allow the user’s hand to be mapped to 3D world coordinates and accurately projected upon. The authors devised three types of cues: follow spot, 3D arrow, and 3D pathlet. The follow spot consists of a white circle and a black arrow centered on the user’s hand indicating z-axis movement as well as positive (blue) and negative (red) coloring indicating xy-axis movement. The 3D arrow is self-explanatory and the 3D pathlet consists of a small path segment with a red dot indicating the user’s current position along the path.

The authors divided the cues into two categories based on whether the cue moves at a steady rate that the user follows or whether the cue advances based on the user’s speed. This categorization, along with the control cases, resulted in the following six testing scenarios: follow spot, 3D follow arrow, 3D self-guided arrow, 3D pathlet (self-guided), video projected onto the user’s hand, and video played on a screen.

The results of the experimentation revealed that, while video cues result in much faster movement on the part of the user, the cues devised by the authors resulted in 85% more accurate performance. The most accurate cue was the follow spot, followed by the arrows, the pathlet, the projected video, and lastly, the video screen.

Related Works

CounterIntelligence: Augmented Reality Kitchen by Leonardo Bonanni, Chia-Hsun Lee & Ted Selker
Development of Head-Mounted Projection Displays for Distributed, Collaborative, Augmented Reality Applications by Jannick P. Rolland, Frank Biocca, Felix Hamza-Lup & Yanggang Ha Ricardo Martins
The Studierstube Augmented Reality Project by Dieter Schmalstieg, Anton Fuhrmann, Gerd Hesina Zsolt Szalavári, L. Miguel Encarnação, Michael Gervautz & Werner Purgathofer
MirageTable: Freehand Interaction on a Projected Augmented Reality Tabletop by Hrvoje Benko, Ricardo Jota & Andrew D. Wilson
Efficient Model-based 3D Tracking of Hand Articulations using Kinect by Iason Oikonomidis, Nikolaos Kyriazis & Antonis A. Argyros
Human Detection Using Depth Information by Kinect by Lu Xia, Chia-Chih Chen & J. K. Aggarwal
Image Guidance of Breast Cancer Surgery Using 3-D Ultrasound Images and Augmented Reality Visualization by Yoshinobu Sato et. al.
A Head-Mounted Display System for Augmented Reality Image Guidance: Towards Clinical Evaluation for iMR1-guided Neurosurgery by F. Sauer et. al.
Exploring MARS: developing indoor and outdoor user interfaces to a mobile augmented reality system by Tobias Höllerer et. al.
A Touring Machine: Prototyping 3D Mobile Augmented Reality Systems for Exploring the Urban Environment by Steven Feiner et. al.

While the system as a whole is somewhat novel, the underlying ideas and individual components have been demonstrated in various other studies. Firstly, the idea to use projectors to augment reality has been in circulation for some time. CounterIntelligence: Augmented Reality Kitchen, written in 2005, explores the idea of using projectors and various sensors to augment a kitchen environment with the goal of improving user speed and safety. Development of Head-Mounted Projection Displays for Distributed, Collaborative, Augmented Reality Applications, a paper written in 2006, and The Studierstube Augmented Reality Project, a paper written in 2002, both discuss the idea of using head-mounted projectors to facilitate an AR experience. MirageTable: Freehand Interaction on a Projected Augmented Reality Tabletop (2012) is another study that chose to monopolize on the tested success of projectors in AR systems, but did so in a much more effectively than LightGuide and with a much more novel approach. LightGuide built on the success of CounterIntelligence by incorporating user-tracking and projecting onto non-static surfaces and improved upon the ideas set forth in Development of Head-Mounted Projection Displays for Distributed, Collaborative, Augmented Reality Applications by enabling the LightGuide system to work without forcing the user to wear any special equipment.

Using Microsoft’s Kinect as a depth camera to track users is another recycled idea. MirageTable, Efficient Model-based 3D Tracking of Hand Articulations using Kinect and Human Detection Using Depth Information by Kinect are all studies that have used Kinect to track user movement within a system. While MirageTable’s use of Kinect was somewhat basic in comparison to LightGuide (MirageTable only tracks shutter glasses about a very limited space), Efficient Model-based 3D Tracking of Hand Articulations using Kinect and Human Detection Using Depth Information by Kinect both utilize Kinect in a much more novel method than LightGuide. Hand articulation tracking and human detection are significantly more advanced applications of Kinect’s capabilities and LightGuide’s implementation would have greatly benefitted from exploring these capabilities and incorporating them into the system to create a more robust application.

The application of AR to guidance systems has been explored within the fields of CHI and medicine to a great extent. Exploring MARS: developing indoor and outdoor user interfaces to a mobile augmented reality system is a paper written in 1999 that discusses the application of AR to allow users to guide each other through environments. A Touring Machine: Prototyping 3D Mobile Augmented Reality Systems for Exploring the Urban Environment, a paper written even earlier (1997) takes this idea even further by imagining a 3D AR system capable of guiding a user through a complex urban environment such as a university campus. Image Guidance of Breast Cancer Surgery Using 3-D Ultrasound Images and Augmented Reality Visualization and A Head-Mounted Display System for Augmented Reality Image Guidance: Towards Clinical Evaluation for iMR1-guided Neurosurgery are studies that use head-mounted video displays to allow doctors to visualize tissue within a patient and provide guidance to assist in surgery. These works not only serve to demonstrate that AR guidance is anything but novel, but that a guidance system aimed at coordinating a users movements with an exact pre-programmed path is trivial compared to other applications explored in previous studies.

Evaluation

The authors evaluated the success of LightGuide using a quantitative comparative evaluation and a qualitative user feedback discussion of pros and cons of their approaches. The depth camera tracked the movement of the users hand through the world coordinates and compared this movement path with the intended path used to formulate the cues. In the case of the video-based cues, scaling wasn’t easily interpreted and the results of the analysis for skewed. The authors corrected for this using an iterative closest point algorithm used to analyze the performance of a user’s shape without taking scale into account. Even after correction, the follow spot still resulted in 85% more accurate movements than the video screen.

The qualitative study was based on user interviews following the completion of their trials. Users stated that they found the follow spot cue easily understandable and close to second nature, but felt that it didn’t provide enough information. The 3D pathlet was found favorable because of its feedforward - users liked knowing what was ahead. The majority of users stated that they preferred the 3-D self-guided arrow over all other visualization cues. Users liked setting their own pace and arrows serve as a familiar directional cue. Note that the 3D arrow cue was the only self-explanatory approach devised by the authors.

I found the evaluation performed in this study excellent. The authors addressed not only the factual success and accuracy of their system, but also took user experience into account - which is ultimately more important in a system aimed at guiding, and therefore teaching, a user.

Discussion

I enjoyed this article but found that it only scratched the surface of what’s possible with this type of technology. The authors limited their study to hand translation ignoring other movements such as rotation, as well as other body parts. While the authors succeeded in demonstrating the potential power of projector-based gesture cues, LightGuide ultimately raised more questions than it answered.

Sunday, September 2, 2012

Paper Reading #2: MirageTable: Freehand Interaction on a Projected Augmented Reality Tabletop

Introduction

MirageTable: Freehand Interaction on a Projected Augmented Reality Tabletop was coauthored by Hrvoje Benko, a researcher at Microsoft Research’s Natural Interaction Research group and Columbia University graduate, Ricardo Jota, a post-doctoral fellow at the University of Toronto working under Daniel Wigdor, and Andrew D. Wilson, a principle researcher at Microsoft Research and MIT graduate.

Summary

“MirageTable is an interactive system designed to merge real and virtual worlds into a single spatially registered experience on top of a table.” Using a depth camera (Kinect), a stereo projector, a stereo sync emitter, a curved screen, and shutter glasses, MirageTable is able to provide a seamless 3D AR experience that not only allows users to interact with virtual objects, but also allows the user to scan real objects into the virtual world. This is all done without gloves, wearable trackers, or other cumbersome gear. To demonstrate the capabilities and limitations of their system, the authors devised “three application examples: virtual 3D model creation, interactive gaming with real and virtual objects, and a 3D teleconferencing experience that not only presents a 3D view of the remote person, but also a seamless 3D shared task space.” MirageTable’s ability to provide a correct 3D perspective view, ability to acquire and relay mesh data in real-time, and ability to perform high-fidelity physical interactions with virtual objects based on both virtual and real geometry - such as the user’s hand, a virtual block, or a real book - combine to provide a high-quality, unique experience.

MirageTable projects a correct 3D perspective using head tracking (done by the Kinect) and stereoscopic projective texturing. Projective texturing allows 3D virtual objects to be correctly placed in the scene alongside real objects by accounting and correcting for occlusions. This is done by rendering the scene from the perspective of each of the user’s eyes, taking into account real geometry as well as virtual content, and projecting these renderings onto captured real-world geometry. Then, a second rendering is performed for each eye from the perspective of the projector. The result is a 3D virtual image that appears correct from the eye of the user.

Acquisition and replay of mesh in real-time was achieved using a depth camera as a continuous 3D digitizer and a custom vertex shader run on a GPU. They did not restrict digitization to anything specific, but instead capture anything that occupies physical space. While replay of the mesh constructed by the depth camera in real-time was not challenging due to advanced GPU technology, mesh acquisition was limited to what parts of the real-world object were visible to the camera. In other words, MirageTable makes a great 3D mirror, but is limited in its ability to fully digitize a physical model without rotating the object, using mirrors, or expanding the system to multiple cameras.

MirageTable aimed to minimize the differences between real and virtual objects within the system by simulating realistic physical interactions that extend from the real world to the virtual space. This is where MirageTable falls short due to technological limitations. Depth cameras aren’t able to accurately infer grasping forces, real-time deformable geometry is too computationally complex, and the limitations of mesh acquisition extend to this capability as well. The authors did, however, approximate the physics of captured geometry using proxy particles and Nvidia’s PhysX game engine.

Related Works

The authors freely admit that no singular component of their system is particularly novel, but that the novelty of their experiment lies in the successful blending of components to create a high-quality system unlike any other ever implemented. Their contributions consist of their “system design and implementation, three prototype applications, and a user study on 3D perception and image quality in [their] system.” The authors were primarily influenced by two projects conducted prior to their experiment: Office of the Future and LightSpace. In addition to these projects and the works citied in their paper, the following un-cited related works help place MirageTable in the context of other research into artificial reality and related topics:

A Survey of Augmented Reality by Ronald T. Azuma
Recent Advances in Augmented Reality by Ronald Azuma, Yohan Baillot, Reinhold Behringer, Steven Feiner, Simon Julier, & Blair MacIntyre
Marker Tracking and HMD Calibration for a Video-based Augmented Reality Conferencing System by Hirokazu Kato & Mark Billinghurst
Efficient Model-based 3D Tracking of Hand Articulations using Kinect by Iason Oikonomidis, Nikolaos Kyriazis & Antonis A. Argyros
Human Detection Using Depth Information by Kinect by Lu Xia, Chia-Chih Chen & J. K. Aggarwal
The Studierstube Augmented Reality Project by Dieter Schmalstieg, Anton Fuhrmann, Gerd Hesina Zsolt Szalavári, L. Miguel Encarnação, Michael Gervautz & Werner Purgathofer
Collaborative Augmented Reality by Mark Billinghurst & Hirokazu Kato
CounterIntelligence: Augmented Reality Kitchen by Leonardo Bonanni, Chia-Hsun Lee & Ted Selker
Perceptual Issues in Augmented Reality by David Drascic & Paul Milgram
MIND-WARPING: Towards Creating a Compelling Collaborative Augmented Reality Game by Thad Starner, Bastian Leibe, Brad Singletary & Jarrell Pair

A Survey of Augmented Reality is an older study conducted in 1997 that covers a wide variety of AR applications explored at the time of writing, but most relevant is the paper’s discussion of the future of AR. The paper states that the two most pressing problems in the field of AR are registration and sensing and that future approaches to addressing these issues will incorporate perceptual studies and real-time computing. MirageTable certainly fits this description and effectively meets the challenges of registration and sensing in a novel way. Recent Advances in Augmented Reality is a followup study conducted in 2001 to update A Survey of Augmented Reality. This study greatly enhances the first study by addressing the rapid technological advancements that occurred just prior to the 21st century. In doing so, the authors realize that the problems faced by the future of AR are technological limitations, user interface limitations, and social acceptance. The advancements from 1997 to 2001 did wonders to improve AR and the authors recognize that future advancements will likely do the same - and they obviously have in the case of MirageTable.

Marker Tracking and HMD Calibration for a Video-based Augmented Reality Conferencing System is a study conducted in 1999 that addresses the same issue as MirageTable’s teleconferencing application in a similar way. MirageTable utilizes many advancements made since 1999 to accomplish a much more robust solution without the use of cumbersome headgear - note that the shutter glasses used in MirageTable’s system are only necessary if the user desires a 3D experience. Efficient Model-based 3D Tracking of Hand Articulations using Kinect is a study that utilizes the capabilities of Microsoft’s Kinect to conduct marker-less hand articulation tracking - a more complex application of Kinect’s capabilities than MirageTable’s shutter glasses tracking. Human Detection Using Depth Information by Kinect explores methods of conducting human-tracking using Kinect hardware. Both of these studies expand on the technical and computational challenges associated with real-time depth tracking - a challenge very central to MirageTable’s functionality that isn’t covered in detail in the original paper.

The Studierstube Augmented Reality Project was conducted in 2002 with the goal of creating a 3D user interface metaphor for augmented reality as powerful as 2D’s ubiquitous desktop metaphor. This project, much like MirageTable, uses projection and 3D elements to approach the task of providing a real world solution to collaborative AR computing systems and was influenced by the Office of the Future project at UNC. Collaborative Augmented Reality, another paper by Billinghurst & Kato, was also conducted in 2002 with the same basic goal as MirageTable: to blend reality and virtuality to allow users to see each other alongside virtual objects. Unlike related works previously discussed, these works run parallel to MirageTable and could be seen as an alternative approach to the same basic challenge. These papers provide a good context for what MirageTable might have been were it developed ten years ago and demonstrate the field of AR’s ongoing interest in developing collaborative 3D systems.

CounterIntelligence: Augmented Reality Kitchen is an AR system based on projection of information onto novel surfaces, much like MirageTable uses curved surfaces to project a seamless artificial surface extending from a real desk. While this paper’s testing scenario and implementation are not directly useful in relation to MirageTable, this paper does serve to highlight the significant contribution that is MirageTable’s user study and other potential applications and extensions of MirageTable’s projection capabilities. Another paper that emphasizes the importance of testing user perception in AR applications is Perceptual Issues in Augmented Reality, a paper written in 1996 that explores the challenges of displaying AR data in graphics in relation to depth perception and stereoscopic imaging. Both of these issues are specifically addressed in MirageTable’s user study of 3D perception and it’s effect on image quality and subsequently, user preference.

Finally, we look at MIND-WARPING: Towards Creating a Compelling Collaborative Augmented Reality Game, another paper that addresses one of the challenges taken on by MirageTable - AR gaming. MIND-WARPING addresses many of the same systematic problems faced by MirageTable: user perception, novel input collection, and collaborative functionality. This paper tackles these issues well, but not as successfully as MirageTable’s implementation. This serves to support the authors of MirageTable’s assertion that their overall system is novel due to its quality and successful incorporation of tested ideas into a novel implementation.

Evaluation

Now that MirageTable has been placed in context with previous related work, it is appropriate to address the methods used to evaluate its success. The authors developed three prototype applications to test the overall system as a whole and to assess the MirageTable’s real-world viability. In addition, a detailed user study of projective texturing quality and perception was done to access the success of their novel application on projectors and stereoscopic imaging.

Virtual 3D model creation was a viable application that allowed real-world objects to be scanned, copied, and manipulated in the virtual space. Testing done with architects demonstrated that MirageTable was able to construct complex virtual models despite the mesh acquisition and interaction limitations previously discussed. Their gaming application was similarly successful. 3D shape approximation was utilized to overcome some of the system’s limitations and allow for passable gameplay, but not without flaws and only with approximately symmetrical objects. The teleconferencing application was the most successful application. The curvature of the screen created a seamless environment which, coupled with 3D imaging and a shared virtual workspace, provided a truly unique experience at or near consumer product quality. All testing done using the prototype applications was informal and only served to demonstrate the capabilities of their system as a whole, the only quantitative measures taken concerned the accuracy, quality, and perception of MirageTable’s 3D projections.

Two experiments were performed to assess MirageTable’s projection capabilities: an evaluation of image quality degradation and an evaluation of user’s depth perception. First, the authors assessed how projected 3D images were impacted by a variety of irregular surfaces using RMS testing. They found that geometric distortions lead to relatively small differences in projection, indicating the success of their texturing technique. Color played a much larger factor than geometry in image distortion. The second experiment was conducted to assess user’s perceptions of the results found in experiment one.

Users viewed virtual balls on a variety of backgrounds and textures and made judgement ratings of the depth of each ball using a tape attached to the table. Overall, six conditions and four depths were analyzed with four repetitions for each participant. The results showed that color played a negligible factor in affecting user’s perception and that the base case resulted in an accurate depth assessment as well. The only two scenarios in which user perception was seriously impacted were the drop (surface at two different levels) and wave (surface riddled with peaks and valleys) conditions. While 3D perception is possible on geometrically distorted backgrounds, it can be less accurate depending on the geometry.

Discussion

I really enjoyed this paper. MirageTable is a very innovative system that combines the best of many other attempts at creating 3D AR. The individual components of their system lack in novelty, but the system as a whole is quite unique. This comes across in their evaluation, which is almost entirely limited to informal systematic testing. Many of the individual components of MirageTable have already been devised and tested, and since their goal is to create a system that is viable for the real-world, this approach is valid. The exception to this is their projective texturing technique, which is a novel approach to projector-based AR and they do test this contribution in detail. MirageTable’s only failings are from lack of technological capability - I would be interested to see what a new implementation of MirageTable would be like in years to come.