Another Tech Blog: Paper Reading #6: ClayVision: The (Elastic) Image of the City

Introduction

ClayVision is a paper by Yuichiro Takeuchi and Ken Perlin. Takeuchi is an Associate Researcher at Sony Computer Science Laboratories Inc. and earned his PhD from The University of Tokyo. In March he obtained a Masters in Design Studies from Harvard. Perlin is a Professor of Computer Science at the NYU Media Research Lab and Director of the Games for Learning Institute.

Summary

ClayVision takes a new approach to augmented reality assisted urban navigation by utilizing knowledge from non-computer science fields to break the current paradigm, which informs the user by pasting potentially irrelevant and frequently unwanted information on top of reality. ClayVision uses computer vision and image processing to create a dynamic real-time replica of the user’s perspective that can then be morphed and adjusted to direct the user and convey information.

ClayVision seeks to take AR from being gimmicky to being a “calm” technology. Current navigation applications involve information bubbles and overlays, not augmenting reality, but distorting reality. A user’s attention is very limited and navigating an urban environment is potentially dangerous. ClayVision addresses the issue of user safety and attention using Edward Tufte’s Data-Ink Ratio, which states that the effectiveness of visual communications can be analyzed using a ratio of ink used to convey information to total ink used in the graphic.

Central to ClayVision’s function is computer-vision based localization, which the authors recognize as an emerging field and an open problem. To address this, the authors created a database of pictures using the iPad’s camera (the tablet used to prototype ClayVision) for a set of predetermined locations and calculate the device’s pose. The authors’ rationalize this sidestep by asserting that even if ClayVision only works in limited locations, it can provide insights into future applications and the design of the system.

Image processing of the video feed is done using a simplified procedure based on SIFT, which outputs a set of feature points and other data used to determine the relative position of the entire frame. This processing is done in real-time on an iPad 2. Output is used to compare the video feed to the database of pictures and the template pictures are transformed based on the iPad’s camera specifications to produce the correct pose. After localization, projection and modelview matrices are calculated to map 3D building models onto the feed. These models are then textured using information from the feed and transformed to communicate information to the user. Texturing is done correctly by altering the image background with template picture information in a way that doesn’t disrupt the video and allows for transformations that don’t cause excessive errors.

While there are a limitless number of transformations possible with ClayVision the authors chose a select few to discuss based on the information they wished to convey to the user. They found that emphasizing buildings was best done by changing their size, value, texture, color, orientation, shape, and/or position. Shape, orientation, and position were ruled out due to humans’ poor selective perception of shapes and the possible confusion of information brought on by changing orientation or position. Emphasis of buildings was implemented by increasing texture saturation and by changing the heights of buildings to emphasize/de-emphasize them. This could be done statically or dynamically, but the motion effects were found to be distracting and posed a potential safety issue. Building usage was expressed by altering facades (making a downtown cafe more distinguishable by giving it a French picturesque exterior). City regions are made distinguishable using post-rendering processes, such as applying a toon-like effect. Lastly, artificial structures can be erected to provide landmarks to the user.

The authors’ approach to this paper is based on discussion around their prototype and possibilities for extending ClayVision in the future, from a software and hardware standpoint.

Related Works

Augmented Reality Navigation by Uchechukwuka Monu & Matt Yu
An Image-Based System for Urban Navigation by Duncan Robertson & Roberto Cipolla
A Touring Machine: Prototyping 3D Mobile Augmented Reality Systems for Exploring the Urban Environment by S. Feiner, B. MacIntyre, T. Höllerer & A. Webster
A Wearable Computer System with Augmented Reality to Support Terrestrial Navigation by B. Thomas, V. Demczuk, W. Piekarski, D. Hepworth & B. Gunther
Pervasive Information Acquisition for Mobile AR-Navigation Systems by Wolfgang Narzt et. al.
AR Navigation System for Neurosurgery by Yuichiro Akatsuka et. al.
Visually Augmented Navigation in an Unstructured Environment Using a Delayed State History by Ryan Eustice, Oscar Pizarro & Hanumant Singh
A Vision Augmented Navigation System by Michael Bosse et. al.
A Vision Augmented Navigation System for an Autonomous Helicopter by Michael Bosse
A Survey of Augmented Reality by Ronald T. Azuma

Augmented reality as a means of navigation is not a new idea. In 1997, when the field of AR was relatively young, Azuma discussed the future of AR in his paper A Survey of Augmented Reality. In this paper he mentions the many potential applications for AR, including navigation.

In addition to being an established idea, AR navigation has also been implemented in a variety of different ways. Augmented Reality Navigation and An Image-Based System for Urban Navigation discuss AR navigation implemented on a mobile phone. A Touring Machine: Prototyping 3D Mobile Augmented Reality Systems for Exploring the Urban Environment and A Wearable Computer System with Augmented Reality to Support Terrestrial Navigation explore AR navigation on custom wearable hardware. Pervasive Information Acquisition for Mobile AR-Navigation Systems discusses an AR navigation system for cars in great detail. AR Navigation System for Neurosurgery takes AR navigation into the operating room by focusing on microscopic navigation, rather than macroscopic. A Vision Augmented Navigation System goes into detail about an AR navigation system and follows up with an application of this system in A Vision Augmented Navigation System for an Autonomous Helicopter.

All of these papers take on the task of using computer enhanced reality to guide users, but each of these applications is very similar or addresses a niche problem (surgery). ClayVision doesn’t claim to be a unique application, it claims to take a unique approach. The only paper I could find that attempts AR navigation in a novel way was Visually Augmented Navigation in an Unstructured Environment Using a Delayed State History, but even this paper fails to address the design and human factors concerns discussed in ClayVision.

Evaluation

Evaluation in this paper is non-existent. The only users mentioned in the paper are the authors. The prototype was not subjected to any standards or any type of measures. There are basic comparisons made between ClayVision and similar research, as in any paper, but that is the closest the authors come to evaluating ClayVision.

Discussion

I think the premise behind ClayVision is really interesting and a valid topic for research, but I was disappointed that the authors neglected to get user feedback or test their prototype against existing AR software. I’d be really interested to see them followup with a more complete analysis in the near future.

Another Tech Blog

Tuesday, September 11, 2012

Paper Reading #6: ClayVision: The (Elastic) Image of the City

No comments:

Post a Comment