Martin Tall On Gaze Interaction

Monday, March 10, 2008

Application of Fitts Law to Eye Gaze Interaction Interfaces (Miniotas, 2000)

Fitts's law (often cited as Fitts' law) is a model of human movement which predicts the time required to rapidly move to a target area, as a function of the distance to the target and the size of the target. Paul M. Fitts (1912 – 1965) was a psychologist at Ohio State University (later at the University of Michigan). He developed a model of human movement, Fitts's law, based on rapid, aimed movement, which went on to become one of the most highly successful and well studied mathematical models of human motion. Fitts's law is used to model the act of pointing, both in the real world (e.g., with a hand or finger) and on computers (e.g., with a mouse) (Source: Wikipedia, 2008-03-11)

I became interested when I found the paper "Application of Fitts Law to Eye Gaze Interaction Interfaces" by Darius Miniotas (2000) at the Siauliai University, Lithuania. The study does only contain six participants. The task consists of keeping a fixation within 26mm x 26mm box continuously for 250ms. Knowing the noise and jitter in all eye trackers (the one I'm using is state-of-the-art 2008) the task might not be the best one for illustrating Fitts law while using eye trackers. Additionally, presenting a visual indicator of gaze position may appear as distracting due to often present offsets in eye tracking algorithms (somewhat off and moving around).

Abstract

An experiment is described comparing the performance of an eye tracker and a mouse in a simple pointing task. Subjects had to make rapid and accurate horizontal movements to targets that were vertical ribbons located at various distances from the cursor's starting position. The dwell-time protocol was used for the eye tracker to make selections. Movement times were shorter for the mouse than for the eye tracker. Fitts' Law model was shown to predict movement times using both interaction techniques equally well. The model is thus seen to be a potential contributor to design of modern multimodal human computer interfaces. (ACM Paper)

Inspiration: Professor Andrew Duchowksi

Andrew Duchowski is one on the leading authorities on eye tracking and gaze interaction working out of the College of Engineering and Science at Clemson University (South Carolina, U.S.) Andrew is involved in organizing the annual Eye Tracking Research and Applications Symposium (ETRA) conference.

Research interests

Visual perception and human-computer interaction.
Computer graphics, eye tracking, virtual environments.
Computer vision and digital imaging.
Wavelet and multi-resolution analysis.

Andrew has published the book Eye Tracking Methodology: Theory and Practice which is one of the very few titles within the field which is especially oriented towards practical research and technical development. Well worth reading if your aiming at developing gaze driven applications (deals with algorithms, experimental setup etc)

Eye Tracking Methodology: Theory and Practice, 2nd ed.
Duchowski, A. T. (2007), Springer-Verlag, London, UK.
ISBN: 978-1-84628-808-7

Resume, talks and publications

Inspiration: Professor Rob Jacobs (Mr Midas Touch)

Rob Jacobs, currenly at Tufts University, was early on interested in gaze based interaction. Jacobs have a long list of both honors, publications and presentations in the HCI field. Being the founder of the "Midas Touch" analogy and many other fundamental aspects of gaze interaction the guy clearly deserves an introduction.

From his homepage

"Robert Jacob is a Professor of Computer Science at Tufts University, where his research interests are new interaction media and techniques and user interface software. He is currently also a visiting professor at the Universite Paris-Sud, and he was a visiting professor at the MIT Media Laboratory, in the Tangible Media Group, and continues collaboration with that group. Before coming to Tufts, he was in the Human-Computer Interaction Lab at the Naval Research Laboratory. He received his Ph.D. from Johns Hopkins University, and he is a member of the editorial board of Human-Computer Interaction and the ACM Transactions on Computer-Human Interaction. He was Papers Co-Chair of the CHI 2001 conference, Co-Chair of UIST 2007, and Vice-President of ACM SIGCHI. He was elected to the ACM CHI Academy in 2007, an honorary group of people who have made extensive contributions to the study of HCI and have led the shaping of the field."

Research topics:
Human-Computer Interaction
New Interaction Techniques and Media
Tangible User Interfaces
Virtual Environments
User Interface Software
Information Visualization

Must-read:

R.J.K. Jacob, “The Use of Eye Movements in Human-Computer Interaction Techniques: What You Look At is What You Get,” ACM Transactions on Information Systems, vol. 9, no. 3, pp. 152-169 (April 1991) [link]

L.E. Sibert and R.J.K. Jacob, “Evaluation of Eye Gaze Interaction,” Proc. ACM CHI 2000 Human Factors in Computing Systems Conference, pp. 281-288, Addison-Wesley/ACM Press (2000). [link]

Inspiration: Dwell-Based Pointing in Applications (Muller-Tomfelde, 2007)

While researching the optimal default value for dwell time execution I stumbled upon this paper by Christian Muller-Tomfelde at the CSIRO ICT Centre, Australia. It does not concern dwell time in the aspect of gaze based interaction but instead focuses on how we handle dwell times while pointing towards objects and conveying this reference to a communication partner. How long can this information be withheld before the interaction becomes unnatural?

Abstract
"This paper describes exploratory studies and a formal experiment that investigate a particular temporal aspect of human pointing actions. Humans can express their intentions and refer to an external entity by pointing at distant objects with their fingers or a tool. The focus of this research is on the dwell time, the time span that people remain nearly motionless during pointing at objects. We address two questions: Is there a common or natural dwell time in human pointing actions? What implications does this have for Human Computer Interaction? Especially in virtual environments, feedback about the referred object is usually provided to the user to confirm actions such as object selection. A literature review and two studies led to a formal experiment in a hand-immersive virtual environment in search for an appropriate feedback delay time for dwell-based pointing actions. The results and implications for applications for Human Computer Interaction are discussed. "

I find the part about the visual feedback experiment interesting.

"We want to test whether a variation of the delay of an explicit visual feedback for a pointing action has an effect of the perception of the interaction process. First, feedback delay time above approximately 430 ms is experienced by users to happen late. Second, for a feedback delay time above approximately 430 ms users experience waiting for feedback to happen and third, feedback delay below 430 ms is considered by users to be natural as in real life conversations. "

Questions asked:

1: Do you have the impression that the system feedback happened in a reasonable time according to your action? Answer: confirmation occurred too fast (1), too late (7).

2: Did you have the feeling to wait for the feedback to happen? Answer: no I didn’t have to wait (1), yes, I waited (7).
3: Did you have the impression that the time delay for the feedback was natural? (i.e., as in a real life communication situation) Answer: time delay is not natural (1), quite natural (7).

"This allows us to recommend a feedback delay time for manual pointing actions of approximately 350 to 600 ms as a starting point for the development of interactive applications. We have shown that this feedback delay is experienced by users as natural and convenient and that the majority of observers of pointing actions gave feedback within a similar time span."

Friday, March 7, 2008

Technology: Consumer-grade EEG devices

Not exactly eye tracking but interesting as a combined modality are the upcoming consumer grade Electroencephalography (EEG) devices sometimes referred to as "brain-mouse". The devices are capable of detecting brain activity by electrodes placed on the scalp.

The company OCZ Technology, mainly known for it's computer components such a memory and power supplies, have announced the "Neural Impulse Actuator (NIA)". While this technology itself is nothing new the novelty lies in the accessibility of the device, priced somewhere around $200-250 when introduced next week.

Check out the quick mini-demo by the guys at Anandtech from the Cebit exhibition in Hannover, 2008
This technical presentation (in German) goes into a bit more detail.

From the press release:
"Recently entering mass production, the final edition of the Neural Impulse Actuator (NIA) will be on display for users to try out the new and electrifying way of playing PC games. The final version of the NIA uses a sleek, metal housing, a USB 2.0 interface, a streamlined headband with carbon interface sensors, and user-friendly software. The NIA is the first commercially available product of its kind, and gamers around the world now have access to this forward-thinking technology that’s had the industry buzzing since its inception."

These devices do have the potential for taking the whole hands-free computing to the next level. They could be a feasible candidate for solving the midas touch problem by proving a device that enables the user to perform activations by subtle facial gestures etc. I have yet to discover how sensitive this device is and what the latency is. Additionally, does it come with an API?

I've tried research grade EEG devices as means for interaction while at the University of California, San Diego and pure thoughts of actions are hard to detect in a stable manner. It is well known in the neuroscience community that thought of actions activates the same regions in the brain as would be activated by actually performing them. We even have mirror neurons that are activated by observing other people performing goal-directed actions (picking up that banana) The neural activation of pure thought alone is subtle and hard to detect compared to performing actual movements, such as lifting ones arm. So, I do not expect it to be an actual Brain Computer Interface (BCI) capable of detecting thoughts (ie. thinking of kicking etc.) but more a detector of subtle motions in my forehead muscles (eye and eye brown movements, facial expressions etc.)

The firm Emotive has their own version of a consumer grade EEG which is named Epoc NeuroHeadset, it has been around for a little longer and seem to have a more developed software, but still mainly demonstration applications.

The Emotive NeuroHeadset

Inspiration: All Eyes on the Monitor (Mollenbach et al, 2008)

Going further with the Zooming User Interface (ZUI) is the prototype descibed in the "All Eyes on the Monitor: Gaze Based Interaction in Zoomable, Multi-Scaled Information-Space" (E. Mollenbach, T. Stefansson, J-P Hansen) developed at the Loughborough University in the U.K and the ITU INC, Denmark. It employes the gaze based pan/zoom interaction style which is suitable for gaze interaction to resolve the inaccuracy (target sizes increase when zooming in to them) Additionally, the results indicate that for certain tasks gaze based interaction is faster than traditional mouse operation.

ABSTRACT

The experiment described in this paper, shows a test environment constructed with two information spaces; one large with 2000 nodes ordered in semi-structured groups in which participants performed search and browse tasks; the other was smaller and designed for precision zooming, where subjects performed target selection simulation tasks. For both tasks, modes of gaze- and mouse-controlled navigation were compared. The results of the browse and search tasks showed that the performances of the most efficient mouse and gaze implementations were indistinguishable. However, in the target selection simulation tasks the most efficient gaze control proved to be about 16% faster than the most efficient mouse-control. The results indicate that gaze-controlled pan/zoom navigation is a viable alternative to mouse control in inspection and target exploration of large, multi-scale environments. However, supplementing mouse control with gaze navigation also holds interesting potential for interface and interaction design. Download paper (pdf)

The paper was presented at the annual International Conference for Intelligent Interfaces (IUI) that was held in Maspalomas, Gran Canaria between 13-16th January 2008.

Monday, March 3, 2008

Zooming and Expanding Interfaces / Custom componenets

The inspiration I got from the reviewed papers on using a zooming interaction style to developing a set of zoom based interface components. The interaction style is suitable for gaze to overcome the inaccuracy and jitter of eye movements. My intention is that the interface components should be completely standalone, customizable and straightforward to use. Ideally included in new projects by importing one file and writing one line of code.

The first component is a dwell-based menu button that on fixation will a) provide a dwelltime indicator by animating a small glow effect surrounding the button image and b) after 200ms expand an ellipse that houses the menu options. This produces a two step dwell activation while making use of the display area in a much more dynamic way. The animation is put in place to keep the users fixation remained at the button for the duration of the dwell time. The items in the menu are displayed when the ellipse has reached its full size.

This gives the user a feedback in his parafoveal region and at the same time the glow of the button icon has stopped indicating a full dwelltime execution. (bit hard to describe in words, perhaps easier to understand from the images below) The parafoveal region of our visual field is located just outside the foveal region (where the full resolution vision takes place). The foveal area is about the size of a thumbnail on an armslengths distance, items in the parafoveal region still can be seen but the resolution/sharpness is reduced. We do see them but have to make a short saccade for them to be in full resolution. In other words the menu items pop out at a distance that attracts a short saccade which is easily discriminated by the eye tracker. (Just4fun test your visual field)

Before the button has received focus

Upon fixation the button image displays an animated glow effect indicating the dwell process. The image above illustrates how the menu items pops out on the ellipse surface at the end of the dwell. Note that the ellipse grows in size during a 300ms period, exact timing is configurable by passing a parameter in the XAML design page.

The second prototype I have been working on is also inspired by the usage of expanding surfaces. The purpose is a gaze driven photo gallery where thumbnail sized image previews becomes enlarged upon glancing at them. The enlarged view displays an icon which can be fixated to make the photo appear in full size.

Displaying all the images in the users "My pictures" folder.

Second step, glancing at the photos. Dynamically resized. Optionally further enlarged.

Upon glancing at the thumbnails they become enlarged which activates the icon at the bottom of each photo. This enables the user to make a second fixation on it to bring the photo into a large view. This view has to two icons to navigate back and forth (next photo). By fixating outside the photo the view goes back to the overview.

Saturday, February 23, 2008

Talk: Sensing user attention (R. Vertegaal)

Stumbled upon a talk by Roel Vertegaal at Google Techtalk describing various projects at the Queens University Human Media Lab, many of which are using eye tracking technology. In general, applies knowledge from cognitive science on attention and communication onto practical Human-Computer Interaction interfaces applications. Overall nice 40 minute talk. Enjoy.

Abstract
Over the past few years, our work has centered around the development of computing technologies that are sensitive to what is perhaps the most important contextual cue for interacting with humans that exists: the fabric of their attention. Our research group has studied how humans communicate attention to navigate complex scenarios, such as group decision making. In the process, we developed many different prototypes of user interfaces that sense the users' attention, so as to be respectful players that share this most important resource with others. One of the most immediate methods for sensing human attention is to detect what object the eyes look at. The eye contact sensors our company has developed for this purpose work at long range, with great head movement tolerance, and many eyes. They do not require any personal calibration or coordinate system to function. Today I will announce Xuuk's first product, EyeBox2, a viewing statistics sensor that works at up to 10 meters. EyeBox2 allows the deployment of algorithms similar to Google's PageRank in the real world, where anything can now be ranked according to the attention it receives. This allows us, for example, to track mass consumer interest in products or ambient product advertisements. I will also illustrate how EyeBox2 ties into our laboratory's research on interactive technologies, showing prototypes of attention sensitive telephones, attentive video blogging glasses, speech recognition appliances as well as the world's first attentive hearing aid.

Roel Vertegaal is the director of the Human Media Lab at the Queen's University in Kingston, Canada. Roel is the founder of Xuuk which offers the EyeBox2, a remote eye tracker that works on up to 10 meters distance (currently $1500) and associated analysis software.

Inspiration: EyeWindows (Fono et al, 2005)

Continuing on the zooming style of interaction that has become common within the field of gaze interaction is the "EyeWindows: Evalutaion of Eye-Controlled Zooming Windows for Focus Selection" (Fono&Vertegaal, 2005) Their paper describes two prototypes, one media browser with dynamic (elastic) allocation of screen real estate. The second prototype is used to dynamically size desktop windows upon gaze fixation. Overall, great examples presented in a clear, well structured paper. Interesting evaluation of selection techniques.

Abstract
In this paper, we present an attentive windowing technique that uses eye tracking, rather than manual pointing, for focus window selection. We evaluated the performance of 4 focus selection techniques: eye tracking with key activation, eye tracking with automatic activation, mouse and hotkeys in a typing task with many open windows. We also evaluated a zooming windowing technique designed specifically for eye-based control, comparing its performance to that of a standard tiled windowing environment. Results indicated that eye tracking with automatic activation was, on average, about twice as fast as mouse and hotkeys. Eye tracking with key activation was about 72% faster than manual conditions, and preferred by most participants. We believe eye input performed well because it allows manual input to be provided in parallel to focus selection tasks. Results also suggested that zooming windows outperform static tiled windows by about 30%. Furthermore, this performance gain scaled with the number of windows used. We conclude that eye-controlled zooming windows with key activation provides an efficient and effective alternative to current focus window selection techniques. Download paper (pdf).

David Fono, Roel Vertegaal and Conner Dickie are researchers at the Human Media Lab at the Queen's University in Kingston, Canada.

Friday, February 22, 2008

Inspiration: Fisheye Lens (Ashmore et al. 2005)

In the paper "Efficient Eye Pointing with a Fisheye Lens" (Ashmore et al., 2005) the usage of a fish eye magnification lens is slaved to the foveal region of the users gaze. This is another usage of the zooming style of interaction but compared to the ZoomNavigator (Skovsgaard, 2008) and the EyePointer (Kumar&Winograd, 2007) this is a continuous effect that will magnify what ever the users gaze lands upon. In other words, it is not meant to be a solution for dealing with the low accuracy of eye trackers in typical desktop (windows) interaction. Which makes is suitable for tasks of visual inspection for quality control, medical x-ray examination, satellite images etc. On the downside the nature of the lens distorts the image which breaks the original spatial relationship between items on the display (as demonstrated by the images below)

Abstract
"This paper evaluates refinements to existing eye pointing techniques involving a fisheye lens. We use a fisheye lens and a video-based eye tracker to locally magnify the display at the point of the user’s gaze. Our gaze-contingent fisheye facilitates eye pointing and selection of magnified (expanded) targets. Two novel interaction techniques are evaluated for managing the fisheye, both dependent on real-time analysis of the user’s eye movements. Unlike previous attempts at gaze-contingent fisheye control, our key innovation is to hide the fisheye during visual search, and morph the fisheye into view as soon as the user completes a saccadic eye movement and has begun fixating a target. This style of interaction allows the user to maintain an overview of the desktop during search while selectively zooming in on the foveal region of interest during selection. Comparison of these interaction styles with ones where the fisheye is continuously slaved to the user’s gaze (omnipresent) or is not used to affect target expansion (nonexistent) shows performance benefits in terms of speed and accuracy" Download paper (pdf)

The fish eye lens has been implemented commercially into the products of Idelix Software Inc. which has a set of demonstration available.