Tuesday, August 10, 2010
Eye control for PTZ cameras in video surveillance
Monday, September 14, 2009
GaZIR: Gaze-based Zooming Interface for Image Retrieval (Kozma L., Klami A., Kaski S., 2009)
Abstract
"We introduce GaZIR, a gaze-based interface for browsing and searching for images. The system computes on-line predictions of relevance of images based on implicit feedback, and when the user zooms in, the images predicted to be the most relevant are brought out. The key novelty is that the relevance feedback is inferred from implicit cues obtained in real-time from the gaze pattern, using an estimator learned during a separate training phase. The natural zooming interface can be connected to any content-based information retrieval engine operating on user feedback. We show with experiments on one engine that there is sufficient amount of information in the gaze patterns to make the estimated relevance feedback a viable choice to complement or even replace explicit feedback by pointing-and-clicking."
- László Kozma, Arto Klami, and Samuel Kaski: GaZIR: Gaze-based Zooming Interface for Image Retrieval. To appear in Proceedings of 11th Conference on Multimodal Interfaces and The Sixth Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI), Boston, MA, USA, Novermber 2-6, 2009. (abstract, pdf)
Thursday, September 18, 2008
The Inspection of Very Large Images by Eye-gaze Control
"The researchers presented novel methods for navigating and inspecting extremely large images solely or primarily using eye gaze control. The need to inspect large images occurs in, for example, mapping, medicine, astronomy and surveillance, and this project considered the inspection of very large aerial images, held in Google Earth. Comparative search and navigation tasks suggest that, while gaze methods are effective for image navigation, they lag behind more conventional methods, so interaction designers might consider combining these techniques for greatest effect." (BCS Interaction)
Abstract
The increasing availability and accuracy of eye gaze detection equipment has encouraged its use for both investigation and control. In this paper we present novel methods for navigating and inspecting extremely large images solely or primarily using eye gaze control. We investigate the relative advantages and comparative properties of four related methods: Stare-to-Zoom (STZ), in which control of the image position and resolution level is determined solely by the user's gaze position on the screen; Head-to-Zoom (HTZ) and Dual-to-Zoom (DTZ), in which gaze control is augmented by head or mouse actions; and Mouse-to-Zoom (MTZ), using conventional mouse input as an experimental control.
The need to inspect large images occurs in many disciplines, such as mapping, medicine, astronomy and surveillance. Here we consider the inspection of very large aerial images, of which Google Earth is both an example and the one employed in our study. We perform comparative search and navigation tasks with each of the methods described, and record user opinions using the Swedish User-Viewer Presence Questionnaire. We conclude that, while gaze methods are effective for image navigation, they, as yet, lag behind more conventional methods and interaction designers may well consider combining these techniques for greatest effect.
Tuesday, April 15, 2008
Gaze Interaction Demo (Powerwall@Konstanz Uni.)
The demonstration can be view at a better quality (10Mb)
Also make sure to check out the 360 deg. Globorama display demonstration. It does not use eye tracking for interaction but a laser pointer. Nevertheless, really cool immersive experience, especially the Google Earth zoom in to 360 panoramas.
Wednesday, March 12, 2008
Eye Gaze Interaction with Expanding Targets (Minotas, Spakov, MacKenzie, 2004)
Abstract
"Recent evidence on the performance benefits of expanding targets during manual pointing raises a provocative question: Can a similar effect be expected for eye gaze interaction? We present two experiments to examine the benefits of target expansion during an eye-controlled selection task. The second experiment also tested the efficiency of a “grab-and-hold algorithm” to counteract inherent eye jitter. Results confirm the benefits of target expansion both in pointing speed and accuracy. Additionally, the grab-and-hold algorithm affords a dramatic 57% reduction in error rates overall. The reduction is as much as 68% for targets subtending 0.35 degrees of visual angle. However, there is a cost which surfaces as a slight increase in movement time (10%). These findings indicate that target expansion coupled with additional measures to accommodate eye jitter has the potential to make eye gaze a more suitable input modality." (Paper available here)
Their "Grab-and-hold" algorithm that puts some more intelligent processing of the gaze data. "Upon appearance of the target, there is a settle-down period of 200 ms during which the gaze is expected to land in the target area and stay there. Then, the algorithm filters the gaze points until the first sample inside the expanded target area is logged. When this occurs, the target is highlighted and the selection timer triggered. The selection timer counts down a specified dwell time (DT) interval. "
While reading this paper I came to think about an important question concerning filtering of gaze data. The delay that comes from collecting the samples used for the algorithm processing causes a delay in the interaction. For example, if I were to sample 50 gaze positions and then average these to reduce the jitter it would result in a one second delay on a system that captures 50 images per second (50Hz) As seen in other papers as well there is a speed-accuracy trade off to make. What is more important, a lower error rate or a more responsive system?
Friday, March 7, 2008
Inspiration: All Eyes on the Monitor (Mollenbach et al, 2008)
ABSTRACT
The paper was presented at the annual International Conference for Intelligent Interfaces (IUI) that was held in Maspalomas, Gran Canaria between 13-16th January 2008.
Monday, March 3, 2008
Zooming and Expanding Interfaces / Custom componenets
The first component is a dwell-based menu button that on fixation will a) provide a dwelltime indicator by animating a small glow effect surrounding the button image and b) after 200ms expand an ellipse that houses the menu options. This produces a two step dwell activation while making use of the display area in a much more dynamic way. The animation is put in place to keep the users fixation remained at the button for the duration of the dwell time. The items in the menu are displayed when the ellipse has reached its full size.
This gives the user a feedback in his parafoveal region and at the same time the glow of the button icon has stopped indicating a full dwelltime execution. (bit hard to describe in words, perhaps easier to understand from the images below) The parafoveal region of our visual field is located just outside the foveal region (where the full resolution vision takes place). The foveal area is about the size of a thumbnail on an armslengths distance, items in the parafoveal region still can be seen but the resolution/sharpness is reduced. We do see them but have to make a short saccade for them to be in full resolution. In other words the menu items pop out at a distance that attracts a short saccade which is easily discriminated by the eye tracker. (Just4fun test your visual field)
The second prototype I have been working on is also inspired by the usage of expanding surfaces. The purpose is a gaze driven photo gallery where thumbnail sized image previews becomes enlarged upon glancing at them. The enlarged view displays an icon which can be fixated to make the photo appear in full size.
Saturday, February 23, 2008
Inspiration: EyeWindows (Fono et al, 2005)
Abstract
In this paper, we present an attentive windowing technique that uses eye tracking, rather than manual pointing, for focus window selection. We evaluated the performance of 4 focus selection techniques: eye tracking with key activation, eye tracking with automatic activation, mouse and hotkeys in a typing task with many open windows. We also evaluated a zooming windowing technique designed specifically for eye-based control, comparing its performance to that of a standard tiled windowing environment. Results indicated that eye tracking with automatic activation was, on average, about twice as fast as mouse and hotkeys. Eye tracking with key activation was about 72% faster than manual conditions, and preferred by most participants. We believe eye input performed well because it allows manual input to be provided in parallel to focus selection tasks. Results also suggested that zooming windows outperform static tiled windows by about 30%. Furthermore, this performance gain scaled with the number of windows used. We conclude that eye-controlled zooming windows with key activation provides an efficient and effective alternative to current focus window selection techniques. Download paper (pdf).
David Fono, Roel Vertegaal and Conner Dickie are researchers at the Human Media Lab at the Queen's University in Kingston, Canada.
Friday, February 22, 2008
Inspiration: Fisheye Lens (Ashmore et al. 2005)
Abstract
"This paper evaluates refinements to existing eye pointing techniques involving a fisheye lens. We use a fisheye lens and a video-based eye tracker to locally magnify the display at the point of the user’s gaze. Our gaze-contingent fisheye facilitates eye pointing and selection of magnified (expanded) targets. Two novel interaction techniques are evaluated for managing the fisheye, both dependent on real-time analysis of the user’s eye movements. Unlike previous attempts at gaze-contingent fisheye control, our key innovation is to hide the fisheye during visual search, and morph the fisheye into view as soon as the user completes a saccadic eye movement and has begun fixating a target. This style of interaction allows the user to maintain an overview of the desktop during search while selectively zooming in on the foveal region of interest during selection. Comparison of these interaction styles with ones where the fisheye is continuously slaved to the user’s gaze (omnipresent) or is not used to affect target expansion (nonexistent) shows performance benefits in terms of speed and accuracy" Download paper (pdf)
Wednesday, February 20, 2008
Inspiration: GUIDe Project (Kumar&Winograd, 2007)
Abstract
"The GUIDe (Gaze-enhanced User Interface Design) project in the HCI Group at Stanford University explores how gaze information can be effectively used as an augmented input in addition to keyboard and mouse. We present three practical applications of gaze as an augmented input for pointing and selection, application switching, and scrolling. Our gaze-based interaction techniques do not overload the visual channel and present a natural, universally-accessible and general purpose use of gaze information to facilitate interaction with everyday computing devices." Download paper (pdf)
Demonstration video
"The following video shows a quick 5 minute overview of our work on a practical solution for pointing and selection using gaze and keyboard. Please note, our objective is not to replace the mouse as you may have seen in several articles on the Web. Our objective is to provide an effective interaction technique that makes it possible for eye-gaze to be used as a viable alternative (like the trackpad, trackball, trackpoint or other pointing techniques) for everyday pointing and selection tasks, such as surfing the web, depending on the users' abilities, tasks and preferences."
The use of "focus points" is good design decisions as it provides the users with a fixation point which is much smaller than the actual target. This provides a clear and steady fixation which is easily discriminated by the eye tracker. The idea of displaying something that will "lure" the users fixation to remain still is something I intend to explore in my own project.
As mentioned the GUIDe project has developed several applications besides the EyePoint, such as the EyeExpose (application switching), gaze-based password entry, automatic text scrolling.
More information can be found in the GUIDe Publications
Inspiration: ZoomNavigator (Skovsgaard, 2008)
Abstract
The goal of this research is to estimate the maximum amount of noise of a pointing device that still makes interaction with a Windows interface possible. This work proposes zoom as an alternative activation method to the more well-known interaction methods (dwell and two-step-dwell activation). We present a magnifier called ZoomNavigator that uses the zoom principle to interact with an interface. Selection by zooming was tested with white noise in a range of 0 to 160 pixels in radius on an eye tracker and a standard mouse. The mouse was found to be more accurate than the eye tracker. The zoom principle applied allowed successful interaction with the smallest targets found in the Windows environment even with noise up to about 80 pixels in radius. The work suggests that the zoom interaction gives the user a possibility to make corrective movement during activation time eliminating the waiting time found in all types of dwell activations. Furthermore zooming can be a promising way to compensate for inaccuracies on low-resolution eye trackers or for instance if people have problems controlling the mouse due to hand tremors.
The sequence of images are screenshots from ZoomNavigator showing
a zoom towards a Windows file called ZoomNavigator.exe.
Two-step zoom
The two-step zoom activation is demonstrated in the video below by IT University of Copenhagen (ITU) research director prof. John Paulin Hansen. Notice how the error rate is reduced by the zooming style of interaction, making it suitable for applications with need for detailed discrimination. It might be slower but error rates drops significantly.
"Dwell is the traditional way of making selections by gaze. In the video we compare dwell to magnification and zoom. While the hit-rate is 10 % with dwell on a 12 x 12 pixels target, it is 100 % for both magnification and zoom. Magnification is a two-step process though, while zoom only takes on selection. In the experiment, the initiation of a selection is done by pressing the spacebar. Normally, the gaze tracking system will do this automatically when the gaze remains within a limited area for more than approx. 100 ms"
For more information see the publications of the ITU.
Inspiration: StarGazer (Skovsgaard et al, 2008)
The ability to enter text into the system is crucial for communication, without hands or speech this is somewhat problematic. The StartGazer software aims at solving this by introducing a novel 3D approach to text entry. In December I had the opportunity to visit ITU and try the StarGazer (among other things) myself, it is astonishingly easy to use. Within just a minute I was typing with my eyes. Rather than describing what it looks like, see the video below.
The associated paper is to be presented at the ETRA08 conference in March.
Tuesday, February 19, 2008
Inspiration: GazeSpace (Laqua et al. 2007)
The paper below was presented last year on a conference held by the British Computer Society specialist group on Human Computer Interaction. Catching my attention is the focus on providing a custom content spaces (canvas), good feedback and using a dynamic dwell-time, something I intend to incorporate into my own gaze GUI components. Additionally, the idea on expanding the content canvas upon a gaze fixation is really nice and something I will attempt to do in .Net/WPF (initial work displays a set of photos that becomes enlarged upon fixation)
GazeSpace Eye Gaze Controlled Content Spaces (Laqua et al. 2007)
Abstract
In this paper, we introduce GazeSpace, a novel system utilizing eye gaze to browse content spaces. While most existing eye gaze systems are designed for medical contexts, GazeSpace is aimed at able-bodied audiences. As this target group has much higher expectations for quality of interaction and general usability, GazeSpace integrates a contextual user interface, and rich continuous feedback to the user. To cope with real-world information tasks, GazeSpace incorporates novel algorithms using a more dynamic gaze-interest threshold instead of static dwell-times. We have conducted an experiment to evaluate user satisfaction and results show that GazeSpace is easy to use and a “fun experience”. Download paper (PDF)
About the author
Sven Laqua is a PhD Student & Teaching Fellow at the Human Centred Systems Group a part of the Dept. of Computer Science at University College London. Sven has a personal homepage, university profile and a blog (rather empty at the moment)