Martin Tall On Gaze Interaction

Saturday, August 23, 2008

GaCIT in Tampere, day 4.

A follow up the hands-on session was held by Andrew Duchowski. This time to investigate eye movements on moving stimulo (ie. video clips) A classic experiment from the Cognitive Science domain was used as stimuli (the umbrella woman) It serves as a very nice example on how to use eye trackers in a practical experiment.

The task objective is to either count the passes of the white or black team. The experiment illustrates the inattentional blindness which causes certain objects in the movie to go unnoticed.
More information on the phenomenon can be found in the following papers:

Becklen, Robert and Cervone, Daniel (1983) Selective looking and the noticing of unexpected events. Memory and Cognition, 11, 601-608.

Simons, Daniel J. and Chabris, Christopher F. (1999). Gorillas in our midst: sustained inattentional blindness for dynamic events, Perception, 28, pp.1059-1074.

Rensink, Ronald A. (2000). When Good Observers Go Bad: Change Blindness, Inattentional Blindness, and Visual Experience, Psyche, 6(09), August 2000. Commentary on: A. Mack and I. Rock (1998) Inattentional Blindness. MIT Press.

Defining areas of interest (AOI) often creates the tedious process of keyframing where the object has to be defined in each frame of the video. Automatic matchmoving/rotoscoping software does exists but it often does not perform a perfect segmentation of the moving objects. Dixon et al. have performed research in this area, more information can be found in the following papers:

Dixon, T.D., Nikolov, S.G. et al. (2006). Scanpath analysis of fused multi-sensor images with luminance change: A pilot study. Proceedings of the 9th International Conference on Information Fusion, Florence, Italy, July 2006.
Dixon, T.D, G Nikolov, J Lewis, J. Li, E Fernández, J M Noyes, T Troscianko, D R Bull, C N Canagarajah. Multi-Sensor Fused Video Assessment using Scanpath Analysis. Biologically Inspired Information Fusion (BIIF 2006) WorkshopUniversity of Surrey, UK, 22nd–23rd August 2006.

The afternoon was used to participant presentations which covered a rather wide range of topics, visual cognition, expert vs novices gaze patterns, gaze interaction, HCI and usability research.

GaCIT in Tampere, day 3.

In the morning Howell Istance of De Montford University, currently at University of Tampere, gave a very intersting lecture concerning gaze interaction, it was divided into three parts 1) games 2) mobile devices 3) stereoscopic displays

Games
This is an area for gaze interaction which have a high potential and since the gaming industry has grown to be a hugh industy it may help to make eye trackers accessible/affordable. The development would be benificial for users with motor impairments.

A couple of examples for implementations were then introduced. The first one was a first person shoother running on a XBOX360:

The experimental setup evaluation contained 10 repeated trials to look at learning (6 subjects). Three different configurations were used 1) gamepad controller moving and aiming (no gaze) 2) gamepad controller moving and gaze aiming and 3) gamepad controller moving forward only, gaze aiming and steering of the movement.
Results:

However, twice as many shots were fired that missed in the gaze condition which can be described as a "machine gun" approach. Noteworthy is that no filtering was applied to the gaze position.

Howell have conducted a analysis of common tasks in gaming, below is a representation of the amount of actions in the Guild Wars game. The two bars indicate 1) novices and 2) experienced users.

Controlling all of these different actions requires switching of task mode. This is very challenging considering only on input modality (gaze) with no method of "clicking".

There are several ways a gaze interface can be constructed. From a bottom up approach. First the position of gaze can be used to emulate the mouse cursor (on a system level) Second, a transparent overlay can be placed on top of the application. Third, a specific gaze interface can be developed (which has been my own approach) This requires a modification of the original application which is not always possible.

The Snap/Clutch interaction method developed by Stephen Vickers who is working with Howell operates on the system level to emulate the mouse. This allows for specific gaze gestures to be interpretated which is used to switch mode. For example a quick glace to the left of the screen will activate a left mouse button click mode. When a eye fixation is detected in a specific region a left mouse click will be issued to that area.

When this is applied to games such as World of Warcraft (demo) specific regions of the screen can be used to issue movement actions towards that direction. The image below illustrates these regions overlaid on the screen. When a fixation is issued in the A region an action to move towards that direction is issued to the game it self.

Stephen Vickers gaze driven World of Warcraft interface.

After lunch we had a hands-on session with the Snap/Clutch interaction method where eight Tobii eye trackers were used for a round multiplayer of WoW! Very different from a traditional mouse/keyboard setup and takes some time to get used to.

Istance, H.O.,Bates, R., Hyrskykari, A. and Vickers, S. Snap Clutch, a Moded Approach to Solving the Midas Touch Problem. Proceedings of the 2008 symposium on Eye Tracking Research & Applications; ETRA 2008. Savannah, GA. 26th-28th March 2008. Download
Bates, R., Istance, H.O., and Vickers, S. Gaze Interaction with Virtual On-Line Communities: Levelling the Playing Field for Disabled Users. Proceedings of the 4th Cambridge Workshop on Universal Access and Assistive Technology; CWUAAT 2008. University of Cambridge, 13th-16th April 2008. Download

The second part of the lecture concerned gaze interaction for mobile phones. This allows for ubiquitous computing where the eye tracker is integrated with a wearable display. As a new field it is surrounded with certain issues (stability, processing power, variation in lightning etc.) but all of which will be solved over time. The big question is what the "killer-application" will be. ( entertainment?) A researcher from Nokia attended the lecture and introduced a prototype system. Luckily I had the chance to visit their research department the following day to get a hands-on with their head mounted display with a integrated eye tracker (more on this in another post)

The third part was about stereoscopic displays which adds a third dimension (depth) to the traditional X and Y axis. There are several projects around the world working towards making this everyday reality. However, tracking the depth of gaze fixation is limited. The vergence (as seen by the distance between both pupils) eye movements are hard to measure when the distance to objects move above two meters.

Calculating convergence angles
d = 100 cm tan θ = 3.3 / 100; θ = 1.89 deg.
d = 200 cm tan θ = 3.3 / 200; θ = 0.96 deg.

Related papers on stereoscopic eye tracking:

Essig, K., Pomplun, M. & Ritter, H. (2006). A neural network for 3D gaze recording with binocular eye trackers. International Journal of Parallel, Emergent, and Distributed Systems, 21 (2), 79-95.

Y-M Kwon, K-W Jeon, J Ki, Q M. Shahab, S Jo and S-K Kim (2006). 3D Gaze Estimation and Interaction to Stereo Display The International Journal of Virtual Reality, 5(3):41-45

The afternoon was spent with a guided tour around Tampere followed by a splendid dinner at a "viking" themed restaurant.

GaCIT in Tampere, day 2.

The second of GaCIT in Tampere started off with a hands-on lab by Andrew Duchowski. This session followed up on the introduction the day before. The software of choice was Tobii Studio which is an integrated solution for displaying stimuli and visualization of eye movements (scanpaths, heat-maps etc.) Multiple types of stimuli can be used, including text, images, video, websites etc.

The "experiment" consisted of two ads shown below. The hypothesis to be investigated was that the direction of gaze would attract more attention towards the text compared to the picture where the baby is facing the camera.

After calibrating the user the stimulus is observed for a specific amout of time. When the recording has completed a replay of the eye movements can be visually overlaid ontop of the stimuli. Furthermore, several recordings can be incorporated into one clip. Indeed the results indicate support for the hypothesis. Simply put, faces attract attention and the direction of gaze guides it further.

After lunch Boris Velichkovsky gave a lecture on cognitive technologies. After a quick recap of the talk the day before about the visual system the NBIC report was introduced. This concerns the converging technologies of Nano-, Bio-, Information Technology and Cognitive Science.

Notable advances in these fields contain the Z3 computer (Infotech, 1941), DNA (Bio, 1953), Computed Tomography scan (Nano, 1972) and Short Term Memory (CogSci, 1968) All of which has dramtically improved human understanding and capabilities.

Another interesting topic concerned the superior visual recognition skills humans have. Research have demonstrated that we are able to recognize up to 2000 photos after two weeks with a 90% accuracy. Obviously the visual system is our strongest sense, however much of our computer interaction as a whole is driven by a one way information flow. Taking the advaces in
bi-directional OLED microdisplays in to account the field of augmented reality have a bright future. These devices act as both camera and displaying information at the same time. Add an eye tracker to the device and we have some really intresting opportunities.

Boris also discussed the research of Jaak Panksepp concerning the basic emotional systems in the mammals. (emo-systems, anatomical areas and neurotransmitters for modulation)

To sum up the second day was diverse in topics but non the less interesting and demonstrates the diversity of knowledge and skills needed for todays researchers.

Tuesday, August 19, 2008

GaCIT in Tampere, day 1.

The first day of the summer school on gaze, communication and interaction technology (GACIT, pronounced gaze-it) were started of by a talk with Boris Velichkovsky. The topic was visual cognition, eye movement and attention. These are some of the notes I made.

Some basic findings of the visual system were introduced. In general, the visual system is divided into two pathways, the dorsal and ventral system. The dorsal goes from the striate cortex (back of the brain) and upwards (towards the posterior parietal). This pathway of visual information concerns the spatial arrangement of objects in our environment. Hence, it is commonly termed the "where" pathway. The other visual pathway goes towards the temporal lobes (just above your ears) and concerns the shape and identification of specific objects, this is the "what" pathway. The ambient system responds early (0-250ms.) after which the focal system takes over.

These two systems are represented by the focal (what) and ambient (where) attention systems. The ambient system has a overall crude but fast response in lower luminance, the focal attention system works the opposite with fine, but slow, spatial resolution. Additional, the talk covered the cognitive models of attention such as Posner, Broadbent etc. (see attention on Wikipedia)

A great deal of the talk concerned the freezing effect (inhibition of saccades and a prolonged fixation) which can to some extent be predicted. The onset of a dangerous "event" can be seen before the acctual response (the prolonged fixation) Just before the fixation (500ms) the predicition can be made with a 95% success. The inhibition comes in two waves where the first one is issued by the affective repsonse of the amygdala (after 80 ms.) which acts on the superior colliculus to inihibit near saccades. A habituating effect on this affective response can be seen where the second wave of inhitition (+170ms.) becomes less apperent, the initial response is however unaffected.

While driving a car and talking on the phone the lack of attention leads to eye movements with shorter fixation durations. This gives an approximated spatial localization of objects. It is the combination of a) duration of the fixation and b) the surrounding saccades that determines the quality of recognition. A short fixation followed by a long subsequent saccade leads to low recognition results. A short fixation followed by a short saccade gives higher recognition scores. A long fixation followed by either short or long saccades leads to equally high recognition results.

Furthermore, a short saccade within the parafoveal region leads to a high level of neural activity (EEG) after 90ms. This differs from long saccades which gives no noticable variance in cortical activity (compared to the base line)

However, despite the classification into two major visual systems, the attentional system can be divided into 4-6 layers of organization. Hence there is no singel point of attention. They have developed during the evolution of the mind to support various cognitive demands.

For example, the role of emotional respones in social communication can be seen with the strong response to facial expressions. Studies have shown that male responds extremely fast to face-to-face images of other males expressing aggressive facial gestures. This low level response happens much faster that our conscious awareness (as low as 80ms. if I recall correctly) Additionally, the eyes are much faster than we can consciously comprehend, as we are not aware of all the eye movements our eyes perform.

In the afternoon Andrew Duchowski from Clemson University gave a talk about eye tracking and eye movement analysis. Various historical apparatus and techniques were introduced (such as infrared corneal reflection) Followed by a research methodology and guidelines for conducting research. A pratical example of a study conducted by Mr. Nalangula at Clemson was described. This compared expert vs. novices viewing of errornous circuit boards. Results indicate that the experts scanpath can improve the results of the novices (ie. detecting more errors) than those who received no training. A few guidelines on how to use visualization were shown (clusters, heatmaps etc.)

The day ended with a nice dinner and a traditional Finnish smoke-sauna followed by a swim in the lake. Thanks goes to Uni. Tampere UCIT group for my best sauna experience to this date.

Tuesday, July 22, 2008

Eye gestures (Hemmert, 2007)

Fabian Hemmert at the Potsdam University of Applied Sciences published his MA thesis in 2007. He put up a site with extensive information and demonstrations of his research in eye gesture such as winks, squints, blinks etc. See the videos or thesis. Good work and great approach!

One example:

"Looking with one eye is a simple action. Seeing the screen with only one eye might therefore be used to switch the view to an alternate perspective on the screen contents: a filter for quick toggling. In this example, closing one eye filters out information on screen to a subset of the original data, such as an overview over the browser page or only the five most recently edited files. It was to see how the users would accept the functionality at the cost of having to close one eye, a not totally natural action." (Source)

Monday, July 21, 2008

SMI Experiment Suite 360

Video demonstrates the easy workflow of Experiment Suite 360: Experiment Builder, iView RED: X Non-Invasive Eye Tracker and BeGaze Analaysis Software. It provides a set of examples of what eye tracking can be used for. Furthermore, the remote based system (IView RED) is the same eye tracker that was used for developing the NeoVisus prototype (although the interface works on multiple systems)

Tuesday, July 15, 2008

Sebastian Hillaire at IRISA Rennes, France

Sebastian Hillaire is a Ph.D student at the IRISA Rennes in France, member of the BUNRAKU and France Telecom R&D. His work is situated around using eye trackers for improving the depth-of-field visual scene in 3D environments. He has published two papers on the topic:

Automatic, Real-Time, Depth-of-Field Blur Effect for First-Person Navigation in Virtual Environment (2008)

"We studied the use of visual blur effects for first-person navigation in virtual environments. First, we introduce new techniques to improve real-time Depth-of-Field blur rendering: a novel blur computation based on the GPU, an auto-focus zone to automatically compute the user’s focal distance without an eye-tracking system, and a temporal filtering that simulates the accommodation phenomenon. Secondly, using an eye-tracking system, we analyzed users’ focus point during first-person navigation in order to set the parameters of our algorithm. Lastly, we report on an experiment conducted to study the influence of our blur effects on performance and subjective preference of first-person shooter gamers. Our results suggest that our blur effects could improve fun or realism of rendering, making them suitable for video gamers, depending however on their level of expertise."

Screenshot from the algorithm implemented in Quake 3 Arena.

Sébastien Hillaire, Anatole Lécuyer, Rémi Cozot, Géry Casiez
Automatic, Real-Time, Depth-of-Field Blur Effect for First-Person Navigation in Virtual Environment. To appear in IEEE Computer Graphics and Application (CG&A), 2008 , pp. ??-?? Source code (please refer to my IEEE VR 2008 publication)

Using an Eye-Tracking System to Improve Depth-of-Field Blur Effects and Camera Motions in Virtual Environments (2008)

"We describes the use of user’s focus point to improve some visual effects in virtual environments (VE). First, we describe how to retrieve user’s focus point in the 3D VE using an eye-tracking system. Then, we propose the adaptation of two rendering techniques which aim at improving users’ sensations during first-person navigation in VE using his/her focus point: (1) a camera motion which simulates eyes movement when walking, i.e., corresponding to vestibulo-ocular and vestibulocollic reflexes when the eyes compensate body and head movements in order to maintain gaze on a specific target, and (2) a Depth-of-Field (DoF) blur effect which simulates the fact that humans perceive sharp objects only within some range of distances around the focal distance.

Second, we describe the results of an experiment conducted to study users’ subjective preferences concerning these visual effects during first-person navigation in VE. It showed that participants globally preferred the use of these effects when they are dynamically adapted to the focus point in the VE. Taken together, our results suggest that the use of visual effects exploiting users’ focus point could be used in several VR applications involving firstperson navigation such as the visit of architectural site, training simulations, video games, etc."

Sébastien Hillaire, Anatole Lécuyer, Rémi Cozot, Géry Casiez
Using an Eye-Tracking System to Improve Depth-of-Field Blur Effects and Camera Motions in Virtual Environments. Proceedings of IEEE Virtual Reality (VR) Reno, Nevada, USA, 2008, pp. 47-51. Download paper as PDF.

QuakeIII DoF&Cam sources (depth-of-field, auto-focus zone and camera motion algorithms are under GPL with APP protection)

Passive eye tracking while playing Civilization IV

While the SMI iView X RED eye tracker used in this video is not used for driving the interaction it showcases how eye tracking can be used for usability evaluations in interaction design (Civilization does steal my attention on occations, Sid Meier is just a brilliant game designer)

Thursday, July 10, 2008

Eye Gaze Interactive Air Traffic Controllers workstation (P.Esser & T.J.J Bos, 2007)

P.Esser and T.J.J Bos at the Maastricht University have developed a prototype for reducing the repetitive strain injuries Air Traffic Controllers sustain while operating their systems. The research was conducted at the National Aerospace Laboratory in the Netherlands. The results indicate a clear advantage compared to the traditional roller/track ball, especially for large distances. This is expected since Fitt's law does not apply in the same manner for eye movement as physical limb/hand movement. Sure eye movement over longer distances takes more time to perform than short ones but it does not compare to moving you arm one inch vs. one meter. Certainly there are more applications that could benifit from gaze assisted interaction, medical imaging in the field of radiology is one (such as CT, MRI, these produce very high resolution images with resolutions up to 4096x4096 pixels)

Summary of the thesis "Eye Gaze Interactive ACT workstation"
"Ongoing research is devoted to finding ways to improve performance and reduce workload of Air Traffic Controllers (ATCos) because their task is critical to the safe and efficient flow of air traffic. A new intuitive input method, known as eye gaze interaction, was expected to reduce the work- and task load imposed on the controllers by facilitating the interaction between the human and the ATC workstation. In turn, this may improve performance because the freed mental resources can be devoted to more critical aspects of the job, such as strategic planning. The objective of this Master thesis research was to explore how human computer interaction (HCI) in the ATC task can be improved using eye gaze input techniques and whether this will reduce workload for ATCos.

In conclusion, the results of eye gaze interaction are very promising for selection of aircraft on a radar screen. For entering instructions it was less advantageous. This is explained by the fact that in the first task the interaction is more intuitive while the latter is more a conscious selection task. For application in work environments with large displays or multiple displays eye gaze interaction is considered very promising. "

Download paper as pdf

Wednesday, July 9, 2008

GazeTalk 5

The GazeTalk system is one of the most comprehensive open solutions for gaze interaction today. It has been developed with the disabled users in mind and supports a wide range of everyday tasks. It dramatically increases the quality of life for the disabled suffering from ALS or similar conditions. The following information is quoted from the COGAIN website.

Information about Gazetalk 5 eye communication system

GazeTalk is a predictive text entry system that has a restricted on-screen keyboard with ambiguous layout for severely disabled people. The main reason for using such a keyboard layout is that it enables the use of an eye tracker with a low spatial resolution (e.g., a web-camera based eye tracker).

The goal of the GazeTalk project is to develop an eye-tracking based AAC system that supports several languages, facilitates fast text entry, and is both sufficiently feature-complete to be deployed as the primary AAC tool for users, yet sufficiently flexible and technically advanced to be used for research purposes. The system is designed for several target languages, initially Danish, English, Italian, German and Japanese.

Main features

type-to-talk
writing
email
web – browser
Multimedia – player
PDF – reader
letter and word prediction, and word completion
speech output
can be operated by gaze, headtracking, mouse, joystick, or any other pointing device
supports step-scanning (new!)
supports users with low precision in their movements, or trackers with low accuracy
allows the user to use Dasher inside GazeTalk and to transfer the text written in Dasher back to GazeTalk

GazeTalk 5.0 has been designed and developed by the Eye Gaze Interaction Group at the IT University of Copenhagen and the IT-Lab at the Royal School of Library and Information Science, Copenhagen.

more info Read more About Gazetalk or view GazeTalk manual PDF icon

Short manual on data recording in GazeTalk PDF icon

GazeTalk Videos

Introduction to GazeTalk and its features
Click here to play the video
Watch the video "Introduction to GazeTalk and its features" (same as above!) in Windows Media File (wmv) format (7.5 MB).
Watch a YouTube video on Gaze Interaction for People with ALS/MND .
Watch another YouTube video on a Talk Given with the Eyes Only by Arne Lykke Larsen
Watch a YouTube video on ALS-Communication and GazeTalk , where the developer, Dr. John Paulin Hansen explains how GazeTalk works (in Danish, subtitles in English). The video includes short clips of using GazeTalk and a brief interview of John Paulin Hansen. The video was produced by www.synvision.dk, intiative: Birger Bergmann Jeppesen (bigerbj (at) webspeed (dot) dk), for more information, see www.als-communication.dk)

more info Download Gazetalk