Saturday, February 23, 2008

Talk: Sensing user attention (R. Vertegaal)

Stumbled upon a talk by Roel Vertegaal at Google Techtalk describing various projects at the Queens University Human Media Lab, many of which are using eye tracking technology. In general, applies knowledge from cognitive science on attention and communication onto practical Human-Computer Interaction interfaces applications. Overall nice 40 minute talk. Enjoy.

Over the past few years, our work has centered around the development of computing technologies that are sensitive to what is perhaps the most important contextual cue for interacting with humans that exists: the fabric of their attention. Our research group has studied how humans communicate attention to navigate complex scenarios, such as group decision making. In the process, we developed many different prototypes of user interfaces that sense the users' attention, so as to be respectful players that share this most important resource with others. One of the most immediate methods for sensing human attention is to detect what object the eyes look at. The eye contact sensors our company has developed for this purpose work at long range, with great head movement tolerance, and many eyes. They do not require any personal calibration or coordinate system to function. Today I will announce Xuuk's first product, EyeBox2, a viewing statistics sensor that works at up to 10 meters. EyeBox2 allows the deployment of algorithms similar to Google's PageRank in the real world, where anything can now be ranked according to the attention it receives. This allows us, for example, to track mass consumer interest in products or ambient product advertisements. I will also illustrate how EyeBox2 ties into our laboratory's research on interactive technologies, showing prototypes of attention sensitive telephones, attentive video blogging glasses, speech recognition appliances as well as the world's first attentive hearing aid.

Roel Vertegaal is the director of the Human Media Lab at the Queen's University in Kingston, Canada. Roel is the founder of Xuuk which offers the EyeBox2, a remote eye tracker that works on up to 10 meters distance (currently $1500) and associated analysis software.

Inspiration: EyeWindows (Fono et al, 2005)

Continuing on the zooming style of interaction that has become common within the field of gaze interaction is the "EyeWindows: Evalutaion of Eye-Controlled Zooming Windows for Focus Selection" (Fono&Vertegaal, 2005) Their paper describes two prototypes, one media browser with dynamic (elastic) allocation of screen real estate. The second prototype is used to dynamically size desktop windows upon gaze fixation. Overall, great examples presented in a clear, well structured paper. Interesting evaluation of selection techniques.

In this paper, we present an attentive windowing technique that uses eye tracking, rather than manual pointing, for focus window selection. We evaluated the performance of 4 focus selection techniques: eye tracking with key activation, eye tracking with automatic activation, mouse and hotkeys in a typing task with many open windows. We also evaluated a zooming windowing technique designed specifically for eye-based control, comparing its performance to that of a standard tiled windowing environment. Results indicated that eye tracking with automatic activation was, on average, about twice as fast as mouse and hotkeys. Eye tracking with key activation was about 72% faster than manual conditions, and preferred by most participants. We believe eye input performed well because it allows manual input to be provided in parallel to focus selection tasks. Results also suggested that zooming windows outperform static tiled windows by about 30%. Furthermore, this performance gain scaled with the number of windows used. We conclude that eye-controlled zooming windows with key activation provides an efficient and effective alternative to current focus window selection techniques. Download paper (pdf).

David Fono, Roel Vertegaal and Conner Dickie are researchers at the Human Media Lab at the Queen's University in Kingston, Canada.

Friday, February 22, 2008

Inspiration: Fisheye Lens (Ashmore et al. 2005)

In the paper "Efficient Eye Pointing with a Fisheye Lens" (Ashmore et al., 2005) the usage of a fish eye magnification lens is slaved to the foveal region of the users gaze. This is another usage of the zooming style of interaction but compared to the ZoomNavigator (Skovsgaard, 2008) and the EyePointer (Kumar&Winograd, 2007) this is a continuous effect that will magnify what ever the users gaze lands upon. In other words, it is not meant to be a solution for dealing with the low accuracy of eye trackers in typical desktop (windows) interaction. Which makes is suitable for tasks of visual inspection for quality control, medical x-ray examination, satellite images etc. On the downside the nature of the lens distorts the image which breaks the original spatial relationship between items on the display (as demonstrated by the images below)

"This paper evaluates refinements to existing eye pointing techniques involving a fisheye lens. We use a fisheye lens and a video-based eye tracker to locally magnify the display at the point of the user’s gaze. Our gaze-contingent fisheye facilitates eye pointing and selection of magnified (expanded) targets. Two novel interaction techniques are evaluated for managing the fisheye, both dependent on real-time analysis of the user’s eye movements. Unlike previous attempts at gaze-contingent fisheye control, our key innovation is to hide the fisheye during visual search, and morph the fisheye into view as soon as the user completes a saccadic eye movement and has begun fixating a target. This style of interaction allows the user to maintain an overview of the desktop during search while selectively zooming in on the foveal region of interest during selection. Comparison of these interaction styles with ones where the fisheye is continuously slaved to the user’s gaze (omnipresent) or is not used to affect target expansion (nonexistent) shows performance benefits in terms of speed and accuracy" Download paper (pdf)

The fish eye lens has been implemented commercially into the products of Idelix Software Inc. which has a set of demonstration available.

Wednesday, February 20, 2008

Inspiration: GUIDe Project (Kumar&Winograd, 2007)

In the previous post I introduced the ZoomNavigator (Skovsgaard, 2008) which is similar to the EyePointer system (Kumar&Winograd, 2007) developed within the GUIDe project (Gaze-Enhaced User Interface Design) an initiative by the department for Human Computer Interaction at Stanford University. This system relies on both an eye tracker and a keyboard which excludes users with disabilities (see video below). The aim of the GUIDe is to make the whole human-computer interaction "smarter" (as in intuitive, faster & less cumbersome) This differs from the COGAIN initiative which mainly aims at giving people with disabilities a higher quality of life.

"The GUIDe (Gaze-enhanced User Interface Design) project in the HCI Group at Stanford University explores how gaze information can be effectively used as an augmented input in addition to keyboard and mouse. We present three practical applications of gaze as an augmented input for pointing and selection, application switching, and scrolling. Our gaze-based interaction techniques do not overload the visual channel and present a natural, universally-accessible and general purpose use of gaze information to facilitate interaction with everyday computing devices." Download paper (pdf)

Demonstration video
"The following video shows a quick 5 minute overview of our work on a practical solution for pointing and selection using gaze and keyboard. Please note, our objective is not to replace the mouse as you may have seen in several articles on the Web. Our objective is to provide an effective interaction technique that makes it possible for eye-gaze to be used as a viable alternative (like the trackpad, trackball, trackpoint or other pointing techniques) for everyday pointing and selection tasks, such as surfing the web, depending on the users' abilities, tasks and preferences."

The use of "focus points" is good design decisions as it provides the users with a fixation point which is much smaller than the actual target. This provides a clear and steady fixation which is easily discriminated by the eye tracker. The idea of displaying something that will "lure" the users fixation to remain still is something I intend to explore in my own project.

As mentioned the GUIDe project has developed several applications besides the EyePoint, such as the EyeExpose (application switching), gaze-based password entry, automatic text scrolling.
More information can be found in the GUIDe Publications

Make sure to get a copy of Manu Kumars Ph.D thesis "Gaze-enhanced User Interface Design" which is pleasure to read. Additionally, Manu have founded GazeWorks a company which aims at making the technology accessible for the general public at a lower cost.

Inspiration: ZoomNavigator (Skovsgaard, 2008)

Following up on the StartGazer text entry interface presented in my previous post, another approach to using zooming interfaces is employed in the ZoomNavigator (Skovsgaard, 2008) It addresses the well known issue of using gaze as input on traditional desktop systems, namely inaccuracy and jitter. Interesting solution which relies on dwell-time execution compared to the EyePoint system (Kumar&Winograd, 2007) which is described in the next post.

The goal of this research is to estimate the maximum amount of noise of a pointing device that still makes interaction with a Windows interface possible. This work proposes zoom as an alternative activation method to the more well-known interaction methods (dwell and two-step-dwell activation). We present a magnifier called ZoomNavigator that uses the zoom principle to interact with an interface. Selection by zooming was tested with white noise in a range of 0 to 160 pixels in radius on an eye tracker and a standard mouse. The mouse was found to be more accurate than the eye tracker. The zoom principle applied allowed successful interaction with the smallest targets found in the Windows environment even with noise up to about 80 pixels in radius. The work suggests that the zoom interaction gives the user a possibility to make corrective movement during activation time eliminating the waiting time found in all types of dwell activations. Furthermore zooming can be a promising way to compensate for inaccuracies on low-resolution eye trackers or for instance if people have problems controlling the mouse due to hand tremors.

The sequence of images are screenshots from ZoomNavigator showing
a zoom towards a Windows file called ZoomNavigator.exe.

The principles of ZoomNavigator are shown in the figure above. Zooming is used to focus on the attended object and eventually make a selection (unambiguous action). ZoomNavigator allows actions similar to those found in a conventional mouse. (Skovsgaard, 2008) The system is described in a conference paper titled "Estimating acceptable noise-levels on gaze and mouse selection by zooming" Download paper (pdf)

Two-step zoom
The two-step zoom activation is demonstrated in the video below by IT University of Copenhagen (ITU) research director prof. John Paulin Hansen. Notice how the error rate is reduced by the zooming style of interaction, making it suitable for applications with need for detailed discrimination. It might be slower but error rates drops significantly.

"Dwell is the traditional way of making selections by gaze. In the video we compare dwell to magnification and zoom. While the hit-rate is 10 % with dwell on a 12 x 12 pixels target, it is 100 % for both magnification and zoom. Magnification is a two-step process though, while zoom only takes on selection. In the experiment, the initiation of a selection is done by pressing the spacebar. Normally, the gaze tracking system will do this automatically when the gaze remains within a limited area for more than approx. 100 ms"

For more information see the publications of the ITU.

Inspiration: StarGazer (Skovsgaard et al, 2008)

A major area of research for the COGAIN network is to enable communication for the disabled. The Innovative Communications group at IT University of Copenhagen continuously work on making gaze-based interaction technology more accessible, especially in the field of assistive technology.

The ability to enter text into the system is crucial for communication, without hands or speech this is somewhat problematic. The StartGazer software aims at solving this by introducing a novel 3D approach to text entry. In December I had the opportunity to visit ITU and try the StarGazer (among other things) myself, it is astonishingly easy to use. Within just a minute I was typing with my eyes. Rather than describing what it looks like, see the video below.
The associated paper is to be presented at the ETRA08 conference in March.

This introduces an important solution to the problem of eye tracker inaccuracy namely zooming interfaces. Fixating on a specific region of the screen will display an enlarged version of this area where objects can be earlier discriminated and selected.

The eyes are incredibly fast but from the perspective of eye trackers not really precise. This is due to the physiology properties of our visual system, in specific the foveal region of the eye. This retinal area produces the sharp detailed region of our visual field which in practice covers about the size of a thumbnail on an armslenght distance. To bring another area into focus a saccade will take place which moves the pupil, thus our gaze, this is what is registered by the eye tracker. Hence the discrimination of most eye trackers are in the 0.5-1 degree (in theory that is)

A feasible solution to deal with this limitation in accuracy is to use the display space dynamically and zoom into the areas of interest upon glancing. The zooming interaction style solves some of the issues with inaccuracy and jitter of the eye trackers but in addition it has to be carefully balanced so that it still provides a quick and responsive interface.

However, the to me the novelty in the StarGazer is the notion of traveling through a 3D space, the sensation of movement really catches ones attention and streamlines the interaction. Since text entry is really linear character by character, flying though space by navigating to character after character is a suitable interaction style. Since the interaction is nowhere near the speed of two hand keyboard entry the employment of linguistic probabilities algorithms such as those found in cellphones will be very beneficial (ie. type two or three letters and the most likely words will display in a list) Overall, I find the spatial arrangement of gaze interfaces to be a somewhat unexplored area. Our eyes are made to navigate in a three dimensional world while the traditional desktop interfaces mainly contains a flat 2D view. This is something I intend to investigate further.

Inspiration: COGAIN

Much of the developments seen in the field of gaze interaction stems from the assistive technology field where users whom are unable to use regular computer interfaces are provided tools to empower their everyday life in a wide range of activities such as communication, entertainment, home control etc. For example they can use the eye to type words and sentences which then are synthetically translated into spoken language by software, thus enabling communication beyond blinking. A major improvement in the quality of life.

"COGAIN (Communication by Gaze Interaction) integrates cutting-edge expertise on interface technologies for the benefit of users with disabilities. COGAIN belongs to the eInclusion strategic objective of IST. COGAIN focuses on improving the quality of life for those whose life is impaired by motor-control disorders, such as ALS or CP. COGAIN assistive technologies will empower the target group to communicate by using the capabilities they have and by offering compensation for capabilities that are deteriorating. The users will be able to use applications that help them to be in control of the environment, or achieve a completely new level of convenience and speed in gaze-based communication. Using the technology developed in the network, text can be created quickly by eye typing, and it can be rendered with the user's own voice. In addition to this, the network will provide entertainment applications for making the life of the users more enjoyable and more equal. COGAIN believes that assistive technologies serve best by providing applications that are both empowering and fun to use."

A short introduction by Dr Richard Bates, a research fellow at the School of Computing Sciences at the De Montfort University in Leicester, can be downloaded either as presentation slides or paper.

The COGAIN network is a rich source of information on gaze interaction. A set of tools developed within the network has been made publicly software available for download. Make sure to check out the video demonstations of various gaze interaction tools.

Participating organizations within the COGAIN network.

Tuesday, February 19, 2008

Inspiration: GazeSpace (Laqua et al. 2007)

Parallel to working on the prototypes I continuously search and review papers and thesises on gaze interaction methods / techniques, hardware and software development etc. I will post references on some of these to this blog. A great deal of research and theories on interaction / cognition lies behind the field of gaze interaction.

The paper below was presented last year on a conference held by the British Computer Society specialist group on Human Computer Interaction. Catching my attention is the focus on providing a custom content spaces (canvas), good feedback and using a dynamic dwell-time, something I intend to incorporate into my own gaze GUI components. Additionally, the idea on expanding the content canvas upon a gaze fixation is really nice and something I will attempt to do in .Net/WPF (initial work displays a set of photos that becomes enlarged upon fixation)

GazeSpace Eye Gaze Controlled Content Spaces (Laqua et al. 2007)

In this paper, we introduce GazeSpace, a novel system utilizing eye gaze to browse content spaces. While most existing eye gaze systems are designed for medical contexts, GazeSpace is aimed at able-bodied audiences. As this target group has much higher expectations for quality of interaction and general usability, GazeSpace integrates a contextual user interface, and rich continuous feedback to the user. To cope with real-world information tasks, GazeSpace incorporates novel algorithms using a more dynamic gaze-interest threshold instead of static dwell-times. We have conducted an experiment to evaluate user satisfaction and results show that GazeSpace is easy to use and a “fun experience”. Download paper (PDF)

About the author
Sven Laqua is a PhD Student & Teaching Fellow at the Human Centred Systems Group a part of the Dept. of Computer Science at University College London. Sven has a personal homepage, university profile and a blog (rather empty at the moment)

Monday, February 11, 2008

GazeMemory v0.1a on its way

The extra time spent on developing the Custom Controls for Windows Presentation Foundation (WPF) paid off. What before that took days to develop can now be build within hours. Today I put together a gaze version of the classic game Memory which is controlled by dwell-time (prologed fixation) The "table" contains 36 cards, i.e 18 unique options. By fixating on one card a smooth animation will make the globe on the front of the card to light up and after fixating long enough (500ms) if will show the symbol (flags in first version) After selecting the fist card another is fixated and the two will be compared. If they contain the same symbols then remove them from the table. If not, turn them back over again. The interface provides several feedback mechanisms. Upon glancing the border around the buttons begins to shine, when fixating long enough the dwell time function is activated and illustrated by a white glow that smoothly fades up surrounding the globe.

The Custom Control that will be named GazeButton is to be further developed to support more features such as a configurable dwell-time, feedback such as animations, colors etc. The time spent will be returned tenfold when later on. I plan to release these components as open source as soon as they reach better and more stable performance (ie. production quality with documentation)

Lessons learned so far involves Dependecy properties which is very important if you'd like to develop custom controls in WPF. Animation control and triggers and getting more into DataBinding which looks very promising so far.

Recommended guidelines for WPF custom controls
Three ways to build an image button
Karl on WPF

Screenshot of the second prototype of the my GazeMemory game

Thursday, February 7, 2008

Midas touch, Dwell time & WPF Custom controls

How do you make a distinction between users glancing at objects and fixations with the intention to start a function? This is the well known problem usually referred to as the "Midas touch problem". The interfaces that rely on gaze as the only means input must support this style of interaction and be capable of making distinction between glances when the user is just looking around and fixations with the intention to start a function.

There are some solutions to this. Frequently occurring is the concept of "dwell-time" where you can activate functions simply by a prolonged fixation of an icon/button/image. Usually in the range of 4-500ms or so. This is a common solution when developing gaze interfaces for assistive technology for users suffering from Amyotrophic lateral sclerosis (ALS) or other "paralyzing" conditions where no other modality the gaze input can be used. It does come with some issues, the prolonged fixation means that the interaction is slower since the user has to sit through the dwell-time but mainly it adds stress to the interaction since everywhere you look seems to activate a function.

As part of my intention to develop a set of components for the interface a dwell-based interaction style should be supported. It may not be the best method but I do wish to have it in the drawer just to have the opportunity to evaluate it and perform experiments with it.

The solution I´ve started working on goes something like this; upon a fixation on the object an event is fired. The event launches a new thread which aims at determining if the fixation is within the area long enough for it to be considered to be a dwell (the gaze data is noisy) Half way through it measures if the area have received enough fixations to continue, otherwise aborts the thread. At the end it measures if fixations have resided within the area for more than 70% of the time, in that case, it activates the function.

Working with threads can be somewhat tricky and a some time for tweaking remains. In addition getting the interaction to feel right and suitable feedback is important. I'm investigating means of a) providing feedback on which object is selected b) indications that the dwell process has started and its state. c) animations to help the fixation to remain in the center of the object.

Coding> Windows Presentation Foundation and Custom Controls

Other progress has been made in learning Windows Presentation Foundation (WPF) user interface development. The Microsoft Expression Blend is a tool that enables a graphical design of components. By creating generic objects such as gaze-buttons the overall process will benefit in the longer run. Instead of having all of the objects defined in a single file it is possible to break it into separate projects and later just include the component DLL file and use one line of code to include it in the design.

Additional styling and modification on the objects size can then be performed as if it were a default button. Furthermore, by creating dependency properties in the C# code behind each control/component specific functionality can be easily accessed from the main XAML design layout. It does take a bit longer to develop but will be worth it tenfold later on. More on this further on.

Microsoft Expression Blend. Good companion for WPF and XAML development.

Windows Presentation Foundation (WPF) has proven to be more flexible and useful than it seemed at fist glance. You can really do complex things swiftly with it. The following links have proven to be great resources for learning more about WPF.

Lee Brimelow, experienced developer who took part in the new Yahoo Messenger client. Excellent screencasts that takes you through the development process. and

>Kirupa Chinnathambi, introduction to WPF, Blend and a nice blog. and

Microsoft MIX07, 72 hours of talks about the latest tools and tricks.

Wednesday, February 6, 2008

Better feedback!

Since having a pointer representing the gaze position on the screen becomes distracting some other form of feedback is needed. As mentioned before having a pointer will cause your eyes to more or less automatically fixate on the moving object. Remember, the eyes are never still and even more the eye tracker does create additional jitter.

What we need is a subtle way of showing the user that the eye tracker has captured the gaze coordinates to the same location as he/she is looking at. It's time for trying to make it look a bit nicer than the previous example where the whole background color of the button would change on a gaze fixation.

The reason for choosing to work with Windows Presentation Foundation is that it provides rich functionality for building modern interfaces. For example you can define a trigger to an event, such as GazeEnter (ie. MouseEnter) on a button and then apply a build in graphical effect on the object. These effects are rendering in real time such as a glowing shadow around the object or a gaussian filter that gives the object an out of focus effect. Very useful for this project. Let's give it a try.

This is the "normal" button. Notice the out-of-focus effect on the globe in the center.

Upon receving a glance the event "Image.IsMouseOver" event is trigged. This starts the built-in rendering function BitmapEffect OuterGlowBitmapEffect which generates a nice red glowing border around the button.

The XAML design code (screenshot) for the button. Click to enlarge.
Notice the GlowColor and GlowSize attributes to manipulate the rendering of the effects.
To apply this to the button we define the element Style="{StaticResource GlowButton}" inside the button tag. Further the globe in the center can be brought back in focus and highlighed with a green glow surrounding it inside the button canvas.

The globe is defined as an image object. Upon focus the triggers will set the gaussian blur effect to zero, which means in focus. The glow effect produces the green circle surrounding the globe.

Putting it all together in a nice looking interface, using the Glass Window style, it looks promising and a real improvement since yesterdays boring interface. Providing a small surrounding glow of giving the image focus upon fixation is much better than changing the whole button color. The examples here are perhaps somewhat less subtle than they should, just to demonstrated the effect.

Screenshots of the second prototype with new U.I effects and events.

The "normal" screen with no gaze input.

OnGazeEnter. Generate a glowing border around the button

The other example with a green glow and the globe in full focus.

Tuesday, February 5, 2008

Enabling feedback (OnGazeOver)

By taking away the mouse pointer we also take away the feedback on where the acctual "pointer" is. We know where we are looking but this might not be where the eye tracker have measured the gaze vector to be. There is a clear need for feedback to be sure what object is fixated ie. that the right object is chosen.

In the previous post we gained control over the mouse pointer and use this in secret (hidden) to point with the gaze X and Y coordinates. Now we have access to a whole range of triggers and functions such as MouseEnter or IsMouseOver (replace "mouse" with "gaze" and you get the idea)

For the next small test it's time to introduce Windows Presentation Foundation (WPF) and
XAML layout templates. In my opinion the best to come out of Microsoft in a while.

It enables you to create applications (fast) that look and feel like 2008. No more battleship gray control panels and boring layouts. The first advantage I see it the separation of design into XML files (not all different from HTML) and code in traditional .cs files plus a lot of support for things you really want to do (animations, 3D, Internet, media etc.) If you develop Windows applications and have not got around to test WPF yet you certainly give it a spin.

For example the button we for providing the user with feedback on gaze-position can be defined as this:

Screenshot, code will render a button in the browser.

The built in trigger supports the IsMouseOver event that the UI component provides, and there is many types of behavior supported. All the styles and behaviors can be defined in groups and templates which enables a very powerful structure thats easy to maintain. Additionally it is rather easy to define your own events that should be fired on f.ex onGazeEnter.

While exploring I've placed buttons like this in 4x4, 5x5, 8x6 and 9x9 grids to test how easily they can be discriminated. The 48 button version seemed to have a reasonable distance between the objects and large enough button area for a stable selection. Real experiments with a whole range of users is needed to make design guidelines like this (further down the line)

Inital version. Providing feedback on gaze position.

Redirecting the Gaze X/Y to replace Mouse X/Y

To utilize the wide range of functionality that has been developed for mouse based interaction in our gaze interaction software we need to replace the mouse X/Y coordinates with the gaze X/Y.

This requires one to dig rather deep into the Windows system and DLL files. The function Move takes two integers and you guessed right, the X and Y position of the mouse pointer.
This method is to be called every time the tracker provides us with new gaze data. Subsequently the position of the pointer should be moved to the new position.

So first we modify the EyeTrackerServer class
   // The decoded string from tracker UDP stream
datareceived = System.Text.Encoding.ASCII.GetString(received);

// Create and instance of the object that redirect the gaze to the mouse
RedirectGazeToMouse gazeMouse = new RedirectGazeToMouse();

if (datareceived.Length > 0)
// Extract the X & Y coordinates from the UDP stream from the tracker

// Move the mousepointer according to the gaze X&Y coordinates
gazeMouse.Move(gazeData.GazePositionY, gazeData.GazePositionX);

It is very distracting to actually see the mouse pointer moving on the screen, updated 50 times per second, because you would fixate on it and it would move slightly (remember eye trackers are not as precise) so one would end up "chasing" it around. Let's hide the mouse pointer.
In the main application window (typically Windows1.xaml.cs or similar) this is done by placing the one line in the constructor:

public Window1()

// Hide the mouse cursor since it is replace by gaze coordinates
Cursor = Cursors.None;

// Initialize and start the Eye Tracker
myServer = new EyeTrackerServer();

And here is the RedirectGazeToMouse class. (many thanks to

using System;
using System.Collections.Generic;
using System.Text;
using System.Runtime.InteropServices;
using System.Windows;

namespace GazeHero

public class RedirectGazeToMouse
[DllImport("user32.dll", EntryPoint = "SendInput", SetLastError = true)]

static extern uint SendInput(uint nInputs, INPUT[] pInputs, int cbSize);

[DllImport("user32.dll", EntryPoint = "GetMessageExtraInfo", SetLastError = true)]

static extern IntPtr GetMessageExtraInfo();

private enum InputType

private enum MOUSEEVENTF
MOVE = 0x0001, // mouse move
LEFTDOWN = 0x0002, // left button down
LEFTUP = 0x0004, // left button up
RIGHTDOWN = 0x0008, // right button down
RIGHTUP = 0x0010, // right button up
MIDDLEDOWN = 0x0020, // middle button down
MIDDLEUP = 0x0040, // middle button up
XDOWN = 0x0080, // x button down
XUP = 0x0100, // x button down
WHEEL = 0x0800, // wheel button rolled
VIRTUALDESK = 0x4000, // map to entire virtual desktop
ABSOLUTE = 0x8000, // absolute move

private enum KEYEVENTF
KEYUP = 0x0002,
UNICODE = 0x0004,
SCANCODE = 0x0008,

private struct MOUSEINPUT
public int dx;
public int dy;
public int mouseData;
public int dwFlags;
public int time;
public IntPtr dwExtraInfo;

private struct KEYBDINPUT
public short wVk;
public short wScan;
public int dwFlags;
public int time;
public IntPtr dwExtraInfo;

private struct HARDWAREINPUT
public int uMsg;
public short wParamL;
public short wParamH;

private struct INPUT
public int type;
public MOUSEINPUT mi;
public KEYBDINPUT ki;

This function moves the cursor to a specific point at the screen. X coordinate of the position as pixel Y coordinate of the position as pixel
Returns 0 if there was an error otherwise 1

public uint Move(int x, int y)
double ScreenWidth = System.Windows.SystemParameters.PrimaryScreenWidth;
double ScreenHeight = System.Windows.SystemParameters.PrimaryScreenHeight;

INPUT input_move = new INPUT();
input_move.mi.dx = (int)Math.Round(x * (65535 / ScreenWidth), 0);
input_move.mi.dy = (int)Math.Round(y * (65535 / ScreenHeight), 0);

input_move.mi.mouseData = 0;
input_move.mi.dwFlags = (int)(MOUSEEVENTF.MOVE | MOUSEEVENTF.ABSOLUTE);
INPUT[] input = {input_move};

return SendInput(1, input, Marshal.SizeOf(input_move));

This function simulates a simple mouseclick at the current cursor position
All right if it is 2. All below indicates an error.

public static uint Click()
INPUT input_down = new INPUT();
input_down.mi.dx = 0;
input_down.mi.dy = 0;
input_down.mi.mouseData = 0;
input_down.mi.dwFlags = (int)MOUSEEVENTF.LEFTDOWN;
INPUT input_up = input_down;
input_up.mi.dwFlags = (int)MOUSEEVENTF.LEFTUP;
INPUT[] input = {input_down, input_up};

return SendInput(2, input, Marshal.SizeOf(input_down));



Mouse vs. Gaze

Gaze interaction differs from the mouse mainly on some aspects. First, it is not mechanical. This means that the X,Y position will always, more or less, move while the mouse remains exactly the same if left alone. Second, there is no clicking to activate functions. Initial ideas might be to blink but it does not stand for a permanent solution since we blink out of reflexes all the time. Third, we use our eyes to investigate the environment. Performing gestures such as looking left-right-left is not natural. In other words, performing motor tasks with our eyes does not feel right. The eyes keeps track of the state of the world and the interface should provide just this.

It's time to investigate how how to optimally provide the user with a good sense of feedback. In most areas of software interfaces the phenomenon of roll-over highlighting has been very successful. For example when you move the mouse over buttons they change background color or show a small animation. This type of feedback is crucial for building good interfaces. It shows the user exactly what function can be executed and gives a good clue on what to expect. My initial idea is to throw a grid of gray buttons on the screen, lets say 6 x 6, and change their background color when the gaze enters.

How to solve this?
Even if we are developing a new type of interface it doesn't necessarily mean that we should turn our back on the world as we know it. Most programming languages today support the mouse and keyboard as the main input devices. This provides us with a range of functionality that could be useful, even for developing gaze interfaces.

For example, most U.I components such as button or images have methods and properties that are bound to mouse events. These are usually defined as OnMouseOver or MouseEnter etc. These events can be used to execute specific parts of the code, for example to change the color of a button or detect when a user clicks an object.

Somehow we need to take control over the mouse and replace the X and Y coordinates it receives from the input device with the gaze X/Y coordinates we have extracted from the eye tracker. After a day of searching, trying and curing headaches this turns out to be possible.

Follow me to the next post and I'll show how it's done =)

Day 3:1 - Quick fix for noisy data

Yesterday I was able to plot the recorded gaze data and visually illustrate it by drawing small rectangles where my gaze was positioned. As mentioned the gaze data was rather noisy. Upon fixating on a specific area the points would still spray out within an area about the size of a soda cap. I decided to have a better look at the calibration process. This is done in the IView program supplied by SMI. By running the process in a full size window I was able to get a better calibration, increasing the number of calibration points gives a better result.

Still the gaze data is rather noisy and it is a well known problem within the field. This has previously been solved by applying an algorithm ro smoothen the X and Y position. My initial solution is to compare the received data with the last reading. If it is within a radius of 20 pixels it will be considered to be the same spot as the previous fixation.

if (isSmoothGazeDataOn)
// If new gaze X point is outside plus/minus the smooth-radius, set new gaze pos.
if (newGazeX > this.gazePositionX + this.smoothRadius ||
newGazeX < this.gazePositionX - this.smoothRadius)

this.gazePositionX = newGazeX;

// If new gaze Y point is outside plus/minus the smooth-radius, set new gaze pos.
if (newGazeY > this.gazePositionY + this.smoothRadius ||
newGazeY < this.gazePositionY - this.smoothRadius)

this.gazePositionY = newGazeY;
else // Gaze position is equal to pure data. No filtering.
this.gazePositionX = newGazeX;
this.GazePositionY = newGazeY;

It is a very simple function for stabilizing the gaze data (somewhat). A solution with a buffers and a function to averaging over more readings might be better but for now this does the trick (the gaze plotter became more stable upon fixating on one specific area)

Day 2 - Plotting the gaze

After successfully hooking up a socket to listen to the UDP stream from the SMI IViewX RED eye tracker the next objective was to draw or plot the gaze on the screen so that I could visually see where I was gazing (!) Or, where the eye tracker would suggest that my gaze was directed.

The UDP stream provided me with the X and Y coordinates of my gaze. Now I needed a Windows program that would enable me to start the client, receive the data and plot this graphically. In order to do this I created a delegate for an event handler. This means that when ever the program received a new gaze position it would fire an event. The main program will in turn register a listener for this event that would call for a function that draws a small box on the screen based on the X and Y coordinates collected.


public delegate void GazeChangedEventHandler(object source, EventArgs e);
public GazeChangedEventHandler onGazeChange;
In addition to this I decided to create an object "GazeData" that would carry a function to extract the X and Y position from the datastream and set this as two integers. These were to be named GazePositionX and GazePositionY.

So, in the main loop of the program where the raw data string was previously just printed to the console I instead passed it on to a function.


datareceived = System.Text.Encoding.ASCII.GetString(received);

if (datareceived.Length > 0)
And then the function itself, after parsing and setting the X and Y the function fires the event "OnGazeChange"

 public void extractGazePosition(string data)

if (onGazeChange != null)
onGazeChange(this, EventArgs.Empty);
The GazeData object contains the function for extracting the X and Y and property set/getters
   public void extractTrackerData(string dataStr)
char[] seperator = { ' ' };
string[] tmp = dataStr.Split(seperator, 10);
this.TimeStamp = Convert.ToInt32(tmp[1]);
this.gazePositionX = Convert.ToInt32(tmp[2]);
this.gazePositionY = Convert.ToInt32(tmp[4]);
The main windows application would create a listener for the OnGazeChange event:
 myServer.onGazeChange +=
new EyeTrackerServer.GazeChangedEventHandler(onGazeChange);
And when the server would receive a new gaze reading it would signal to the eventHandler to fire this function that draws a rectangle on the screen

 public void onGazeChange(object source, EventArgs e)

public void PlotGazePosition(int x, int y)
Graphics G = this.CreateGraphics();
Pen myPen = new Pen(Color.Red);
Random rnd = new Random((int)DateTime.Now.Ticks);

// Little bit of random colors just for fun...
myPen.Color = Color.FromArgb(
(int)rnd.Next(0, 255),
(int)rnd.Next(0, 255),
(int)rnd.Next(0, 200));

// The shape of the rectangles are slightly random too, just to make it artistic..
G.DrawRectangle(myPen, x, y, (int)rnd.Next(5,25), (int)rnd.Next(5,25));
Happily gazing at my own gaze pattern and trying to draw with it on the screen it was just about time to wrap up day two. So far, so good. However, I did notice that the gaze data was full of jitter. When ever I would fixate on one specific point the plotter would jump around in an area about the size of a.. coca-cola cap. Was this part of the tracker or normal eye behavior. In general, our eyes are never still. When we fixate on objects small eye movements called microsaccades takes place. Supposedly (there is a debate) this is how we are able to keep items in focus. If the eye would be completely fixated the image would slowly fade away (reminds me of some plasma tv screens that do the same so that the image would not "burn" into the panel)

Day One - Getting the gaze position

The SMI IViewX RED eye tracker steams the gaze coordinates on a UPD stream configured on port 4444. The following code is how I managed to hook the C# client up to access the stream and simply print it to the console window. This is the initial version that concluded the first day. I got the gaze coordinates from the tracker into my C# code. Good enough.

First output from the UPD Eye Tracker client/server program

using System;
using System.Collections.Generic;
using System.Text;
using System.Net;
using System.Net.Sockets;
using System.Threading;

namespace GazeStar

class EyeTrackerServer
private const int udpPort = 4444;
public Thread UDPThread;

public EyeTrackerServer()
UDPThread = new Thread(new ThreadStart(StartListen));
Console.WriteLine("Started thread...");
catch (Exception e)
Console.WriteLine("An UDP error occured.." + e.ToString());

public void StartListen()
IPHostEntry localHost;


Socket soUdp = new Socket(
SocketType.Dgram, ProtocolType.Udp);

Byte[] localIp = { 127, 0, 0, 1 };
IPAddress localHostIp = new IPAddress(localIp);
localHost = Dns.GetHostEntry(localHostIp);

catch (Exception e)
Console.WriteLine("Localhost not found!" + e.ToString());

IPEndPoint localIpEndPoint = new IPEndPoint(

String datareceived = "";

while ( true )
Byte[] received = new byte[256];
IPEndPoint tmpIpEndPoint = new IPEndPoint(localHost.AddressList[0],

EndPoint remoteEp = (tmpIpEndPoint);
int bytesReceived = soUdp.ReceiveFrom(received, ref remoteEp);

datareceived = System.Text.Encoding.ASCII.GetString(received);
Console.WriteLine("Sample client is connected via UDP!");

if (datareceived.Length > 0)
catch (SocketException se)
Console.WriteLine("A socket error has occured: " + se.ToString());

static void Main(string[] args)
EyeTrackerServer eyesrv = new EyeTrackerServer();


Day One - The Eye Tracker

The choosen platform to develop the software on is Microsoft Visual Studio 2008 using .NET 3.5 and the C# (C-Sharp) programming language. Not that I´m very experienced with it (just one course) but it is similar to Sun Microsystems Java language. Besides the development environment is really nice and there´s a large amount of online resources available. Since the box is running XP all ready there is absolutely no reason to mess with it (personally I run MacPro/Os X but that´s another story)

The SMI IView RED eye tracker comes with the IView software where you can calibrate the system against points on the screen as well as other configuration aspects. After turning the tracker on and launching the calibration process I could see that the tracker is working.

Screenshot of the SMI IView program.

The calibration dots to the left usually are in full screen. To the left you can see how the eye tracker measures the reflection of the IR-lights and combines this with the location of the pupil to detect and measure eye movements. This is usually referred to as corneal reflection. The infra red light shined in my face is out of the spectrum that I can perceive. More information on eye tracking.

Clearly the the computer some how receives the data since it´s drawing circles on the screen indicating where my gaze is directed. How do I get hold of this data?

Upon an external inspection I find one firewire cable going from the tracker to the computer and two cables from the image processing box to the tracker. Seems that I must read from the firewire port. Time to Google that.

Turns out that there is an Universal Eye Tracking Driver which has been developed by Oleg Spakov at the University of Tampere, Finland. Should be a good solution so that I could easily move the application to any other supported system, including those from Swedish firm Tobii Technology. After downloading and installing the driver (which comes with source code, great!) I compiled the test application in Visual Studio to try it out. When trying to choose which tracker and what port I was using it turned out that there was no support for firewire. Seems like the previous version of IView was using USB. After some correspondence with Oleg and some tries to work around the issue it was time to stop banging my head against and RTFM like one should.

Never was much for manuals in the first place. Especially when they are 400 pages thick filled with tables of ASCII codes and references to other codes or pages. Suppose it is very much to the point, if you are a German engineer that is.. Ok. Found it. The tracker data can be sent via ethernet if IView is configured to do so. Said and done, configured IView to stream data by the UDP protocol on port 4444.

Had decided not to leave until I had the data. How do I open a datagram socket in C#? A quick search on Google should solve it. Found a pice of code that seemed to work, using a thread to open a socket to the designated port and the just read what ever data that came along. If I could print the data to the console window it.. would just be an awesome end to day one.

See next post for the C# solution..

Day One - Introduction

Today I met up with Kenneth Holmqvist who is the laboratory director of the HumLab at Lund Universtity. Kenneth, who have a long experience in the field, held a course last semester in Eye Tracking Methodology in which I participated as a part of my Masters in Cognitive Science at Lund University

The HumLab, or Humanities Laboratory, is located in the new language and litterature center which was build just a few years ago. The facilities certainly are top-notch. Modern Scandinavian design, high quality materials and have a high technical standard (wireless internet access, access control, perfect climate and air)

The laboratory matches this standard by providing Lund University with advanced technical solutions and expertise. A range of studies takes place here. A perfect home for someone into cognitive sciences including psychology, linguistics and why not Human-Computer Interaction.

My background lies in software development which I previously studied at the Department of Informatics, where I completed a BA in Software Design/Construction. My interest in Cognitive Science and Human Computer Interaction was developed during an EAP exchange to University of California, San Diego in 2006-2007. The blend of Cognition and Neuroscience, understanding of the bits and bolts that enables our perception and behavior combined with novel interface technology and interaction methods is a extremely interesting field. Many thanks to the cog.sci. faculty at UCSD for inspiring classes (Hollan, Boyle, Sereno, Chiba)

Kenneth have pratical experience with a range of eye trackers, they do come in many shapes. (head mounted, high speed, remote) all of which are present in the HumLab. He demonstrated a brand new SMI IView RED remote system connected to a powerful Windows XP machine. This is the setup that I will develop a Gaze Interaction Interface on.

Day one was far from over, lets get started in another post..