Using affordable eye tracking methods to study reading: the role of sampling rate

Talk presented at the 22nd European Conference on Eye Movements, Maynooth University

Bernhard Angele¹, Zeynep Gunes Ozkan², Marina Serrano-Carot¹, and Jon Andoni Duñabeitia¹

1. Universidad Nebrija; 2. University of Valencia

Introduction

Presentation available at: https://bangele.quarto.pub/ecem-2024-sampling-rate. Scan QR code for link.

Eye movements are a window into the reading process

Recording eye movements has led to a wealth of findings about the cognitive processes involved in reading
For example, we know that skilled readers pre-process upcoming words extensively (e.g. McConkie & Rayner, 1975) and that readers can extract orthographic, phonological, and semantic information from upcoming words (e.g. Schotter et al., 2012)

Eye movements are a window into the reading process

Models of eye-movement control in reading have been developed that provide theoretical accounts for many of these findings (Bicknell & Levy, 2010; Engbert et al., 2002; e.g. Reichle et al., 1998; Snell et al., 2018)
However, the eye-movement literature is dominated by studies from a small number of countries, investigating reading in a limited number of languages

Illustrating the eye-movement gap

Data from Scopus: Searching for “eye” and “track(er/ing)” or “movement(s)” and “reading” in the title, abstract, and keywords
Publications from 1974 to 2024
Counting country affiliations by author (publications can have multiple authors and author affiliations)
Excluding affiliations with missing country names
Details: Angele & Duñabeitia (2024)

Publications 1974-2024

Publications 1974-2024 (Cartogram)

Evolution over time: 1974 – 2000

Cartogram: 1974 – 2000

Evolution over time: 2001 – 2010

Cartogram: 2001 – 2010

Evolution over time: 2011 – 2020

Cartogram: 2011 – 2020

Evolution over time: 2021 – 2024

Cartogram: 2021 – 2024

Summary: The eye-movement gap

For the last 50 years, the literature on eye movements in reading has been dominated by a handful of countries
This is still true in the last three years, with one major change: China has become a major center of reading research
There are also a few more European countries with a significant number of publications
For the vast majority of other countries, the situation now is the same as it was in the 1970s, 80s, and 90s

What are we missing?

General issue of WEIRD research (Henrich et al., 2010): Western participants may not be representative of all readers or even the majority of readers
English, German, French, Spanish, Italian etc. are similar languages in many respects and all share the same writing system
Studying just one more language (Chinese) has forced us to think about issues such as word segmentation, processing of character components, the relationship between semantic and orthographic processing, and many more.

What are we missing?

We can also appreciate many similarities between reading in English (and other alphabetic languages) and Chinese
What other aspects of reading do we take for granted?
How can we identify universals if we only study a handful of languages?

Why is there so little eye-movement research except in a few countries?

Eye-trackers are expensive
Modern infra-red based eye-trackers are very complex devices
Most expensive component: High-speed, high-resolution cameras
Sampling rate is a key bottleneck
Can we study reading at lower sampling rates?

Our approach

We take a practical approach: Which is the lowest sampling rate that allows us to find evidence of cognitive processing?
We need a benchmark effect – a phenomenon that is well-studied and whose existence (and effect size) is clear
The word frequency effect on fixation duration is ideal for this

Method

32 participants read 400 sentences in Spanish
Eye movements are recorded by an SR Research Eyelink Portable Duo
Four sampling rates 250 Hz, 500 Hz, 1000 Hz, and 2000 Hz (100 sentences each)
Frequency manipulation: each sentence has a target word that was manipulated to be either
- high frequency (mean frequency 47/million)
- low frequency (mean frequency 2/million)
The context up to the target word was identical for both versions of the sentence.
- Context after the target word was allowed to vary.

Results: Example trial 1000 Hz

At 1000 Hz, this trial has 6393 samples.

Results: Example trial 1000 Hz

Crop out end of trial

At 1000 Hz, this trial has 6393 samples.

Example trial 1000 Hz

With Eyelink detected fixations

At 1000 Hz, this trial has 6393 samples.

Example trial 500 Hz

With Eyelink fixations

At 500 Hz, this trial has 2657 samples.

Example trial 250 Hz

With Eyelink fixations

At 250 Hz, this trial has 1029 samples.

Results: Example trial 2000 Hz

With Eyelink fixations

At 2000 Hz, this trial has 11416 samples.

Results: Means

250 and 500 Hz
	First fixation duration		Gaze duration
	Mean	SD	Mean	SD
250 Hz
high frequency	242	85	332	169
low frequency	245	85	373	209
Effect	3		40
500 Hz
high frequency	238	84	329	182
low frequency	246	92	366	229
Effect	8		36

1000 and 2000 Hz
	First fixation duration		Gaze duration
	Mean	SD	Mean	SD
1000 Hz
high frequency	234	83	319	171
low frequency	244	85	365	222
Effect	10		45
2000 Hz
high frequency	230	80	313	165
low frequency	236	85	344	197
Effect	7		31

Evidence for frequency effect in FFD on the target word

Evidence for frequency effect in GD on the target word

Simulating even lower sampling rates

We can drop samples to simulate a lower sampling rate
For example, if we drop 15 out of every 16 samples from a 2000 Hz trial, we get a simulated 125 Hz trial
For example, remember this trial at 2000 Hz has 11416 samples:

Removing samples

If we drop out 15 out of every 16 samples, we get the equivalent of a 125 Hz trial (714 out of 11416 samples):

Removing samples

Detecting fixations in simulated 125 Hz data:

Removing even more samples

We can remove 39 out of every 40 samples to get the equivalent of a 50 Hz trial (286 out of 11416 samples):

Removing even more samples

We can remove 63 out of every 64 samples to get the equivalent of a 31.25 Hz trial (179 out of 11416 samples):

Simulations: 100 trials per subject

Simulated low sampling rates
	First fixation duration		Gaze duration
	Mean	SD	Mean	SD
31.25 Hz
high frequency	214	101	269	143
low frequency	222	107	307	189
Effect	8		39
50 Hz
high frequency	221	88	291	151
low frequency	228	92	333	200
Effect	7		42
125 Hz
high frequency	233	81	314	158
low frequency	237	83	356	213
Effect	4		42

Simulated low sampling rates (FFD)

100 trials per subject

Simulated low sampling rates (GD)

100 trials per subject

Simulations: 400 trials per subject

Simulated low sampling rates
	First fixation duration		Gaze duration
	Mean	SD	Mean	SD
31.25 Hz
high frequency	214	99	274	154
low frequency	225	108	313	193
Effect	12		39
50 Hz
high frequency	223	89	299	162
low frequency	233	96	337	205
Effect	9		38
125 Hz
high frequency	234	84	322	170
low frequency	241	88	363	219
Effect	7		41

Simulated low sampling rates (FFD)

400 trials per subject

Simulated low sampling rates (GD)

400 trials per subject

Discussion

The frequency effect is very robust, especially in GD and TVT
- Apparently, not even 30 Hz is too low to detect word frequency effects!
The effect is more sensitive to sampling rate in FFD
- Aggregate measures such as GD and TVT seem to be less affected by random noise
This is encouraging for researchers who cannot afford a 1000 Hz eye-tracker!

Discussion

Simulating low sampling rates from data collected using a very accurate eye-tracker is not the same as actually using an affordable eye tracker.
But we have shown that sampling rate is not a hard bottleneck for studying reading.
If you are not sure about whether your eye-tracker is good enough to study reading, maybe just do a pilot study looking for the frequency effect in GD
- Or wait for us to do it!

Recommendations

You can compensate for low sampling rates by increasing sample size
In our case, 1,600 observations per condition were enough to detect the frequency effect in GD (about 30 ms) at all sampling rates (similar to Brysbaert & Stevens (2018))
To consistently detect the effect in FFD (<10 ms) at lower sampling rates, we needed 6,400 observations per condition

Recommendations

In summary: If you want to use affordable eye-trackers to study reading:
- target effects that are large enough
- choose aggregate measures (GD, TVT, go-past time, etc.)
- plan to use a large sample
- consider running a pilot study with a known effect (word frequency) and do a power analysis based on the results

Thank you

Presentation available at: https://bangele.quarto.pub/ecem-2024-sampling-rate or scan QR code

Example trial 1000 Hz

With fixations according to the Engbert & Kliegl (2003) algorithm

Example trial 1000 Hz

With Eyelink and Engbert & Kliegl (2004) fixations plotted on top of each other

Why investigate this?

Most eye tracking labs have eye trackers that can track at 1000 Hz or even 2000 Hz.
- Why think about lower sampling rates?
Eye trackers that can track at 1000 Hz are very expensive.
- Sampling rate is a “hard” bottleneck
  - Cameras that can record 1000 frames per second are quite rare
  - We can perhaps improve low-quality images using techniques such as machine learning, but we cannot create extra samples
The cost of high sampling rate eye trackers limits the study of reading (and other processes)
- Outside of the lab
- With more diverse populations
- With new paradigms (e.g. multi-person)

Removing samples

Comparisons between samples detected in 125 Hz and samples detected in the original 2000 Hz.

Simulating low sampling rates

Is dropping samples an appropriate way to simulate low sampling rates?
- Maybe averaging over samples instead of dropping them is more appropriate
Let’s compare!

Simulating low sampling rates: dropping samples

Simulating low sampling rates: averaging samples

Average algorithm: FFD

100 trials/subject

Average algorithm: GD

100 trials/subject

Average algorithm: FFD

400 trials/subject

Average algorithm: GD

400 trials/subject

References

Angele, B., & Duñabeitia, J. A. (2024). Closing the eye-tracking gap in reading research. Frontiers in Psychology, 15. https://doi.org/10.3389/fpsyg.2024.1425219

Bicknell, K., & Levy, R. (2010). A rational model of eye movement control in reading. 11681178. http://dl.acm.org/citation.cfm?id=1858800

Brysbaert, M., & Stevens, M. (2018). Power Analysis and Effect Size in Mixed Effects Models: A Tutorial. Journal of Cognition, 1(1). https://doi.org/10.5334/joc.10

Engbert, R., Longtin, A., & Kliegl, R. (2002). A dynamical model of saccade generation in reading based on spatially distributed lexical processing. Vision Research, 42(5), 621636. https://doi.org/10.1016/S0042-6989(01)00301-7

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). Most people are not WEIRD. Nature, 466(7302), 29–29. https://doi.org/10.1038/466029a

McConkie, G. W., & Rayner, K. (1975). The span of the effective stimulus during a fixation in reading. Perception & Psychophysics, 17, 578587.

Reichle, E. D., Pollatsek, A., Fisher, D. L., & Rayner, K. (1998). Toward a model of eye movement control in reading. Psychological Review, 105(1), 125–157. https://doi.org/10.1037/0033-295X.105.1.125

Schotter, E. R., Angele, B., & Rayner, K. (2012). Parafoveal processing in reading. Attention, Perception, & Psychophysics, 74(1), 5–35. https://doi.org/10.3758/s13414-011-0219-2

Snell, J., Van Leipsig, S., Grainger, J., & Meeter, M. (2018). OB1-reader: A model of word recognition and eye movements in text reading. Psychological Review, 125(6), 969–984. https://doi.org/10.1037/rev0000119