To download a poster, right-click the link and select "Save Target As ..." or "Save Link As ..."
See copyright notice
May, K.A. & Georgeson, M.A. (in press). An optimal estimator for edge contrast explains perceived contrast of sine wave gratings.
Perception. (Poster presented at AVA Christmas Meeting 2016)
Download poster (1.48 MB)
May, K.A. & Zhaoping, L. (in press). Adaptation of face gender, expression, and head direction from random-noise adaptation images: A surprising prediction of Li and Atick's efficient binocular coding theory.
Perception, 45(S1), ECVP Abstract Supplement. (Talk presented at ECVP 2016)
We present a novel face adaptation paradigm that follows from Li and Atick's (1994) theory of efficient binocular encoding. In this theory, the inputs to the two eyes are combined using separately adaptable binocular summation and differencing channels. We designed a dichoptic test stimulus for which the summation channel sees one face image, and the differencing channel sees a different face image. The pairs of faces seen by the two channels were male/female, happy/sad, or turned slightly to the left/right. The face perceived by the observer depended on the relative sensitivities of the two channels. We manipulated channel sensitivity by selective adaptation with images that were low-pass filtered white noise. In correlated adaptation, each eye received the same adaptation image, which selectively adapted the summation channel; in anticorrelated adaptation, the adaptor contrast was reversed between the eyes, selectively adapting the differencing channel. After adaptation, we presented the dichoptic test stimulus, and found that the observer perceived the summation channel's face image most often after anticorrelated adaptation, and perceived the differencing channel's face image most often after correlated adaptation. For male/female and left/right judgements, perception was generally biased towards the summation channel; for happy/sad judgements, there was little or no bias.
May, K.A. & Zhaoping, L. (2016). Face gender adaptation from random noise adaptors: A surprising prediction of Li and Atick's efficient binocular coding theory.
Journal of Vision, 16(12):1326. (Talk presented at VSS 2016)
We present a novel face adaptation paradigm that follows from Li and Atick's (1994) theory of efficient binocular encoding. In this theory, the inputs to the two eyes are combined using separately adaptable binocular summation and differencing channels. We showed previously that, if a binocular test stimulus is designed so that the summation and differencing channels see opposite directions of motion or tilt, then the perceived direction of motion or tilt can be manipulated by selectively adapting one of the binocular channels, even if the adaptor contains no motion or orientation signal. Here we extend this to face gender adaptation. Our test stimuli were made from male and female facial composite images. In what follows, uppercase letters M and F are the male and female composite images, and lowercase letters are scalar multipliers that control image contrast. On half the trials, the summation and differencing channels receive male image aM and female image bF, respectively, made by inputs (aM+bF)/2 to one eye and (aM–bF)/2 to the other eye. On the other half of the trials, the summation and differencing channels receive female image aF and male image bM, respectively. The probability of perceiving the gender corresponding to the summation channel is influenced not only by the ratio a/b, but also by the contrast sensitivities of the summation and differencing channels. We manipulated channel sensitivities using adaptation stimuli that were low-pass filtered noise. In correlated adaptation, each eye received the same adaptation image, which selectively adapted the summation channel; in anticorrelated adaptation, the adaptor contrast was reversed between the eyes, selectively adapting the differencing channel. Despite being random noise, the adaptors influenced perceived gender: The probability of perceiving the gender corresponding to the summation channel increased after anticorrelated adaptation and decreased after correlated adaptation. The results support Li and Atick's theory.
May, K.A. & Zhaoping, L. (in press). Tilt aftereffect generated by isotropic adaptation stimuli: A counterintuitive prediction of efficient coding theory.
i-Perception (Talk presented at SVG 2015)
Li and Atick (1994) presented a theory of efficient binocular encoding in which the two eyes' signals are combined using separately adaptable binocular summation and differencing channels. We designed a dichoptic test stimulus for which the summation channel sees a grating tilted in one direction (clockwise or anticlockwise of horizontal), and the differencing channel sees a grating tilted in the opposite direction. The observer's perceived direction of tilt (summation or difference direction) should depend on the relative sensitivities of the two channels. We manipulated channel sensitivity using adaptation. In correlated adaptation, each eye received the same image, which selectively adapted the summation channel; in anticorrelated adaptation, each eye received the photographic negative of the other eye's image, which selectively adapted the differencing channel. These adaptation stimuli had equal energy at all orientations. Despite being isotropic, the adaptors influenced perceived tilt: The test stimulus usually appeared tilted in the difference direction after correlated adaptation, and usually appeared tilted in the summation direction after anticorrelated adaptation. This counterintuitive finding of a tilt aftereffect from isotropic adaptors is exactly analogous to May, Zhaoping and Hibbard's (2012) finding of a motion aftereffect from static adaptors. These two results strongly support Li and Atick's theory.
May, K.A. & Zhaoping, L. (2016). Perceived direction of tilt determined by adaptation to unoriented or untilted binocular stimuli: Surprising predictions of efficient coding theory.
Perception, 45, 694. (Talk presented at AVA Christmas Meeting 2015)
Li and Atick (1994, Network: Computation in Neural Systems, 5, 157–174) presented a theory of efficient binocular integration in which the two eyes' signals are combined using separately adaptable binocular summation and differencing channels. We designed a dichoptic test stimulus for which the summation channel sees a grating tilted in one direction (clockwise or anticlockwise of vertical or horizontal), and the differencing channel sees a grating tilted in the opposite direction. The observer's perceived direction of tilt (summation or difference direction) should depend on the relative sensitivities of the two channels. We manipulated channel sensitivity using adaptation. In correlated adaptation, each eye received the same adaptation image, which selectively adapted the summation channel; in anticorrelated adaptation, the adaptor contrast was reversed between the eyes, selectively adapting the differencing channel. In Experiment 1, the adaptation stimuli were unoriented noise; in Experiment 2, the adaptation stimuli were oriented but untilted noise. Despite being unoriented or untilted, the adaptors influenced perceived tilt: The test stimulus usually appeared tilted in the difference direction after correlated adaptation, and tilted in the summation direction after anticorrelated adaptation. We found similar effects when the noise pattern was added to the test stimulus instead of preceding it; this may be caused by very fast adaptation. Our counterintuitive finding of a tilt aftereffect from unoriented or untilted adaptors is formally equivalent to our previous finding of a motion aftereffect from static adaptors (May, Zhaoping, & Hibbard, 2012, Current Biology, 22, 28–32). These results strongly support Li and Atick's theory.
May, K.A. & Zhaoping, L. (2015). Tilt aftereffect generated by isotropic adaptation stimuli: A counterintuitive prediction of Li and Atick's efficient binocular coding theory.
Perception, 44(S1), ECVP Abstract Supplement, 184. (Talk presented at ECVP 2015)
Li and Atick (1994) presented a theory of efficient binocular encoding in which the two eyes' signals are combined using separately adaptable binocular summation and differencing channels. We designed a dichoptic test stimulus for which the summation channel sees a grating tilted in one direction (clockwise or anticlockwise of horizontal), and the differencing channel sees a grating tilted in the opposite direction. The observer's perceived direction of tilt (summation or difference direction) should depend on the relative sensitivities of the two channels. We manipulated channel sensitivity using adaptation. In correlated adaptation, each eye received the same image, which selectively adapted the summation channel; in anticorrelated adaptation, each eye received the photographic negative of the other eye's image, which selectively adapted the differencing channel. These adaptation stimuli had equal energy at all orientations. Despite being isotropic, the adaptors influenced perceived tilt: The test stimulus usually appeared tilted in the difference direction after correlated adaptation, and usually appeared tilted in the summation direction after anticorrelated adaptation. This counterintuitive finding of a tilt aftereffect from isotropic adaptors is analogous to May, Zhaoping and Hibbard's (2012) finding of a motion aftereffect from static adaptors. These two results strongly support Li and Atick's theory.
May, K.A. & Zhaoping, L. (2015). Tilt aftereffect from untilted adaptators and motion aftereffect from static adaptors: Counterintuitive predictions of Li and Atick's efficient binocular coding theory.
(Poster presented at Second ViiHM Workshop, July 2015, Bath, UK)
Download poster (3.36 MB)
Li and Atick (1994) presented a theory of efficient binocular encoding which proposes separately adaptable binocular summation and differencing channels. We designed dichoptic stimuli for which the summation and differencing channels see gratings tilted in opposite directions. These stimuli are analogous to those in an earlier study, for which the summation and differencing channels see gratings moving in opposite directions (May, Zhaoping & Hibbard, 2012). The theory predicts that, for these stimuli, perceived direction of tilt or motion depends on the relative sensitivities of the two binocular channels. We manipulated channel sensitivity using adaptation. In correlated adaptation, each eye received the same image, selectively adapting the summation channel; in anticorrelated adaptation, each eye received the photographic negative of the other eye's image, selectively adapting the differencing channel. Observers generally perceived tilt or motion corresponding to the unadapted channel's signal. These aftereffects should not depend on adapting orientation- or motion-selective mechanisms; to confirm this, we generated tilt aftereffects from isotropic adaptators, and motion aftereffects from static adaptors. These counterintuitive findings strongly support Li and Atick's theory, and contrast with all other reported aftereffects, which require the adaptor to have a strong signal in the perceptual dimension in which the aftereffect occurs.
May, K.A. (2014). Understanding visual coding: Insights from psychophysics and computational theory. (Keynote talk given at the 2014 Applied Vision Association Christmas meeting on receipt of the David Marr Medal)
My research seeks to answer two basic questions: What are the mechanisms of human/biological vision, and why are they like that? One approach to answering these questions, pioneered by David Marr, is to identify the information-processing problems that the visual system is trying to solve. We can then derive algorithms for solving these problems, and suggest physiological implementations. These algorithms and proposed implementations amount to models of vision that can be tested empirically. If we can show that the visual system uses a near-optimal algorithm, then we have an answer to the question of why it is like that. I illustrate this process with examples from two areas of research: binocular integration and edge processing. In the first case, a theory of optimal binocular integration makes the counterintuitive prediction that it should be possible to generate motion aftereffects with static adaptors, a prediction that we confirmed psychophysically. In the second case, a theoretically optimal edge detection algorithm predicts human performance on a blur detection task with remarkable accuracy on a trial-by-trial basis. This approach differs from traditional ideal-observer theory, which assumes that the observer performs the psychophysical task optimally (apart from biological constraints, such as internal noise and sampling inefficiency): We instead assume that the observer performs an artificial laboratory task by recruiting mechanisms that are optimized for a related real-world task. The performance of such an observer is often far from optimal on laboratory tasks.
May, K.A. (2014). Edge detection and contour integration in human and machine vision.
(Poster presented at First ViiHM Workshop, September 2014, Stratford-Upon-Avon, UK)
Download poster (1.53 MB)
In this poster I review my work on how human vision detects edges and integrates them across the visual field. With Mark Georgeson and William McIlhagga, I ran experiments on perception of edge blur and contrast. This work led to scale-space models of edge detection in human vision similar to those developed for machine vision by Tony Lindeberg and others. These models represent the image using a multidimensional representation that includes dimensions of orientation and scale (i.e. blur) as well as the 2 dimensions of the image. With Robert Hess, I used a similar representation to develop a model of contour integration that groups separated edge elements into coherent contours. Currently, with Li Zhaoping, I am using two-tone images to study contour completion in natural scenes. We threshold natural greyscale images into two grey levels, which causes contours to break. We use a Bayesian decoding model to understand how the visual system completes these contours to recognize the image.
May, K.A. & Solomon, J.A. (2014). Formal relationships between neuronal response properties and psychophysical performance.
i-Perception, 5(5):56. (Talk presented at SVG 2014)
One of the major goals of sensory neuroscience is to understand the relationships between physiology and behaviour. To this end, we derived equations that predict what an observer's psychophysical performance should be from the properties of the neurons carrying the sensory code. Our model neurons were characterized by their tuning function to the stimulus, and the random process that generates spikes. We used a generalized Poisson spiking process that can generate any Fano factor (ratio of variance to mean) ≥ 1. The tuning function was either a sigmoid (Naka-Rushton) function or a Gaussian function. We predicted psychophysical performance by calculating Fisher information, which is approximately equal to the maximum achievable decoding precision. For a population of neurons with identically shaped tuning functions, distributed with a constant density across a log stimulus axis, the Fisher information is given by a remarkably simple expression. In this case, its value is independent of the stimulus value, and this gives rise to Weber's Law. We also allowed certain neuronal parameter values to increase with stimulus value, which gave a near-miss to Weber's law, as often found in contrast discrimination. Our work has two major benefits. Firstly, we can quickly work out the performance of physiologically plausible population coding models by evaluating simple equations, rather than using slow and laborious Monte Carlo simulations. Secondly, the equations themselves give deep insights into the relationships between physiology and psychophysical performance.
May, K.A. & Solomon, J.A. (2013). Reconciling multiplicative physiological noise and additive psychophysical noise.
Perception, 42, ECVP Abstract Supplement, 158. (Talk presented at ECVP 2013 symposium, Visual Noise: New Insights)
In many psychophysical models of contrast discrimination, the contrast signal undergoes nonlinear transduction, and corruption with additive (i.e., stimulus-invariant) Gaussian noise. But physiological noise is often found to be multiplicative (variance proportional to response). A simple Bayesian decoding model of spiking neurons accommodates both findings, showing Poisson-based multiplicative noise at the physiological level, but additive Gaussian noise at the psychophysical level. If the model neurons' contrast-response functions are evenly spaced along the log-contrast axis, the decoded log-contrast has a stimulus-invariant, approximately Gaussian, distribution. At the psychophysical level, this model is equivalent to a log transducer with stimulus-invariant Gaussian noise. A slight manipulation of the neurons' pattern of spacing along the contrast axis makes the model behave much like a Legge-Foley transducer with stimulus-invariant noise. But is the noise on the model's internal signal really stimulus-invariant? It depends on the (arbitrary) choice of units in which we express the model's decoded contrast. We suggest that the transducer in some psychophysical models is just a transform of the stimulus contrast that allows us to express the internal signal in units such that the noise is stimulus-invariant. In this case, the argument that the noise is stimulus-invariant at the psychophysical level is circular.
May, K.A. & Solomon, J.A. (2012). Weibull β for contrast detection is the Naka-Rushton exponent.
Perception, 41, 1523. (Talk presented at AVA Christmas meeting 2012)
One of the goals of neuroscience is to understand the relationship between physiology and behaviour. Here we report a previously unknown connection between the neural contrast-response function and the psychometric function for 2-alternative forced choice (2AFC) contrast detection. The contrast-response function describes mean neural spike rate, r, as a function of stimulus contrast, c. It is usually fitted with a Naka-Rushton function, given by r = rmaxcq/(c50q+cq) + r0. The psychometric function for 2AFC contrast detection describes the proportion correct, P, as a function of target contrast, c. It is often fitted with a Weibull function, given by P = (1−λ) − (0.5−λ)exp[−(c/α)β]. We show that optimal decoding of a population of neurons with Naka-Rushton contrast-response functions, and spike distributions generated by a family of Poisson-based random processes, results in a psychometric function that closely approximates a Weibull function with β equal to the Naka-Rushton exponent, q. We proved this result analytically under restrictive assumptions, and used Monte Carlo simulations to show that it still holds to a close approximation when these assumptions are relaxed. Our finding provides a remarkably straightforward interpretation of Weibull β for 2AFC detection, and explains the close match between β and q obtained from fits to empirical data (both are about 3 on average). Unlike previous researchers who have used the Weibull function for mathematical convenience without strong theoretical justification, we argue that the Weibull function is theoretically the most appropriate mathematical form for the psychometric function for 2AFC contrast detection.
May, K.A., Zhaoping, L. & Hibbard, P.B. (2012). Motion adaptation from static binocular images: A surprising prediction of efficient coding theory.
(Poster presented at Sensory Coding & Naural Environment 2012, IST Austria)
Download poster (12.8 MB)
Efficient coding theory has shown that many different properties of sensory neural circuits reflect optimal coding strategies, given the statistical structure of the sensory input to which an organism is exposed in its natural environment1-7. However, to date, this approach has only been used to explain previously known findings (except Ref. 8, which was a precursor to the current study). Here we used Li and Atick’s theory of efficient stereo coding8-10 to derive a novel and surprising prediction, which we have confirmed psychophysically11. In natural vision, the two eyes’ signals are often highly correlated. In Li and Atick’s theory, this redundant, inefficient, representation is transformed into uncorrelated binocular summation and difference signals, and gain control is applied to the summation and differencing channels to optimize their sensitivities; the optimal gain on each channel depends on the interocular correlation, as specified by the theory. In natural vision, the interocular correlation will change from moment to moment, so the gains on the binocular summation and differencing channels need to be dynamically updated to maintain optimal sensitivity. These channels should therefore be separately adaptable, whereby a channel’s sensitivity is reduced following overexposure to adaptation stimuli that selectively stimulate that channel. This predicts a remarkable effect of binocular adaptation on perceived direction of a dichoptic motion stimulus12. For this stimulus, the summation and difference signals move in opposite directions, so perceived motion direction (upward or downward) should depend on which of the two binocular channels is most sensitive after adaptation, even if the adaptation stimuli are completely static. We confirmed this prediction: a single static dichoptic adaptation stimulus presented for less than 1 s can control perceived direction of a subsequently presented dichoptic motion stimulus. This is not predicted by any current model of motion perception, and suggests that the visual cortex quickly adapts to the prevailing binocular image statistics to maximize information coding efficiency.
1. Laughlin,S. (1981). Zeitung für Naturforschung C, 36, 910-912.
2. Tadmor,Y. & Tolhurst,D.J. (2000). Vision Res., 40, 3145-3157.
3. Srinivasan,M.V., Laughlin,S.B. & Dubs,A. (1982). Proceedings of the Royal Society of London B, 216, 427-459.
4. Atick,J.J. & Redlich,A.N. (1990). Neural Comput., 2, 308-320.
5. Atick,J.J. & Redlich,A.N. (1992). Neural Comput., 4, 196-210.
6. Atick,J.J. (1992). Network, 3, 213-251.
7. Atick,J.J., Li,Z. & Redlich,A.N. (1993). Vision Res., 33, 123-129.
8. Chen,D. & Li,Z. (1998). In Theoretical Aspects of Neural Computation: A Multidisciplinary Perspective. Wong,K.-Y.M., King,I. & Yeung,D.-Y. (eds.), pp. 225-235 (Springer-Verlag, New York).
9. Li,Z. & Atick,J.J. (1994). Network, 5, 157-174.
10. Li,Z. (1995). In The Neurobiology of Computation: The Proceedings of the Third Annual Computation and Neural Systems Conference. Bower,J. (ed.), pp. 397-402 (Kluwer, Boston, MA).
11. May,K.A., Zhaoping,L. & Hibbard,P.B. (2012). Current Biology, 22, 28-32.
12. Shadlen,M. & Carney,T. (1986). Science, 232, 95-97.
May, K.A. & Zhaoping, L. (2011). Contrast-response functions, Fisher information, and contrast decoding performance.
Journal of Vision, 11(11):1172. (Poster presented at VSS 2011)
Download poster (1.67 MB)
Using maximum a posteriori decoding of stimulus contrast, Clatworthy, Chirimuuta, Lauritzen and Tolhurst (2003, Vision Research, 43, 1983–2001) discovered many characteristics of the relationship between the contrast-response function [described by the Naka-Rushton function, r = rmaxcq/(c50q + cq)] and contrast identification accuracy. Their decoding method is optimal, but laborious to implement, and gives little insight into why these characteristics arise, or how general they are. If the spike count is not too low, the Fisher information provides a good analytical approximation of optimal decoding accuracy. We show how to calculate the Fisher information for Clatworthy et al.'s doubly stochastic Poisson process, and derive equations that explain many of their observations regarding single neurons, e.g. that accuracy peaks "slightly below the neuron's c50" (Clatworthy et al., p. 1989) – we show that it peaks at the contrast for which mean response is rmax/3. Fisher information provides a closer estimate of optimal decoding accuracy for neural populations than for single neurons because of the higher total spike count. We show that, for a population of N ≥ 8 Poisson-spiking neurons with q = 2 and c50 values evenly distributed across the log contrast range, log10c ∈ [−3, 0.1], Fisher information very closely approximates optimal contrast decoding accuracy when N×rmax is greater than about 100 spikes; in these conditions, decoding accuracy is very close to being proportional to N×rmax. We also investigate the effect of supersaturation, whereby the contrast-response function peaks and then declines with increasing contrast. Contrary to the proposal that supersaturating neurons provide a suboptimal contrast code [Peirce, 2007, Journal of Vision, 7(6):13, 1–10], we show that supersaturation improves contrast decoding accuracy for neural populations, while also reducing metabolic costs.
Zeiner, K.M., Spitschan, M., May, K.A., Zhaoping, L. & Harris, J.M. (2011). Eye movements and reaction times for detecting monocular regions in binocularly viewed scenes.
Journal of Vision, 11(11):326. (Poster presented at VSS 2011)
Our binocular view of the world is scattered with monocular regions, that only one eye can view. These occur at each depth
edge. In previous research, we found that monocular target items are detected faster than binocular targets in a stimulus
filled with binocular distractors. Here we explore whether monocular targets also direct eye movements whilst observers perform
a visual search task.
Participants performed a classic search task to detect a target C amidst 254 distractor O's, in one of 3 conditions: 1) monocular
target, all distractors binocular, 2) one monocular distractor, all other distractors and target binocular, 3) target and
distractors binocular. Stimuli were presented using a modified Wheatstone Stereoscope. Reaction times for target detection
were measured and eye movements were recorded using an infrared eye tracker. Stimulus onset was contingent upon central fixation.
The target was always located towards the left or right side of the stimulus. We measured whether the first saccade was towards
the half of the stimulus that contained the target.
On average, reaction time followed the pattern previously observed: if the target was monocular, reaction times were shortest,
while if there was one monocular distractor, and the target was binocular, reaction times were longest. In condition 2 we
found that the slower reaction times were mirrored by lower rates of correct eye movements. In condition 1 we observed that
individuals giving faster reaction times showed a higher number of correct saccades, while those with slower reaction times
tended to show fewer correct saccades.
Our results suggest that moving the eyes rapidly towards a monocular region may aid its fast detection. This could help in
the identification of object edges, and in the perception of depth from binocular disparity.
May, K.A. (2011). Bayesian decoding of neural population responses explains many characteristics of contrast detection and discrimination.
Perception, 40, ECVP Abstract Supplement, 51. (Talk presented at ECVP 2011)
Contrast thresholds for detecting a signal added to a pedestal generate a "dipper function", which first falls as the pedestal contrast increases from zero, and then rises to give a "near-miss" to Weber's law. The psychometric functions are well-fitted by Weibull functions: the slope parameter, beta, is about 3 for zero pedestal (detection), and falls to around 1.3 with increasing pedestal. All of this can be explained by Bayesian decoding of a population of neurons with Naka–Rushton contrast-response functions [r=rmax*cq/(c50q+cq)], and a rectangular distribution of semi-saturation contrasts, c50, along the log contrast axis. I derive equations that accurately predict the model's performance and give insights into why it behaves this way. For Poisson-spiking neurons, the model's detection psychometric function is a Weibull function with beta equal to the Naka–Rushton exponent, q, which physiologically often takes a value of about 3; for high pedestals, beta is always about 1.3, regardless of the model parameters. As contrast increases from zero towards the bottom of the c50 range, the threshold dips; within the c50 range, Weber's law holds if rmax is constant across the neural population; a shallower/decreasing slope occurs if rmax is scaled to give a fixed response to 100% contrast.
Dumoulin, S.O., Hess, R.F., Harvey, B.M. & May, K.A. (2010). Measuring contour integration mechanisms using fMRI.
Society for Neuroscience Abstracts, 531.9.
Introduction: A crucial role of our visual system is to detect and segregate objects. Primary visual cortex (V1) extracts local, oriented edges from the visual scene. Object representations could be constructed from these local edges using mechanisms such as contour integration, which integrates information across local edges with similar properties (Field, Hayes, Hess, Vision Research, 1993). Here we measure contour integration properties in visual cortex using fMRI and a new data-analysis method that reconstructs population receptive field (pRF) properties.
Methods: We measured fMRI responses to moving bar apertures that revealed contours. The pRF was modeled by a circularly symmetric Gaussian receptive field in visual space. Convolution of the model pRF with the stimulus sequence predicts the fMRI time-series; the pRF parameters (x,y,σ) are estimated for each voxel by minimizing the sum of squared errors between the predicted and observed fMRI time-series (Dumoulin & Wandell, Neuroimage, 2008). We measured the pRF size as a function of the underlying contour orientation relative to the measurement direction. We also compared relatively straight and curved contours.
Results: PRF sizes vary as a function of the underlying contour orientation relative to the pRF measurement direction. For relatively straight contours, larger pRF sizes are seen in V1 when measuring in the direction of the contours as opposed to other directions. In V2 and V3 a similar result was obtained but for curved contours.
Conclusion: Our results indicate that contour integration mechanisms contribute to the overall pRF size and can be experimentally manipulated. Our results suggest that relatively straight contours are processed in V1 and curved contours in V2/V3.
May, K.A., Zhaoping, L. & Hibbard, P.B. (2010). Binocular integration in human vision adapts to maximize information coding efficiency.
Perception, 39, ECVP Abstract Supplement, 77. (Poster presented at ECVP 2010)
Download poster (16.9 MB)
The two eyes typically receive correlated inputs, from which one can derive two decorrelated channels: binocular summation (S+) and binocular difference (S–). The channel gains (g+, g–) should adapt to optimize the tradeoff between information transmission and energy usage, giving an inverted-U function of signal strength: strong signals are suppressed to conserve energy with little information loss, and weak signals are suppressed to avoid wasting energy transmitting noise (Li and Atick, 1994 Network 5 157-174). The relative strengths of the S+ and S– signals depend on the interocular correlation. We adapted observers to positive correlations (both eyes saw identical natural images, giving stronger S+ than S–), zero correlations (each eye saw a completely different natural image, giving equal-strength S+ and S–) or negative correlations (each eye saw the photonegative of the other eye’s image, giving weaker S+ than S–). We assessed the gain ratio g+/g– from cyclopean motion direction judgments for a dichoptic display in which the S+ signal contained motion in the opposite direction to the S– and monocular signals. For high adaptation contrast, g+/g– was lower after adapting to positive than zero or negative interocular correlations; the opposite occurred for low adaptation contrast. The data are explained by an inverted-U gain function.
May, K.A., Zhaoping, L. & Hibbard, P.B. (2010). Binocular integration in human vision adapts quickly to maximize coding efficiency.
Perception, 39, 1149. (Talk presented at the AVA AGM 2010)
The two eyes typically receive correlated inputs, from which one can derive the two decorrelated input channels: binocular summation, S+, and binocular difference, S−. S+ has greater power than S− in natural scenes, and the opposite occurs when the inputs to the two eyes are anticorrelated. To represent the input most efficiently, ie to maximize the information transmitted for a given energy budget, the visual system gives a higher gain to the weaker of the two decorrelated channels when the signal-to-noise ratio (SNR) is high [eg at low spatial frequencies (SFs), due to the 1/f spectrum], and gives a lower gain to the weaker channel when the SNR is low (eg at high SFs) to minimize energy wasted in transmitting noise (Li and Atick, 1994 Network 5 157 – 174). The gains are predicted to adapt to the interocular correlation. We assessed the relative gains to S+ and S− channels from observers' motion direction judgements using a cyclopean motion stimulus [Shadlen and Carney, 1986 Science 232 95 – 97; Hayashi et al., 2007 Journal of Vision 7(8):7 1 – 10] in which the S+ signal had motion in the opposite direction to both S− and the monocular signals. As predicted, at low SFs, the ratio of S+ to S− gain was lower after adapting observers to positive ocular correlations (when both eyes saw identical natural images) than after adapting to anticorrelated ocular inputs (when one eye saw the photonegative of the other eye's input). The opposite occurred for high SFs. Adaptation occurred within a few seconds.
Zhaoping, L. & May, K.A. (2010). Human monochromatic light discrimination explained by optimal signal decoding.
Perception, 39, 1148–1149. (Talk presented at the AVA AGM 2010)
Why does the minimum wavelength difference for humans to discriminate two monochromatic inputs (which could differ in input intensity) depend on the wavelength in a particular way, dipping near wavelengths 490 and 590 nm but rising steeply beyond 630 nm (Pokorny and Smith, 1970 Journal of the Optical Society of America 60 562 – 569)? We propose a computational explanation by maximum-likelihood decoding of the light's colour from the cone absorptions. The wavelength tuning curves of the three cone types reflect their average absorptions for any monochromatic input. However, owing to Poisson noise in the cones, the actual absorptions will deviate stochastically from the respective averages. The brain could decode the best estimates of the input wavelength and intensity responsible, and the noise-induced uncertainty about these estimates. Computationally [Dayan and Abbott, 2001 Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems (Cambridge, MA: MIT Press)], these best estimates and their uncertainties correspond to the peak location and the spread, in the input wavelength and intensity, of the conditional probability of the absorptions for the input. Experimentally, peak and spread should correspond to the perceived monochromatic input and the input discrimination threshold. We apply the computational decoding scheme to a wavelength discrimination procedure when subjects adjust the input wavelength and intensity of a comparison input field to match a standard monochromatic input field, and find a good agreement between the computationally predicted and experimentally observed wavelength discrimination thresholds as a function of the wavelength. Our findings suggest that retinal and cortical processes for colour decoding are optimal.
May, K.A., Zhaoping, L. & Hibbard, P.B. (2010). Effects of image statistics on stereo coding in human vision.
Journal of Vision, 10(7):359. (Poster presented at VSS 2010)
Download poster (5.3 MB)
Note: the experiments described in the poster use a different methodology from those in the abstract, although they address the same issue.
Biological visual systems continuously optimize themselves to the prevailing image statistics, which gives rise to the phenomenon of adaptation. For example, post-adaptation color appearance can be explained by efficient coding which appropriately combines the input cone channels into various chromatic and achromatic channels with suitable gains that depend on the input statistics [Atick, J.J., Li, Z. & Redlich, A.N. (1993). Vision Research, 33, 123-129]. In this study we focus on the ocular channels corresponding to the two eyes. We investigated how image statistics influence the way human vision combines information from the two eyes. Efficient coding in ocular space [Li, Z. & Atick, J.J. (1994) Network, 5, 157-174] predicts that the binocularity of neurons should depend on the interocular correlations in the visual environment: As the interocular correlations increase in magnitude, the neurons should become more binocular. In natural viewing conditions, interocular correlations are higher for horizontal than vertical image components, because vertical binocular disparities are generally smaller than horizontal disparities. Thus, adaptation to natural stereo image pairs should lead to a greater level of binocularity for horizontally-tuned neurons than vertically-tuned neurons, whereas adaptation to pairs of identical natural images should not. We used interocular transfer of the tilt illusion as an index of binocularity of neurons with different characteristics. Subjects adapted either to natural stereo pairs or pairs of identical natural images. As predicted, interocular transfer was higher for near-horizontal than near-vertical stimuli after adaptation to natural stereo pairs, but not after adaptation to pairs of identical natural images.
May, K.A. & Hess, R.F. (2010). Implementing curve detectors for contour integration.
Perception, 39(2), 270. (Poster presented at the AVA Christmas meeting 2009)
Download poster (15.7 MB)
We recently presented a model of contour integration in which the image receives two stages of oriented filtering, separated by a nonlinearity [May and Hess, 2008 Journal of Vision 8(13):4 1–23]. If the 1st and 2nd stage filters have the same orientation, the model detects 'snakes', in which the elements are parallel to the contour path; if the 1st and 2nd stage filters are orthogonal, the model detects 'ladders', in which the elements are perpendicular to the path. The model correctly predicts that detection of ladders is largely unaffected by contour smoothness, but fails to predict that jagged snakes are harder to detect than smooth snakes. The advantage for smooth snakes suggests the existence of a third stage which detects fragments of snake contour with constant sign of curvature. It has been argued that contours are analysed with mechanisms that multiply the outputs of subunits along the contour [Gheorghiu and Kingdom, 2009 Journal of Vision 9(2):23, 1–17]. We implemented multiplicative curve detectors by multiplying spatially shifted outputs from different orientation channels in our model, giving curve detector responses for different orientations and curvatures. For each orientation, we summed responses across detector curvature, to give a 3-D response space, with dimensions representing orientation and the 2-D retinal image. Responses were then thresholded to form 3-D zero-bounded regions within the response space, tracing out the contours. The model, which is a hybrid between association field and filter-overlap models, successfully accounts for the improvement in snake detection performance with increasing contour smoothness.
May, K.A. & Hess, R.F. (2009). Implementing curve detectors for contour integration.
Journal of Vision, 9(8):906, 906a. (Poster presented at VSS 2009)
Download poster (15.7 MB)
We recently presented a model of contour integration in which grouping occurs due to the overlap of filter responses to the different elements [May, K.A. & Hess, R.F. (2008). Journal of Vision, 8(13):4, 1–23]. The image receives two stages of oriented filtering, separated by a nonlinearity. Then the filter output is thresholded to create a set of zero-bounded regions (ZBRs) within each orientation channel. Finally, spatially overlapping ZBRs in neighbouring orientation channels are linked to form 3D ZBRs within the space formed by the two dimensions of the image along with a third dimension representing orientation. If the 1st and 2nd stage filters have the same orientation, the model detects snakes, in which the elements are parallel to the contour path; if the 1st and 2nd stage filters are orthogonal, the model detects ladders, in which the elements are perpendicular to the path. The model detects both straight and curved contours, and correctly predicts that detection of ladders is largely unaffected by contour smoothness, but fails to explain the finding that jagged snakes are harder to detect than smooth snakes that follow an arc of a circle. The advantage for smooth snakes, and several other findings, suggest that the primitive features detected by snake-integration mechanisms are fragments of contour with constant sign of curvature. A detector for any shape of contour can be created by summing spatially shifted outputs from different orientation channels: this is equivalent to filtering with a receptive field that matches the desired shape, and would be simple to implement physiologically. We extended our earlier model by combining filter outputs in this way to create detectors for smooth contour fragments with a range of different curvatures. This approach makes the model more robust to noise, and explains the advantage for smoothly curved snakes.
May, K.A. & McIlhagga, W.H. (2009). Probing edge blur perception with reverse correlation.
Perception, 38(4), 621. (Talk presented at the AVA AGM 2009)
We investigated blur perception in human vision using reverse correlation. On each trial, subjects saw two edges: the target was a blurred edge with a Gaussian integral profile, and the nontarget was a sharp step-edge. 1-D noise was added to each edge. Subjects had to identify the target. We found the mean difference between target and nontarget noise profiles for the correct trials and for the incorrect trials. The difference between these two mean noise-difference profiles is the classification image (CI), which can be interpreted as the receptive field that was used to perform the task. Consistent with the N3+ model [a multi-scale model of edge perception (Georgeson et al, 2007 Journal of Vision 7(13):7, 1 – 21)], our CIs approximated a Gaussian third derivative. The model filters the image with Gaussian first- and second-derivative operators, with an intervening half-wave rectifier; the scale, σ, of each channel is determined by the scales of its two derivative operators. Each channel's output is multiplied by σα. Peaks across space and scale indicate the position and scale (ie blur) of each edge element. For a Gaussian edge with scale σe, the peak occurs in the channel with scale σ = σe√[1/(3/α − 1)]. In the original N3+ model, α = 1.5, giving a peak in the channel matched in scale to the edge. Our CIs were wider than predicted by this model, suggesting a higher value of α. The N3+ model with the best-fitting α-value predicted responses on a trial-by-trial basis, and gave simulated CIs that fitted remarkably well to the psychophysical ones.
May, K.A. & Hess, R.F. (2008). Testing filter-overlap models of contour integration.
Journal of Vision, 8(6):72, 72a. (Talk presented at VSS 2008)
Most models of contour integration belong to one of two broad classes: those with explicit connections that link different regions of space (association field models, e.g. Field, Hayes & Hess, 1993, Vision Research, 33, 173-193), and those which depend on spatial overlap in the filter responses to adjacent elements (filter-overlap models). In some filter-overlap models, processing occurs separately within each orientation channel. These models do not adequately account for human foveal contour detection performance because (1) their performance decreases too rapidly with increasing curvature (Hess & Dakin, 1997, Nature, 390, 602-604), and (2) their performance decreases as the contour becomes smoother (Lovell, 2005, Journal of Vision, 5(8), 469a), while human observers generally show the opposite effect (Pettet, 1999, Vision Research, 39, 551-557; Lovell, 2005). The filter-overlap model's ability to detect smooth or highly curved contours can be improved by allowing it to link spatially-overlapping filter responses from adjacent orientation channels. We set up two types of orientation-linking filter-overlap model. One used 1st-order filters to detect snakes (i.e. contours composed of Gabor elements parallel to the path of the contour); the other used 2nd-order filters to detect ladders (in which the elements are perpendicular to the path). Both models were good at detecting smooth, highly curved contours, but showed little effect of contour smoothness or curvature. In contrast, human performance on snakes increased substantially with increasing smoothness and, for the most jagged contours, decreased substantially with increasing curvature. Human performance on ladders showed little effect of smoothness (unlike separate-channels filter-overlap models), but was strongly disrupted by an increase in curvature (unlike orientation-linking filter-overlap models). Thus, neither type of filter-overlap model could account for the pattern of results for snakes or ladders. We conclude that, despite their successful detection performance, filter-overlap models are not realistic models of contour integration in human vision.
May, K.A. & Hess, R.F. (2007). Contour integration and crowding: a similar type of mechanism?
Perception, 36(9), 1399. (Talk presented at the AVA AGM 2007)
We studied integration of contours consisting of Gabor elements positioned along a smooth path, embedded amongst distractors. Contour elements were aligned with the path (‘snakes’) or
orthogonal to it (‘ladders’). Straight snakes and ladders were easily detected in the fovea but, at an eccentricity of 6°, only snakes were detectable. We propose that the failure to detect peripheral ladders is an example of crowding, the phenomenon observed when identification of peripherally located letters is disrupted by flanking letters. Pelli et al (2004 Journal of Vision 4 1136 – 1169) outlined a model in which simple feature detectors are followed by integration fields, which mediate tasks requiring the outputs of several detectors (eg letter identification). They proposed that crowding occurs because integration fields are larger in the periphery, causing inappropriate feature integration. We argue that the ‘association field’, which has been proposed to mediate contour integration (Field et al,
1993 Vision Research 33 173 – 193), is a type of integration field. Our data are explained by a model in which weak ladder integration competes with strong snake integration. In the fovea, small association fields allow both types of contour to be integrated with little interference. In the periphery, association fields are larger, and a ladder element is likely to be closely aligned with a distractor within the field; the ladder element will then form a snake with the distractor element, disrupting the ladder integration. In contrast, even with large fields, snake elements are usually most strongly linked to their neighbours along the contour.
May, K.A. & Hess, R.F. (2007). Ladder contours are undetectable in the periphery.
Journal of Vision, 7(9):113, 113a. (Talk presented at VSS 2007)
In many studies of contour integration, the task is to detect a contour consisting of spatially separated Gabor elements positioned along a smooth path (e.g., Field, Hayes, & Hess,
1993, Vision Research, 33, 173-193). The elements can be aligned with the path ("snakes") or perpendicular to it ("ladders"). With foveal viewing, ladders are generally harder to detect than snakes but, as long as they are fairly straight, ladders can still be detected quite easily. We found a striking deficit in detection of ladders in the periphery. Completely straight ladders were undetectable at an eccentricity of 6 degrees of visual angle, whereas performance on straight snakes at this eccentricity was at or close to 100%. This suggests that ladder detection is disproportionately impaired in the periphery, but an alternative explanation is that there is a general impairment of ladder detection that only shows up in the periphery, where performance falls away from ceiling. To
address this issue, we brought performance away from ceiling in the fovea by jittering the orientations of the elements. For two subjects, foveal performance was matched for snakes and ladders with the same orientation jitter levels. In both cases, detection of ladders fell to chance at an eccentricity of 4 deg, whereas detection of snakes remained significantly above chance up to and including the largest eccentricity that we tested (8 deg). The failure to detect ladders at such small eccentricities may partly explain the relative difficulty in detecting ladders that has been reported in previous studies: in all of these studies, the position of the contour has been randomized to some extent. The difference in the effect of eccentric viewing on snakes and ladders means that any positional randomization would have caused a greater disruption to detection of ladders.
May, K.A. & Hess, R.F. (2006). Snakes are as fast as ladders: evidence against the hypothesis that contrast facilitation mediates contour detection.
Journal of Vision, 6(6), 337a. (Poster presented at VSS 2006)
Download Poster as Powerpoint file (4.1 MB)
or A3 size pdf (4.4 MB)
It is easy to detect a "snake" consisting of spatially separated, collinear elements, embedded in a field of randomly oriented elements (Field, Hayes & Hess,
1993, Vision Research, 33, 173-193). Performance is poor when elements are oriented 45 degrees to the contour, but improves when elements are orthogonal to the contour ("ladders") (Ledgeway, Hess & Geisler, 2005, Vision Research, 45, 2511-2522). Contour detection has been related to the phenomenon of contrast facilitation, whereby the contrast threshold for detection of an element is reduced when it is flanked by other elements: many models assume that contours are detected through the modulation of
neuronal activity by the facilitatory signals that underlie contrast facilitation. If this were the case, one would expect contour detection to show similar temporal properties to contrast facilitation. Cass & Spehar
(2005, Vision Research, 45, 3060-3073) used a psychophysical procedure to estimate the speed of propagation of contrast facilitation signals; their results suggest that the facilitatory signals from collinear flankers propagate much more slowly than those from non-collinear flankers. We investigated the effect of temporally modulating the orientation of contour elements from collinear to diagonal, or from orthogonal to diagonal. If contour detection and contrast facilitation are mediated by the same mechanisms, then the integration of snake contours should be much slower, and should be disrupted at much lower temporal frequencies, than the integration of ladder contours. We found identical temporal properties for both contour types, suggesting that contour integration is mediated by different mechanisms from contrast facilitation.
May, K.A. & Zhaoping, L. (2005). Both cognitive factors and local inhibition mediate the effect of a surrounding frame in visual search for oriented bars.
Journal of Vision, 5(8), 959a. (Poster presented at VSS 2005)
Download poster (819 KB)
It is easier to search for tilted line elements amongst vertical distractors than vice-versa (Treisman & Gormican, 1988, Psychological Review, 95, 15-48). When a vertical or tilted square frame surrounds the elements, there is an advantage for targets tilted relative to the frame. Treisman suggested two explanations: (1) the frame defines the orientation against which tilt is defined, and targets parallel to the frame lack a "tilt" feature, making them harder to find; (2) targets tilted relative to the frame have a unique orientation, making them more salient than targets parallel to the frame, which receive competition from it. Li (2002, Trends in Cognitive Sciences, 6, 9-16) proposed a saliency mechanism that explains these results using iso-orientation inhibition between nearby V1 cells: cells responding to an element parallel to the frame receive more inhibition than those responding to an element with a unique orientation. We ran several experiments to test this model. In each stimulus, either the target or distractors were parallel to the left and right sides of the frame, and no element was parallel to the frame's top and bottom. In experiment 1 the left and right sides of the frame were constructed from elements oriented parallel to the frame's top and bottom; in experiment 2, the left and right sides were removed altogether. Both modifications caused the target to be uniquely oriented whether or not it was tilted relative to the frame and, in both cases, the frame effect was still present (but reduced in experiment 2). These results are not explained by the V1 model, and suggest a role for more cognitive factors. However, other results supported the V1 model, which predicts that inhibition decreases with increasing distance between receptive fields. We found that enlarging the frame, so that it was further from the elements, reduced its effect. In addition, a single line through the stimulus has the same effect as a frame only when the target is close to it.
Guyader, N., May, K.A. & Zhaoping, L. (2005). Top-down interference in visual search.
Journal of Vision, 5(8), 951a. (Poster presented at VSS 2005)
In our visual search experiment, each item had two bars: one was tilted 45 degrees to the left from vertical for distractors and 45 degrees to the right for the target; the other is a horizontal or vertical bar centered at the same location. Each target or distractor is a rotated version of all other items. As the target had a uniquely oriented bar, it was typically the most salient item, both by the Feature Integration Theory (Treisman & Gelade, Cognitive Psychology 12:97-136, 1980), and the theory of the bottom up saliency map in V1 (Li, Trends in Cognitive Sciences, 6:9-16,
2002). The subjects were informed of this unique orientation, and were instructed to quickly report by button press whether the target was in the left or right half of the stimulus display. Reaction times (RTs) were measured and subjects' eye positions were tracked. We also measured the "reaction time of the eye" (RTE) defined as the first time that the eye position is close enough to the target. Typically, RT > RTE. Subjects reported that the target often "vanished" after they had initially detected it. Eyes were often seen to saccade to the target, then moved away or loitered around for a long time, before moving back to the target and the subject's button press. A control condition was designed by changing the uniquely oriented bar in the target to tilt 20 degrees to the right from vertical, so the target was no longer a rotated version of distractors. The gap between RT and RTE was significantly shorter in this control than that in the original condition, even though their RTs were comparable. The same result was found for other control conditions with comparable RTs. In the original condition, it is as if the eyes, driven by V1 through superior colliculus, locate the target by the bottom up saliency process of unique orientation pop out, while the top-down process of object recognition, presumably rotation invariant, intervenes with the fact that all items are identical objects.
Zhaoping, L. & May, K. (2004). Irrelevance of feature maps for bottom up visual saliency in segmentation and search tasks.
Society for Neuroscience Abstracts, 20.1.
Traditional models of selection using saliency maps assume that visual inputs are processed by separate feature maps whose outputs are subsequently added to form a master saliency map. A recent hypothesis (Li, TICS 6:9-16, 2002) that V1 implements a saliency map requires no separate feature maps. Rather, saliency at a visual location corresponds to the activity of the most active V1 cell responding to inputs there, regardless of its feature tuning. We test the models using texture segmentation and visual search tasks. Texture borders in Fig. A and B pop out due to higher saliency of the bars at the borders. Traditional models predict easier texture segmentation in pattern C (created by superposing A and B) than in A and B, while the V1 model does not. Traditional models predict no interference of the component pattern D in segmenting pattern E which is created by superposing A and D, while the V1 model predicts interference. Using reaction time as a measure of the task difficulty, the V1 model's predictions were confirmed. Analogous results were found in search tasks for orientation singletons in stimuli of target and distractors made of single or composite bars. The V1 model was also confirmed using stimuli made of color-orientation feature composites.
May, K.A. & Zhaoping, L. (2004). Investigating salience mechanisms by using the effects of surrounding frame on the tilted-vertical asymmetry in visual search.
Perception, 33, Supplement, 12. (Talk presented at ECVP 2004)
We measured the stimulus duration required to detect target lines that differed in orientation from distractor lines. Tilted targets amongst vertical distractors required shorter durations than vice-versa. Surrounding the stimulus with a square frame tilted by the same amount as the tilted lines reduced or reversed this asymmetry; a vertical frame had no effect. Treisman and Gormican
(1988 Psychological Review 95 15 - 48) found similar results using reaction times. Li
(2002 Trends in Cognitive Sciences 6 9 - 16) proposed that V1 mechanisms determine salience in visual search. According to this proposal, the advantage for tilted targets could arise from weaker iso-orientation suppression of obliquely tuned V1 cells, since fewer cells encode oblique orientations. The frame effect can be explained by proposing that the sides of the frame inhibit responses to lines parallel to the frame. This predicts no effect of a frame constructed from elements with orientation perpendicular to the side of the frame. This prediction was supported by some subjects, but not others. When alternate frame elements were black and white (on a grey background), so that a large V1 receptive field aligned with the side of the frame would show no response, the frame effect disappeared for some subjects.
May, K.A. & Georgeson, M.A. (2004). Perceiving edge contrast.
Perception, 33(6), 757. (Talk presented at the AVA Christmas meeting 2003)
We have shown previously that a template model for edge perception successfully predicts perceived blur for a variety of edge profiles (Georgeson, 2001 Journal of Vision 1 438a; Barbieri-Hesse and Georgeson, 2002 Perception 31 Supplement,54). This study concerns the perceived contrast of edges. Our model spatially differentiates the luminance profile, half-wave rectifies this first derivative, and then differentiates again to create the edge's 'signature'. The spatial scale of the signature is evaluated by filtering it with a set of Gaussian derivative operators. This process finds the correlation between the signature and each operator kernel at each position. These kernels therefore act as templates, and the position and scale of the best-fitting template indicate the position and blur of the edge. Our previous finding, that reducing edge contrast reduces perceived blur, can be explained by replacing the half-wave rectifier with a smooth, biased rectifier function (May and Georgeson, 2003 Perception 32 388; May and Georgeson, 2003 Perception 32 Supplement, 46). With the half-wave rectifier, the peak template response R to a Gaussian edge with contrast C and scale σ is given by: R =Cπ −1/4σ −3/2. Hence, edge contrast can be estimated from response magnitude and blur: C =Rπ1/4σ3/2. Use of this equation with the modified rectifier predicts that perceived contrast will decrease with increasing blur, particularly at low contrasts. Contrast-matching experiments supported this prediction. In addition, the model correctly predicts the perceived contrast of Gaussian edges modified either by spatial truncation or by the addition of a ramp.
May, K.A. & Georgeson, M.A. (2003). Perceiving edge blur: Gaussian-derivative filtering and a rectifying nonlinearity.
Perception, 32, Supplement, 46. (Talk presented at ECVP 2003)
A template model for edge perception successfully predicts perceived blur for a wide variety of edge profiles (Georgeson, 2001 Journal of Vision 1 438a). The model differentiates the luminance profile, half-wave rectifies this first derivative, and then differentiates again to create the 'signature' of the edge. The spatial scale of the signature is evaluated by filtering with a set of Gaussian derivative operators whose response measures the correlation between the signature and the operator kernel. These kernels thus act as templates for the edge signature, and the position and scale of the best-fitting template indicate the position and blur of the edge. The rectifier accounts for a range of effects on perceived blur (Barbieri-Hesse and Georgeson, 2002 Perception 31 Supplement, 54). It also predicts that a blurred edge will look sharper when a luminance gradient of opposite sign is added to it. Experiment 1 used blur-matching to reveal a perceived sharpening that was close to the predicted amount. The model just described predicts that perceived blur will be independent of contrast, but experiment 2 showed that blurred edges appeared sharper at lower contrasts. This effect can be explained by subtracting a threshold value from the gradient profile before rectifying. At low contrasts, more of the gradient profile falls below threshold and its effective spatial scale shrinks in size, leading to perceived sharpening. As well as explaining the effect of contrast on blur, the threshold improves the model's account of the added-ramp effect (experiment 1).
May, K.A. & Georgeson, M.A. (2003). Perceiving edge blur: linear filtering and a rectifying nonlinearity.
Perception, 32(3), 388. (Talk presented at the AVA Christmas meeting 2002)
We studied the visual mechanisms that encode edge blur in images. Our previous work suggested that the visual system spatially differentiates the luminance profile twice to create the 'signature' of the edge, and then evaluates the spatial scale of this signature profile by applying Gaussian derivative templates of different sizes. The scale of the best-fitting template indicates the blur of the edge. In blur-matching experiments, a staircase procedure was used to adjust the blur of a comparison edge (40% contrast, 0.3 s duration) until it appeared to match the blur of test edges at different contrasts (5% – 40%) and blurs (6 – 32 min of arc). Results showed that lower-contrast edges looked progressively sharper. We also added a linear luminance gradient to blurred test edges. When the added gradient was of opposite polarity to the edge gradient, it made the edge look progressively sharper. Both effects can be explained quantitatively by the action of a half-wave rectifying nonlinearity that sits between the first and second (linear) differentiating stages. This rectifier was introduced to account for a range of other effects on perceived blur (Barbieri-Hesse and Georgeson, 2002 Perception 31 Supplement, 54), but it readily predicts the influence of the negative ramp. The effect of contrast arises because the rectifier has a threshold: it not only suppresses negative values but also small positive values. At low contrasts, more of the gradient profile falls below threshold and its effective spatial scale shrinks in size, leading to perceived sharpening.
Georgeson, M.A., May, K.A. & Barbieri-Hesse, G.S. (2003). Perceiving edge blur: the Gaussian-derivative template model.
Journal of Vision, 3(9), 360a.
We studied the visual encoding of edge blur in images. Our previous work (VSS 2001) suggested a model in which the visual system spatially differentiates the luminance profile twice to create the 'signature' of the edge, and then evaluates the spatial scale of this signature profile by applying Gaussian derivative templates of different sizes. The scale of the best-fitting template estimates the blur of the edge. Here we refine the model in the light of further blur-matching experiments. A staircase procedure adjusted the blur of a Gaussian comparison edge until it appeared to match the blur of test edges with different spatial profiles, lengths, contrasts and blurs. We also added a linear luminance gradient to blurred test edges. When the added gradient was of opposite polarity to the edge gradient, it made the edge look progressively sharper. Lower contrast edges also looked sharper. Both effects can be explained quantitatively by the action of a half-wave rectifying nonlinearity that sits between the first and second differentiating stages. This rectifier also accounts for a range of other effects on perceived blur. It segments the image into discrete regions of common gradient polarity around each edge. The effect of contrast arises because the rectifier has a threshold: it not only suppresses negative values but also small positive values. At low contrasts, more of the gradient profile falls below threshold and its effective width shrinks, leading to perceived sharpening. The refined template model has few free parameters, but is a remarkably accurate predictor of perceived edge blur and offers some insight into the role of multi-scale filtering by V1 neurons.
May, K.A. & Perrett,D.I. (1993). Facial Attractiveness.
(Talk presented at the BPS International Conference on Face Processing, University of Wales College of Cardiff, September 1993)
Langlois & Roggman (Psychological Science, 1990, 114–121) put forward the hypothesis that "attractive faces are only average". They supported their claim by showing that composite faces made by blending photographs of different faces are rated as being more attractive than the average rating of their component faces. We claim that, while composites are attractive, attractiveness is not merely a measure of averageness. We support this claim by showing that the attractiveness of the component faces has a direct effect on the attractiveness of the composite; specifically, if attractive faces are blended together the resulting composite is more attractive than if unattractive faces are used. This phenomenon is not predicted by Langlois & Roggman's model.
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.