|
Published conference abstracts
To download a poster, right-click the link and select "Save Target As ..." or "Save Link As ..."
See copyright notice
May, K.A., Zhaoping, L. & Hibbard, P.B. (2010). Binocular integration in human vision adapts to maximize information coding efficiency.
Perception, 39, ECVP Abstract Supplement, 77. (Poster presented at ECVP 2010)
Download poster (16.9 MB)
The two eyes typically receive correlated inputs, from which one can derive two decorrelated channels: binocular summation (S+) and binocular difference (S–). The channel gains (g+, g–) should adapt to optimize the tradeoff between information transmission and energy usage, giving an inverted-U function of signal strength: strong signals are suppressed to conserve energy with little information loss, and weak signals are suppressed to avoid wasting energy transmitting noise (Li and Atick, 1994 Network 5 157-174). The relative strengths of the S+ and S– signals depend on the interocular correlation. We adapted observers to positive correlations (both eyes saw identical natural images, giving stronger S+ than S–), zero correlations (each eye saw a completely different natural image, giving equal-strength S+ and S–) or negative correlations (each eye saw the photonegative of the other eye’s image, giving weaker S+ than S–). We assessed the gain ratio g+/g– from cyclopean motion direction judgments for a dichoptic display in which the S+ signal contained motion in the opposite direction to the S– and monocular signals. For high adaptation contrast, g+/g– was lower after adapting to positive than zero or negative interocular correlations; the opposite occurred for low adaptation contrast. The data are explained by an inverted-U gain function.
May, K.A., Zhaoping, L. & Hibbard, P.B. (2010). Binocular integration in human vision adapts quickly to maximize coding efficiency.
Perception, 39, 1149. (Talk presented at the AVA AGM 2010)
The two eyes typically receive correlated inputs, from which one can derive the two decorrelated input channels: binocular summation, S+, and binocular difference, S−. S+ has greater power than S− in natural scenes, and the opposite occurs when the inputs to the two eyes are anticorrelated. To represent the input most efficiently, ie to maximize the information transmitted for a given energy budget, the visual system gives a higher gain to the weaker of the two decorrelated channels when the signal-to-noise ratio (SNR) is high [eg at low spatial frequencies (SFs), due to the 1/f spectrum], and gives a lower gain to the weaker channel when the SNR is low (eg at high SFs) to minimize energy wasted in transmitting noise (Li and Atick, 1994 Network 5 157 – 174). The gains are predicted to adapt to the interocular correlation. We assessed the relative gains to S+ and S− channels from observers' motion direction judgements using a cyclopean motion stimulus [Shadlen and Carney, 1986 Science 232 95 – 97; Hayashi et al., 2007 Journal of Vision 7(8):7 1 – 10] in which the S+ signal had motion in the opposite direction to both S− and the monocular signals. As predicted, at low SFs, the ratio of S+ to S− gain was lower after adapting observers to positive ocular correlations (when both eyes saw identical natural images) than after adapting to anticorrelated ocular inputs (when one eye saw the photonegative of the other eye's input). The opposite occurred for high SFs. Adaptation occurred within a few seconds.
Zhaoping, L. & May, K.A. (2010). Human monochromatic light discrimination explained by optimal signal decoding.
Perception, 39, 1148–1149. (Talk presented at the AVA AGM 2010)
Why does the minimum wavelength difference for humans to discriminate two monochromatic inputs (which could differ in input intensity) depend on the wavelength in a particular way, dipping near wavelengths 490 and 590 nm but rising steeply beyond 630 nm (Pokorny and Smith, 1970 Journal of the Optical Society of America 60 562 – 569)? We propose a computational explanation by maximum-likelihood decoding of the light's colour from the cone absorptions. The wavelength tuning curves of the three cone types reflect their average absorptions for any monochromatic input. However, owing to Poisson noise in the cones, the actual absorptions will deviate stochastically from the respective averages. The brain could decode the best estimates of the input wavelength and intensity responsible, and the noise-induced uncertainty about these estimates. Computationally [Dayan and Abbott, 2001 Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems (Cambridge, MA: MIT Press)], these best estimates and their uncertainties correspond to the peak location and the spread, in the input wavelength and intensity, of the conditional probability of the absorptions for the input. Experimentally, peak and spread should correspond to the perceived monochromatic input and the input discrimination threshold. We apply the computational decoding scheme to a wavelength discrimination procedure when subjects adjust the input wavelength and intensity of a comparison input field to match a standard monochromatic input field, and find a good agreement between the computationally predicted and experimentally observed wavelength discrimination thresholds as a function of the wavelength. Our findings suggest that retinal and cortical processes for colour decoding are optimal.
May, K.A., Zhaoping, L. & Hibbard, P.B. (2010). Effects of image statistics on stereo coding in human vision.
Journal of Vision, 10(7):359. (Poster presented at VSS 2010)
Download poster (5.3 MB)
Note: the experiments described in the poster use a different methodology from those in the abstract, although they address the same issue.
Biological visual systems continuously optimize themselves to the prevailing image statistics, which gives rise to the phenomenon of adaptation. For example, post-adaptation color appearance can be explained by efficient coding which appropriately combines the input cone channels into various chromatic and achromatic channels with suitable gains that depend on the input statistics [Atick, J.J., Li, Z. & Redlich, A.N. (1993). Vision Research, 33, 123-129]. In this study we focus on the ocular channels corresponding to the two eyes. We investigated how image statistics influence the way human vision combines information from the two eyes. Efficient coding in ocular space [Li, Z. & Atick, J.J. (1994) Network, 5, 157-174] predicts that the binocularity of neurons should depend on the interocular correlations in the visual environment: As the interocular correlations increase in magnitude, the neurons should become more binocular. In natural viewing conditions, interocular correlations are higher for horizontal than vertical image components, because vertical binocular disparities are generally smaller than horizontal disparities. Thus, adaptation to natural stereo image pairs should lead to a greater level of binocularity for horizontally-tuned neurons than vertically-tuned neurons, whereas adaptation to pairs of identical natural images should not. We used interocular transfer of the tilt illusion as an index of binocularity of neurons with different characteristics. Subjects adapted either to natural stereo pairs or pairs of identical natural images. As predicted, interocular transfer was higher for near-horizontal than near-vertical stimuli after adaptation to natural stereo pairs, but not after adaptation to pairs of identical natural images.
May, K.A. & Hess, R.F. (2010). Implementing curve detectors for contour integration.
Perception, 39(2), 270. (Poster presented at the AVA Christmas meeting 2009)
Download poster (15.7 MB)
We recently presented a model of contour integration in which the image receives two stages of oriented filtering, separated by a nonlinearity [May and Hess, 2008 Journal of Vision 8(13):4 1–23]. If the 1st and 2nd stage filters have the same orientation, the model detects 'snakes', in which the elements are parallel to the contour path; if the 1st and 2nd stage filters are orthogonal, the model detects 'ladders', in which the elements are perpendicular to the path. The model correctly predicts that detection of ladders is largely unaffected by contour smoothness, but fails to predict that jagged snakes are harder to detect than smooth snakes. The advantage for smooth snakes suggests the existence of a third stage which detects fragments of snake contour with constant sign of curvature. It has been argued that contours are analysed with mechanisms that multiply the outputs of subunits along the contour [Gheorghiu and Kingdom, 2009 Journal of Vision 9(2):23, 1–17]. We implemented multiplicative curve detectors by multiplying spatially shifted outputs from different orientation channels in our model, giving curve detector responses for different orientations and curvatures. For each orientation, we summed responses across detector curvature, to give a 3-D response space, with dimensions representing orientation and the 2-D retinal image. Responses were then thresholded to form 3-D zero-bounded regions within the response space, tracing out the contours. The model, which is a hybrid between association field and filter-overlap models, successfully accounts for the improvement in snake detection performance with increasing contour smoothness.
May, K.A. & Hess, R.F. (2009). Implementing curve detectors for contour integration.
Journal of Vision, 9(8):906, 906a. (Poster presented at VSS 2009)
Download poster (15.7 MB)
We recently presented a model of contour integration in which grouping occurs due to the overlap of filter responses to the different elements [May, K.A. & Hess, R.F. (2008). Journal of Vision, 8(13):4, 1–23]. The image receives two stages of oriented filtering, separated by a nonlinearity. Then the filter output is thresholded to create a set of zero-bounded regions (ZBRs) within each orientation channel. Finally, spatially overlapping ZBRs in neighbouring orientation channels are linked to form 3D ZBRs within the space formed by the two dimensions of the image along with a third dimension representing orientation. If the 1st and 2nd stage filters have the same orientation, the model detects snakes, in which the elements are parallel to the contour path; if the 1st and 2nd stage filters are orthogonal, the model detects ladders, in which the elements are perpendicular to the path. The model detects both straight and curved contours, and correctly predicts that detection of ladders is largely unaffected by contour smoothness, but fails to explain the finding that jagged snakes are harder to detect than smooth snakes that follow an arc of a circle. The advantage for smooth snakes, and several other findings, suggest that the primitive features detected by snake-integration mechanisms are fragments of contour with constant sign of curvature. A detector for any shape of contour can be created by summing spatially shifted outputs from different orientation channels: this is equivalent to filtering with a receptive field that matches the desired shape, and would be simple to implement physiologically. We extended our earlier model by combining filter outputs in this way to create detectors for smooth contour fragments with a range of different curvatures. This approach makes the model more robust to noise, and explains the advantage for smoothly curved snakes.
May, K.A. & McIlhagga, W.H. (2009). Probing edge blur perception with reverse correlation.
Perception, 38(4), 621. (Talk presented at the AVA AGM 2009)
We investigated blur perception in human vision using reverse correlation. On each trial, subjects saw two edges: the target was a blurred edge with a Gaussian integral profile, and the nontarget was a sharp step-edge. 1-D noise was added to each edge. Subjects had to identify the target. We found the mean difference between target and nontarget noise profiles for the correct trials and for the incorrect trials. The difference between these two mean noise-difference profiles is the classification image (CI), which can be interpreted as the receptive field that was used to perform the task. Consistent with the N3+ model [a multi-scale model of edge perception (Georgeson et al, 2007 Journal of Vision 7(13):7, 1 – 21)], our CIs approximated a Gaussian third derivative. The model filters the image with Gaussian first- and second-derivative operators, with an intervening half-wave rectifier; the scale, σ, of each channel is determined by the scales of its two derivative operators. Each channel's output is multiplied by σα. Peaks across space and scale indicate the position and scale (ie blur) of each edge element. For a Gaussian edge with scale σe, the peak occurs in the channel with scale σ = σe√[1/(3/α − 1)]. In the original N3+ model, α = 1.5, giving a peak in the channel matched in scale to the edge. Our CIs were wider than predicted by this model, suggesting a higher value of α. The N3+ model with the best-fitting α-value predicted responses on a trial-by-trial basis, and gave simulated CIs that fitted remarkably well to the psychophysical ones.
May, K.A. & Hess, R.F. (2008). Testing filter-overlap models of contour integration.
Journal of Vision, 8(6):72, 72a. (Talk presented at VSS 2008)
Most models of contour integration belong to one of two broad classes: those with explicit connections that link different regions of space (association field models, e.g. Field, Hayes & Hess, 1993, Vision Research, 33, 173-193), and those which depend on spatial overlap in the filter responses to adjacent elements (filter-overlap models). In some filter-overlap models, processing occurs separately within each orientation channel. These models do not adequately account for human foveal contour detection performance because (1) their performance decreases too rapidly with increasing curvature (Hess & Dakin, 1997, Nature, 390, 602-604), and (2) their performance decreases as the contour becomes smoother (Lovell, 2005, Journal of Vision, 5(8), 469a), while human observers generally show the opposite effect (Pettet, 1999, Vision Research, 39, 551-557; Lovell, 2005). The filter-overlap model's ability to detect smooth or highly curved contours can be improved by allowing it to link spatially-overlapping filter responses from adjacent orientation channels. We set up two types of orientation-linking filter-overlap model. One used 1st-order filters to detect snakes (i.e. contours composed of Gabor elements parallel to the path of the contour); the other used 2nd-order filters to detect ladders (in which the elements are perpendicular to the path). Both models were good at detecting smooth, highly curved contours, but showed little effect of contour smoothness or curvature. In contrast, human performance on snakes increased substantially with increasing smoothness and, for the most jagged contours, decreased substantially with increasing curvature. Human performance on ladders showed little effect of smoothness (unlike separate-channels filter-overlap models), but was strongly disrupted by an increase in curvature (unlike orientation-linking filter-overlap models). Thus, neither type of filter-overlap model could account for the pattern of results for snakes or ladders. We conclude that, despite their successful detection performance, filter-overlap models are not realistic models of contour integration in human vision.
May, K.A. & Hess, R.F. (2007). Contour integration and crowding: a similar type of mechanism?
Perception, 36(9), 1399. (Talk presented at the AVA AGM 2007)
We studied integration of contours consisting of Gabor elements positioned along a smooth path, embedded amongst distractors. Contour elements were aligned with the path (‘snakes’) or
orthogonal to it (‘ladders’). Straight snakes and ladders were easily detected in the fovea but, at an eccentricity of 6°, only snakes were detectable. We propose that the failure to detect peripheral ladders is an example of crowding, the phenomenon observed when identification of peripherally located letters is disrupted by flanking letters. Pelli et al (2004 Journal of Vision 4 1136 – 1169) outlined a model in which simple feature detectors are followed by integration fields, which mediate tasks requiring the outputs of several detectors (eg letter identification). They proposed that crowding occurs because integration fields are larger in the periphery, causing inappropriate feature integration. We argue that the ‘association field’, which has been proposed to mediate contour integration (Field et al,
1993 Vision Research 33 173 – 193), is a type of integration field. Our data are explained by a model in which weak ladder integration competes with strong snake integration. In the fovea, small association fields allow both types of contour to be integrated with little interference. In the periphery, association fields are larger, and a ladder element is likely to be closely aligned with a distractor within the field; the ladder element will then form a snake with the distractor element, disrupting the ladder integration. In contrast, even with large fields, snake elements are usually most strongly linked to their neighbours along the contour.
May, K.A. & Hess, R.F. (2007). Ladder contours are undetectable in the periphery.
Journal of Vision, 7(9):113, 113a. (Talk presented at VSS 2007)
In many studies of contour integration, the task is to detect a contour consisting of spatially separated Gabor elements positioned along a smooth path (e.g., Field, Hayes, & Hess,
1993, Vision Research, 33, 173-193). The elements can be aligned with the path ("snakes") or perpendicular to it ("ladders"). With foveal viewing, ladders are generally harder to detect than snakes but, as long as they are fairly straight, ladders can still be detected quite easily. We found a striking deficit in detection of ladders in the periphery. Completely straight ladders were undetectable at an eccentricity of 6 degrees of visual angle, whereas performance on straight snakes at this eccentricity was at or close to 100%. This suggests that ladder detection is disproportionately impaired in the periphery, but an alternative explanation is that there is a general impairment of ladder detection that only shows up in the periphery, where performance falls away from ceiling. To
address this issue, we brought performance away from ceiling in the fovea by jittering the orientations of the elements. For two subjects, foveal performance was matched for snakes and ladders with the same orientation jitter levels. In both cases, detection of ladders fell to chance at an eccentricity of 4 deg, whereas detection of snakes remained significantly above chance up to and including the largest eccentricity that we tested (8 deg). The failure to detect ladders at such small eccentricities may partly explain the relative difficulty in detecting ladders that has been reported in previous studies: in all of these studies, the position of the contour has been randomized to some extent. The difference in the effect of eccentric viewing on snakes and ladders means that any positional randomization would have caused a greater disruption to detection of ladders.
May, K.A. & Hess, R.F. (2006). Snakes are as fast as ladders: evidence against the hypothesis that contrast facilitation mediates contour detection.
Journal of Vision, 6(6), 337a. (Poster presented at VSS 2006). Download Poster as
Powerpoint file (4.1 MB) or
A3 size pdf (4.4 MB)
It is easy to detect a "snake" consisting of spatially separated, collinear elements, embedded in a field of randomly oriented elements (Field, Hayes & Hess,
1993, Vision Research, 33, 173-193). Performance is poor when elements are oriented 45 degrees to the contour, but improves when elements are orthogonal to the contour ("ladders") (Ledgeway, Hess & Geisler, 2005, Vision Research, 45, 2511-2522). Contour detection has been related to the phenomenon of contrast facilitation, whereby the contrast threshold for detection of an element is reduced when it is flanked by other elements: many models assume that contours are detected through the modulation of
neuronal activity by the facilitatory signals that underlie contrast facilitation. If this were the case, one would expect contour detection to show similar temporal properties to contrast facilitation. Cass & Spehar
(2005, Vision Research, 45, 3060-3073) used a psychophysical procedure to estimate the speed of propagation of contrast facilitation signals; their results suggest that the facilitatory signals from collinear flankers propagate much more slowly than those from non-collinear flankers. We investigated the effect of temporally modulating the orientation of contour elements from collinear to diagonal, or from orthogonal to diagonal. If contour detection and contrast facilitation are mediated by the same mechanisms, then the integration of snake contours should be much slower, and should be disrupted at much lower temporal frequencies, than the integration of ladder contours. We found identical temporal properties for both contour types, suggesting that contour integration is mediated by different mechanisms from contrast facilitation.
May, K.A. & Zhaoping, L. (2005). Both cognitive factors and local inhibition mediate the effect of a surrounding frame in visual search for oriented bars.
Journal of Vision, 5(8), 959a. (Poster presented at VSS 2005).
Download poster (819 KB)
It is easier to search for tilted line elements amongst vertical distractors than vice-versa (Treisman & Gormican, 1988, Psychological Review, 95, 15-48). When a vertical or tilted square frame surrounds the elements, there is an advantage for targets tilted relative to the frame. Treisman suggested two explanations: (1) the frame defines the orientation against which tilt is defined, and targets parallel to the frame lack a "tilt" feature, making them harder to find; (2) targets tilted relative to the frame have a unique orientation, making them more salient than targets parallel to the frame, which receive competition from it. Li (2002, Trends in Cognitive Sciences, 6, 9-16) proposed a saliency mechanism that explains these results using iso-orientation inhibition between nearby V1 cells: cells responding to an element parallel to the frame receive more inhibition than those responding to an element with a unique orientation. We ran several experiments to test this model. In each stimulus, either the target or distractors were parallel to the left and right sides of the frame, and no element was parallel to the frame's top and bottom. In experiment 1 the left and right sides of the frame were constructed from elements oriented parallel to the frame's top and bottom; in experiment 2, the left and right sides were removed altogether. Both modifications caused the target to be uniquely oriented whether or not it was tilted relative to the frame and, in both cases, the frame effect was still present (but reduced in experiment 2). These results are not explained by the V1 model, and suggest a role for more cognitive factors. However, other results supported the V1 model, which predicts that inhibition decreases with increasing distance between receptive fields. We found that enlarging the frame, so that it was further from the elements, reduced its effect. In addition, a single line through the stimulus has the same effect as a frame only when the target is close to it.
Guyader, N., May, K.A. & Zhaoping, L. (2005). Top-down interference in visual search.
Journal of Vision, 5(8), 951a. (Poster presented at VSS 2005)
In our visual search experiment, each item had two bars: one was tilted 45 degrees to the left from vertical for distractors and 45 degrees to the right for the target; the other is a horizontal or vertical bar centered at the same location. Each target or distractor is a rotated version of all other items. As the target had a uniquely oriented bar, it was typically the most salient item, both by the Feature Integration Theory (Treisman & Gelade, Cognitive Psychology 12:97-136, 1980), and the theory of the bottom up saliency map in V1 (Li, Trends in Cognitive Sciences, 6:9-16,
2002). The subjects were informed of this unique orientation, and were instructed to quickly report by button press whether the target was in the left or right half of the stimulus display. Reaction times (RTs) were measured and subjects' eye positions were tracked. We also measured the "reaction time of the eye" (RTE) defined as the first time that the eye position is close enough to the target. Typically, RT > RTE. Subjects reported that the target often "vanished" after they had initially detected it. Eyes were often seen to saccade to the target, then moved away or loitered around for a long time, before moving back to the target and the subject's button press. A control condition was designed by changing the uniquely oriented bar in the target to tilt 20 degrees to the right from vertical, so the target was no longer a rotated version of distractors. The gap between RT and RTE was significantly shorter in this control than that in the original condition, even though their RTs were comparable. The same result was found for other control conditions with comparable RTs. In the original condition, it is as if the eyes, driven by V1 through superior colliculus, locate the target by the bottom up saliency process of unique orientation pop out, while the top-down process of object recognition, presumably rotation invariant, intervenes with the fact that all items are identical objects.
Zhaoping, L. & May, K. (2004). Irrelevance of feature maps for bottom up visual saliency in segmentation and search tasks.
Program No. 20.1. 2004 Abstract Viewer/Itinerary Planner. Washington, DC: Society for Neuroscience, 2004.
Traditional models of selection using saliency maps assume that visual inputs are processed by separate feature maps whose outputs are subsequently added to form a master saliency map. A recent hypothesis (Li, TICS 6:9-16, 2002) that V1 implements a saliency map requires no separate feature maps. Rather, saliency at a visual location corresponds to the activity of the most active V1 cell responding to inputs there, regardless of its feature tuning. We test the models using texture segmentation and visual search tasks. Texture borders in Fig. A and B pop out due to higher saliency of the bars at the borders. Traditional models predict easier texture segmentation in pattern C (created by superposing A and B) than in A and B, while the V1 model does not. Traditional models predict no interference of the component pattern D in segmenting pattern E which is created by superposing A and D, while the V1 model predicts interference. Using reaction time as a measure of the task difficulty, the V1 model's predictions were confirmed. Analogous results were found in search tasks for orientation singletons in stimuli of target and distractors made of single or composite bars. The V1 model was also confirmed using stimuli made of color-orientation feature composites.
 Figure.jpg)
May, K.A. & Zhaoping, L. (2004). Investigating salience mechanisms by using the effects of surrounding frame on the tilted-vertical asymmetry in visual search.
Perception, 33, Supplement, 12. (Talk presented at ECVP 2004)
We measured the stimulus duration required to detect target lines that differed in orientation from distractor lines. Tilted targets amongst vertical distractors required shorter durations than vice-versa. Surrounding the stimulus with a square frame tilted by the same amount as the tilted lines reduced or reversed this asymmetry; a vertical frame had no effect. Treisman and Gormican
(1988 Psychological Review 95 15 - 48) found similar results using reaction times. Li
(2002 Trends in Cognitive Sciences 6 9 - 16) proposed that V1 mechanisms determine salience in visual search. According to this proposal, the advantage for tilted targets could arise from weaker iso-orientation suppression of obliquely tuned V1 cells, since fewer cells encode oblique orientations. The frame effect can be explained by proposing that the sides of the frame inhibit responses to lines parallel to the frame. This predicts no effect of a frame constructed from elements with orientation perpendicular to the side of the frame. This prediction was supported by some subjects, but not others. When alternate frame elements were black and white (on a grey background), so that a large V1 receptive field aligned with the side of the frame would show no response, the frame effect disappeared for some subjects.
May, K.A. & Georgeson, M.A. (2004). Perceiving edge contrast.
Perception, 33(6), 757. (Talk presented at the AVA Christmas meeting 2003)
We have shown previously that a template model for edge perception successfully predicts perceived blur for a variety of edge profiles (Georgeson, 2001 Journal of Vision 1 438a; Barbieri-Hesse and Georgeson, 2002 Perception 31 Supplement,54). This study concerns the perceived contrast of edges. Our model spatially differentiates the luminance profile, half-wave rectifies this first derivative, and then differentiates again to create the edge's 'signature'. The spatial scale of the signature is evaluated by filtering it with a set of Gaussian derivative operators. This process finds the correlation between the signature and each operator kernel at each position. These kernels therefore act as templates, and the position and scale of the best-fitting template indicate the position and blur of the edge. Our previous finding, that reducing edge contrast reduces perceived blur, can be explained by replacing the half-wave rectifier with a smooth, biased rectifier function (May and Georgeson, 2003 Perception 32 388; May and Georgeson, 2003 Perception 32 Supplement, 46). With the half-wave rectifier, the peak template response R to a Gaussian edge with contrast C and scale σ is given by: R =Cπ −1/4σ −3/2. Hence, edge contrast can be estimated from response magnitude and blur: C =Rπ1/4σ3/2. Use of this equation with the modified rectifier predicts that perceived contrast will decrease with increasing blur, particularly at low contrasts. Contrast-matching experiments supported this prediction. In addition, the model correctly predicts the perceived contrast of Gaussian edges modified either by spatial truncation or by the addition of a ramp.
May, K.A. & Georgeson, M.A. (2003). Perceiving edge blur: Gaussian-derivative filtering and a rectifying nonlinearity.
Perception, 32, Supplement, 46. (Talk presented at ECVP 2003)
A template model for edge perception successfully predicts perceived blur for a wide variety of edge profiles (Georgeson, 2001 Journal of Vision 1 438a). The model differentiates the luminance profile, half-wave rectifies this first derivative, and then differentiates again to create the 'signature' of the edge. The spatial scale of the signature is evaluated by filtering with a set of Gaussian derivative operators whose response measures the correlation between the signature and the operator kernel. These kernels thus act as templates for the edge signature, and the position and scale of the best-fitting template indicate the position and blur of the edge. The rectifier accounts for a range of effects on perceived blur (Barbieri-Hesse and Georgeson, 2002 Perception 31 Supplement, 54). It also predicts that a blurred edge will look sharper when a luminance gradient of opposite sign is added to it. Experiment 1 used blur-matching to reveal a perceived sharpening that was close to the predicted amount. The model just described predicts that perceived blur will be independent of contrast, but experiment 2 showed that blurred edges appeared sharper at lower contrasts. This effect can be explained by subtracting a threshold value from the gradient profile before rectifying. At low contrasts, more of the gradient profile falls below threshold and its effective spatial scale shrinks in size, leading to perceived sharpening. As well as explaining the effect of contrast on blur, the threshold improves the model's account of the added-ramp effect (experiment 1).
May, K.A. & Georgeson, M.A. (2003). Perceiving edge blur: linear filtering and a rectifying nonlinearity.
Perception, 32(3), 388. (Talk presented at the AVA Christmas meeting 2002)
We studied the visual mechanisms that encode edge blur in images. Our previous work suggested that the visual system spatially differentiates the luminance profile twice to create the 'signature' of the edge, and then evaluates the spatial scale of this signature profile by applying Gaussian derivative templates of different sizes. The scale of the best-fitting template indicates the blur of the edge. In blur-matching experiments, a staircase procedure was used to adjust the blur of a comparison edge (40% contrast, 0.3 s duration) until it appeared to match the blur of test edges at different contrasts (5% – 40%) and blurs (6 – 32 min of arc). Results showed that lower-contrast edges looked progressively sharper. We also added a linear luminance gradient to blurred test edges. When the added gradient was of opposite polarity to the edge gradient, it made the edge look progressively sharper. Both effects can be explained quantitatively by the action of a half-wave rectifying nonlinearity that sits between the first and second (linear) differentiating stages. This rectifier was introduced to account for a range of other effects on perceived blur (Barbieri-Hesse and Georgeson, 2002 Perception 31 Supplement, 54), but it readily predicts the influence of the negative ramp. The effect of contrast arises because the rectifier has a threshold: it not only suppresses negative values but also small positive values. At low contrasts, more of the gradient profile falls below threshold and its effective spatial scale shrinks in size, leading to perceived sharpening.
Georgeson, M.A., May, K.A. & Barbieri-Hesse, G.S. (2003). Perceiving edge blur: the Gaussian-derivative template model.
Journal of Vision, 3(9), 360a.
We studied the visual encoding of edge blur in images. Our previous work (VSS 2001) suggested a model in which the visual system spatially differentiates the luminance profile twice to create the 'signature' of the edge, and then evaluates the spatial scale of this signature profile by applying Gaussian derivative templates of different sizes. The scale of the best-fitting template estimates the blur of the edge. Here we refine the model in the light of further blur-matching experiments. A staircase procedure adjusted the blur of a Gaussian comparison edge until it appeared to match the blur of test edges with different spatial profiles, lengths, contrasts and blurs. We also added a linear luminance gradient to blurred test edges. When the added gradient was of opposite polarity to the edge gradient, it made the edge look progressively sharper. Lower contrast edges also looked sharper. Both effects can be explained quantitatively by the action of a half-wave rectifying nonlinearity that sits between the first and second differentiating stages. This rectifier also accounts for a range of other effects on perceived blur. It segments the image into discrete regions of common gradient polarity around each edge. The effect of contrast arises because the rectifier has a threshold: it not only suppresses negative values but also small positive values. At low contrasts, more of the gradient profile falls below threshold and its effective width shrinks, leading to perceived sharpening. The refined template model has few free parameters, but is a remarkably accurate predictor of perceived edge blur and offers some insight into the role of multi-scale filtering by V1 neurons.
Copyright notice
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
|