Accurate motion perception is essential for visually guided movement; complex behaviors such as chasing prey, escaping predators, avoiding obstacles, or catching a thrown ball require that an organism be able to rapidly determine the position and velocity of a moving object, and to anticipate its trajectory through space. Psychological research on motion perception has established that motion can be processed by separate mechanisms, an idea that dates back to Wertheimer’s phenomenological distinction between the “phi” sensation elicited by faster motion and the “beta” sensation elicited by slower motion (Wertheimer 1912, 2012; Steinman et al., 2000). The goal of this thesis is to examine how separate mechanisms contribute to a unified perception of motion.
It is generally believed that there are multiple classes of motion mechanisms. One of these, the first-order motion mechanism, seems to respond to local spatiotemporal correlations in luminance contrast, at relatively short timescales and in small regions of space. It is generally agreed that the first-order motion mechanism originates with the direction selective response of cells in cortical area V1. Classic models of first-order motion detection use linear filters tuned to certain combinations of spatial and temporal frequencies, with the outputs of several such filters being combined nonlinearly (Adelson and Bergen, 1985; Watson and Ahumada, 1985). With some embellishments, the responses observed from cells in area V1 are largely compatible with this model (Movshon et al., 1978; Rust et al., 2005; Touryan et al., 2005; Chen et al., 2007), and models of higher order processing can be built atop this basis (Graham, 2011). Moreover, the size and bandwidth of motion sensing channels inferred from psychophysical measurements are similar to those observed of V1 neurons (Anderson and Burr, 1987, 1989; Banks et al., 1991; Anderson et al., 1991; Watson and Turano, 1995).
However, these first-order mechanisms cannot by themselves fully explain human motion perception. For example, perceiving visual motion does not necessarily require moving features to differ in mean luminance from the background, and does not require motion energy in the Fourier domain. Many forms of higher-order motion stimuli have been constructed that would not be consistently detectable to first-order mechanisms, but these stimuli still elicit strong sensations of movement (Derrington and Badcock, 1985; Chubb and Sperling, 1988; Zanker, 1990). These stimuli have been used to provide evidence for and characterize motion sensing systems separate from the first-order mechanisms. They have been constructed variously by modulations in contrast, texture, or other stimulus features, but generally involve the change in position, over time, of some feature in the image (Lu and Sperling, 1995).
One possible reason for having multiple motion systems is that first-order motion signals are not always a reliable indication of the veridical motion of an object. In a complicated visual world, motion can come from many sources, and accurate perception of the movement of objects requires disambiguating motion signals attributable to the object from irrelevant motions in the background or of other objects. Consider the task of trying to track the movement of a zebra among a background of waving grass. One challenge this task presents is that a motion energy sensor with a limited receptive field size will report the component of the zebra’s motion orthogonal to its stripes, rather than the veridical motion of the zebra, an instance of the so-called “aperture problem” (Ilildreth and Ullman, 1982; Adelson and Movshon, 1982). Combining the component motion signals from V1 cells sensing different orientations might allow disambiguation of the true velocity. This extraction of pattern motion from component motion appears to be one of the roles of visual area MT (Movshon et al., 1985; Simoncelli and Heeger, 1998; Rust et al., 2006). However, this computation of pattern motion cannot completely explain motion perception either, as combining different motion signals allows the object’s motion to be mixed with irrelevant background motion. If motion information is pooled over larger areas, the motion of the background grass will create a subset of the pooled signals that are substantially incorrect; an MT cell analyzing the motion of the zebra will mix together signals from the zebra’s stripes with signals from the grassy background.
A related problem is that computing the velocity of a pattern discards information about its location, because receptive fields must be larger to collect pattern motion. The computation in MT resolves pattern motion but appears to lose information about where the motion is occurring within the large MT receptive fields (Majaj et al., 2007). This is puzzling because the ostensible purpose of motion perception is often to track and anticipate the change in position of a physical object. Consider tracking an animal moving through obscuring tall grass, or watching waves pass over choppy water. The stalks of grass, or the foam and texture on the water, do not progressively change position; they only oscillate in place as the movement passes under them. A computation based on pattern motion would generally track the oscillation of the surface texture rather than that of the underlying movement. However it is the underlying movement that is more relevant, and often dominates the perception of motion.
Literature on visual motion processing has drawn many different demarcations between types of motion. Various papers have discussed first-order versus second-order, short-range versus long-range, local vs. global, textural versus figural, and so on, based on particular demarcations drawn among stimulus properties or proposed mechanisms (Nishida, 2011). For the purposes of this thesis I will also have to pick a demarcation. The examples of animals in the grass and waves on the water draw a contrast between the first-order motion produced by an object, and the fact that the object changes position over time. The latter is what I will be referring to as position-defined motion. A moving object generally produces both first-order and position-defined motion. To track the zebra might require integrating first-order motion signals over space and time as the zebra changes position, while discarding adjacent signals that are inconsistent with its trajectory. However most proposed first-order motion mechanisms do not detect changes in position. For first-order motion processing we are reasonably confident of the mechanisms involved, but mechanisms that detect changes in position are less well understood.
Note that selecting any demarcation involves some reinterpretation when reading the literature. There is an extensive literature on second-order motion but not all of it is applicable to position-defined motion. A typical model for second-order motion processing functions analogously to first-order motion but with a rectifying input nonlinearity. The position-defined stimuli I will use in this thesis might be detectable as second-order in this sense, but will be outside the temporal frequency range thought to apply to this type of mechanism (Lu and Sperling, 2001). Some papers on second-order motion stimuli use stimuli that are not position-defined, but others use second-order stimuli that also happen to involve the movement of salient features. I consider the latter results to be potentially informative of position-defined motion processing.
There is evidence that perception of position-defined motion stimuli may have a separate neural substrate from first-order motion. For example, adding position-defined motion noise to a display does not appear to change the threshold of detection for first-order motion, and it is unclear whether adding first-order motion noise interferes with detecting position-defined motion (Edwards and Badcock, 1995; Nishida et al., 1997; Cassanello et al., 2011, but see Hedges et al., 2011). When differing first-order and position-defined components are present in a stimulus, the motion after-effect is always determined by the first-order component, whereas appearance of the stimulus is often determined by the position-defined motion (Derrington and Badcock, 1985; Chubb and Sperling, 1989; Nishida and Sato, 1992). Neuropsychological evidence suggests a double dissociation between first-order and position-defined motion processing deficits in a number of patients (Vaina and Cowey, 1996; Vaina and Soloviev, 2004), suggesting that different motion mechanisms may have anatomically distinct pathways. Another difference between first-order and position-defined stimuli that suggests different mechanisms is that the latter seems to be capable of tracking objects through over distances larger than what can be achieved through individual local filters. “Long-range” apparent motion stimuli span a distance greater than the classical receptive field size in V1, eliciting sensations of motion without explicit direction selective activity in V1.
Interestingly, the physiological substrate of position-defined motion processing is still unclear. Cortical area MT (or somewhere downstream) has been proposed as a locus of integration between motion and position information (Nishida and Johnston, 1999; McGraw et al., 2004; Mather and Pavan, 2009). While the receptive fields of cells in cortical areas MT and MST are large enough that they might be able to integrate information about objects that change position, recordings of these cells find their responses dominated by first-order motion and showing little to no selectivity to position-defined motion, even when the latter corresponds better to the experience of viewing these stimuli (Livingstone et al., 2001; Ilg and Churan, 2004; Hedges et al., 2011). So while signals present in MT have an influence on perceived position, MT does not itself appear to track perceived position. Despite the fact that these two motion systems clearly both contribute to determining the appearance of the moving world, the question of whether and how they interact to produce a single coherent percept of motion remains open (Nishida, 2011). In this thesis I examine the combination of these two types of motion using a display that contains first-order and position-defined components whose direction of motion can be independently manipulated.
Figure 1.1 provides examples of first order and higher order motion. The elements are Gabor-like stimuli that can be understood as a carrier grating windowed by a spatial envelope. The envelope moves independently of the carrier, so that the carrier provides first order motion while the envelope produces position-defined motion. Figure 1.1A illustrates the difference between first-order and position-defined motion. On the left is a single element with carrier motion but no envelope motion. On the right side, the element has envelope motion but its no carrier motion (only an equivalent amount of flicker). The motion on the right is seen as a clear progressive change in position, while the motion on the left has an appearance more like a flicker that has a direction to it. (The position of the element on the left does appear to shift slightly, in an example of motion-induced position shift; De Valois and De Valois, 1991; Ramachandran and Anstis, 1990).
In Figure 1.1B, elements have both carrier and envelope motion. On the left, the carrier and envelope components move in the same direction; on the right the carrier and envelope motions are in opposite directions. Full details of the construction of this display are given in Section 3. Figure 1.1C is the same but with five elements around each fixation point.
When elements with combined envelope and carrier motions are viewed in isolation, as in Figure 1.1B, the appearance of the direction of motion follows the motion of the envelope, and not strongly affected by the direction of the carrier. The carrier motion does cause a change in the sense of “smoothness,” with conflicting motion having a more jittery appearance , but does not strongly affect the apparent direction or even the apparent speed of the motion. However, in Figure 1.1C, when multiple elements are placed in proximity, but not overlapping, the apparent motion depends on whether the stimulus is viewed centrally or peripherally. When the five-element ring on the left is viewed centrally, the apparent direction of motion is consistent with the envelope. When the same element ring is viewed in the periphery, the apparent direction of rotation matches that of the carrier. If an observer maintains attention on the leftward ring, which has carrier and envelope in conflict, while moving their eyes so as to move the stimulus from central to peripheral vision, the apparent motion may appear to reverse in concert with the eye movement.
From this demonstration it appears that having more than one element in proximity affects how first-order and higher-order motion are combined. That the appearance changes with retinal eccentricity of the stimulus suggests that the range of spatial interaction scales with retinal eccentricity. A plausible explanation could be that the presence of flanking objects limits the ability to see movement of the envelope, thereby allowing the carrier motion to determine the percept; that is, crowding (Levi, 2008; Pelli, 2008) may be affecting how first-order and higher-order motion are combined.
In this thesis I examine how first-order and higher order mechanisms interact in forming an overall perception of motion. In Experiment 1 I quantify how element spacing determines the sensitivity to first-order and higher order motions and present a simple model to capture the results, wherein first-order and higher-order motion signals are processed separately and combined at a decision stage. In Experiment 2 we vary the number of elements independently of the spacing of targets and determine that first-order motion sums inputs over a large area, while higher order motion perception is sensitive to the spacing between elements and flankers.