VISION

Men­tal phe­nom­e­na, all con­scious and uncon­scious men­tal phe­nom­e­na, visu­al or audi­to­ry expe­ri­ences, expe­ri­ences of pain, tick­les, itch­ing, thoughts, cer­tain­ly the entire­ty of our men­tal life, result from the process­es that take place in our brains (Sear­le, 1995)

General structure of visual pathway

Brain process­es under­ly­ing the expe­ri­ence of see­ing are con­duct­ed by neur­al net­works, which form the so-called visu­al path­way. It is a func­tion­al­ly and anatom­i­cal­ly com­plex bio­log­i­cal struc­ture which, to put it very sim­ply, con­sists of three mod­ules. The first mod­ule is eyes, more pre­cise­ly their opti­cal sys­tems and reti­nas. The sec­ond mod­ule con­sists of all sub­cor­ti­cal struc­tures locat­ed between the eyes and the cere­bral cor­tex, par­tic­u­lar­ly in the occip­i­tal lobe.  And the third mod­ule — most com­pli­cat­ed and least known, yet, as it seems, most impor­tant for see­ing — con­sists of inter­con­nect­ed cor­ti­cal areas in var­i­ous brain lobes (Fig. 4).

Fig­ure 4. Scheme show­ing struc­tures form­ing neur­al path­ways of vision. Graph­ic design: P.A.

The divi­sion of the visu­al path­way into three parts results from the rough­ly described func­tions per­formed by the struc­tures locat­ed in each of them. The first mod­ule — the eyes — is respon­si­ble for record­ing the light and the ini­tial orga­ni­za­tion of the sen­so­ry data. In the sec­ond mod­ule, these data are ordered and cat­e­go­rized. Both these parts form the so-called ear­ly stage of the visu­al path­way. In the third mod­ule, most com­pli­cat­ed in terms of func­tion and struc­ture, the sen­so­ry data are ana­lyzed, and then inte­grat­ed and syn­the­sized. The coop­er­at­ing brain struc­tures which form this block are referred to as high­er or late stages of sen­so­ry data pro­cess­ing. The final effect of the work of all these mod­ules is the sub­jec­tive expe­ri­ence of see­ing.

Top-down and bottom-up processes

The direc­tion of the flow of neur­al impuls­es in the visu­al path­way was depict­ed in Fig. 4 with arrows. The orange arrows, which indi­cate the order from the eyes through the sub­cor­ti­cal struc­tures into the cere­bral cor­tex, rep­re­sent the so-called bot­tom-up process­es of sen­so­ry data pro­cess­ing. Fol­low­ing this direc­tion of the sen­so­ry data flow, the lobes of the cere­bral cor­tex receive data on the dis­tri­b­u­tion of light reach­ing the eyes of the observ­er with­in the scope in which the pho­tore­cep­tors record them, and the neu­rons con­nect­ed to the pho­tore­cep­tors — for­ward them. It means that if a part of a visu­al scene is, for exam­ple, brighter than anoth­er, the recep­tors which are more inten­sive­ly illu­mi­nat­ed will react more strong­ly, pro­por­tion­ate­ly to its bright­ness. Such infor­ma­tion will be sent to the top, towards the cen­ter, name­ly towards var­i­ous parts of the cere­bral cortex.

It could seem that when we are talk­ing about visu­al per­cep­tion, the bot­tom-top direc­tion of sen­so­ry data pro­cess­ing is the only pos­si­ble way of get­ting to know the world with our vision.  The eyes, like a cam­era, record the light, and the brain inter­prets its dis­tri­b­u­tion, cre­at­ing the con­tent of the sub­jec­tive expe­ri­ence of see­ing. As a result, the observ­er knows what is in front of their eyes—nothing fur­ther from the truth.  The every­day expe­ri­ence pro­vides thou­sands of exam­ples that con­tra­dict the prin­ci­ple of mechan­i­cal video record­ing using eye-cam­eras.  It does not mean, how­ev­er, that there are no sim­i­lar­i­ties between the eyes and the auto­mat­ed means of image record­ing.  But there is one fun­da­men­tal dif­fer­ence — cam­eras do not think about the world they record.  All they can do is indi­cat­ed that there is, for exam­ple, a human face in the frame. They have no idea, how­ev­er, to whom it belongs and what rela­tion­ship we have with its owner. 

Even though the scope of human binoc­u­lar vision in the hor­i­zon­tal plane is approx­i­mate­ly  180°, and in the ver­ti­cal plane — about 130°, we do not per­ceive all objects in the area with the same clar­i­ty. More­over, we may not notice the pres­ence of some objects at all if they are atyp­i­cal or insignif­i­cant from the per­spec­tive of the task being car­ried at the moment.  We may, for exam­ple, over­look a goril­la, strolling the bas­ket­ball court between the play­ers pass­ing the ball to each oth­er, when we con­cen­trate on count­ing the pass­es (Simons and Chabis, 1999). These prop­er­ties of the visu­al sys­tems were intu­itive­ly used by the mas­ters of cin­e­matog­ra­phy, first and fore­most, Alfred Hitchcock. 

We can also erro­neous­ly pre­sume the pres­ence of some objects in a visu­al scene only because we have fre­quent­ly seen them in sim­i­lar sit­u­a­tions.  We can, for exam­ple, be sure that we have seen a light switch on the wall next to the door, even though it was not there.  To put it briefly, the brain active­ly process­es the sen­so­ry data and assess­es their use­ful­ness in terms of the cur­rent­ly per­formed task. This process may as well result in ignor­ing quite large groups of sen­so­ry data or, on the con­trary, stim­u­lat­ing the mus­cles coor­di­nat­ing the eye­ball move­ment to relo­cate vision from one part of a scene to anoth­er to obtain new data. All these process­es are gen­er­al­ly called top-down pro­cesses. In Fig. 4, the direc­tion was marked with pur­ple arrows point­ing downwards. 

The top-down process­es man­age sen­so­ry data pro­cess­ing, mean­ing that they fil­ter the data based on the type of the cur­rent­ly per­formed task, the inten­tion, the need, the atti­tude, the beliefs, knowl­edge, or expec­ta­tions of an observ­er.  Their results affect the move­ment of eye­balls, direct the visu­al axes on those ele­ments of the paint­ing that require more in-depth analy­sis. Com­pre­hen­sive exper­i­men­tal stud­ies regard­ing the role of atti­tude under­stood as gen­er­al­ized pre­pared­ness to a spe­cif­ic form of response, also in terms of visu­al per­cep­tion, were car­ried out as ear­ly as in the 1960s by Dim­itri Uznadze’s stu­dents as a part of the so-called Geor­gian school of psy­chol­o­gy (Bżaława, 1970; Prangiszwili, 1969; Uznadze, 1966).

In gen­er­al, the sub­jec­tive visu­al expe­ri­ence is a result of sen­so­ry data pro­cess­ing via process­es which orga­nize data from bot­tom to top and “push” them towards high­er lev­els of the brain and top-down process­es that fil­ter and mod­i­fy these data depend­ing on cur­rent needs, beliefs, or knowl­edge of the observ­er as well as have a top-down effect on the fram­ing of oth­er ele­ments of a visu­al scene.

Content analysis and visual scene framing systems 

In Fig. 4 many dif­fer­ent brain struc­tures, which par­tic­i­pate in sen­so­ry data pro­cess­ing in ear­ly and late stages of the visu­al path­way, are high­light­ed. These struc­tures are con­nect­ed with one anoth­er in a non-acci­den­tal man­ner as are the tran­sis­tors on the radio’s cir­cuit board. Par­tic­u­lar brain struc­tures (called nuclei, areas, lob­ules, sul­ci, gyri etc.) are con­nect­ed with one anoth­er through axons of neu­rons, i.e. the cords through which nerve impuls­es trav­el from one cell body to anoth­er. There are struc­tures in this bio­log­i­cal sys­tem which not only trans­mit nerve impuls­es to oth­er struc­tures but also direct­ly or indi­rect­ly receive feed­back from oth­er struc­tures. More­over, with­in dif­fer­ent brain struc­tures there are also com­pli­cat­ed con­nec­tions between the neu­rons that cre­ate those struc­tures. The net­work of all these con­nec­tions is real­ly com­pli­cat­ed. How­ev­er, on the basis of func­tions per­formed by dif­fer­ent net­works of coop­er­at­ing neu­ron clus­ters that take part in visu­al data pro­cess­ing, two main sys­tems can be dis­tin­guished. Those are: fram­ing sys­tem and visu­al scene con­tent analy­sis system.

Every vision expe­ri­ence incor­po­rates only a frag­ment of a larg­er whole. We see as if in frames. We can­not simul­ta­ne­ous­ly see every­thing that is hap­pen­ing around our heads. View­ing some­thing is there­fore a sequence of frames — the views of things, lim­it­ed by the field of vision. In order to see the ele­ments of the scene that are out of view, it is nec­es­sary to change the loca­tion from where the scene is cur­rent­ly viewed or, while remain­ing in the same loca­tion, the posi­tion of eyes or head needs to change. This func­tion is per­formed by the neur­al visu­al scene fram­ing sys­tem which con­trols the move­ment of eye­balls (as well as the move­ment of the head and the whole body), fix­at­ing the visu­al axes in the most inter­est­ing parts of the visu­al scene.

As one might imag­ine, the dif­fer­ence between view­ing any visu­al scene and view­ing a pic­ture is that regard­less of the nat­ur­al lim­i­ta­tions of the field of vision, the pic­ture also has its bound­aries. In the case of muse­um paint­ings, the bound­aries are defined by frames that sep­a­rate the work’s paint­ed sur­face from the wall. The frames of the image could just as well be the edges of a cin­e­ma screen, tele­vi­sion screen, com­put­er screen, the out­line of a pho­to in a news­pa­per as well as a the­atre cur­tain or, even more con­ven­tion­al yet also exist­ing, the bound­aries with­in which a play in an urban space or a per­for­mance hall takes place. View­ing an image requires ignor­ing any­thing that lies beyond its bound­aries, espe­cial­ly when the nat­ur­al field of vision also cov­ers that space. The col­or or tex­ture of the wall where the paint­ing is hung are not a part of the image. There­fore, view­ing an image requires, above all, respect for the spa­tial bound­aries the image impos­es. View­ing an image and see­ing that image in a space such as a muse­um are qual­i­ta­tive­ly two com­plete­ly dif­fer­ent acts of vision.

The sec­ond impor­tant fea­ture of vision is that the frame des­ig­nat­ed by the field of vision or the frames of the image always has cer­tain mean­ing. It con­sists of the ele­ments mak­ing up the scene, col­ors, back­ground, spa­tial com­po­si­tion, move­ment. Under­stand­ing of a visu­al scene requires that its fea­tures be analysed and con­front­ed with exist­ing knowl­edge and visu­al expe­ri­ence. It is imple­ment­ed through the neur­al sys­tem of image con­tent analy­sis. The basic func­tion of this sys­tem is to lead to a sub­jec­tive vision expe­ri­ence, although it does not nec­es­sar­i­ly need to be equiv­a­lent to the under­stand­ing of what is cur­rent­ly seen.

Many peo­ple who had the oppor­tu­ni­ty to see the over four-meter-long paint­ing by Mark Rothko for the first time (Fig. 5) ask them­selves what it is all about, or, more rad­i­cal­ly, why it is even con­sid­ered to be a work of art. Lack of knowl­edge, often not just visu­al, may con­sti­tute a seri­ous lim­i­ta­tion to the lev­el and depth of image under­stand­ing. Nonethe­less, regard­less of whether one accu­rate­ly under­stands what they see in an image, it is cer­tain that the neur­al sys­tem which analy­ses the con­tent of a visu­al scene always tries to attribute mean­ing to what is being seen.

Fig­ure 5. Mark Rothko, Red on Maroon (1959). Tate Mod­ern, Lon­don, Great Britain [182,9 x 457,2 cm]

While divid­ing the visu­al sys­tems into the fram­ing sys­tem and the con­tent analy­sis sys­tem, it is worth adding that both of them oper­ate in the bot­tom-top and top-down mode of sen­so­ry data pro­cess­ing.  On the one hand, almost every scene has ele­ments that draw atten­tion more, and, by stim­u­lat­ing the visu­al scene fram­ing sys­tem, they acti­vate the sys­tem of analy­sis of its con­tent in the mode of dis­tanced pro­cess­ing. On the oth­er hand, the same scene may be sub­ject­ed to a spe­cif­ic analy­sis, and can be framed depend­ing the task that is being per­formed at the time by the observ­er, which is car­ried out in the top-down mode of sen­so­ry data processing. 

IMAGE CONTENT ANALYSIS SYSTEM

Visual scene features

A min­i­mum con­di­tion for sub­jec­tive expe­ri­ence of see­ing is record­ing (obser­va­tion) of shape of a flat (two-dimen­sion­al) fig­ure or a three-dimen­sion­al object in the space encom­passed by the observer’s field of vision. I am not deter­min­ing at this point whether we ini­tial­ly see objects in two or three dimen­sions, because it is a dis­pute that is yet to be resolved (for exam­ple Marr, 1982; Piz­lo, 2008). How­ev­er, regard­less of that, the shape may well be sim­ple — for exam­ple, a point or an out­line of a geo­met­ric shape — or com­plex — for exam­ple a car dri­ving by.

In the nat­ur­al con­di­tions we extreme­ly rarely encounter sit­u­a­tions in which we do not see any shape in our field of vision, name­ly things that reveal their dis­tinc­tive­ness by bor­der­ing with some­thing else.  Dense, total dark­ness or thick fog may lead us to believe that we can­not see any­thing.  The “any­thing” sim­ply means lack of the pres­ence of any shape. A shape is a basic def­i­n­i­tion­al qual­i­ty of each fig­ure or object and their parts as well as a fea­ture of the back­ground and space in which they exist, name­ly a def­i­n­i­tion­al of every visu­al scene (Bag­ińs­ki and Fran­cuz, 2007; Fran­cuz and Bag­ińs­ki, 2007). The shapes of the things seen are the most impor­tant cri­te­ri­on in their cat­e­gori­sa­tion (Fran­cuz, 1990), and con­sti­tute the basis for knowl­edge on what the world looks like.  They can deter­mine the bound­aries of both named and unnamed objects.

The expe­ri­ence of absence of any shapes in one’s field of vision is quite rare in the nat­ur­al con­di­tions. The con­tem­po­rary art, on the oth­er hand, pro­vides a num­ber of mod­el exam­ples of images that present the recip­i­ent with such experience. 

In 1951, Robert Rauschen­berg exhib­it­ed a series of provoca­tive paint­ings titled White Paint­ings in which noth­ing what­so­ev­er was paint­ed (Fig. 6). Even the edges of the paint­ings were marked only by the shad­ows cast on the wall by stretchers.

Fig­ure 6. Robert Rauschen­berg, White Paint­ings [three pan­el] (1951). San Fran­cis­co Muse­um of Mod­ern Art, San Fran­cis­co, USA [182,9 x 274,3 cm]

White Paint­ings by Rauschen­berg reveal two sub­tle bound­aries; the first one — between see­ing the scenery that a paint­ing is a part of and look­ing at a paint­ing, and the sec­ond — between the bot­tom-up and the top-down process of sen­so­ry data pro­cess­ing.  On the one hand, the paint­ing in Fig.  6 does not con­tain any mean­ing, yet the mean­ing of the visu­al scenery in which it is present is con­sti­tut­ed by three, rec­tan­gu­lar can­vas paint­ed white.  On the oth­er hand, the paint­ings pro­voke the minds of their recip­i­ents to fill them with any mean­ing pre­cise­ly because of that fact.  Sim­i­lar con­clu­sion in terms of music were drawn by Rauschen­berg’s friend, John Cage, who in 1952 com­posed and per­formed the famous piece, “4.33” for sym­pho­ny orches­tra. Dur­ing its per­for­mance, no musi­cal instru­ments pro­duced any sounds.  Shapes as well as sounds in musi­cal pieces, are, there­fore, cat­e­gories which refer to both what is reg­is­tered by the sens­es (eyes or ears), and what is pro­duced by the minds of observers or listeners.

The sec­ond prop­er­ty of every visu­al scene is the col­or, name­ly a spe­cif­ic sen­so­ry qual­i­ty of a fig­ure or object deter­mined by its shape or back­ground. Col­or can be described using three dimen­sions: hue, name­ly the attribute that we usu­al­ly under­stand as red or blue col­or, light­ness (or bright­ness, val­ue), also known as the lumi­nance or col­or val­ue, char­ac­ter­is­ing col­or bright­ness in a con­tin­u­um between black and white, and sat­u­ra­tion (or chro­ma) name­ly some­thing that we expe­ri­ence as col­or inten­si­ty. Some­times gloss, which is a deriv­a­tive of the sur­face type or of a mate­r­i­al cov­ered with col­or, is also added to the list. The observed dif­fer­ences between the planes of the paint­ing in terms of bright­ness (lumi­nance) and hue are impor­tant tips regard­ing the shapes of fig­ures or objects present in a visu­al scenery. Apart form the list­ed sen­so­ry prop­er­ties, col­ors are also attrib­uted with var­i­ous sym­bol­ic val­ues that can mod­i­fy the mean­ings of the things we see (Gage, 2010; Popek, 2012; Zielińs­ki, 2008).

The third char­ac­ter­is­tic of a visu­al scenery is its spa­tial organ­i­sa­tion in two dimen­sions or more. If notic­ing at least one shape that sug­gests the pres­ence of an object and sep­a­rates it from the back­ground is a con­sti­tu­tive fea­ture of the scenery, then the object must nat­u­ral­ly be locat­ed in a giv­en place in the space. Refer­ring to the notion of a place in a visu­al scenery makes one aware of the fact that it is a rep­re­sen­ta­tion of some point of view, and that it is lim­it­ed by the range of the observer’s field of vision, or the frames of the painting. 

Deter­mi­na­tions of object loca­tion such as, for exam­ple, “on the right” or “on the left, “high­er” or “low­er”, “clos­er” or fur­ther” both from the observ­er and from one anoth­er, are always rel­a­tive to the point from which the giv­en com­po­si­tion is seen, and the frame­work. It is con­cerns both whole visu­al scenes and paint­ings to the same extent. The impor­tance of the observer’s posi­tion in rela­tion to the scene that they see is so para­mount that one can even speak about their ego­cen­tric posi­tion in the world of the object seen (Goodale and Mil­ner, 2008). Such favouritism results from thee fact that the observers sees not only objects of a scene, but also rela­tions between them. Rela­tions between objects in a visu­al scene and on a paint­ing, observed on a plane per­pen­dic­u­lar to the visu­al axis, are intu­itive­ly estab­lished in rela­tion to the sides of the observer’s body and to the nat­ur­al frames out­lined by their field of vision as well as the frames of the paint­ing. On the oth­er hand, notic­ing rela­tions between objects along the lines par­al­lel to the visu­al axis, that is inwards the paint­ing, is not quite so obvi­ous and requires appli­ca­tion of spe­cial pro­ce­dures of reti­nal data pro­cess­ing as well as knowl­edge on depth indi­ca­tors, in order to grasp them.

The fourth and the last char­ac­ter­is­tic of a visu­al scene is its dynam­ic. It is a deriv­a­tive of the veloc­i­ty, vari­abil­i­ty, accel­er­a­tion and move­ment tra­jec­to­ry of both objects with­in the visu­al scene and the observ­er. Move­ment of objects with­in the visu­al scene desta­bilis­es the spa­tial rela­tions between them. In addi­tion, an observ­er can change their loca­tion in rela­tion to the giv­en scene, thus chang­ing the point from which it is viewed. This con­cerns not only relo­ca­tion of the observ­er in space (for exam­ple dur­ing win­dow shop­ping while tak­ing a walk), but also the move­ment of their eyes, which caus­es the visu­al axis to shift from one frag­ment of the scene to the oth­er. To put it short­ly, the move­ment of objects in the visu­al scene as well as the move­ment of the observ­er while look­ing at it con­sti­tute a huge com­pli­ca­tion in the analy­sis of sub­jec­tive expe­ri­ence of see­ing. As stat­ed ear­li­er, the issue of move­ment with­in an image, name­ly with­in its frames is not the sub­ject of this book, but the move­ment of the observ­er, in par­tic­u­lar their eye­balls while look­ing at the image, is.

The list­ed four fea­tures of the visu­al scene — the shape, the col­or, the spa­tial organ­i­sa­tion and the dynam­ics — can be divid­ed into two cat­e­gories, even thought these cat­e­gories are not sep­a­ra­ble. The first one are those prop­er­ties of the scene that allow the observ­er to recog­nise the objects present in it and notice some­thing about their forms and col­ors. It is the cat­e­go­ry of objects. Per­cep­tive analy­sis of the spec­i­mens that belong to this cat­e­go­ry usu­al­ly does not depend on nei­ther their posi­tion in rela­tion to the observ­er, nor whether they are still or in motion. 

On the oth­er hand, spa­tial organ­i­sa­tion and object move­ment in the visu­al scene are almost always relat­ed to some­thing about which we can say that it has some shape and col­or. Notic­ing an object as being locat­ed, for exam­ple, on the right side of the visu­al scene results from its loca­tion in rela­tion to the observer’s body. These fea­tures, how­ev­er, are not only clear­ly relat­ed to the observer’s body rep­re­sen­ta­tion, but also to their move­ment.  In gen­er­al, spa­tial organ­i­sa­tion and object move­ment in the visu­al scene cre­ate the cat­e­go­ry of rela­tions.

All the list­ed cat­e­gories of visu­al scene fea­tures are not sep­a­ra­ble, because there are such visu­al expe­ri­ences that are locat­ed at their inter­face.  A very fast mov­ing object or observ­er, through a com­plete blur­ring of the edges or even col­ors of things in the visu­al scene, may trig­ger the expe­ri­ence of see­ing move­ment that is not a move­ment of an object of a cer­tain shape. Due to the high speed at which the observ­er can move, as well as the abil­i­ty to cre­ate images through elec­tron­ic media, the num­ber of such expe­ri­ences is con­stant­ly increas­ing. So far, evo­lu­tion has not devel­oped effi­cient mech­a­nisms for deal­ing with such sit­u­a­tions. The best man­i­fes­ta­tion of this are the unex­plained mech­a­nisms of many opti­cal illu­sions in the field of motion per­cep­tion, cre­at­ed using dig­i­tal visu­al­i­sa­tion tech­niques (see e.g. Michael Bach’s web­site, Opti­cal Illu­sions & Visu­al Phe­nom­e­na), as well as the illu­sions, e.g. jet pilots have (Bednarek, 2011).

Vision as an act of creation

David Hubel (Nobel lau­re­ate in the field of phys­i­ol­o­gy and med­i­cine in 1981) and Mar­garet Liv­ing­stone’s arti­cle, pub­lished in 1988 in Sci­ence, con­tains an excel­lent intro­duc­tion to the prob­lems con­cern­ing the func­tion­ing of the neur­al sys­tem of analysing the con­tent of visu­al scene. Although a quar­ter of a cen­tu­ry has passed since its release, and the research results in the field of cog­ni­tive neu­ro­science have ver­i­fied most of its hypothe­ses regard­ing the pro­cess­ing of visu­al data, it is still an up-to-date and reli­able source of infor­ma­tion on the struc­ture and func­tion of indi­vid­ual com­po­nents of the visu­al pathway.

The basic find­ing regard­ing the func­tion of the visu­al path­way in the for­ma­tion of a sub­jec­tive vision expe­ri­ence is that, start­ing with the reti­na’s neu­rons found in both eyes of the observ­er and end­ing with var­i­ous struc­tures of their brain, the above-men­tioned char­ac­ter­is­tics of the visu­al scene (such as shape, col­or, two- and three-dimen­sion­al spa­tial ori­en­ta­tion and motion) are analysed by four neur­al path­ways (sub­sys­tems), part­ly inde­pen­dent of one anoth­er. This state­ment is fun­da­men­tal to under­stand­ing how the expe­ri­ence of see­ing the scene and the objects inside it, is pro­duced. It shows that data, record­ed by pho­tore­cep­tors, on the dis­tri­b­u­tion of light enter­ing the eye at a giv­en time, are sub­ject to par­tial­ly inde­pen­dent analy­ses, con­duct­ed by four spe­cialised ner­vous sub­sys­tems, for the most part of the visu­al path­way. The pur­pose of their activ­i­ty is to inter­pret this data based on the char­ac­ter­is­tics of the objects and/or their frag­ments, which are cur­rent­ly in the observer’s field of vision on the basis of their visu­al experience.

The expe­ri­ence of see­ing a com­plete visu­al scene is not the result of a sim­ple image reflec­tion pro­ject­ed on the eye­’s reti­na (as, for exam­ple, in cam­era obscu­ra), but takes place in two main phas­es: (1) decom­po­si­tion, con­sist­ing in the ana­lyt­i­cal and rel­a­tive­ly inde­pen­dent study of the list­ed fea­tures of the visu­al scene, after abstract­ing them from the reti­nal image; and (2) com­po­si­tion, i.e. inte­grat­ing (syn­the­sis­ing) the results of the analy­ses con­duct­ed in the first phase, tak­ing into account data pre­vi­ous­ly record­ed in visu­al memory.

The pres­ence of both of these phas­es in each act of vision leads to the con­clu­sion that the result of sen­so­ry inte­gra­tion always (to a greater or less­er extent) dif­fers from the record­ed source data. This means that the visu­al expe­ri­ence’s con­tent is con­stant­ly pro­duced by the visu­al sys­tem rather than – as it might seem – repro­duced from reti­nal images. In this sense, vision is an act of cre­ation, dur­ing which the image of real­i­ty record­ed by the sys­tem of pho­tore­cep­tors found in the eye­’s reti­na of the observ­er is constructed.

Early analysis system of visual scene content 

The pho­tore­cep­tors locat­ed in the reti­na inside the eye record the dis­tri­b­u­tion of enter­ing light. This is the first stage of the visu­al scene con­tent analy­sis pro­ce­dure.  The most impor­tant neur­al struc­tures that are involved in the record­ing and arrange­ment of sen­so­ry data in the ear­ly stages of the visu­al path­way are shown in Fig. 7.

Ryci­na 7. Struk­tu­ry neu­ronalne zaan­gażowane w anal­izę zawartoś­ci sce­ny wiz­ual­nej na wczes­nych eta­pach szlaku wzrokowego (rzut z boku i z dołu). Opra­cow­anie graficzne P.A.

The ear­ly analy­sis sys­tem of visu­al scene con­tent essen­tial­ly con­sists of two sep­a­rate struc­tures: the eye, in par­tic­u­lar the opti­cal sys­tem (i.e. the lens that is in its front part and the reti­na locat­ed on its back wall inside the eye) and the lat­er­al genic­u­late nucle­us (LGN), placed halfway between the eyes and the cere­bral cor­tex, in a place called the thal­a­mus.

In the reti­nas of the eye there are, among oth­ers, so-called gan­glion cells. Their axons (pro­jec­tions) over which, like a tele­phone wire, nerve sig­nals are trans­mit­ted into the brain, form an optic nerve. In the area between the eyes and LGN there is an optic chi­asm – a place where a bun­dle of axons car­ry­ing nerve sig­nals from each eye splits into two parts.  A half of the axons from the left eye con­nect with a half of the axons from the right eye (sim­i­lar­ly, the oth­er half of the gan­glion cells axons exit­ing the both eyes) and con­tin­ue their path­way togeth­er to enter the right and left brain’s hemi­spheres.  The area between the optic chi­asm and LGN is called the optic tract.

Togeth­er with LGN, nerve impuls­es are trans­mit­ted to the so-called pri­ma­ry visu­al cor­tex or stri­ate cor­tex in the occip­i­tal lobe of the brain via the axons of a large group of cells, whose bod­ies are found in LGN. This area is called optic radi­a­tion and broad­ly clos­es the first stage of sen­so­ry data trans­mis­sion and sen­so­ry data pro­cess­ing in the visu­al pathway.

Eye – the camera metaphor

In many ways, the struc­ture of the eye and the cam­era are sim­i­lar. Before point­ing out the main dif­fer­ences between them, it is worth to have a clos­er look at this anal­o­gy (Fig. 8).

Fig­ure 8. Schemat­ic struc­ture of the cam­era and the eye. Graph­ic design: P.A. based on Groves and Schlesinger (1979)

The visu­al path­way begins in the eye. Like the cam­era, the eye is made up of a her­met­ic and rigid body. In the eye, it is called scle­ra. It pro­tects the eye­ball against mechan­i­cal dam­age and to sta­bilise its shape.

The front of the eye has an opti­cal sys­tem, i.e. the bio­log­i­cal equiv­a­lent of a cam­era lens.  The most out­er ele­ment is the trans­par­ent cornea, which, like the scle­ra, pro­tects the eye from the mechan­i­cal dam­age. It also acts as a sort of pro­tec­tive fil­ter and fixed focal length lens. Just behind the cornea is a pin-hole vis­i­ble for the lens, called the pupil, whose diam­e­ter is adjust­ed by means of the aper­ture, i.e. iris.

Inci­den­tal­ly, unlike the aper­ture on the cam­era, the iris is col­or­ful: most often brown (in var­i­ous shades), but it can also be grey, green or blue (Fig. 8.1).

Fig­ure 8.1. Iris col­ors, depend­ing on the amount of pig­ment, pre­pared on the basis of Franssen, Cop­pens, van den Berg (2008)

Behind the iris, one of the most extra­or­di­nary organs in our body is placed a var­i­fo­cal lens. Ralf Dahm (2007) calls it a “bio­log­i­cal crys­tal”. The eye­’s lens has two prop­er­ties dis­tin­guish­ing it from the opti­cal sys­tem in the cam­era. First­ly, it has excel­lent trans­paren­cy, allow­ing trans­mit­ting almost 100% of light to the inside of the eye, pro­vid­ed that it is ful­ly work­ing.  The eye­’s lens is also var­i­fo­cal, which enables the observ­er to see objects clear­ly at dif­fer­ent dis­tances from it. The mech­a­nism of chang­ing the focal length of the eye lens is noth­ing like the focal length change in the cam­era lens.

Main­tain­ing visu­al acu­ity for objects at dif­fer­ent dis­tances from the opti­cal sys­tem of the eye is ensured by the abil­i­ty to change the shape of the eye lens. The clos­er the object is to the observer’s eye, the thick­er the lens becomes. The fur­ther the object is to the observer’s eye the thin­ner it is (Fig. 9).

Fig­ure 9. Eye-lens accom­mo­da­tion to var­i­ous dis­tances from the object, on which the eye­sight focus­es. Graph­ic design: P.A.

The lens is attached to the inner part of the eye­ball with cil­iary mus­cles. They stretch the lens as they con­tract, mak­ing it thin­ner, while they make the lens swell in its inner part as they become relaxed. The effect of focus­ing light rays on the back wall of the eye­ball is asso­ci­at­ed with the change in the angle of refrac­tion of light rays on the lens accord­ing to its thick­ness. A thick­er lens bends light rays at a larg­er angle than a thin­ner one. This phe­nom­e­non is called eye accom­mo­da­tion.

On the oppo­site wall of the eye­’s opti­cal sys­tem, a pho­to­sen­si­tive matrix, i.e. reti­na, is found. It cov­ers about 70% of the inner sur­face of the eye­ball. The light reflect­ed by objects in the visu­al scene or emit­ted by them illu­mi­nates the bot­tom of the eye and cre­ates its reti­na pro­jec­tion. It is char­ac­terised by the fact that it is: spher­i­cal, small­er and turned upside down in rela­tion to the orig­i­nal (pro­vid­ed that we know what the orig­i­nal looks like). Nev­er­the­less, such dis­tor­tion are not a major prob­lem for the brain.

If the observ­er has a ful­ly func­tion­al opti­cal sys­tem, the whole image of the visu­al scene is pro­ject­ed onto the reti­na’s sur­face with very high accu­ra­cy. This image is sharp and clear. Unfor­tu­nate­ly, this does not mean that the reti­na reflects it in every place with the same qual­i­ty on which it is pro­ject­ed. Due to the way in which the dis­tri­b­u­tion of light reach­ing the reti­na is analysed, it can be com­pared to a heav­i­ly dam­aged cin­e­ma screen, which in many places is undu­lat­ing, unclean, and even pit­ted in some places. To put it briefly, although the eye and the cam­era have many com­mon fea­tures regard­ing their struc­ture, their func­tion­ing is almost com­plete­ly dif­fer­ent (Duchows­ki, 2007). The sen­sors in the cam­er­a’s matrix will record each light para­me­ter with the same qual­i­ty, on the con­trary to pho­tore­cep­tors in the retina. 

There are two types of pho­tore­cep­tors in the reti­na, i.e. pho­to­sen­si­tive cells. They are: cones and rods. The names of these recep­tors orig­i­nate from their shapes: cones resem­ble tapers and rods – cylin­ders (Fig. 10). The rods are incom­pa­ra­bly more sen­si­tive to light enter­ing the eye than the cones, which is why the cones work dur­ing the day and “fall asleep” at night, adapt­ed to the dark.  On the con­trary to the cones, the rods “sleep” dur­ing the day, adapt­ed to the light, and are active at night (Młod­kows­ki, 1998).

Fig­ure 10. Rod and cone. Graph­ic design: P.A. based on Matthews (2000) and Nolte (2011)

The cones react dif­fer­ent­ly to var­i­ous elec­tro­mag­net­ic wave­lengths in the vis­i­ble light range, i.e. approx­i­mate­ly from 400 to 700 nanome­ters. There is a close rela­tion­ship between elec­tro­mag­net­ic wave­length and col­or vision (Fig. 11). The cones also react to the inten­si­ty of the light wave. The rods, in turn, do not dif­fer­en­ti­ate col­ors, but are par­tic­u­lar­ly sen­si­tive to the bright­ness (inten­si­ty) of light. The image reflect­ed by the rods is achro­mat­ic. This means that when it gets dark and the rods acquire con­trol over vision, we stop dif­fer­en­ti­at­ing col­ors while still dif­fer­en­ti­at­ing shades of grey. Obvi­ous­ly, this rule applies only to the col­ors that cov­er the sur­faces on which the light is reflect­ed, not those that emit it. At night, we see col­or­ful neon lights, because the light they emit stim­u­lates the cones. How­ev­er, we can­not see the dif­fer­ence between the green and red paint­work of two adja­cent cars in a dark street, because they reflect more or less the same light in low light con­di­tions and the rods will react to them in a sim­i­lar manner.

Fig­ure 11. Elec­tro­mag­net­ic wave­lengths in the range of vis­i­ble light in the chro­mat­ic (col­ored) and achro­mat­ic ver­sions, in the con­text of oth­er elec­tro­mag­net­ic wave­lengths; m – metre, nm – nanome­tre, mm – mil­lime­tre, km – kilo­me­tre. Own graph­ic design.

The vision under very good light­ing con­di­tions is called pho­topic vision and the cones pri­mar­i­ly par­tic­i­pate in it, where­as the vision in low light con­di­tions is called sco­topic vision and it is the result of rod activity.

Both types of pho­tore­cep­tors work at twi­light, ear­ly in the morn­ing or in a moon­lit night.  The dark­er it is, the weak­er the reac­tion of the cones becomes, and the rods awak­en from the day­time slum­ber and react increas­ing­ly intense­ly.  On the oth­er hand, the brighter it gets, the more intense­ly the cones react to the light, where­as the reac­tion of the rods decreas­es.  At this point, we are deal­ing with the so-called mesopic vision (Fig. 12). It is a time that is par­tic­u­lar­ly dan­ger­ous for dri­vers, due to the fact that nei­ther of the reti­na sys­tems is 100% func­tion­al then.  Bright­ness, which con­di­tions acti­va­tion of cer­tain vision sys­tems, is expressed using units known as can­de­las per square meter. With­out going into details, one can­dela cor­re­sponds, more or less, to the light at twi­light, right after the sunset. 

See­ing images is pos­si­ble pri­mar­i­ly thanks to the cones which are respon­si­ble for see­ing in good light con­di­tions, and for this rea­son we will take a par­tic­u­lar­ly care­ful look at them.

Fig­ure 12. Three sys­tems of vision: sco­topic, mesopic and pho­topic with marked activ­i­ty of rods and cones with regard to var­i­ous inten­si­ties of visu­al scene illu­mi­na­tion; bright­ness is expressed in the num­ber of can­de­las per square meter [cd/m2].   Own graph­ic design based on Schu­bert (2006).

Distribution of cones within the retina 

There are approx­i­mate­ly 4.6 mil­lion cones in the reti­na of an adult per­son (Cur­cio, Sloan, Kali­na and Hen­drick­son, 1990). Their largest clus­ter is locat­ed in the spot where the reti­na is crossed by the visu­al axis. Its oth­er end, on the oth­er hand, cross­es the spot on which we focus our eyes.   The visu­al axis is inclined at approx­i­mate­ly 5° to the optic axis of the eye, run­ning through the mid­dles of all ele­ments of the opti­cal sys­tem of the eye, name­ly the cornea, the pupil and the lens (Fig. 13).

Fig­ure 13. The opti­cal axis of the eye and the visu­al axis illus­trat­ed on the hor­i­zon­tal cross-sec­tion of the right eye (top view). Graph­ic design: P.A.

The inter­sec­tion of the visu­al axis and the reti­na is a small ellip­ti­cal area with chords of approx. 1.5 mm ver­ti­cal­ly and 2 mm hor­i­zon­tal­ly and with a sur­face area of approx. 2.4 mm2 (Niżankows­ka, 2000). This area is called the mac­u­la and there are over half a mil­lion cones packed there, i.e. over 200 thou­sand /mm2. For com­par­i­son, there are only 3–4 pix­els on a 1 mm2 LCD screen with a res­o­lu­tion of 1920 x 1200. That’s 50 thou­sand times less than in the cen­tral part of the eye!

Inside the mac­u­la, there is an even small­er area of approx­i­mate­ly 1 mm², known as the fovea. Inside, in the fove­o­la, the num­ber of cones can reach up to 324 thou­sand /mm2. In adults, there are about 199,000/mm2 of them in this area (Cur­cio et al., 1990), and the fur­ther away from the fovea, there are less of them (Fig. 14).

Fig­ure 14. Aver­age den­si­ty of cones and rods on the reti­na sur­face. In the pic­tures, the den­si­ty of cones (greater) and rods (small­er) depend­ing on the dis­tance from the fovea. Own graph­ic design based on Cur­cio et al. (1990)

If it were pos­si­ble to use cones from around the mid­dle of the fovea to build a small image matrix of 36 x 24 mm for a dig­i­tal cam­era, its res­o­lu­tion would be not 12 or even 30 megapix­els, but approx. 280 Mpix! The accu­ra­cy of the image record­ed by cells locat­ed with­in the area of the fovea is unimag­in­ably high. In Fig.  15, one can see the reti­na’s sur­face near the cen­ter of the fovea, with vis­i­ble cones in the shape of blobs (Ahnelt, Kolb and Pflug, 1987). This is approx­i­mate­ly how the sur­face of the entire reti­na-screen, on which the image oppo­site the eye is pro­ject­ed, looks like. In dif­fer­ent areas, there may be only a slight­ly dif­fer­ent den­si­ty of pho­tore­cep­tors. What the brain learns about the world through the eyes is direct­ly relat­ed to the activ­i­ty of these small-screen points.

Fig­ure 15. Enlarged 900 times frag­ment of the reti­na near the cen­tre of the fovea.  Own graph­ic design based on Ahnelt, Kolb and Pflug (1987)

The sur­face of the fovea con­sti­tutes only 0.1% of the entire sur­face of the reti­na, while the sur­face of the mac­u­la — 0.3% (Młod­kows­ki, 1998). There are no rods in there, and the cones locat­ed in the area con­sti­tute 1/8 of all cones in the reti­na.  The remain­ing 4–5 mil­lion cones are dis­trib­uted on 99.7% of the reti­na sur­face around the mac­u­la.  It means that there are approx. 7 thou­sand of them per 1 mm² of the reti­na, beyond the mac­u­la (Hofer, Car­roll and Williams, 2009). This is still quite a lot, but there are almost thir­ty times few­er cones on most of the reti­na than in the fovea.

As it is easy to guess, the direct con­se­quence of the described dis­tri­b­u­tion of pho­tore­cep­tors on the reti­na is that depend­ing on the place of its light­ing, the pro­ject­ed image is processed with a dif­fer­ent spa­tial res­o­lu­tion. In oth­er words, the brain derives much more data that allow it to more clear­ly recon­struct the image of the visu­al scene from sites with a high­er den­si­ty of pho­tore­cep­tors, than on the basis of data from those areas of the reti­na that are poor­er in photoreceptors.

A few words about rods

As has already been high­light­ed, in addi­tion to the cones, there are oth­er pho­tore­cep­tors, i.e. rods, in the reti­na of the human eye. Their num­ber ranges from 78 to 107 mil­lion depend­ing on the per­son (on aver­age around 92 mil­lion). There are there­fore 20 times more rods than cones (Cur­cio et al., 1990). This means that the reti­na is much less “hard­ware” equipped to see the world in col­ors under full light­ing than in a mono­chrome way and in the dark. It is prob­a­bly a rem­nant of our preda­to­ry ances­tors, who did not real­ly care about see­ing the world in col­ors and def­i­nite­ly pre­ferred hunt­ing at night rather than dur­ing the day. Well, we inher­it­ed col­or vision from our ape ances­tors, who pre­ferred to eat dur­ing the day, care­ful­ly look­ing at the col­or of a banana or man­go peel. This was cru­cial, at least as regards indigestion.

There are no rods in the fovea and the first ones occur only in the regions of the mac­u­la. The fur­ther away from the mac­u­la, the more rods there are. The largest num­ber of rods is locat­ed about 20° from the fovea and is com­pa­ra­ble to the num­ber of cones in the mac­u­la, i.e. approx. 150k /mm2 (Fig. 14). Mov­ing fur­ther in the direc­tion of the periph­er­al reti­na, the num­ber of sta­mens grad­u­al­ly decreas­es and there are half as many sta­mens on the edges of the reti­na, i.e. approx. 75k/mm2.

Such a dis­tri­b­u­tion of rods caus­es that in poor light­ing con­di­tions, we can see some­thing rel­a­tive­ly clear­ly not look­ing direct­ly “out of the cor­ner of the eye”, but more pre­cise­ly, mov­ing the cen­tre of the opti­cal sys­tem of the eye by approx. 20 angu­lar degrees from the place we want to see pre­cise­ly. Only then will the image pro­ject­ed on the reti­na be inter­pret­ed with the high­est pos­si­ble resolution.

Retinal hole

Final­ly, I would like to make a few more points con­cern­ing one struc­tur­al detail of the reti­na. Approx­i­mate­ly 15° away from the fovea, in the nasal reti­na of each eye, there is a lit­er­al “hole” with a diam­e­ter of 1.5 mm and a sur­face of approx. 1.2 mm2. This area is called the blind spot or the optic disc, and it con­tains no pho­tore­cep­tors. It hous­es the optic nerve, which trans­mits sig­nals con­cern­ing the state of pho­tore­cep­tor stim­u­la­tion to the brain, and blood ves­sels nec­es­sary for the oxy­gena­tion of cells inside the eye. In this area, the image pro­ject­ed onto the bot­tom of the eye hits empti­ness. This is very easy to dis­cov­er your­self. All you need to do is close your right eye and look at one of the cross­es on the right-hand side of Pic­ture 16 with your left eye, and then slow­ly approach and move away from it. At a cer­tain angle, you will find that the cir­cle locat­ed on the left-hand side becomes invis­i­ble, and the break in the line dis­ap­pears. This occurs due to the fact that the image of the point or break in the line is pro­ject­ed onto the blind spot.

Fig­ure 16. Blind spot test for the left eye reti­na. Own graph­ic design based on Hur­vich (1981)

Specialized ganglion cells

The bot­tom-up analy­sis of visu­al scene con­tent in the ear­ly stages of the visu­al path­way is pos­si­ble thanks to the pres­ence of not only pho­tore­cep­tors but also dif­fer­ent types of neu­rons in the reti­na of the observer’s eyes, of which the afore­men­tioned gan­glion cells play a par­tic­u­lar­ly impor­tant role. They are spe­cial­ized in the pro­cess­ing of data con­cern­ing (1) the wave­length of vis­i­ble light, which is the basis for col­or vision, (2) the con­trast of light bright­ness, which enables us to see, among oth­er things, the edges of things or parts of things, i.e. shapes in gen­er­al, (3) the vari­abil­i­ty of light­ing in time, which is the basis for see­ing move­ment, and (4) the spa­tial res­o­lu­tion, which lies at the root of the visu­al acuity. 

Among the many types of gan­glion cells, it is pos­si­ble to iden­ti­fy those that are par­tic­u­lar­ly sen­si­tive, e.g. to the wave­length of vis­i­ble light cor­re­spond­ing to green col­or. This means, more or less, that if these gan­glion cells send nerve impuls­es towards the cere­bral cor­tex, like in the Morse code, then the observ­er sees some­thing green. When the cells respon­si­ble for move­ment detec­tion are acti­vat­ed, the brain “learns” that some­thing is chang­ing before the eyes of the observ­er, although on the basis of this infor­ma­tion it does not yet “know” whether it is mov­ing in the visu­al scene, the observ­er is mov­ing, or both. It will “find out” about it as well, how­ev­er, by analysing data from oth­er sens­es. Suf­fice it to say that see­ing one prop­er­ty or anoth­er of a pic­ture direct­ly results from the con­di­tion of neur­al trans­duc­ers and trans­mit­ters of sen­so­ry data. Dam­age to them may mean that a cer­tain prop­er­ty of the paint­ing may go unno­ticed, as if it was not there.

Upon tak­ing a clos­er look at the anato­my of gan­glion cells it turns out they are essen­tial­ly divid­ed into three groups. The first group con­sists of midget gan­glion cells, with small bod­ies and a rel­a­tive­ly small num­ber of branch­es, i.e. den­drit­ic trees and an axon. The sec­ond group con­sists of para­sol gan­glion cells, with large bod­ies and a much greater num­ber of branch­es (Fig. 17). The third group con­sists of bis­trat­i­fied gan­glion cells, with tiny bod­ies and dis­pro­por­tion­ate­ly large branch­es com­pared to their body size, though still con­sid­er­ably small­er than the branch­es of para­sol gan­glion cells or even midget gan­glion cells (Dacey, 2000).

Fig­ure 17. Bod­ies and branch­es of midget gan­glion cells and para­sol gan­glion cells depend­ing on the dis­tance from the radi­al fos­sa. Own graph­ic design based on Wan­dell (1995)

Although midget gan­glion cells are small­er than para­sol gan­glion cells, their mag­ni­tude, and espe­cial­ly the num­ber and spread of den­drit­ic trees, which allow them to receive nerve impuls­es from oth­er cells, depends on their dis­tance from radi­al fos­sa. The clos­er they are to radi­al fos­sa, the small­er they both are.

Apart from den­drites, each neu­ron also has an axon, which is an off­shoot that trans­mits nerve impuls­es from the body of a giv­en cell to anoth­er. It is the axons of all three types of gan­glion cells that make up a fun­da­men­tal frame of an optic nerve. It con­sists of approx­i­mate­ly 1 mil­lion (from 770 thou­sand to 1.7 mil­lion) axons of gan­glion cells (Jonas, Schmidt, Müller-Bergh, Schlötzer-Schre­hardt et al., 1992) and some­what resem­bles an elec­tri­cal cord made up of cop­per wires.

The num­ber of axons of the three men­tioned types of gan­glion cells in an optic nerve is not the same. The major­i­ty, approx­i­mate­ly 80%, are the axons of midget gan­glion cells, and only approx­i­mate­ly 10% are the axons of para­sol and bis­trat­i­fied gan­glion cells. The fibres of small cells (i.e. midget and bis­trat­i­fied gan­glion cells) there­fore con­sist of approx­i­mate­ly 90% of all axons form­ing the optic nerve. It means that, for some rea­son, data trans­mit­ted by small­er gan­glion cells are more impor­tant for the brain than data trans­mit­ted to it via larg­er (para­sol) gan­glion cells.

Gan­glion cell bod­ies are locat­ed in the reti­na of the eye. Their den­drites receive data from pho­tore­cep­tor via dif­fer­ent cells, but those will be dis­cussed lat­er. In any case, due to a small­er num­ber of den­drites, midget gan­glion cell con­nect with a much small­er num­ber of pho­tore­cep­tors and oth­er cells in the reti­na than para­sol gan­glion cells which have large den­drit­ic trees. What is impor­tant, how­ev­er, is that small gan­glion cells (midget gan­glion cells and bis­trat­i­fied gan­glion cells) con­nect main­ly with the pho­tore­cep­tors locat­ed in the cen­tral part of the reti­na. They are there­fore much more sen­si­tive to the spa­tial res­o­lu­tion of the reti­na’s light­ing than para­sol gan­glion cells. Thanks to midget gan­glion cells we can very accu­rate­ly dif­fer­en­ti­ate between shapes of dif­fer­ent objects. The only issue is that their great­est clus­ter cov­ers a rel­a­tive­ly small area of reti­na and as a result – a small field of vision.

Anoth­er prop­er­ty of small gan­glion cells is their sen­si­tiv­i­ty to light wave­lengths. More than 90% of them spe­cialis­es in that field, orig­i­nat­ing the process­es of vision and col­or dif­fer­en­ti­a­tion. Almost all midget gan­glion cells per­fect­ly dif­fer­en­ti­ate elec­tro­mag­net­ic wave­lengths cor­re­spond­ing to green and red col­ors, but they per­form much worse when it comes to the oppo­si­tion of yel­low and blue. How­ev­er, this task is per­formed by bis­trat­i­fied gan­glion cells. They are the ones that play an essen­tial role in the pro­cess­ing of data relat­ed to dif­fer­en­ti­a­tion of yel­low and blue col­ors (Dacey, 2000).

As opposed to small gan­glion cells, para­sol gan­glion cells do not dif­fer­en­ti­ate light wave­lengths, but are much more sen­si­tive to edge detec­tion between planes of sim­i­lar bright­ness than midget gan­glion cells. They are able to reg­is­ter a 1–2 per­cent dif­fer­ence in the bright­ness of jux­ta­posed sur­faces, and 10–15 per­cent dif­fer­ences in bright­ness are reg­is­tered with­out any prob­lem (Shap­ley, Kaplan and Soodak, 1981).

Midget gan­glion cells require a much greater dif­fer­ence in the bright­ness of jux­ta­posed planes in order to reg­is­ter it. More­over, para­sol gan­glion cells cov­er a much larg­er sur­face of the reti­na than midget gan­glion cells. Both these prop­er­ties of large gan­glion cells per­fect­ly com­ple­ment the lim­i­ta­tions of small cell in terms of edge detec­tion of things in the observer’s field of vision out­side fovea and in regards to spa­tial rela­tions between them.

There is anoth­er impor­tant dif­fer­ence between small and large cells. Large cells have axons much thick­er than those of small cells and that is why they send nerve impuls­es two times faster, i.e. at speed of approx­i­mate­ly 4m/s, than midget cells. That prop­er­ty of para­sol cells is of key impor­tance for the detec­tion of changes in the light­ing of reti­na, which allows the observ­er to recog­nise move­ment. Visu­al move­ment detec­tion of an object (as well as the move­ment of the observ­er) is tan­ta­mount to the same or sim­i­lar light and shad­ow arrange­ment mov­ing across the reti­na in time. The pace and direc­tion of the move­ment of an image on the reti­na is an indi­ca­tor of speed and direc­tion of movement.

While fin­ish­ing this func­tion­al char­ac­ter­is­tic of large and small neu­rons, it is worth notic­ing that, sim­i­lar­ly to the traits of almost every visu­al scene, these can also be con­nect­ed into two cat­e­gories. Midget gan­glion cells and bis­trat­i­fied gan­glion cells are espe­cial­ly sen­si­tive to col­or and spa­tial res­o­lu­tion of light which con­sti­tutes the basis for the per­cep­tion of shapes of things in a visu­al scene and dif­fer­en­ti­at­ing them from one anoth­er. It could be said that it is thanks to their activ­i­ty we have a chance to sep­a­rate the objects which form a visu­al scene from our­selves and the back­ground. It’s the most basic func­tion of vision and that is why the axons of midget gan­glion cells and bis­trat­i­fied gan­glion cells are so numer­ous­ly rep­re­sent­ed in the optic nerve.

Para­sol gan­glion cells, on the oth­er hand, thanks to high speed of sig­nal trans­mis­sion and much greater sen­si­tiv­i­ty to dif­fer­ent shades of bright­ness in the visu­al scene than small cells, make it pos­si­ble to see the move­ment and spa­tial organ­i­sa­tion of the viewed scene, and also very effec­tive­ly sup­port the process of iden­ti­fy­ing edges of things.

Thus, the anatom­i­cal struc­ture and phys­i­ol­o­gy of the gan­glion cells result in their spe­cif­ic func­tions, and the foun­da­tions of the sub­jec­tive expe­ri­ence of view­ing the image begin to emerge from them. We view the images the way we do because this is what our bio­log­i­cal hard­ware is like, not due to the fact that they are like this.

From the retina to the lateral geniculate nucleus

The first struc­ture in the brain to receive infor­ma­tion from the reti­na of the eyes is the lat­er­al genic­u­late nucle­us (LGN) locat­ed in thal­a­mus. Already in the 1920s, Mieczysław Minkows­ki, a Swiss neu­rol­o­gist of Pol­ish ori­gin, dis­cov­ered that axons of small and large gan­glion cells con­nect with LGN in a sur­pris­ing­ly order­ly man­ner (Valko, Mumen­thaler and Bas­set­ti, 2006).

The struc­ture of LGN resem­bles a bean grain, at the inter­sec­tion of which six dis­tinct­ly sep­a­rat­ed lay­ers of neu­rons char­ac­terised by two dif­fer­ent sizes are revealed (Fig. 18).

Dark­er lay­ers of cells num­bered 1 and 2 receive nerve impuls­es through the axons of para­sol gan­glion cells. Since the cell bod­ies form­ing these lay­ers in the lat­er­al genic­u­late nucle­us [LGN] also have rel­a­tive­ly large sizes, these lay­ers are called the mag­no­cel­lu­lar lay­ers, M‑type (mag­no) or Y‑type cells.

Fig­ure 18. Lat­er­al genic­u­late nucle­us (LGN) cross-sec­tion with marked lay­ers: 1 and 2 — mag­no­cel­lu­lar (red arrows indi­cat­ing dark stripes), 3–6 — par­vo­cel­lu­lar (white arrows indi­cat­ing dark stripes) and 7 — konio­cel­lu­lar (yel­low arrow indi­cat­ing a light strip between lay­er 2 and 3). Own graph­ic design based on Lery Borodit­sky’s lec­ture notes.

Lay­ers marked with num­bers 3–6 receive sig­nals from axons of small gan­glion cells and since they are also com­posed of cells with small bod­ies, they are called par­vo­cel­lu­lar lay­ers, P‑type (par­vo) lay­ers or X‑type lay­ers. The anato­my and func­tion­al traits of small and large gan­glion cells locat­ed in the reti­na and the cells form­ing spe­cif­ic lay­ers of the lat­er­al genic­u­late nucle­us are almost iden­ti­cal. There­fore, among researchers involved in the issue of vision, there is a con­sen­sus that umbrel­la gan­glion cells and in two lay­ers of LGN — M‑type cells are part of the so-called mag­no­cel­lu­lar path­way, while midget gan­glion cells and in four lay­ers of LGN — P‑type cells deter­mine the so-called par­vo­cel­lu­lar path­way.

In Fig. 18, the sev­enth lay­er locat­ed between the mag­no­cel­lu­lar and par­vo­cel­lu­lar lay­ers, i.e. the sec­ond and third, is also marked. In this lay­er there are konio­cel­lu­lar cells or oth­er­wise — K cells, which receive pro­jec­tions from gan­glion­ic konio­cel­lu­lar cells. Due to the exis­tence of this lay­er in LGN, the third visu­al path­way, the so-called konio­cel­lu­lar path­way, should be added to the two pre­vi­ous ones. Since the cells in the konio­cel­lu­lar path­way con­sti­tute only about 10% of all gan­glion cells and because they per­form anal­o­gous func­tions to those per­formed by midget gan­glion cells, the konio­cel­lu­lar path­way is treat­ed as part of the pathway.

Sum­maris­ing the prop­er­ties of the mag­no­cel­lu­lar and par­vo­cel­lu­lar path­ways of vision, it is worth to take a look at the fol­low­ing list.

Char­ac­ter­is­ticsMag­no­cel­lu­lar pathwayPar­vo­cel­lu­lar pathway
The size of the gan­glion cell bodylargesmall
Size of the recep­tive field of gan­glion cellslargesmall
Speed of a nerve impulse transmissionfastslow
Num­ber of axons in optic nerve and tractsmalllarge
Col­or differentiationnoyes
Con­trast sensitivitysmalllarge
Spa­tial resolutionsmalllarge
Time res­o­lu­tion and move­ment sensitivitylargesmall
Sen­si­tiv­i­ty to dif­fer­en­ti­a­tion of bright­ness of planes lying next to each otherlargesmall

Breakdown of the layers in LGN according to the optic chiasm 

LGN lay­ers are divid­ed not only into konio­cel­lu­lar, mag­no­cel­lu­lar and par­vo­cel­lu­lar, but also into right and left ones. Like most cere­bral struc­tures, LGN is an even body, i.e. it is found on both the right and left side of the brain. In pri­mates, includ­ing humans, there is an afore­men­tioned optic chi­asm (Fig. 19) between reti­nas and LGN. Optic chi­asm is the area where the gan­glion cell axon bun­dles are divid­ed into two parts. This is a very clever evo­lu­tion­ary inven­tion, thanks to which the loss of one eye does not mean that any part of the brain is com­plete­ly exclud­ed from visu­al data pro­cess­ing. After the inter­sec­tion of the optic nerve bun­dles, LGN, lying on the same side as the eye, receives data from the exter­nal (tem­po­ral) part of the reti­na of the eye and data from the paranasal part of the sec­ond eye retina.

As a result of the divi­sion of optic nerve axons into two parts, LGN lay­ers marked in Fig. 18 with num­bers 1, 4 and 6 receive sig­nals from the eye on the oppo­site side of the head and lay­ers 2, 3 and 5 from the eye on the same side of the head as LGN. The same rule applies to both LGN struc­tures on the right and left side of the head.

Fig­ure 19. The dia­gram of right and left eye reti­nal con­nec­tions with right and left hemi­sphere LGN through gan­glion cell axons form­ing the optic nerve (to the optic chi­asm) and the optic tract between the chi­asm and LGN. Graph­ic design: P.A.

It is worth remem­ber­ing that LGN per­forms the func­tion of organ­is­ing sen­so­ry data relat­ed to var­i­ous fea­tures of the visu­al scene. Hav­ing famil­iarised our­selves with an almost chaot­ic tan­gle of den­drites and axons of numer­ous cells that make up the eye reti­na, from this point on it is much eas­i­er to find out which cords are the source data con­cern­ing col­ors, edges of things, their move­ment and spa­tial organ­i­sa­tion. More­over, their order­ing allows to pre­dict not only from which side of the body they come, but also from which part of the eye. See­ing is a major logis­ti­cal under­tak­ing for the brain and there­fore good data organ­i­sa­tion is the basis for suc­cess, i.e. cre­at­ing an accu­rate rep­re­sen­ta­tion of the visu­al scene.

Optic radiation 

The last stage of the ear­ly path­way for visu­al data pro­cess­ing ends with the so-called optic radi­a­tion. Thanks to optic radi­a­tion, data on the dis­tri­b­u­tion of light in the visu­al scene is deliv­ered to the visu­al cor­tex in the occip­i­tal lobe, or more pre­cise­ly to the so called cal­carine sul­cus, locat­ed on the inner side of the lobes. Radi­a­tion, in fact, is a band of cell axons, whose bod­ies form indi­vid­ual lay­ers in LGN.  It takes its orig­i­nal name from a spe­cif­ic sys­tem of fibres, which are dis­trib­uted among tight­ly packed struc­tures lying under­neath the cere­bral cor­tex in a large curve (Fig. 20).

Fig­ure 20. Optic radi­a­tion — con­nec­tion between the indi­vid­ual LGN lay­ers and the pri­ma­ry visu­al cor­tex in the occip­i­tal lobe. On the left side, there is a post mortem anatom­i­cal brain image, on the right side there is a pic­ture made with the use of a trac­tog­ra­phy tech­nique. Own graph­ic design based on Wan­dell and Winaw­er (2011)

Phrenological point of view on the functions of the cerebral cortex in the process of seeing

Still at the end of the 19th cen­tu­ry, phrenol­o­gy was the dom­i­nant neu­ropsy­cho­log­i­cal con­cept.  In accor­dance with the assump­tions of its cre­ator, a Ger­man physi­cist Franz Josef Gall, there is a close link between the anatom­i­cal struc­ture and the loca­tion of var­i­ous struc­tures of cere­bral cor­tex, and the men­tal func­tions car­ried out by them, that is — the mind. Even though at present phrenol­o­gy is regard­ed as pseu­do­science, it was, in fact, an extra­or­di­nar­i­ly accu­rate intu­ition, which under­lies the con­tem­po­rary neu­ro­science (Fodor, 1983).

Due to the lack of appro­pri­ate diag­nos­tic tools and method­ol­o­gy of neu­ropsy­cho­log­i­cal stud­ies, phre­nol­o­gists quite freely asso­ci­at­ed psy­cho­log­i­cal func­tions with var­i­ous parts of cere­bral cor­tex. With regard to vision, how­ev­er, they had no doubts that it was con­duct­ed by the frontal, supra- and peri­or­bital cor­tex.  The whole part of the cor­tex was defined as per­cep­tu­al and its dif­fer­ent parts were assigned the func­tions of view­ing: form (field 23), size (field 24), col­or­ing (field 26), as well as weight (field 25). In the tem­po­ral areas of the orbital cav­i­ties they locat­ed func­tions relat­ed to per­cep­tion of num­bers (field 28) and order cal­cu­la­tion (field 29), and in the post-peri­or­bital cor­tex — lan­guage (field 33). In the frontal struc­tures the imi­ta­tion has also found its loca­tion (field 21) (Fig. 21).

Phre­nol­o­gists did not even pre­sume that see­ing is a process which involves the struc­tures of the brain fur­ther­most from the cere­bral cor­tex, name­ly the occip­i­tal love. They would rather attribute such func­tions as love, fer­til­i­ty, parental love and love for chil­dren in gen­er­al (philo­prog­en­i­tive­ness, field 2), friend­ship and attach­ment adhe­sive­ness, field 4), love for one’s home and home­land (inhab­i­ta­tive­ness, field 3a) as well as abil­i­ty to con­cen­trate the atten­tion, espe­cial­ly in intel­lec­tu­al tasks (con­ce­tra­tive­ness, field 3). 

Func­tions attrib­uted by phre­nol­o­gists to the tem­po­ral lobes, which — as we already know today — also play an impor­tant role in see­ing. They would rather locate in them the basis for com­bat­ive­ness (field 5), destruc­tive­ness (field 6), and secre­tive­ness (field 7). 

The results of the stud­ies in the field of neu­ro­science revealed com­plete­ly dif­fer­ent loca­tion of the struc­tures respon­si­ble for var­i­ous men­tal func­tions, par­tic­u­lar­ly for vision, than in phrenology.

Fig­ure 21. A page from Web­ster’s Aca­d­e­m­ic Dic­tio­nary from 1895 with an illus­tra­tion and descrip­tion of the cor­ti­cal loca­tions of men­tal functions.

Visual cortex in neuroscience

Sen­so­ry data pro­cess­ing in the upper regions of the visu­al path­way is per­formed by approx. 4–6 bil­lion neu­rons in the occip­i­tal, pari­etal, tem­po­ral and even frontal lobe. Over­all, the data record­ed by pho­tore­cep­tors in the reti­na of the eyes involve about 20% of the total sur­face area of the human cere­bral cor­tex (Wan­dell, Dumoulin and Brew­er, 2009). Since occip­i­tal lobes play a par­tic­u­lar­ly impor­tant role in the visu­al process, this area is also referred to as visu­al cor­tex (Fig. 22).

Fig­ure 22. Visu­al cor­tex and oth­er cor­ti­cal struc­tures involved in vision. Graph­ic design: P.A. based on Logo­thetis (1999) and Zeki (2003)

Due to its anatom­i­cal struc­ture, visu­al cor­tex can be divid­ed into two parts: stri­ate cor­tex and extras­tri­ate cor­tex. Stri­ate cor­tex is locat­ed at the very end of the occip­i­tal lobe, in its medi­al part, in the area called the cal­carine sul­cus (Fig. 23 A). Accord­ing to the clas­si­fi­ca­tion of brain areas pro­posed in 1907 by the Ger­man neu­rol­o­gist, Korbin­ian Brod­mann, this is area 17. Apart from the cal­ca­line sul­cus, it also involves a part of the out­er occip­i­tal lobe (Fig. 23B). Area 17 is also referred to as pri­ma­ry visu­al cor­tex or V1 area (from the first let­ter of the Eng­lish word vision and to empha­sise that this is the first cor­ti­cal stage of the visu­al pathway).

In both hemi­spheres of the brain, there are approx. 300 mil­lion neu­rons in the V1 area, which is 40 times more than the num­ber of neu­rons in LGN (Wan­dell, 1995). Nerve impuls­es trav­el to V1 along the optic radi­a­tion, i.e. along the axons of the cells whose bod­ies are locat­ed in LGN.

Fig­ure 23. Cal­carine sul­cus with marked area 17, i.e. stri­ate cor­tex (V1) from the side: A — inner, left occip­i­tal lobe, and B — out­er, right occip­i­tal lobe. Own graph­ic design based on Hor­ton and Hoyt (1991)

The remain­ing parts of the visu­al cor­tex are referred to as extras­tri­ate cor­tex. It basi­cal­ly involves Brod­man­n’s areas: 18 and 19, or in accor­dance with anoth­er nota­tion, the areas:  V2V3V3AV4 and V5 (Fig. 22). Cor­ti­cal struc­tures found in all lobes of the brain are also involved in visu­al data analy­sis: in the pari­etal lobe, e.g. V7 or intra­pari­etal sul­cus (IPS), in the tem­po­ral lobe, e.g. infe­ri­or tem­po­ral cor­tex (ITC) or supe­ri­or tem­po­ral sul­cus (STS), and in the frontal lobe, e.g. the frontal eye field (FEF).

In order to under­stand what the brain does with the light stim­u­lat­ing pho­tore­cep­tors in the reti­na of the eyes, it is nec­es­sary to analyse the struc­ture and func­tions of all the cor­ti­cal parts of the brain involved in vision.

Where do the stripes on the V1 cortical surface come from?

Stri­ate cor­tex derives its name from the appear­ance of its sur­face, which resem­bles a bit of a zebra col­or­ing (Fig.  24). Dark­er stripes vis­i­ble on its sur­face were formed by cytochrome oxi­dase (COX) tech­nique. They receive sig­nals from the eye on the oppo­site side of the head than a giv­en part of the V1 cor­tex area (LeV­ay, Hubel and Wiesel, 1975; Xin­jich and Hor­ton, 2002). As we remem­ber, after the optic chi­asm, the indi­vid­ual lay­ers of cells in the right and left LGN receive sig­nals from both the eye on the same and oppo­site side of the head. It is sim­i­lar in the V1 cor­tex area. Nerve impuls­es from the eyes on both sides of the head trav­el to its both parts, i.e. from the right and left side of the brain.

Fig­ure 24. Pic­ture of the sur­face of the stri­ate cor­tex of the left occip­i­tal lobe; top view after plac­ing the cal­carine sul­cus on the flat sur­face and stain­ing it with cytochrome oxi­dase tech­nique; dark­er stripes indi­cate the posi­tion of those cells which receive sig­nals from the eye on the oppo­site side of the head, i.e. on the right side of the head. Own graph­ic design based on Sin­ci­ch and Hor­ton (2002)

In order to under­stand the ori­gin of the stripes on pri­ma­ry cor­ti­cal sur­face, it is nec­es­sary to look into its inside by cut­ting it cross­wise (Fig. 25). At first glance, there are only three stripes on the cross-sec­tion of the visu­al cor­tex, two slight­ly lighter and one dark­er in between. The dark­er one is called the stria of Gen­nari, named after its dis­cov­er­er, Ital­ian med­ical stu­dent Francesco Gen­nari. No oth­er details of the inter­nal struc­ture of the pri­ma­ry visu­al cor­tex can be seen with­out the microscope.

Fig­ure 25. Cross-sec­tion of the stri­ate cor­tex through the cal­carine sul­cus. A – pic­ture with clear­ly vis­i­ble bright band of stri­ate cor­tex with dark­er stria of Gen­nari and B — schemat­ic rep­re­sen­ta­tion of pic­ture A with marked V1 cor­tex range. Own graph­ic design based on Andrews, Halpern and Purves (1997)

How­ev­er, if we look at it under the micro­scope — and espe­cial­ly after the pre­vi­ous stain­ing of the cells locat­ed there — then we will see some­thing that resem­bles an order­ly struc­ture of lay­ers in LGN. The cells in V1 cor­tex area are arranged in 6 hor­i­zon­tal lay­ers, marked with Roman numer­als, from I to VI. The widest lay­er is lay­er IV, which is divid­ed into four thin­ner lay­ers: IVA, IVB, IVCα and IVCβ (Fig. 26).

The dark stripe of Gen­nari, vis­i­ble in Fig. 25, is a part of lay­er IV, or more pre­cise­ly the area where the con­nec­tions between lay­ers IVB and IVCα are locat­ed. Lay­er IV serves as an entrance gate to the visu­al cor­tex of the brain. It is through it, and espe­cial­ly through the IVCα and IVCβ lay­ers, that nerve impuls­es from LGN, and ear­li­er from the reti­na of both eyes, reach the pri­ma­ry visu­al cor­tex by means of the optic radi­a­tion. Sen­so­ry data is sent to cells in oth­er lay­ers of the V1 cor­tex area only from lay­er IV.

Fig­ure 26. Cross-sec­tion of the macaque stri­ate cor­tex with stained cell lay­ers using the method: A — Nissl and B — Gol­gi. A col­ored band at the lev­el of IVB and IVCα lay­ers indi­cates the loca­tion of the stria of Gen­nari. Own graph­ic design based on Lund (1973)

Islamic architecture of V1 cortex

Anoth­er curios­i­ty relat­ed to the struc­ture of the pri­ma­ry visu­al cor­tex is that the cells that make up its indi­vid­ual lay­ers are joined togeth­er to form clear columns. They are well vis­i­ble in the image tak­en with the Gol­gi method in Fig. 26. Den­drit­ic trees of the cells in lay­er I are like the col­umn heads, the base of which are the cells in lay­er VI, and the stem — the cells in lay­ers II to V.

Look­ing at the striped sur­face of pri­ma­ry cor­tex in Fig. 24, we can only see the heads of the indi­vid­ual columns. How­ev­er, if the columns of cells that receive nerve impuls­es from only one eye are stained, then all the heads of these columns will also assume this col­or, and on the cor­ti­cal sur­face we will see dark­er stripes (Fig. 27). They indi­cate where in the visu­al cor­tex there are rows of cells that receive and process sig­nals from the right or left eye. They cre­ate some kind of colon­nades, such as the ones we can admire in the Great Mosque of Cor­do­ba. Right next to one colon­nade there is anoth­er row of neu­ronal columns, receiv­ing sig­nals from the sec­ond eye, and so on.

Fig­ure 27. Neur­al colon­nades in the stri­ate cor­tex, receiv­ing sig­nals from the right and left eye. Graph­ic design: P.A.

Cell orientation

The dis­cov­ery of the next prop­er­ty of the pri­ma­ry visu­al cor­tex end­ed with the award of the Nobel Prize in 1981 to two neu­ro­phys­i­ol­o­gists David Hubel and Torsten Wiesel. While exper­i­ment­ing with feline visu­al cor­tex they found that columns of cells, which react to the data incom­ing from one eye, stand­ing beside one anoth­er, are spe­cialised in terms of spa­tial ori­en­ta­tion of the frag­ments of objects’ edges record­ed by the eye.  It means that with­in cor­tex V1, cell columns are ordered in two planes. 

One plane is defined by the cells react­ing inde­pen­dent­ly from one anoth­er to data incom­ing from the right and left eye. In turn, the oth­er plane is locat­ed across the first one and com­pris­es cells which react to var­i­ous angles, at which the edge frag­ment of a giv­en object is seen, regard­less of which eye record­ed it.   Describ­ing this aston­ish­ing struc­ture of the pri­ma­ry visu­al cor­tex, Hubel and Wiesel (1972) pro­posed a func­tion­al mod­el of cell col­umn arrange­ment in V1, known as the ice-cube mod­el (Fig. 28).

Fig­ure 28. The func­tion­al ice-cube mod­el illus­trat­ing the sen­si­tiv­i­ty of neu­ron columns form­ing the pri­ma­ry visu­al cor­tex to the data incom­ing from the right and left eye and the data con­cern­ing the angle of incli­na­tion of the edge record­ed in a giv­en frag­ment on the eye reti­na. Graph­ic design: P.A. based on Hubel (1988)

Of course, this mod­el should not be treat­ed lit­er­al­ly, if only because — as we saw when look­ing at the V1 cor­ti­cal sur­face — rows of neu­ron columns react­ing to sig­nals com­ing from one or the sec­ond eye are not arranged in straight lines. Nev­er­the­less, this mod­el illus­trates per­fect­ly the gen­er­al prin­ci­ple of func­tion­al order­ing of the inter­nal struc­ture of the pri­ma­ry visu­al cortex.

It seems that already at the ini­tial stage of the visu­al path­way, the image pro­ject­ed onto the reti­na of the eye is spa­tial­ly divid­ed into thou­sands of small pieces. Each of these pieces con­tains infor­ma­tion about its loca­tion on the reti­na and about the ori­en­ta­tion of the edges of the adja­cent sur­faces dif­fer­ing in light inten­si­ty. In the pri­ma­ry visu­al cor­tex, the con­tent of all these pieces is analysed and, depend­ing on the reti­na site and the ori­en­ta­tion of the reg­is­tered edges, a spe­cif­ic group of cell columns is acti­vat­ed. This is prob­a­bly how the infor­ma­tion on shapes of things is being encod­ed. In the fur­ther stages of the visu­al path­way it is used to recon­struct the objects form­ing the entire visu­al scene.

At first glance it may seem com­plete­ly irra­tional that, when a com­plete image of a visu­al scene is obtained on the reti­na, the visu­al sys­tem first decom­pos­es, and then reassem­bles it. On clos­er exam­i­na­tion, how­ev­er, we must con­clude that it would be dif­fi­cult to invent a more eco­nom­ic sys­tem for visu­al data pro­cess­ing. What needs to be done is to con­sid­er the unimag­in­able amount of data pro­vid­ed to the brain by stim­u­lat­ed cones and rods every sec­ond when our eyes are open. In oth­er words, the com­plex­i­ty and vari­abil­i­ty of visu­al scenes over time require from the visu­al sys­tem a sys­tem approach based on a num­ber of clear prin­ci­ples. They come down to the analy­sis of visu­al data due to the most impor­tant prop­er­ties of both the visu­al scene and the device that reg­is­ters them, i.e. the retina.

If count­less shapes of all things seen can be deter­mined by means of the dif­fer­ences in the bright­ness of two-dimen­sion­al planes on which the edge of the shape is out­lined, it is incom­pa­ra­bly eas­i­er to encode these shapes using a rel­a­tive­ly small num­ber of cells sen­si­tive to edge ori­en­ta­tion with­in the range from 0 to 180°, than by means of an impos­si­ble to esti­mate num­ber of cells and their con­nec­tions, which would remem­ber each shape with all its appear­ances.   Although this option is “seri­ous­ly con­sid­ered” by the brain, also for eco­nom­ic reasons.

It tran­spires that even sin­gle cells can encode shapes of com­plex things, but only those that are very well known and strong­ly fixed in brain struc­tures. Obvi­ous­ly, not in cor­tex V1, only at fur­ther stages of the cor­ti­cal path­way of vision. They are the so-called grand­moth­er cells, which react selec­tive­ly to par­tic­u­lar objects or peo­ple (Gross, 2002; Kreiman, Koch and Fried, 2000; Quiroga, Kraskov, Koch and Fried (2009); Quiroga, Mukamel, Isham, Malach et al. (2008). At this point, it is worth remind­ing that the pos­si­bil­i­ty of encod­ing com­plex shapes by sin­gle cells was dis­cov­ered and described already in the 60s by an emi­nent Pol­ish neu­ro­phys­i­ol­o­gist, Jerzy Konors­ki (1967). He called these cells gnos­tic neu­rons. In accor­dance with his con­cept, they con­sti­tut­ed the high­est lev­el of pro­cess­ing the data on the shapes of the seen objects, record­ed by the gnos­tic fields, com­posed of many cells at the ear­li­er stages of the visu­al pathway. 

Return­ing to the equal­ly grand dis­cov­er­ies of Hubel and Wiesel it is worth adding that they iden­ti­fied not one, but two types of col­umn-form­ing cells in cor­tex V1.   Apart from columns of cells sen­si­tive to the angle of incli­na­tion of edges, the so-called sim­ple cells, they also found pres­ence of cells, which react to the direc­tion of move­ment. They are the so-called com­plex cells. The move­ment of objects in the visu­al scene will not be analysed with­in this mono­graph, there­fore I will not focus on this thread here. 

Retinotopic map in the striate cortex

The descrip­tion of the extra­or­di­nary prop­er­ties of the pri­ma­ry visu­al cor­tex pre­sent­ed in the pre­vi­ous chap­ter is far from com­plete. Know­ing already the inter­nal struc­ture of the V1 cor­tex, let us return again to its sur­face. And here is where the next sur­prise is await­ing us.

The rows of cell columns that receive sig­nals from the reti­na of the right and left eye reflect the order of the pho­tore­cep­tors with­in them. This is not a 1:1 rep­re­sen­ta­tion, but an algo­rithm is known to pre­dict which neu­ron columns in the pri­ma­ry visu­al cor­tex will respond to a spe­cif­ic part of the reti­na in response to stim­u­la­tion. The com­plex log­a­rithm trans­for­ma­tion was devel­oped by Eric L. Schwartz (1980) on the basis of the anatom­i­cal loca­tion of con­nec­tions between gan­glion cells that col­lect infor­ma­tion from dif­fer­ent parts of the reti­na and dif­fer­ent parts of the V1 cor­tex in the right and left occip­i­tal lobe (Fig. 29).

Fig­ure 29. A mod­el of retino­topic organ­i­sa­tion of neu­ronal columns in the pri­ma­ry visu­al cor­tex: A — dif­fer­ent col­ored parts of the reti­na sur­face and B — V1 cor­ti­cal sur­faces with the col­ors cor­re­spond­ing to the respec­tive col­ored sur­faces of the reti­na. Graph­ic design: P.A. based on Hubel (1988)

Fig­ure 29 shows that approx­i­mate­ly 30% of neu­ronal columns in the pri­ma­ry visu­al cor­tex receive data from the mac­u­la area, which cov­ers a field of vision of up to 5° (yel­low col­or). The next, third part of the neu­ronal columns in V1 is respon­si­ble for the area of the reti­na between 5° and 20° of the field of vision (red col­or) and, final­ly, the rest of the cor­ti­cal neu­rons in the area 17 (green, light and dark blue col­ors) cov­ers the remain­ing area of the reti­na, i.e. between 20° and 90° of the field of vision.

Since, as we remem­ber, most of the data trav­els along the axons of the midget gan­glion cells and bis­trat­i­fied gan­glion cells, so the so-called  par­vo­cel­lu­lar path­way, the most columns of cells receive these data in the V1 cor­tex . If we remem­ber that tiny gan­glion cells sup­port main­ly the areas of the reti­na that are locat­ed in the fovea, then it will be easy to under­stand that the num­ber of columns of cells involved in this type of data is dis­pro­por­tion­ate­ly high­er than the num­ber of columns receiv­ing data from oth­er parts of the retina.

One of the first stud­ies to record the visu­al stim­u­lus reac­tions of cells in the V1 cor­tex area was con­duct­ed by Roger Tootell and his asso­ciates on macaques (Tootell, Sil­ver­man, Switkes and de Val­ois, 1982). They demon­strat­ed the effect of reflect­ing a pat­tern of a fig­ure seen by a mon­key in the V1 cor­tex area (Fig. 30). It is worth not­ing that the clear­est rep­re­sen­ta­tion was obtained in the IVC lay­er. This is under­stand­able as it is at this lev­el that sig­nals from LGN reach V1.

Fig. 30. A- visu­al stim­u­lus, B — stained using the 2DG method (2‑deoxyglucose analy­sis) pri­ma­ry visu­al cor­tex at the lev­el of IVC lay­er.  Based on Tootell, Switkes, Sil­ver­man and Hamil­ton (1988)

The sec­ond exam­ple illus­trates even more clear­ly how much we already now about the retino­topic organ­i­sa­tion of cor­tex V1 — not only in macaques, but also in humans — and its activ­i­ty dur­ing see­ing.  In the stud­ies, to which I would like to draw atten­tion to — the reac­tion of human visu­al cor­tex was record­ed using the fMRI 7T scan­ner (Poli­meni, Fis­chl, Greve and Wald, 2010). This time, the stud­ied per­son viewed two ver­sions of the let­ter “M”, pos­i­tive and neg­a­tive, on a com­put­er screen (Fig.  31B). Its form was ade­quate­ly processed using the above-men­tioned Schwartz algo­rithm, which deter­mines the spa­tial rela­tion­ships between the image pro­ject­ed onto visu­al hemi­field and its reflec­tion on the sur­face of the cor­ti­cal sur­face (Fig. 31 A).

Fig­ure 31. A — an image pro­ject­ed on the reti­nal sur­face and expect­ed on the V1 cor­ti­cal sur­face and B — stim­uli used in the stud­ies by Jonathan R. Poli­meni and his asso­ciates (2010). Own graph­ic design based on Poli­meni, Fis­chl, Greve and Wald (2010)

By expos­ing the dis­tort­ed visu­al stim­u­lus, it was intend­ed to check to what extent neu­ronal columns, which are sen­si­tive to dif­fer­ent angles of edges formed at the con­tact point of con­trast­ing sur­faces, would accu­rate­ly reflect the form of the let­ter “M”. The activ­i­ty of neu­rons in the V1 cor­tex area was mea­sured dur­ing the visu­al fix­a­tion by the respon­dents on the mid­dle red spot and it was found that at the IVC lay­er lev­el the image of the let­ter “M” was fair­ly well reflect­ed (Fig. 32 C; blue shape).

Fig­ure 32. Neu­ronal activ­i­ty in dif­fer­ent lay­ers of the human visu­al cor­tex V1 from the deep­est (A) to the super­fi­cial (E), dur­ing the pre­sen­ta­tion of the let­ter “M” processed by Schwartz algorithm

I believe we do not have to con­vince any­one what huge oppor­tu­ni­ties arise from these dis­cov­er­ies in terms of restor­ing vision in peo­ple who lost it as a result of an ill­ness or a mechan­i­cal dam­age to the eye reti­nas.  Instead of the data sent by the pho­tore­cep­tors locat­ed in the reti­nas of the eye to V1 cor­tex, sig­nal can be sent to them direct­ly from the video cam­era. After pro­cess­ing it using the Schwartz algo­rithm and stim­u­lat­ing cells in the pri­ma­ry visu­al cor­tex with elec­trodes, a visu­al expe­ri­ence can be cre­at­ed. There is a lot of evi­dence indi­cat­ing that it soon will be possible.

Some more information about blobs and colors

While dis­cussing the struc­ture and the func­tions of the pri­ma­ry visu­al cor­tex, we can­not omit one oth­er ele­ment.  Between cell columns, which sep­a­rate­ly receive infor­ma­tion on the spa­tial ori­en­ta­tion of edge frag­ments from both eyes, there are small cylin­dri­cal clus­ters of neu­rons play­ing an impor­tant role: among oth­er things, in col­or per­cep­tion (Liv­ing­stone and Hubel, 1984). They can be found in basi­cal­ly all lay­ers of the visu­al cor­tex, except for lay­er IV (Fig. 33).

Fig­ure 33. Mod­i­fied ver­sion of the mod­el of orig­i­nal visu­al cor­tex pro­posed by David Hubel and Torsten N. Wiesel (1977) includ­ing blob cell columns in V1 cor­tex. Graph­ic design: P.A. based on Liv­ing­stone and Hubel (1984)

In the Pol­ish trans­la­tion of the Eng­lish name “blobs”, they are referred to as “kro­pel­ki” [“droplets”], “plam­ki” [“spots”], and even “kleksy” [“blotch­es”] (Matthews, 2000), while the orig­i­nal name of this struc­ture, dis­cov­ered by Jonathan C. Hor­ton and David Hubel (1981) i.e. “łaty” [“patch­es”] is used less fre­quent­ly. These names derive from the fact that after dis­sect­ing V1 cor­tex lon­gi­tu­di­nal­ly and stain­ing it using the cytochrome oxi­dase method slight­ly irreg­u­lar­ly dis­trib­uted blobs between the cell columns, react­ing to the ori­en­ta­tion of the edges of the seen objects, can be observed on its sur­face. 34).

Fig­ure 34. A pho­to­graph of the V1 cor­ti­cal sur­face with marked stripes and blobs, indi­cat­ing the loca­tion of the neu­ronal columns sen­si­tive to col­ors. Own graph­ic design based on Blas­del (1992a)

Inputs and outputs of V1 cortex

Wrap­ping up the jour­ney into the pri­ma­ry visu­al cor­tex, it is worth order­ing the knowl­edge regard­ing its con­nec­tions with oth­er struc­tures of the brain. It is, first and fore­most known, that most data from the LGN reach V1 cor­tex via a wide tract, also known as optic radi­a­tion. The axons of cells form­ing optic radi­a­tion are con­nect­ed with the cells of V1 cor­tex at the lev­el of lay­er VI. These axons form three visu­al path­ways: mag­no­cel­lu­lar, par­vo­cel­lu­lar and koniocellular.

The mag­no­cel­lu­lar path­way is con­nect­ed with V1 cor­tex at the lev­el of IVCα lay­er, and it con­veys the data con­cern­ing bright­ness of visu­al scene frag­ments. They are not encod­ed with res­o­lu­tion as high as in the case of the data flow­ing through the par­vo­cel­lu­lar path­way, but they encom­pass much larg­er areas of the watched scene. These data con­sti­tute a basis for glob­al, spa­tial organ­i­sa­tion of objects locat­ed in a framed scene. More­over, the mag­no­cel­lu­lar path­way con­veys sig­nals which con­sti­tute a basis of see­ing move­ment. Both the per­cep­tion of glob­al spa­tial organ­i­sa­tion with­in a visu­al scene and the per­cep­tion of its vari­abil­i­ty are essen­tial for out ori­en­ta­tion in space. The data com­ing from the IVCα lay­er through the mag­no­cel­lu­lar path­way reach lay­er IVB, neigh­bour­ing it.  From that area, they will be sent in two direc­tions: to V2 cor­tex and V5 cor­tex, also known as the medi­al tem­po­ral area (MT), which spe­cialis­es in pro­cess­ing data con­cern­ing move­ment in a visu­al scene (Fig. 35).

The axon-rich par­vo­cel­lu­lar path­way con­nects with V1 at the lev­el of lay­er IVCβ. It con­veys two types of infor­ma­tion: on the length and inten­si­ty of the light wave stim­u­lat­ing var­i­ous groups of cones in the reti­na, name­ly on the col­or and lumi­nance.  These data are on the one hand char­ac­terised by high spa­tial res­o­lu­tion, and on the oth­er they are lim­it­ed to a rel­a­tive­ly small por­tion of the field of vision.  The infor­ma­tion on the length of the light wave, par­tic­u­lar­ly in terms of the oppo­si­tion of the red and green col­or, is sent from lay­er IVCβ to clus­ters of neu­rons form­ing blobs, and from there — to V2 cor­tex.  On the oth­er hand, the infor­ma­tion on the inten­si­ty of the light­ing, which con­sti­tutes an impor­tant basis for see­ing edges, flows from lay­er IVCβ, along the columns sen­si­tive to spe­cif­ic angles of edge frag­ments (espe­cial­ly in lay­ers II and III) and is sent to V2 cor­tex from this place as well.

Fig­ure 35. Mag­no­cel­lu­lar, par­vo­cel­lu­lar and konio­cel­lu­lar path­ways inputs to V1 cor­tex area from LGN, and out­puts from V1 to V2, V3, V4 and V5 (MT) cor­tex area, as well as reversibly to LGN and SC. Own graph­ic design based on mate­ri­als on Lab­o­ra­to­ry for Visu­al Neu­ro­science website

It is some­what dif­fer­ent in the case of the konio­cel­lu­lar path­way, which con­nects direct­ly with clus­ters of neu­rons form­ing cylin­dri­cal blobs in V1 cor­tex, with­out the inter­me­di­a­tion of columns in lay­er IV. Data con­cern­ing the col­or blue and yel­low are trans­mit­ted through the konio­cel­lu­lar path­way. From the blobs, they flow fur­ther towards V2 cortex.

V2 cortex

A vast major­i­ty of the data reach­ing V1 cor­tex from the reti­na through the LGN is for­ward­ed to V2 cor­tex or, accord­ing to Brod­mann clas­si­fi­ca­tion — field 18. Just like V1, it con­sists of neigh­bour­ing cel­lu­lar struc­tures of var­i­ous anatom­i­cal struc­tures. After being stained they can be eas­i­ly told apart as they form char­ac­ter­is­tic stripes (Fig.  36). We can basi­cal­ly dis­tin­guish three types of stripes in V2 cor­tex: thin stripes, thick stripes, which are stained dark, and the so-called pale stripes, which remain bright dur­ing stain­ing (Matthews, 2000) The stripes lay in the same pat­tern along the entire V2 cor­tex, that is: thin stripes — pale stripes — thick stripes — pale stripes — thin stripes — pale stripes and so on.

Fig­ure 36. Lat­er­al cross sec­tion across the stri­ate struc­ture of V2 cor­tex of a mon­key after its stain­ing with cytochrome oxi­dase. Own graph­ic design based on Hor­ton and Hock­ing (1996)

More­over, a half of the cells in V2 cor­tex reacts almost iden­ti­cal­ly to the cells in V1 (Will­more, Prenger and Gal­lant, 2010). It most prob­a­bly means that they process data con­cern­ing sim­ple fea­tures of a visu­al scene, such as parts of edges, their ori­en­ta­tion, the direc­tion of move­ment and col­ors (Sit and Miikku­lainen, 2009). These cells, how­ev­er, are not as well organ­ised in terms of topog­ra­phy as those in V1 cor­tex.  The oth­er half of V2 cells is respon­si­ble for more com­plex fea­tures of visu­al scenes.  They are, among oth­er things, active in response to illu­so­ry con­tours or con­tours defined by the tex­ture of the sur­face (von der Hey­dt, Peter­hans and Baum­gart­ner, 1984; von der Hey­dt and Peter­hans, 1989) as well as com­pli­cat­ed shapes and their ori­en­ta­tion (Hegdé and van Essen, 2000; Ito and Komat­su, 2004; Anzai, Peng and van Essen, 2007).

Each type of stripes receives dif­fer­ent data from V1, in accor­dance with the break­down of the visu­al path­ways into three cat­e­gories on the basis of their func­tions, which has been already men­tioned sev­er­al times. Thus, data from the blob areas in V1 are for­ward­ed to the cells form­ing thin stripes in V2. In these stripes, the anatom­i­cal dif­fer­ences between the cells form­ing konio­cel­lu­lar and par­vo­cel­lu­lar path­ways are fad­ing out, and inte­gra­tion of all data which con­sti­tute the basis for see­ing col­ors occurs. In oth­er words, neu­rons form­ing the thin stripe in V2 cor­tex are spe­cialised organ­is­ing data with var­i­ous lengths of the elec­tro­mag­net­ic wave, which cre­ates the entire spec­trum of the seen world. These data are then for­ward­ed both direct­ly and via V3 cor­tex to V4 cor­tex, also known as the cor­ti­cal col­or cen­tre (DeY­oe and van Essen, 1985; Shipp and Zeki, 1985; van Essen, 2004). On the basis of fMRI exam­i­na­tion results Der­rik E. Ash­er and Alyssa A. Brew­er (2009) sug­gest that in terms of col­or pro­cess­ing in V4 field, there are sig­nif­i­cant dif­fer­ences between the hemi­spheres.  It turns out that in the cor­ti­cal col­or cen­tre on the right, the neu­rons are much more sen­si­tive to chro­mat­ic stim­uli than to achro­mat­ic stim­uli, and in the V4 field, locat­ed on the left side, no dif­fer­ences in the response of the neu­rons to chro­mat­ic and achro­mat­ic stim­uli was found. 

As we remem­ber, high-res­o­lu­tion vision of shapes based on con­trasts in the lumi­nance range has its source in the data trans­mit­ted via the par­vo­cel­lu­lar path­way, which, from the colon­nades in V1, reach the pale stripes in V2. It is one of the most impor­tant struc­tures in the visu­al cor­tex, the activ­i­ty of which con­sti­tutes the basis for organ­i­sa­tion of data enabling recog­ni­tion of object shapes in visu­al scenes. Data from pale stripes in V2 are also sent to V4.

Brain struc­tures which receive pro­jec­tions from V4, name­ly the infe­ri­or tem­po­ral gyrus (IT) and the fusiform gyrus with a par­tic­u­lar area known as the area of face recog­ni­tion fusiform face area (FFA) are respon­si­ble for inte­grat­ing the data con­cern­ing shapes of the objects seen (Fig. 37).

Fig­ure 37. Col­or struc­tures involved in see­ing.  A — left hemi­sphere and B — view of the brain from the bot­tom Graph­ic design: P.A. on the basis of Yokochi, Rohen and Wein­reb (2006).

The third type of stripes in V2 cor­tex are dark and thick stripes. Two cat­e­gories of data from V1 reach this area: data on the spa­tial organ­i­sa­tion of a visu­al scene and move­ment.  Data on glob­al spa­tial organ­i­sa­tion enable both ver­i­fi­ca­tion of the pres­ence of cer­tain objects in visu­al scenes (thus, they also con­sti­tute the basis for see­ing their shapes) as well as their dis­tri­b­u­tion in rela­tion to each oth­er. Due to rel­a­tive­ly low res­o­lu­tion, these date per­form the func­tion of gen­er­al ori­en­ta­tion in space and — depend­ing on the sit­u­a­tion — they con­sti­tute an impulse to focus atten­tion and per­form more detailed analy­sis of select­ed parts of a giv­en scene by sys­tems respon­si­ble for high-res­o­lu­tion see­ing. Sig­nals cod­ing infor­ma­tion con­cern­ing the glob­al spa­tial ori­en­ta­tion are for­ward­ed to V3 cortex. 

On the oth­er hand, the sec­ond cat­e­go­ry of data flow­ing from V1 to the thick, dark stripes in V2 con­sti­tutes the basis for see­ing move­ment. From there they are led to the afore­men­tioned MT area, name­ly V5 cor­tex and fur­ther V3A, which con­sti­tute the cor­tex cen­tre of move­ment per­cep­tion (Roe and Ts’o, 1995; Shipp and Zeki, 1989) (Fig. 37).

Ventral and dorsal paths

Apart from inte­grat­ing the data incom­ing from V1, V2 cor­tex also con­sti­tutes an extreme­ly impor­tant node in the visu­al path­way, in which two new, par­tial­ly inde­pen­dent of one anoth­er, visu­al path­ways begin: the ven­tral path and the dor­sal path (Mil­ner and Goodale, 2008). Their names also derive from their loca­tion in the cere­bral cor­tex. If we imag­ine the brain as an ani­mal, for exam­ple a fish, then the struc­tures locat­ed at its top sur­face, name­ly the pari­etal lobes, would resem­ble the dor­sum, where­as the struc­tures locat­ed at the bot­tom (for exam­ple the tem­po­ral lobes) — with the abdomen (Fig.  38). Hence the names of the paths.  Just like the sub­cor­ti­cal konio­cel­lu­lar, par­vo­cel­lu­lar and mag­no­cel­lu­lar path­ways, locat­ed at the ear­li­er stages of the visu­al path­way, also both cor­ti­cal paths dif­fer from one anoth­er in terms of their anato­my and function.

Fig­ure 38. Two visu­al path­ways: the dor­sal path, type “what”, and the ven­tral path, type “where”. Graph­ic design: P.A. on the basis of Mil­ner and Goodale (2008)

The his­to­ry of dis­cov­er­ing these two path­ways dates back to the 80s of the pre­vi­ous cen­tu­ry.  In 1982, Leslie G. Unger­lei­der and Mor­timer Mishkin pub­lished the results of the stud­ies on the activ­i­ty of cere­bral cor­tex in rhe­sus macaques, dur­ing per­for­mance of tasks which required either dif­fer­en­ti­at­ing between objects or locat­ing the objects using vision.  Of course, at the time, no mag­net­ic scan­ners were used for the pur­pos­es of locat­ing the activ­i­ty of var­i­ous parts of the brain dur­ing task per­for­mance.  Hypothe­ses regard­ing the func­tion of cer­tain brain parts were test­ed by sur­gi­cal removal of the part of the mon­key’s brain, which was thought to be of any sig­nif­i­cance for prop­er per­for­mance of a giv­en task and then — obvi­ous­ly after recov­er­ing — the mon­key would per­form the task.  If it per­formed it worse or did not per­form it at all in com­par­i­son to those mon­keys whose brain remained undam­aged, it was con­clud­ed that the removed part was respon­si­ble for prop­er per­for­mance of the task. This way, Unger­lei­der and Mishkin estab­lished that if the tem­po­ral part of the brain, encom­pass­ing the sur­round­ings of V4 and IT, was dam­aged, then the mon­keys had seri­ous prob­lems with prop­er­ly recog­nis­ing the objects known to them, yet they per­formed tasks which required them to locate objects in space quite well.  On the oth­er hand, the mon­keys with dam­aged pari­etal lobes, espe­cial­ly in the area of V5/MT cor­tex, were not able to use spa­tial guide­lines, but recog­nised objects known to them eas­i­ly (Mishkin, Unger­lei­der and Macko, 1983; Unger­lei­der, 1985).

The results of the research con­duct­ed by Unger­lei­der and Mishkin (1982) were includ­ed in the con­cept accord­ing to which data reach­ing V2 cor­tex from the cen­tral parts of the reti­nas through the par­vo­cel­lu­lar and konio­cel­lu­lar path­ways are trans­mit­ted towards the tem­po­ral lobe and form the basis for iden­ti­fi­ca­tion and recog­ni­tion of objects in the visu­al scene. On the oth­er hand, the data flow­ing through the mag­no­cel­lu­lar path (that is, from the periph­er­al parts of the reti­nas) are sent towards the pari­etal lobe and con­sti­tute the basis for locat­ing objects in the scene.

The first path was con­ven­tion­al­ly named the “what”-type path, on account of the fact that acti­va­tion of brain struc­tures locat­ed along this path makes it pos­si­ble to recog­nise objects, in par­tic­u­lar their shapes and col­ors, which pro­vides an answer to the ques­tion: “what is it?”. On the oth­er hand, the sec­ond path was named the “where”-type path because the brain struc­tures that form it are active when the cog­ni­tive task requires spa­tial ori­en­ta­tion in rela­tion to objects present in a visu­al scene, which leads to answer­ing the ques­tion: “where is it located”? 

Ten years lat­er, Melvyn A. Goodale and A. David Mil­ner (1992) pro­posed an alter­na­tive con­cept of the func­tion of both visu­al path­ways. They basi­cal­ly did not ques­tion the view, accord­ing to which the ven­tral path “deals” with such prop­er­ties of visu­al scenes as the shape and col­or of objects locat­ed in it.  Accord­ing to them, the brain struc­tures which are active in the ven­tral path actu­al­ly con­sti­tute the basis for recog­nis­ing objects noticed in visu­al scenes on the basis of the record­ed sen­so­ry data con­cern­ing the dis­tri­b­u­tion of bright­ness and col­ors as well as such com­plex objects as, for exam­ple, the human face. 

The most impor­tant mod­i­fi­ca­tion of the Unger­lei­der and Mishk­i­na con­cept con­cerned the func­tion of the dor­sal path. Accord­ing to Goodale and Mil­ner, acti­va­tion of the brain struc­tures which form this path­way enables the observ­er to not only ori­ent them­selves in space and the dis­tri­b­u­tion of the seen objects in a scene, but also to con­trol their own behav­iours in the space. 

They reached such con­clu­sion while observ­ing the behav­iour of a patient, known by the ini­tials D.F., who suf­fered from the so-called visu­al agnosia, i.e. inabil­i­ty to recog­nise seen objects.  Her prob­lems result­ed from exten­sive dam­age of the brain struc­tures in the area of the ven­tral path. The most amaz­ing fact was that even though D.F. expe­ri­enced sig­nif­i­cant dif­fi­cul­ties in recog­nis­ing objects, she could use them quite well, per­form­ing accu­rate motor and move­ment tasks. It was caused by the fact that the dor­sal path, respon­si­ble for car­ry­ing out these tasks, was dam­aged in this patient.  The detailed descrip­tion of the D.F. case and the pre­sen­ta­tion of their con­cept of visu­al path­ways, was pre­sent­ed by Mil­ner and Goodale in the book pub­lished in 1995 (the Pol­ish trans­la­tion was pub­lished in 2008 on the basis of the 2nd Eng­lish issue from 2006).

To sum­marise, from Unger­lei­der and Mishk­i­na’s per­spec­tive, the dor­sal path allows the observ­er to cog­ni­tive­ly grasp the loca­tion and the spa­tial rela­tions between objects in a visu­al scene.  The observ­er resem­bles a view­er watch­ing a 3D movie in a cin­e­ma, in which space is pre­sent­ed to them in all its com­plex­i­ty, and they sys­tem­at­i­cal­ly main­tain the rela­tions between the objects seen. On the oth­er hand, in accor­dance with the sug­ges­tion of Goodale and Mil­ner, visu­al ori­en­ta­tion in space, pos­si­ble thanks to the activ­i­ty of the brain struc­tures locat­ed in the dor­sal path, is only the start­ing point in terms of con­trol­ling one’s own behav­iour in this space.  The func­tion of the dor­sal path, under­stood in such way, turns a pas­sive observ­er of a visu­al scene, in rela­tion to which they are in an exter­nal posi­tion (of a, for exam­ple, cin­e­ma lover), into an actor, with­in the watched scene (i.e. a moun­taineer). What is more, in the watched space, the actor assumes the cen­tral (ego­cen­tric) posi­tion, and all rela­tions between the observed objects are rel­a­tivized to the place they occu­py in this space as well as to the motor tasks, which are to be per­formed by the observer. 

Data integration from both visual pathways

It would be odd, if both dis­cussed visu­al path­ways oper­at­ed in com­plete iso­la­tion from one anoth­er.  The case of D.F. showed that dam­ages to the ven­tral path, result­ing in the inabil­i­ty to recog­nise objects using vision, do not dis­tort the abil­i­ty to manip­u­late the objects.  Nonethe­less, an increas­ing num­ber of data also indi­cate the fact that both path­ways com­mu­ni­cate, and the results of their activ­i­ty are inte­grat­ed, cre­at­ing a sub­jec­tive expe­ri­ence of mul­ti­fac­eted con­tact with the object. 

Mark R. Pen­nick and Rajesh K. Kana (2011) asked the stud­ied per­sons to per­form to types of tasks.  One of the tasks con­sist­ed in iden­ti­fy­ing and nam­ing the pre­sent­ed objects, and the oth­er — on deter­min­ing their posi­tion in space.  Dur­ing the per­for­mance of the­ses tasks, brain activ­i­ty of the par­tic­i­pants was record­ed using func­tion­al mag­net­ic res­o­nance imag­ing (fMRI).  In line with the expec­ta­tions it turned out that while per­form­ing the iden­ti­fi­ca­tion task the struc­tures locat­ed along the ven­tral path were par­tic­u­lar­ly active, where­as dur­ing the loca­tion task — along the dor­sal path.  The researchers, how­ev­er, were inter­est­ed in whether there are struc­tures which are active regard­less of the type of task per­formed.   It turned out that that there are sev­er­al struc­tures of the kind, which most prob­a­bly inte­grate the data from both visu­al path­ways. They are locat­ed par­tic­u­lar­ly in the frontal lobe, such as the left mid­dle frontal gyrus (LMFG) and the left pre­cen­tral gyrus (LPRCN) as well as in the pari­etal lobe: the right supe­ri­or pari­etal lob­ule (RSPL) and the intra­pari­etal sul­cus (IPS) (Fig. 37), locat­ed direct­ly beneath it.

All these struc­tures form a type of an asso­ci­a­tion net­work col­lect­ing data from var­i­ous sources, and then com­pil­ing them into a form, which can be sensed by us, not only as an expe­ri­ence of see­ing, but also as an expe­ri­ence of object exis­tence in the world we live in.  The results of the stud­ies con­duct­ed by Pen­nick and Kan con­firm the ear­li­er reports regard­ing the func­tions of the above-list­ed brain struc­tures (Buchel, Coull and Fris­ton, 1999; Clayes, Dupont, Cor­nette, Sunaert et al., 2004; Jung and Haier, 2007; Schenk and Mil­ner, 2006).

VISUAL SCENE FRAMING SYSTEM

Frame, or the range of vision

We more or less know how the reti­na screen on which the light get­ting into the eye reflects visu­al scenes is built.  We also know some­thing on the opti­cal sys­tem of the eye.  It is, there­fore, good to real­ize what shape and size of the field of vision is, name­ly — again refer­ring to the metaphor of the cam­era — what is the size of a sin­gle frame, lim­it­ed by the frames of the exposed field. 

The top­ic of the field of view (FOV) is dis­cussed in pho­tog­ra­phy a lot; it refers to the angle between the most dis­tant light points in a hor­i­zon­tal or ver­ti­cal plane on the image sen­sor, which were record­ed using a lens with a deter­mined focal length.  For exam­ple, the field of view for the focal length of 50 mm records an image on a matrix or frame with dimen­sions 24 × 36 mm with­in range of 40° in the hor­i­zon­tal plane and 27° in the ver­ti­cal plane.

The con­cept of the field of vision is anal­o­gous to the con­cept of the field of view. The field of vision means the area record­ed by a still eye in an immo­bilised head.  In the hor­i­zon­tal plane it is equal to 150–160° (60–70°) in the paranasal area and approx.  90° in the tem­po­ral area), and in the ver­ti­cal plane approx. 130° (50–60° above the visu­al axis and 70–80° below the visu­al axis) (Fig. 39).

Fig­ure 39. The scope of the field of vision in the ver­ti­cal and hor­i­zon­tal plane.  Graph­ic design: P.A.

The scope of the field of vision is sig­nif­i­cant­ly less reg­u­lar than the rec­tan­gu­lar shape of the pho­to­graph­ic matrix.  The dif­fer­ence in terms of shape and size of both fields of vision (the eye and the cam­era) is illus­trat­ed in Fig.  40. It schemat­i­cal­ly presents the shape of the reti­na of the right eye, which is pro­ject­ed on the sur­face of a sphere, with marked field of vision of a per­son sub­ject­ed to the peri­met­ric exam­i­na­tion. The aim of such exam­i­na­tion is to mark the out­line of the field of vision of a still eye in all pos­si­ble directions. 

I applied an out­line of a small-image frame of 24 mm x 36 mm onto the cen­tral part of the eye­’s reti­na, exposed in a cam­era with focal length of 50 mm (marked in orange). It cov­ers a small part of the entire field of vision.  As we remem­ber, in the dis­tance of approx.  20° of the field of vision from the fovea in each direc­tion the num­ber of cones on the reti­na rad­i­cal­ly decreas­es.  It means that a 50-mil­lime­tre lens per­fect­ly cov­ers the part of the field of vision which is char­ac­terised by the high­est res­o­lu­tion and sen­si­tiv­i­ty to the dif­fer­en­ti­a­tion of the elec­tro­mag­net­ic wave length in terms of vis­i­ble light. For this rea­son, a lens of such focal is known as the stan­dard lens.

Fig­ure 40. The scope of the field of vision of the right eye (the result of the peri­met­ric exam­i­na­tion) with super­im­posed small-image frame defined by the field of view with a focal length f = 50mm.  Graph­ic design: P.A. on the basis of the NASA Vision Group.

Due to the fact that we gen­er­al­ly view things using both our eyes (stereo­scop­i­cal­ly), it would be right to attach an illus­tra­tion of the scope of the binoc­u­lar field of vision (Fig.  41). Changes are not sig­nif­i­cant in the ver­ti­cal plane, but in the hor­i­zon­tal plane the field of vision is con­sid­er­ably widened, at least to 180°.  The chart also includes a small image frame cor­re­spond­ing to the range of vision of a 50mm lens (marked red), as well as a wide-angle lens with a focal length of 13mm, which enables large objects to be pho­tographed from a short dis­tance (marked green). Only a wide-angle lens with such small focal length cov­ers most of the field of vision of a human watch­ing a scene with both eyes.

Fig­ure 41. The scope of the stereo­scop­ic field of vision (white) with super­im­posed frames marked by field of view with a focal length of f = 50 mm (red) and f = 13 mm (green). Grey parts indi­cate visu­al areas seen with only one eye. Graph­ic design: P.A. on the basis of the NASA Vision Group.

To sum­marise, the scope of the field of vision of the eyes is con­sid­er­able, but in real­i­ty — as we are about to find out — it does not trans­late into see­ing scenes equal­ly clear­ly in each point of this field.  To the con­trary, only a small area locat­ed in the vicin­i­ty of the fovea of the reti­na pro­vides the cor­tex of the brain with the data on the objects seen.  The remain­ing (periph­er­al) part of the field of vision mere­ly makes it pos­si­ble to be aware of the pres­ence of slight­ly blurred objects in visu­al scenes, but, instead, it is par­tic­u­lar­ly sen­si­tive to movement. 

The scope of the field of vision and the size of an object in the visual scene frame

The scope of the field of vision tells us some­thing about the shape and the size — expressed in angu­lar units — of the framed visu­al scene, on which we focus or vision for a frac­tion of a sec­ond.  Using the descrip­tion of the field of vision with angu­lar units it is worth stop­ping, and play­ing with sim­ple arith­metic.  It will allow us to real­ize in what way the brain esti­mates the dis­tance and the size of objects seen in a scene.  Three para­me­ters decide about this infor­ma­tion: (1) the size (S) of the viewed object (under­stood as its height or width), expressed in mil­lime­tres, cen­time­tres or metres, (2) the angle (A) of the field of vision, expressed in degrees, and (3) the dis­tance (D) of the object from the reti­na of the eye, expressed in mil­lime­tres, cen­time­tres or metres (Fig.  42). As the size of the reti­na in the eye is con­stant, and the lens auto­mat­i­cal­ly adjusts the focal to the object locat­ed at the inter­sec­tion of the visu­al axis, these two indices can be omit­ted while estab­lish­ing par­tic­u­lar para­me­ters of objects locat­ed in the frame of a visu­al scene.

Fig­ure 42. The para­me­ters, with which the scope of the field of vision can be described. Graph­ic design: P.A.

Know­ing the real size of the object in met­ric units and its dis­tance from the eyes of the observ­er, one can eas­i­ly deter­mine its angu­lar size accord­ing to the fol­low­ing pattern:

If we know the angu­lar size of the object and its dis­tance from the eyes of the observ­er, we can, in turn, cal­cu­late its actu­al size:

We can also cal­cu­lat­ed in what dis­tance from the eyes of the observ­er the object is locat­ed, know­ing its actu­al and angu­lar size:

We will apply these for­mu­las in rela­tion to the sit­u­a­tion of look­ing at, for exam­ple, a paint­ing at the muse­um.  Let us assume that some­one is look­ing at the paint­ing sized 61 cm (in height) x 43 cm (in width) from the dis­tance of 200 cm.  On the basis of the for­mu­las we can cal­cu­late that its angu­lar height and width are equal to, respectively:

and

On the basis of the peri­met­ric exam­i­na­tion it is known that the aver­age scope of the human field of vision in the ver­ti­cal plane is approx. 130°, which means that the entire image with angu­lar height of a lit­tle over 17° is locat­ed with­in the field of vision of the observ­er.  Alas, it does not mean that it is seen equal­ly clear­ly in all its entirety. 

As it was sig­nal­ized before, due to the struc­ture of the reti­na, the human eye can clear­ly record at most only up to approx.  5° of the sur­face of the visu­al scene, and it turns out that from the dis­tance of approx­i­mate­ly 200cm it pen­e­trates the sur­face of a paint­ing up to approx.  17.5 cm (for 5° angle of the field of vision):

If we sub­sti­tute the cal­cu­lat­ed diam­e­ter into the for­mu­la for the area of a cir­cle, namely:

where π is con­stant and equals approx.  3.14 and is the radius of the cir­cle, name­ly half of its diam­e­ter, it will turn out that from the dis­tance of 2 meters, we can see the sur­face of approx.  240 cm2 clearly.

At the first glance, it seems like a lot, but when we com­pare this area to the area of the entire paint­ing, which is over 2,600 cm2,

where h is the height and w is the width, then it will turn out that the field of vision of approx. 240 cm2 cov­ers less than 10% of the entire sur­face of the paint­ing.  It results from the fol­low­ing relation:

name­ly

It should be added that the brain “uses” the pro­vid­ed for­mu­las for a vari­ety of pur­pos­es, for exam­ple to assess the in-depth dis­tance between two peo­ple. If it “knows” that they are of sim­i­lar height (i.e. 180cm), but the angu­lar size of one of them is, for exam­ple, 21°, and the oth­er — 12°, one can eas­i­ly con­clude that the first one is locat­ed less than 5m away from the eyes of the observer:

and the oth­er — more than 8.5 m from them:

We will return to the issues relat­ed to the result of the peri­met­ric exam­i­na­tion and the for­mu­las, using which we may cal­cu­lat­ed the scope of the field of vision or the dis­tance of an object from the observ­er, while dis­cussing the issues con­cern­ing see­ing shapes. 

Vision – holistic or sequential?

Up until now I have been writ­ing about see­ing an image in such a way as if it was pre­sent­ed to us in its whole, with a sin­gle glance of the eye. Such is our sub­jec­tive expe­ri­ence. Usu­al­ly, we need a while to fig­ure out what objects are present in a visu­al scene and what it means. We have an impres­sion that we “grasp” images momen­tar­i­ly.  But it is the sub­jec­tive expe­ri­ence of see­ing that obscures the truth regard­ing the sequen­tial nature and lack of data which con­sti­tute its basis.  See­ing is a process car­ried out in time and due to the fact that it is very short, among oth­er things, we have the impres­sion that images appear in front of our eyes all at once. 

By the way, the momen­tary nature of the over­all view of the image is also relat­ed to one of the most mys­te­ri­ous men­tal phe­nom­e­na, name­ly the imag­i­na­tion. It is used to, among oth­er things, recall scenes that were seen before under the influ­ence of momen­tary stim­u­la­tion of pho­tore­cep­tors. It is a mech­a­nism of oper­at­ing the sources of visu­al mem­o­ry. Some­times one glimpse is tru­ly enough to stim­u­late the imag­i­na­tion to recon­struct an entire visu­al scene and expe­ri­ence it as see­ing (Fran­cuz, 2007a; 2007b; 2010a; 2010b).  Imag­i­na­tion also plays a sub­stan­tial role in the process­es of per­cep­tive cat­e­gori­sa­tion, which under­lies the form­ing of the notions (Barsa­lou, 1999; Fran­cuz, 2011).

Thus, the expe­ri­ence of see­ing con­sists of two coop­er­at­ing mech­a­nisms.  We have more or less learned about one of them.  It is the mech­a­nism of the visu­al scene con­tent analy­sis, pre­sent­ed in the pre­vi­ous chap­ter. It is respon­si­ble for record­ing the dis­tri­b­u­tion of light reflect­ed from objects in the scene, or emit­ted by them, and for­ward­ing these data to the cere­bral cor­tex. It is the bot­tom-up mech­a­nism of see­ing. The sec­ond mech­a­nism con­sists in the data con­cern­ing the look of the world, which were ear­li­er record­ed in var­i­ous brain struc­tures. The visu­al mem­o­ry, which sug­gest all pos­si­ble inter­pre­ta­tions for the data cur­rent­ly reach­ing the brain via the bot­tom-up way.  We must, how­ev­er, remem­ber, that we know sig­nif­i­cant­ly less about these top-down mech­a­nisms than about the bot­tom-up ones. 

I think that we are only at the very begin­ning of the road which will lead us the answer­ing the ques­tion regard­ing the way in which data are stored in the visu­al mem­o­ry, and how they are used to build the expe­ri­ence of see­ing.  Any­way, giv­en the cur­rent lev­el of knowl­edge on the visu­al per­cep­tion we can only say that what we sub­jec­tive­ly expe­ri­ence as see­ing par­tic­u­lar images (scenes or objects) is more of fig­ment of the observer’s imag­i­na­tion, than record­ing the avail­able real­i­ty using our eyes.  I have already drawn atten­tion to this sev­er­al times, and I will keep on empha­sis­ing it.  The seen scene is the effect of com­bin­ing two types of data.  One of them orig­i­nate from the reg­is­tra­tion of the light­ing dis­tri­b­u­tion of its parts lim­it­ed by the field of vision and the oth­er ones, record­ed in the visu­al mem­o­ries that fill in the miss­ing parts of the puz­zle by means of an imag­i­nary mech­a­nism. Visu­al pro­grams work so fast that we do not notice the sub­se­quent phas­es of image — puz­zle for­ma­tion, but we have an impres­sion that we imme­di­ate­ly see it as a whole.  It is sim­i­lar with watch­ing a film at the cin­e­ma. We can see smooth move­ment on the screen although, in fact, we only see 24 still pho­tos every sec­ond. The fac­tor deter­min­ing the effect is a very short expo­sure time.

Why do our eyes move?

The move­ment of eye­balls is an observ­able sign of the sequen­tial nature of see­ing.  To under­stand well what role it plays while watch­ing a visu­al scene, we must first answer two fun­da­men­tal ques­tions: why do the eyes move at all and how do they move?

The scope of high-res­o­lu­tion field of vision with the use of recep­tors con­cen­trat­ed in the cen­tral part of the reti­na of the eye is not great.  Let us recall that it cov­ers at most 5° of the angle of the field of vision, which cor­re­sponds to approx.  3% of the sur­face of a 21-inch screen watched from the dis­tance of 60 cm. Com­pared to the entire sur­face of the screen it real­ly is not much.  In order to see what is in the oth­er, 3% parts of the sur­face, we sim­ply have to point our eyes at them, chang­ing the posi­tion of our eyes.

If the eyes were motion­less, we would be forced to con­stant­ly move our head, which would con­sume dis­pro­por­tion­ate­ly more ener­gy than mov­ing our eyes.  There­fore, it would mal­adap­tive, and the evo­lu­tion nor­mal­ly does not rein­force such behav­iours.  To sum­marise, we move out eyes because, among oth­er things, the scope of our field of vision is lim­it­ed, and by chang­ing their posi­tion we are able to see par­tic­u­lar ele­ments of a giv­en scene and use them to put togeth­er the entire image.  The sense of the pre­vi­ous sen­tence has very far-reach­ing con­se­quences regard­ing under­stand­ing how we accu­mu­late data con­cern­ing observed scenes. 

Scan­ning dif­fer­ent frag­ments of a visu­al scene is not the only rea­son for eye move­ment.  The oth­er one could be, for exam­ple, fol­low­ing a mov­ing object with­out chang­ing the posi­tion of the head or fix­ing the vision on a still object, while the observ­er is mov­ing.  To put it short­ly, there are many types of eye move­ments and all of them have strict­ly defined func­tions in the process of opti­cal fram­ing of a visu­al scene. How­ev­er, before I will describe dif­fer­ent types of eye move­ments, it is worth tak­ing a clos­er look at neu­ro­me­chan­ics of the eyes.

Neuromechanics of the moving eye

The eye is embed­ded in the orbital cav­i­ty lined with a lay­er of the adi­pose tis­sue with a thick­ness of a few mil­lime­tres. Since major part of the out­er lay­er of the eye­ball, i.e. the scle­ra, is smooth, hence, the eye can move with­out any resis­tance in the orbital cav­i­ty, like a ball in a well-lubri­cat­ed bear­ing. The eye­ball is obvi­ous­ly small­er than the orbital open­ing and could eas­i­ly fall out if it was not for the mus­cles hold­ing it from the inter­nal side of the skull. How­ev­er, their role is not only to pre­vent the eye from falling out of the orbital cav­i­ty, but they per­form much more impor­tant func­tions. The mus­cles attached to the eye­ball con­sti­tute the mechan­i­cal basis for chang­ing its posi­tion, such as mus­cles attached to the bones of an arm or a leg enable their movement.

Three pairs of antag­o­nis­tic mus­cles attached to each pos­te­ri­or, exter­nal sur­face of each eye par­tic­i­pate in each eye move­ment. The lat­er­al rec­tus, the tem­po­ral rec­tus and the medi­al rec­tus enable the eye to move left or right; the rec­tus mus­cles, includ­ing the supe­ri­or rec­tus and the infe­ri­or rec­tus move the eye up or down, where­as the oblique mus­cles, includ­ing the supe­ri­or oblique and the infe­ri­or oblique, enable the eye­ball to rotate, thanks to which it is pos­si­ble to direct the eyes to places lying between the hor­i­zon­tal and ver­ti­cal planes, e.g. to the upper right or low­er left cor­ner of the image (Fig. 43). It is worth men­tion­ing that the eyes can also move back­wards deeply into the skull. They per­form the so-called retrac­tion move­ment, which is the result of simul­ta­ne­ous con­trac­tion of all mus­cles, e.g. in the event of antic­i­pat­ed impact in the face. The mus­cles that move the eye­ball are char­ac­terised by the fact that they con­tract in the fastest way among all the mus­cles that move the human body (Matthews, 2000).

Fig­ure 43. Three pairs of antag­o­nis­tic mus­cles, mov­ing the (left) eye­ball: (a) the supe­ri­or rec­tus and (b) the infe­ri­or rec­tus, enabling the eye move­ment in the ver­ti­cal plane, © lat­er­al tem­po­ral rec­tus and (d) medi­al rec­tus (from the side of the nose), enabling the eye move­ment in hor­i­zon­tal plane as well as (e) the supe­ri­or oblique and (f) the infe­ri­or oblique, enabling the rota­tion­al motion of the eye and see­ing in oblique planes. Graph­ic design: P.A.

Due to the anatom­i­cal struc­ture of the eye and skull, the way of plac­ing the eye­ball in the orbital cav­i­ty, and the char­ac­ter­is­tics of the list­ed pairs of extraoc­u­lar mus­cles, the range of eye move­ments is lim­it­ed to approx. 90° (max­i­mum up to 100°) in the hor­i­zon­tal plane and approx. 80° (max­i­mum up to 125°) in the ver­ti­cal plane.

It is worth pay­ing atten­tion to the fact that although the range of eye move­ments in the orbital cav­i­ties is quite large, in prac­tice, it gen­er­al­ly does not exceed 30° in any plane. This is due to the head mobil­i­ty. Look­ing, e.g. upwards, we rarely tight­en the infe­ri­or rec­tus to the lim­its of strength, but rather we tilt the head slight­ly back­wards and thus effort­less­ly focus our eyes on the object of inter­est, while keep­ing the eyes in a more nat­ur­al posi­tion in the orbital cav­i­ty. How­ev­er, if we would like to look, e.g. to the left side with­out mov­ing the head, then the visu­al axis of the right eye will cross the nose and the image from that eye will be lim­it­ed. There­fore, it is bet­ter to help your­self by turn­ing your head to the left slightly.And final­ly, a few sen­tences con­cern­ing the con­nec­tions between the extraoc­u­lar mus­cles and the near­est brain struc­tures, from which sig­nals affect­ing their work come from. Pairs of mus­cles attached to the eye­balls react with a sud­den con­trac­tion or relax­ation under the influ­ence of nerve impuls­es flow­ing along the cra­nial nerves from neu­rons (so-called motoneu­rons) locat­ed in the spinal cord. The rec­tus mus­cles (exclud­ing the lat­er­al tem­po­ral rec­tus) and the infe­ri­or oblique are acti­vat­ed by impuls­es flow­ing along the cell axons locat­ed in the acces­so­ry ocu­lo­mo­tor nucle­us (Edinger – West­phal nucle­us), via the third cra­nial nerve (CN III), called the ocu­lo­mo­tor nerve (Fig. 44). The supe­ri­or oblique is con­trolled by the trochlear nerve nucle­us (fourth cra­nial nerve nucle­us) via the IV (trochlear) cra­nial nerve, where­as the rec­tus mus­cle is con­nect­ed to the nucle­us of the abducens nucle­us via the VI (abducens) cra­nial nerve. For now, let us find out where the sig­nals acti­vat­ing the extraoc­u­lar mus­cles come from, but we will soon find out where the sig­nals stim­u­lat­ing all three of these neur­al nuclei come from. It is their acti­va­tion that trans­lates into a spe­cif­ic move­ment of the eyeball.

Fig­ure 44. Con­nec­tions among the nuclei of motoneu­rons, locat­ed in the spinal cord, and the extraoc­u­lar rec­tus mus­cles (exclud­ing the lat­er­al tem­po­ral rec­tus) and the infe­ri­or oblique. Graph­ic design: P.A. based on Kan­del, Schwartz and Jes­sell (2000)

General eye movement classification

We have been already famil­iarised with the gen­er­al con­cept of the eye move­ment. It is time to take a clos­er look at the dif­fer­ent types of eye move­ments, both in terms of their func­tion­al­i­ty and the neu­ro­bi­o­log­i­cal mech­a­nisms gov­ern­ing them. The typol­o­gy of eye move­ments is quite broad, although some are more impor­tant and oth­ers slight­ly less impor­tant from the point of view of the issues dis­cussed in this book. I will dis­cuss them, start­ing with those of less sig­nif­i­cance for image view­ing, and end­ing with those that con­sti­tute the core of the visu­al scene’s fram­ing mech­a­nism. In fact, the over­rid­ing goal of all types of eye move­ments is to main­tain the high­est qual­i­ty of vision with vary­ing con­di­tions in regards of watch­ing visu­al scene.

In the first group, eye move­ment is asso­ci­at­ed with the appar­ent immo­bil­i­ty of the eye­ball, i.e. the moment when the eye fix­ates for a moment on some frag­ment of the image. It is sur­pris­ing, but even then the eyes per­form at least three tiny and quite dif­fer­ent micro­move­ments in terms of tra­jec­to­ry, i.e. fix­a­tion move­ments: microsac­cade, ocu­lar drift and ocu­lar mic­trotremor. In short, they pre­vent the adap­ta­tion of pho­tore­cep­tors to the same light­ing dur­ing visu­al fixation.

In the sec­ond group, eye move­ments are reflex­es, i.e. invol­un­tary move­ments, the pur­pose of which is to take a longer look at a select­ed object in the visu­al scene, in spe­cif­ic view­ing con­di­tions. For instance, the pur­pose of the vestibu­lo-ocu­lar reflex (VOR) is to take a longer look at a giv­en object when the head is mov­ing. Such sit­u­a­tion occurs so often on a dai­ly basis that a ful­ly auto­mat­ed ocu­lo­mo­tor mech­a­nism has devel­oped in the course of evo­lu­tion. It enables us to hold gaze on the object of our inter­est when chang­ing posi­tion towards it.

Anoth­er ful­ly auto­mat­ed eye move­ment is the opto­ki­net­ic reflex (OKR). Its pur­pose is to hold a gaze on an object when either it moves very quick­ly towards a sta­tion­ary observ­er or the observ­er moves very quick­ly towards a sta­tion­ary object. Final­ly, the third ocu­lo­mo­tor reflex enables the observ­er to hold his/her gaze on an approach­ing or reced­ing object or when the observ­er approach­es to it or recedes from the object. These are so-called ver­gence reflex­es. The ones that accom­pa­ny the decreas­ing dis­tance between the object and the observ­er are called con­ver­gence reflex­es, where­as those that accom­pa­ny the increas­ing dis­tance are called diver­gence reflex­es.

The third group of eye move­ments is espe­cial­ly impor­tant for under­stand­ing what vision is. These are fram­ing move­ments. The first of them is called the sac­cade or inter­mit­tent eye move­ment and its goal is to pre­cise­ly posi­tion the visu­al axis for both eye­balls on the most impor­tant, for some rea­son, frag­ment of the visu­al scene. Sac­cades, as we will see late, can be per­formed under the influ­ence of impuls­es hav­ing their source in the visu­al scene, i.e. con­trolled bot­tom up or bot­tom-down, to a greater or less­er extent in accor­dance with the will of the observ­er. The sec­ond move­ment in this group is the so-called smooth pur­suit. To under­stand its essence, it is enough to pay atten­tion to how men (well, maybe not every­one) watch an attrac­tive girl pass­ing by. Their eyes are almost glued to any part of her body. The eyes move at exact­ly the same speed as she moves. This is the smooth pursuit.

Unlike all oth­ers types of move­ments, the com­mon fea­ture of the sac­cades and the smooth pur­suit is that they are much more vol­un­tar­i­ly con­trolled. Their goal is to frame the visu­al scene in such a way as to obtain as much data as pos­si­ble from it, which will enable the con­struc­tion of an accu­rate, cog­ni­tive rep­re­sen­ta­tion of this scene.

Fixational eye movements

Each sac­cade ends with a short-term fix­a­tion, i.e. the eye move­ment stops on such a frag­ment of the visu­al scene where it is cur­rent­ly locat­ed in the line of sight. The fix­at­ed eye seems to be com­plete­ly motion­less, but it actu­al­ly per­forms three types of invol­un­tary fix­a­tion move­ments: ocu­lar microtremor (OMT), ocu­lar drift and microsac­cade.

Fix­a­tion­al eye move­ments are fast and short, have a com­pli­cat­ed, and seem­ing­ly chaot­ic, tra­jec­to­ry. How­ev­er, they per­form very impor­tant func­tions dur­ing vision. First of all, they pre­vent the adap­ta­tion of the already acti­vat­ed group of pho­tore­cep­tors to the light act­ing on them. Slight­ly chang­ing the sta­bilised eye posi­tion dur­ing fix­a­tion, these move­ments cause that new (but not yet acti­vat­ed) pho­tore­cep­tors are con­stant­ly engaged in the record­ing of the same image pro­ject­ed on the reti­na. These move­ments also enable the col­lec­tion of even more data from the cropped frag­ment of the visu­al scene, on which the eyes are fix­at­ed (Leigh and Zee, 2006; Mar­tinez-Conde, Mack­nik and Hubel, 2004). All three types of move­ments were record­ed for the first time and their detailed descrip­tion was pub­lished by Roy M. Pritchard in 1961 (Fig. 45).

Ocu­lar microtremor, also called tremor of the eyes or phys­i­o­log­i­cal nys­tag­mus, is con­stant and com­plete­ly invol­un­tary eye­ball activ­i­ty dur­ing fix­a­tion. Among all micro­move­ments, they have the low­est ampli­tude with a length not exceed­ing the diam­e­ter of a sin­gle cone locat­ed in the fovea (Mar­tinez-Conde, Mack­nik, Tron­coso and Hubel, 2009) as well as an incom­pa­ra­bly high­er fre­quen­cy com­pared to them, with­in 70–100 Hz (Bloger, Bojan­ic, Shea­han, Coak­ley et al., 1999). The tra­jec­to­ry of ocu­lar mic­trotremor is dif­fer­ent in each eye.

Fig­ure 45. Graph­ic records of three types of eye­ball move­ment, record­ed against a back­ground of a mosa­ic of pho­tore­cep­tors locat­ed in the fovea of the eye reti­na with the sur­face of 0.05mm.    The longest straight sec­tions illus­trate the tra­jec­to­ry of microsac­cades, micro­drift was record­ed in the form of curved sec­tions, and the records of ocu­lar microtremor over­lap with micro­drift and resem­ble a zigzag stitch made by a saw­ing machine. Own graph­ic design based on Pritchard (1961)

The oth­er type of fix­a­tion move­ment is the micro­drift. It is a rel­a­tive­ly slow move­ment of the eye­ball, in a rather ran­dom direc­tion, because of which the image pro­ject­ed onto the reti­na con­stant­ly illu­mi­nates slight­ly oth­er group of pho­tore­cep­tors.  The ampli­tude of the drift oscil­lates from 1′ to 8′ (angu­lar min­utes, name­ly 60 parts of the angu­lar degree) and the speed is below 30′/s (Rolfs, 2009). Unlike ocu­lar microtremor, the ocu­lar drift caus­es a glob­al shift in the rela­tion between the image pro­ject­ed onto the reti­na and the pho­tore­cep­tors in it.  This move­ment pre­vents adap­ta­tion of pho­tore­cep­tors to the same inten­si­ty of light, thus, it pre­vents the image from fad­ing out.  The image can fade out because pho­tore­cep­tors which have already been stim­u­lat­ed for a moment stop absorb­ing new batch­es of light and are not sen­si­tive to them.  The already men­tioned Roy M. Pritchard (1961) proved that sta­bil­is­ing an image on the same pho­tore­cep­tor group results in see­ing the image less and less clear­ly, until it fades out entire­ly.  Much like in rela­tion to ocu­lar microtremor, dur­ing eye fix­a­tion on the same frag­ment of a visu­al scene, the tra­jec­to­ry of the ocu­lar drift in the right and left eye is com­plete­ly dif­fer­ent (Fig. 46).

Fig­ure 46. A record of drift tra­jec­to­ry (black line) and microsac­cades (red line) sep­a­rate­ly for the right and left eye­ball dur­ing a 1‑sec­ond-long fix­a­tion of the eyes on a sur­face of the size of approx.  1o of the field of vision angle.  The posi­tion of the sub­se­quent points of eye sta­bil­i­sa­tion is expressed in angu­lar min­utes.  Own graph­ic design based on Rolfs (2009)

Final­ly, the third type of eye fix­a­tion move­ments are the microsac­cades. They have been a sub­ject of grow­ing inter­est in the recent years. Eng­bert, 2006; Eng­bert, Kliegl, 2004; Mar­tinez-Conde et al., 2004; 2009; Rolfs, 2009). They ampli­tude is very diver­si­fied, and oscil­lates between 3′ (angu­lar min­utes) to 2° of the field of vision angle, and its speed is usu­al­ly incom­pa­ra­bly high­er than the speed of the ocu­lar microtremor or ocu­lar drift (Mar­tinez-Conde et al., 2009). The effect of the shift of the reti­na in rela­tion to the image pro­ject­ed onto it, caused by the microsac­cade, is anal­o­gous to the micro­drift.  Microsac­cade is, how­ev­er, faster and most often it shifts the visu­al axis to a larg­er dis­tance than the micro­drift. Unlike the microtremors and micro­drifts, microsac­cades are per­formed syn­chron­i­cal­ly by both eyes (see: red line in Fig. 46).

Mar­tin Rolfs (2009) indi­cates many var­i­ous func­tions per­formed by the microsac­cades.  First and fore­most, they rad­i­cal­ly change the rela­tion of the image pro­ject­ed onto the reti­na sur­face, pre­vent­ing it from fad­ing out; they more­over facil­i­tate main­tain­ing high clar­i­ty of vision, and are used to scan a small sur­face of fix­a­tion (among oth­er things in order to detect the edges of objects in a visu­al scene more accu­rate­ly), and play an impor­tant role in the process of pro­gram­ming next sac­cades, by means of shift­ing the so-called visu­al atten­tion to new areas of a visu­al scene. 

Oculomotor reflexes

I shall begin the overview of the move­ments of eyes engaged in the move­ment of either the observ­er or the objects in a visu­al scene by dis­cussing three types of reflex­es, name­ly such reac­tions of the eyes, which are almost entire­ly uncon­trolled by the observ­er. These are: vestibu­lo-ocu­lar reflex (VOR), opto­ki­net­ic reflex, as well as ver­gence reflex­es, con­ver­gence reflex­es and diver­gence reflex­es.

At first glance one can see that the head and the eyes can move inde­pen­dent­ly of one oth­er. It has its obvi­ous advan­tages, because thanks to it the visu­al sys­tem is much bet­ter adapt­ed to var­i­ous life sit­u­a­tions which require good ori­en­ta­tion in space.  This sit­u­a­tion also breeds a cer­tain prob­lem, with which the evo­lu­tion has dealt excel­lent­ly.  The prob­lem con­sists in the fact that we often want to fix our vision on a cer­tain detail of a visu­al scene, but were are, at the same time, mov­ing.  It is enough to imag­ine how com­pli­cat­ed work has to be per­formed by the eye mus­cles in order to fix the vision in one, select­ed spot of a visu­al scene observed by a rid­er dur­ing horse rid­ing com­pe­ti­tion.  The head not only moves in all direc­tions, bal­anc­ing the dis­tur­bances relat­ed to the move­ment of the horse, but, more­over, keeps on chang­ing its posi­tion in rela­tion to the watched scene.  A lit­tle less extreme, yet anal­o­gous, sit­u­a­tion is look­ing at store­fronts dur­ing a walk, or look­ing at paint­ings on the walls of a muse­um while moving. 

The phys­i­o­log­i­cal mech­a­nism which lies at the bot­tom of the VOR reflex, is the mech­a­nism which uses data from the kinaes­thet­ic sys­tem, first and fore­most respon­si­ble for main­tain­ing the bal­ance of the body, in order to facil­i­tate the func­tion of the visu­al sys­tem (Fig. 48). The visu­al mech­a­nism which pre­vents loss of clar­i­ty relat­ed to head move­ments is the vestibu­lo-ocu­lar reflex. The reflex can be eas­i­ly observed when we are con­stant­ly look­ing at a cer­tain object, while mov­ing our heads to the left and right (Fig. 47). Vestibu­lo-ocu­lar reflex can be observed while per­form­ing a sim­ple task. While look­ing into the mir­ror, let us try to fix the vision on our pupils and, simul­ta­ne­ous­ly, move our head in all direc­tions.  Each move­ment in one direc­tion will cause invol­un­tary reac­tion of the eyes in the oppo­site.  The speed of the head and eye move­ment is the same.

Fig­ure 47. Illus­tra­tion of the vestibulo–ocular reflex. The task con­sists in look­ing at a still object while mov­ing your head in var­i­ous direc­tions. Graph­ic design: P.A.

The phys­i­o­log­i­cal mech­a­nism which lies at the bot­tom of the VOR reflex, is the mech­a­nism which uses data from the kinaes­thet­ic sys­tem, first and fore­most respon­si­ble for main­tain­ing the bal­ance of the body, in order to facil­i­tate the func­tion of the visu­al sys­tem (Fig. 48).

Fig­ure 48. Schemat­ic rep­re­sen­ta­tion of the neur­al sub­sys­tem respon­si­ble for the vestibulo–ocular reflex. Graph­ic design: P.A. based on Matthews (2000) and Nolte (2011)

Detec­tion of the loca­tion and move­ment of the head takes place in the vestibu­lar organ, locat­ed in the inner ear.  The data from the vestibu­lar organ are for­ward­ed to the lat­er­al vestibu­lar nuclei, small clus­ters of neu­rons locat­ed on both sides of the brain­stem (more pre­cise­ly in the lat­er­al dor­sal part of the medul­la oblon­ga­ta). From there, the sig­nals are for­ward­ed to two pairs of nuclei which are in direct charge of eye move­ment — the already-men­tioned nuclei of the ocu­lo­mo­tor and abducens nerve (Matthews, 2000; Nolte, 2011).

In sum­ma­ry, even the slight­est tilt of the head from the ver­ti­cal posi­tion is record­ed in the vestibu­lar organ and the infor­ma­tion about it is dili­gent­ly used by the sys­tem in charge of eye­ball move­ment.  Thanks to the vestibu­lar ocu­lar reflex we are able to con­stant­ly project a select­ed frag­ment of an image onto the cen­tral part of the reti­na with­out los­ing its clarity. 

The opto­ki­net­ic reflex occurs when the head is rel­a­tive­ly sta­ble, and the eyes are unwit­ting­ly fix­ing on a rel­a­tive­ly large object, the image of which swift­ly changes its loca­tion on the reti­na.  For exam­ple, when we are on a train we from time to time fix out eyes on some ele­ment of the dynam­ic scene and when it dis­ap­pears from the field of vision the eyes “attach” to anoth­er object.  The opto­ki­net­ic reflex can co-occur with the vestibu­lar ocu­lar reflex, i.e. when the head is addi­tion­al­ly mov­ing (Mus­tari and Ono, 2009). It is char­ac­terised by short laten­cy time (below 100 ms), rel­a­tive­ly high speed (even above 180°/sec.) and it is real­ly dif­fi­cult to con­trol it (Büt­tner and Krem­my­da, 2007). It basi­cal­ly is not very sig­nif­i­cant while view­ing still images.

The basic func­tion of both reflex­es (the vestibu­lar and the opto­ki­net­ic reflex) is com­pen­sat­ing for the loss of sharp­ness of the object locat­ed in a visu­al scene, caused either by the move­ment of the observ­er or the object, locat­ed in a plane per­pen­dic­u­lar to the visu­al axis. On the oth­er hand, the so-called ver­gence reflex occurs while look­ing at a visu­al scene in which the objects (or the observ­er) change their loca­tion in rela­tion to one anoth­er in a line par­al­lel to the visu­al axis, name­ly they either come clos­er or move away from one another.

Ver­gence occur in two forms, as con­ver­gence, i.e. con­ver­gent or diver­gence, i.e. diver­gent. Their speed is sim­i­lar, approx.  25°/s. Ver­gence reflex­es are respon­si­ble for main­tain­ing the visu­al axis on an object, depend­ing on the dis­tance of the object from the observ­er.   When the dis­tance between them decreas­es, the pupils of the eye­balls approach each oth­er (con­ver­gence), and when the speed is high­er they stave off (diver­gence) (Fig. 49).

Fig­ure 49. Ver­gence reflex­es: A — diver­gence (objects which are staving off or are afar) and B — con­ver­gence (objects which are com­ing clos­er or are close).  Graph­ic design: P.A.

The reflex some­what resem­bles mak­ing a squint, either con­ver­gent or diver­gent.  When an object is locat­ed fur­ther than 5 — 6 meters away from the observ­er, the visu­al axes of both eyes are posi­tions almost in par­al­lel to each oth­er.  It is a lim­it above which ver­gence reflex­es of the eyes play less­er and less­er role in stereo­scop­ic, that is binoc­u­lar depth per­cep­tion.  Just like the pre­vi­ous one, the reflex is of rel­a­tive­ly small sig­nif­i­cance for depth recog­ni­tion while look­ing at a flat sur­face of an image. 

Basic mechanism of visual scene framing

Sac­cadic move­ments of the eyes are gen­er­al­ly used to move the gaze from one point of the rel­a­tive­ly sta­ble visu­al scene to anoth­er one. They can be eas­i­ly observed by look­ing into the eyes of a per­son who is watch­ing some­thing. We will notice that the pupils of the observer’s eye­balls every now and then, simul­ta­ne­ous­ly, change their posi­tion, plac­ing the visu­al axes on sub­se­quent ele­ments of the image being viewed. 

After each sac­cade, the eyes are motion­less for a short­er or longer peri­od. This moment is called visu­al fix­a­tion and is one of the most impor­tant activ­i­ties in the process of see­ing. This is when the data are record­ed on the lev­el of pho­tore­cep­tor stim­u­la­tion in the reti­na. A moment lat­er they are trans­mit­ted to the brain as infor­ma­tion about the cur­rent­ly seen frag­ment of the visu­al scene. This is the moment when the “frame is exposed”.

In Fig. 50 B, I marked the sac­cadic move­ments (yel­low lines) and places of visu­al fix­a­tion (yel­low points) record­ed while view­ing the por­trait of Krysty­na Potoc­ka by Angel­i­ca Kauff­mann (the Wilanów col­lec­tion) (Fig. 50 A). Twen­ty-four exam­ined peo­ple were look­ing at the image, dis­played on a com­put­er screen; I was record­ing their eye­ball move­ment using the SMI High­Speed 1250Hz ocu­lo­graph. The results of the exper­i­ment will be dis­cussed in the final chap­ter of the book, which is devot­ed to per­cep­tion of beauty.

Fig­ure 50 A. Angel­i­ca Kauff­mann, Por­trait of Krysty­na Potoc­ka (1783/1784). Muse­um of King Jan III’s Palace at Wilanów, War­saw, Poland [155 x 113 cm] and B — paint­ing cre­at­ed by Angel­i­ca Kauff­man with the record of eye move­ment tra­jec­to­ry of 24 exam­ined per­sons. Elab­o­ra­tion based on own research results.

Dur­ing sac­cades, pho­tore­cep­tors are also stim­u­lat­ed. Nev­er­the­less, tak­ing into account the speed of eye­ball move­ment, images pro­ject­ed on the eye­’s reti­na at that time are whol­ly blurred — to the same extent as the image out­side a speed­ing TGV can be blurred — both in the fore­ground — due to the move­ment of the train, and in the back­ground – because of the dis­tance (Fig. 50 C).

Fig­ure 50 C. A pho­to­graph from a win­dow of TGV in the south of France. Graph­ic design: P.F.

The brain ‘is not inter­est­ed’ in these data and there­fore tem­porar­i­ly turns off some com­po­nents, to be more spe­cif­ic, those parts of the visu­al cor­tex which are respon­si­ble for receiv­ing data on the sta­tus of pho­tore­cep­tors stim­u­la­tion (Paus, Mar­rett, Wors­ley and Evans, 1995; Burr, Mor­rone and Ross, 1994). This phe­nom­e­non is called sac­cadic sup­pres­sion or sac­cadic mask­ing. This mech­a­nism is sim­i­lar to the func­tion­ing of a shut­ter in a video cam­era (Fig. 51).

Fig­ure 51. Oper­at­ing prin­ci­ple of the shut­ter in the film cam­era as an anal­o­gy of the mech­a­nism of sac­cadic mask­ing. Own graph­ic design based on Łow­ic­ki (2005)

Record­ing a film involves expo­sure of a frame sequence on an immo­bilised cel­lu­loid film stock cov­ered with a pho­to­sen­si­tive pho­to­graph­ic film. While the film stock is moved by one frame, the rotat­ing shut­ter ful­ly cov­ers it for a moment. This way, the pho­to­graph­ic film is not exposed dur­ing move­ment of the film stock only when it is motion­less. Record­ing a visu­al scene with eyes is sim­i­lar. The brain only analy­ses images that were record­ed by pho­tore­cep­tors dur­ing visu­al fix­a­tion, not dur­ing a sac­cade. Thanks to this, the visu­al scene (expe­ri­enced by us as vision) is con­struct­ed pri­mar­i­ly on the basis of sharp and still (i.e. non-blurred and motion­less) images. Sac­cadic reac­tion times, like film stock shift­ing times, do not belong to the dura­tion of the record­ed visu­al scene. For­tu­nate­ly, they are of a one order of mag­ni­tude short­er than fixations.

In each sec­ond, the eyes per­form 3–5 sac­cades. How­ev­er, there may be many more of them in moments of increased stim­u­la­tion. Sac­cadic move­ments are rapid – they begin and end equal­ly in a sud­den way. Depend­ing on the sit­u­a­tion, they last from 10 to 200 ms. For instance, they con­sti­tute on aver­age 30 ms while read­ing (Duchows­ki, 2007; Leigh and Zee 2006). They dif­fer in length, i.e. ampli­tude, as well as in speed. The ampli­tude of a sac­cade depends on many fac­tors, includ­ing the size of the visu­al scene or the loca­tion of objects with­in it, but gen­er­al­ly does not exceed 15° angle of view (max­i­mum can be up to 40°). Its dura­tion is pro­por­tion­al to the ampli­tude, although it is not a straight-line rela­tion­ship. Sim­i­lar­ly, the sac­cadic speed increas­es pro­por­tion­al­ly (although not lin­ear­ly) to the ampli­tude and it can be up to 900 °/sec in human. (Fis­ch­er and Ramsperg­er, 1984). This is the fastest move­ment of the human body. The rela­tion­ships among the ampli­tude, time and speed of each sac­cade are rel­a­tive­ly con­stant. This fea­ture of sac­cadic move­ments is called sac­cade stereo­typ­i­cal­i­ty (Bahill, Clark and Stark, 1975).

The sec­ond fea­ture of sac­cadic move­ments, next to the sac­cade stereo­typ­i­cal­i­ty, is sac­cade bal­lis­tic­i­ty (Duchows­ki, 2007). It is under­stood as the iner­tia as well as the sta­bil­i­ty of the eye-move­ment direc­tion after the start of the sac­cade. Sim­i­lar­ly as it is impos­si­ble to change the para­me­ters of a stone’s flight from the moment of its throw, the sac­cade’s speed, accel­er­a­tion and direc­tion (pro­grammed before its start) can, gen­er­al­ly, no longer be changed while its dura­tion (Car­pen­ter, 1977).

There are many types of sac­cadic move­ments (for full clas­si­fi­ca­tion see Leigh and Zee, 2006, pp. 109–110), but two of them are par­tic­u­lar­ly impor­tant from the point of view of the issues pre­sent­ed in this book. These are reflex­ive sac­cades and vol­un­tary sac­cades. Reflex­ive (or invol­un­tary) sac­cades appear in response to spe­cif­ic ele­ments of the visu­al scene, such as, for exam­ple, the sud­den appear­ance of an object, rapid move­ment, salien­cy of its var­i­ous ele­ments, con­trast, flash­es or flick­ers, and even sounds towards which eyes are reflex­ive­ly direct­ed. Vol­un­tary sac­cades are con­trolled top-down, i.e. their tra­jec­to­ry is sub­or­di­nat­ed to such fac­tors as, e.g. knowl­edge, atti­tude, and con­scious search for some­thing in an image or the observer’s expectations.

Neuronal basis of saccadic movements

The most impor­tant neur­al struc­tures that are involved in con­trol­ling eye move­ments (both sac­cadic and smooth pur­suit ones) are shown in Fig. 52. Their mul­ti­plic­i­ty and com­pli­cat­ed net­work of path­ways con­nect­ing them make us well aware of how the com­plex mech­a­nism con­trols the nor­mal right or left glance. There is no place here for a coin­ci­dence, although some­times this ‘machin­ery’ may not fol­low the course of events.

Fig­ure 52. Struc­tures and neur­al path­ways involved in form­ing deci­sions regard­ing eye move­ments. Graph­ic design: P.A. based on Duchows­ki (2007), Wurtz (2008), Yokochi, Rohen and Wein­reb (2006)

Let us start with those struc­tures of the eye move­ment con­trol sys­tem that con­sti­tute the ear­ly stages of the visu­al path­way. One of the most impor­tant struc­tures is the supe­ri­or col­licu­lus (SC), locat­ed in the mid­brain tec­tum. Nerve impuls­es reach the out­er lay­ers of this struc­ture from the reti­na via the optic nerve fibre bun­dle, called the retino-col­lic­u­lar path­way (Fig. 52, blue tract).

The SC’s main task is to posi­tion the eye­balls so that their visu­al axes are direct­ly oppo­site the most inter­est­ing part of the visu­al scene. The con­struc­tion and func­tion­ing of this amaz­ing struc­ture is some­what rem­i­nis­cent of a mod­ern sniper rifle sight. It is built in lay­ers, as well as many sub­cor­ti­cal struc­tures (e.g. LGN) or cor­ti­cal ones (e.g. V1) in brain (Fig. 53).

Fig­ure 53. The supe­ri­or col­licu­lus, locat­ed on the dor­sal side of the mid­brain: input and out­put lay­ers. Graph­ic design: P.A. based on Wurtz (2008)

First of all, the SC top lay­er, which reach­es synap­tic end­ings of gan­glion cells’ axons from the observer’s reti­nas (retino-col­lic­u­lar path­way), has a retino­topic organ­i­sa­tion, anal­o­gous to the pri­ma­ry visu­al cor­tex (V1) (Naka­hara, Mori­ta, Wurtz and Opti­can, 2006).

There is a topo­graph­ic rela­tion between the reac­tions of pho­tore­cep­tors locat­ed at spe­cif­ic reti­nal areas and the reac­tions of cells form­ing the SC top lay­er (see Fig. 54).

Fig­ure 54. Retino­topic organ­i­sa­tion of the first (grey) lay­er of supe­ri­or col­licu­lus.  Visu­al hemi­field on the left side, and the cor­re­spond­ing retino­topic organ­i­sa­tion of neu­rons in SC on the right. The cir­cle and the tri­an­gle rep­re­sent two stim­uli, which caused a reac­tion of pho­tore­cep­tors in a giv­en spot in the reti­na and cells in SC. Own graph­ic design based on Naka­hara, Mori­ta, Wurtz and Opti­can (2006)

Just like in V1, also in SC the largest part of the map is occu­pied by a rep­re­sen­ta­tion of the sur­round­ings of fovea. This is per­fect­ly illus­trat­ed by two fig­ures (cir­cle and tri­an­gle), marked on Fig. 54. The cir­cle, the posi­tion of which it is dif­fi­cult to deter­mine on the basis of the reti­na chart, can be eas­i­ly locat­ed on its retino­topic map in SC.  The sig­nif­i­cance of topo­graph­i­cal­ly ori­ent­ed reti­nal data in SC for the per­for­mance of the next sac­cade is sim­i­lar to the sig­nif­i­cance of a map at mil­i­tary head­quar­ters for pre­cise deter­mi­na­tion of the tar­get of the next attack. 

Anoth­er extreme­ly inter­est­ing fea­ture of SC is the fact that cells locat­ed in its deep­er lay­ers react not only to visu­al, but also audi­to­ry (Jay and Sparks, 1984) and sen­so­ry stim­uli (Groh and Sparks, 1996). It would explain, for exam­ple, why a sud­den sound or smell can direct one’s eyes to its source.  Col­lic­u­lar viewfind­er is there­fore not an organ iso­lat­ed from the oth­er sens­es, spe­cialised only in the response to visu­al stim­uli, but a bio­log­i­cal device sen­si­tive to a much wider spec­trum of sen­so­ry stim­u­la­tion. Thanks to such struc­tures we feel that our sens­es do not work sep­a­rate­ly, but they con­sti­tute a ful­ly inte­grat­ed sys­tem react­ing in a coor­di­nat­ed way to var­i­ous man­i­fes­ta­tions of the real­i­ty (Cup­pi­ni, Ursi­no, Magos­so, Row­land et al., 2010). 

Final­ly, SC coor­di­nates the work of sev­er­al var­i­ous brain struc­tures, send­ing to them and receiv­ing from them neur­al impuls­es in order to final­ly deter­mine the loca­tion of the next sac­cade.  After receiv­ing the data on the cur­rent posi­tion of the eye in rela­tion to a visu­al scene from the reti­na, SC sends them towards two struc­tures locat­ed in the cere­bral cor­tex (Fig. 52).

The first track, or rather sev­er­al sep­a­rate path­ways, leads to pari­etal lobes, more pre­cise­ly to the lat­er­al intra­pari­etal cor­tex (LIP), lin­ing the sur­face of intra­pari­etal sul­cus (IPS). One of the path­ways leads to this area via LGN, the pri­ma­ry visu­al cor­tex (V1) and fur­ther through the medi­al tem­po­ral (MT) area, i.e. the already men­tioned V5 — area respon­si­ble for see­ing move­ment and the medi­al supe­ri­or tem­po­ral (MST) area, which are locat­ed deep down the supe­ri­or tem­po­ral sul­cus (Fig. 52, green tract).

The oth­er path­way runs along a short­er track through the pul­v­inar media nuclei and from there direct­ly to the supe­ri­or tem­po­ral sul­cus and fur­ther to the IPS (Wurtz, 2008) (Fig. 52, red tract).

The sec­ond impor­tant struc­ture to which the SC sends data on the cur­rent posi­tion of the eye is in the frontal cor­tex, more pre­cise­ly, in the pos­te­ri­or part of the mid­dle frontal gyrus and in the adja­cent pre­cen­tral sul­cus. This struc­ture is known under the rather pecu­liar name of frontal eye field (FEF). In real­i­ty this cen­tre is respon­si­ble for any sac­cadic move­ment of the eye on the oppo­site side of the head, and for a smooth motion of the eye­balls. The track from SC to FEF leads through the dor­so­me­di­al nucle­us, locat­ed in the thal­a­mus (Fig. 52, yel­low tract).

After obtain­ing all the nec­es­sary data from the thal­a­m­ic nuclei, pul­v­inar and struc­tures locat­ed in the visu­al dor­sal path­way (and espe­cial­ly MT and MST) both cen­tres, that is IPS and FEF, per­form: analy­sis of data con­cern­ing the cur­rent posi­tion of the eye in rela­tion to the frame of the visu­al scene, rec­on­cil­i­a­tion of the analy­sis results between one anoth­er and, sub­se­quent­ly, they send back the deci­sion on the change of the eye posi­tion to SC (Got­tlieb and Bal­an, 2010; Kable and Glim­ch­er, 2009; Noudoost, Chang, Stein­metz and Moore, 2010; Wen, Yao, Liu and Ding, 2012) (fig. 52, orange tract). The coor­di­nates of the new eye posi­tion are plot­ted on a reti­nal retino­topic map of the reti­na in the top lay­er of SC and passed on in the form of an instruc­tion to be performed.

From SC, infor­ma­tion trav­els to a small group of nuclei, locat­ed in the medi­al part of the para­me­di­an pon­tine retic­u­lar for­ma­tion (PPRF) by two path­ways. One leads direct­ly from SC to PPRF, and the sec­ond indi­rect­ly through the cere­bel­lum, which par­tic­i­pates in all oper­a­tions relat­ed to the move­ment of the organ­ism (Gross­berg, Sri­hasam and Bul­lock, 2012) (Fig. 52, pur­ple tract). Through the con­nec­tions of PPRF with the nuclei of: ocu­lo­mo­tor nerve, trochlear nerve and abducens nerve, the sig­nal is final­ly sent to three extraoc­u­lar mus­cle pairs, via the III, IV and VI cra­nial nerves (Fig. 52, vio­let tract). Mus­cles con­tract or relax, caus­ing the eye to move, that is, the sac­cade, and thus estab­lish a new posi­tion of the cen­tre of the frame. This is how the oper­a­tion of per­form­ing a sin­gle eye move­ment comes to an end, which in nor­mal con­di­tions usu­al­ly takes no more than 1/3 of a second.

Smooth pursuit

Among the eye move­ments whose pur­pose is to keep a mov­ing object in the frame in a plane per­pen­dic­u­lar to the visu­al axis, the most impor­tant is the smooth pur­suit. The jus­ti­fi­ca­tion for dis­cussing it in the book devot­ed to the flat, still image is the fact that more and more often we see such images on the bod­ies of trams or bus­es, as well as on bill­boards pulled on trail­ers behind the car.

The aver­age speed of the smooth pur­suit is much low­er than the speed of sac­cades and it usu­al­ly does not exceed 30°/sec., except for its first phase. Smooth pur­suit begins approx. 100 — 150 ms after object move­ment is detect­ed (Büt­tner and Krem­my­da, 2007). From that moment, for 40ms, the eye accel­er­ates in the direc­tion of object move­ment, try­ing to “catch up” with it. The speed of the eye can increase even to 50°/sec dur­ing that time and — depend­ing on the speed of the object and the dis­tance from which it is viewed — the visu­al axis can move ahead of the object or remain some­what behind it. In any case, for the next 60ms the eye will rough­ly adjust its speed to the speed of the object. This is the first stage, called open-loop pur­suit. Its aim is sim­ply to ini­ti­ate the move­ment of the eye­balls and shift the visu­al axis in the direc­tion of the object move­ment and, if pos­si­ble, align the object at their inter­sec­tion.  The course of this stage is almost iden­ti­cal in humans and mon­keys (Carl and Gell­man, 1987; Krau­zlis and Lis­berg­er, 1994).

The sec­ond stage of the smooth pur­suit is to hold a gaze on a con­stant­ly mov­ing object. It is based on a closed-loop pur­suit. Sev­er­al times a sec­ond, the visu­al sys­tem checks a size of the dif­fer­ence between the posi­tion of a mov­ing object and the posi­tion of an eye, or more pre­cise­ly – the fovea. Ini­tial­ly, the adjust­ment of both posi­tions takes place 3–4 times per sec­ond and if the object moves more or less at a con­stant speed, it decreas­es to 2–3 times per sec­ond (Leigh and Zee, 2006). In this way, with the help of short sac­cades, the visu­al axes con­stant­ly inter­sect a mov­ing object approx­i­mate­ly in the same place. Since the sac­cades are small (and the fur­ther the object is from the observ­er, the short­er sac­cades are), the pur­suit move­ment gives the impres­sion that it is smooth rather than saccadic.

An inter­est­ing prop­er­ty of the smooth pur­suit is its iner­tia. When the pur­sued object sud­den­ly dis­ap­pears from the field of vision (i.e. a car, which is being observed, dis­ap­pears behind road­side trees), for approx. 4 sec­onds the eyes will con­tin­ue smooth pur­suit in the direc­tion estab­lished before the dis­ap­pear­ance of the object (despite its absence in the scene), and their speed will begin to slow­ly decrease, how­ev­er, to no less than 60% of the ini­tial speed.  If 4 sec­onds after its dis­ap­pear­ance the object does not reap­pear in the frame, smooth pur­suit ends (Barnes, 2008; Beck­er and Fuchs, 1985).

The mech­a­nism of smooth pur­suit inter­tia esti­mates the most like­ly time and place of objec­t’s emer­gence from behind the obstruc­tion, in order to enable the eyes to main­tain vision on it.  Of course, this effect is con­di­tioned by con­stant speed and known move­ment direc­tion of the object, as well as the pos­si­bil­i­ty to observe it for a cer­tain amount of time before it dis­ap­pears from the field of vision. 

It is also worth adding that human visu­al sys­tem is much bet­ter at main­tain­ing vision on mov­ing objects, which move hor­i­zon­tal­ly rather than ver­ti­cal­ly in a plane per­pen­dic­u­lar to the visu­al axis, and if their move­ment is ver­ti­cal, it is bet­ter when they are descend­ing not ascend­ing (Grasse and Lis­berg­er, 1992). This order reflects the fre­quen­cy with which we deal with a giv­en type of move­ment in a visu­al scene.  We most often see objects mov­ing in a hor­i­zon­tal plane (to the right or to the left), more rarely — in a ver­ti­cal plane, from the top to the bot­tom (for exam­ple falling objects), and even more rarely — in a ver­ti­cal plane, from the bot­tom to the top. 

Neurophysiological smooth pursuit model

Neu­ro­phys­i­o­log­i­cal mech­a­nism of smooth pur­suit com­bines three ocu­lo­mo­tor pro­grams: vestibu­lo-ocu­lar reflex, sac­cadic move­ment and spe­cif­ic smooth pur­suit. The sub­cor­ti­cal struc­tures, espe­cial­ly in the brain­stem, con­nect it to the vestibu­lo-ocu­lar reflex (Cullen, 2009). Most often, when fol­low­ing a slow­ly mov­ing object, we do not only let our eyes fol­low it, but we also move our head. The smooth pur­suit, how­ev­er, dif­fers from the reflex­ive visu­al fix­a­tion on an object when mov­ing the head, because it is pri­mar­i­ly much more voluntary.

Sim­i­lar­ly, both smooth pur­suit and sac­cadic move­ment are con­trolled by many of the same struc­tures, lying on the visu­al path­way, rang­ing from LGN, V1, MT, MST to FEF (Gross­berg, Sri­hasam and Bul­lock, 2012; Ilg, 2009). Unlike sac­cade move­ment, smooth pur­suit does not require the involve­ment of struc­tures locat­ed in the supe­ri­or col­licu­lus (SC).

Final­ly, the third mech­a­nism, which plays the most impor­tant role in the process of any ini­ti­a­tion and main­te­nance of the smooth pur­suit, involves spe­cif­ic cor­ti­cal struc­tures asso­ci­at­ed with the already known sac­cadic move­ments, the lead­ing cen­tre of the frontal eye field (FEF), such as the frontal pur­suit area (FPA) and the sup­ple­men­tary eye field (SEF) (Fig. 52).

Table of content