Active vision

An area of computer vision is active vision, sometimes also called active computer vision. An active vision system is one that can manipulate the viewpoint of the camera(s) in order to investigate the environment and get better information from it.[http://axiom.anu.edu.au/~rsl/rsl_active.html http://axiom.anu.edu.au/~rsl/rsl_active.html]{{Cite journal |doi = 10.1016/0004-3702(91)90080-4|title = Animate vision|journal = Artificial Intelligence|volume = 48|pages = 57–86|year = 1991|last1 = Ballard|first1 = Dana H.}}{{Cite journal |doi = 10.1007/BF00133571|title = Active vision|journal = International Journal of Computer Vision|volume = 1|issue = 4|pages = 333–356|year = 1988|last1 = Aloimonos|first1 = John|last2 = Weiss|first2 = Isaac|last3 = Bandyopadhyay|first3 = Amit| s2cid=25458585 }}{{cite journal |doi=10.1109/TAMD.2014.2341351 |url=https://www.researchgate.net/publication/263653330|title=Ecological Active Vision: Four Bioinspired Principles to Integrate Bottom–Up and Adaptive Top–Down Attention Tested with a Simple Camera-Arm Robot|journal=IEEE Transactions on Autonomous Mental Development|volume=7|pages=3–25|year=2015|last1=Ognibene|first1=Dimitri|last2=Baldassare|first2=Gianluca|doi-access=free|hdl=10281/301362|hdl-access=free}}

Background

The interest in active camera system started as early as two decades ago. Beginning in the late 1980s, Aloimonos et al. introduced the first general framework for active vision in order to improve the perceptual quality of tracking results. Active vision is particularly important to cope with problems like occlusions, limited field of view and limited resolution of the camera.{{cite book |doi=10.1109/ICCV.2003.1238372 |chapter=Information theoretic focal length selection for real-time active 3D object tracking|title=Proceedings Ninth IEEE International Conference on Computer Vision|pages=400–407 vol.1|year=2003|last1=Denzler|last2=Zobel|last3=Niemann|s2cid=17622133|isbn=978-0-7695-1950-0|citeseerx=10.1.1.122.1594}} Other advantages can be reducing the motion blur of a moving object {{cite journal |doi=10.1023/A:1008166825510 |url=https://www.researchgate.net/publication/220659402 |title=Control of a Camera for Active Vision: Foveal Vision, Smooth Tracking and Saccade|year=2000|last1=Rivlin|first1=Ehud|journal=International Journal of Computer Vision|volume=39|issue=2|pages=81–96|last2=Rotstein|first2=Héctor|s2cid=8737891 }} and enhancing depth perception of an object by focusing two cameras on the same object or moving the cameras.

Active control of the camera view point also helps in focusing computational resources on the relevant element of the scene.{{Cite journal |doi = 10.1167/11.5.5|pmid = 21622729|pmc = 3134223|title = Eye guidance in natural vision: Reinterpreting salience|journal = Journal of Vision|volume = 11|issue = 5|pages = 5|year = 2011|last1 = Tatler|first1 = B. W.|last2 = Hayhoe|first2 = M. M.|last3 = Land|first3 = M. F.|last4 = Ballard|first4 = D. H.}} In this selective aspect, active vision can be seen as strictly related to (overt & covert) visual attention in biological organisms, which has been shown to enhance the perception of selected part of the visual field. This selective aspect of human (active) vision can be easily related to the foveal structure of the human eye,{{cite journal |doi=10.1109/34.206959 |url=https://www.researchgate.net/publication/220183132|title=On the advantages of polar and log-polar mapping for direct estimation of time-to-impact from optical flow|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=15|issue=4|pages=401–410|year=1993|last1=Tistarelli|first1=M.|last2=Sandini|first2=G.|citeseerx=10.1.1.49.9595}} where in about 5% of the retina more than the 50% of the colour receptors are located.

It has also been suggested that visual attention and the selective aspect of active camera control can help in other tasks like learning more robust models of objects and environments with less labeled samples or autonomously

.{{cite journal |doi=10.1016/j.cviu.2004.09.004 |url=http://bwlab.utoronto.ca/wp-content/uploads/2014/10/walther_etal2005cviu.pdf|title=Selective visual attention enables learning and recognition of multiple objects in cluttered scenes|journal=Computer Vision and Image Understanding|volume=100|issue=1–2|pages=41–63|year=2005|last1=Walther|first1=Dirk|last2=Rutishauser|first2=Ueli|last3=Koch|first3=Christof|last4=Perona|first4=Pietro|citeseerx=10.1.1.110.976}}

{{cite book |last1=Larochelle |first1=H. |last2=Hinton |first2=G. |chapter=Learning to combine foveal glimpses with a third-order Boltzmann machine |date=6 December 2010 |chapter-url=http://papers.nips.cc/paper/4089-learning-to-combine-foveal-glimpses-with-a-third-order-boltzmann-machine.pdf |title=Proceedings of the 23rd International Conference on Neural Information Processing Systems |volume=1 |pages=1243–1251}}

Approaches

=The autonomous camera approach=

Autonomous cameras are cameras that can direct themselves in their environment. There has been some recent work using this approach. In work from Denzler et al., the motion of a tracked object is modeled using a Kalman filter while the focal length that minimizes the uncertainty in the state estimations is the one that is used. A stereo set-up with two zoom cameras was used. A handful of papers have been written for zoom control and do not deal with total object-camera position estimation. An attempt to join estimation and control in the same framework can be found in the work of Bagdanov et al., where a Pan-Tilt-Zoom camera is used to track faces.{{cite book |doi=10.1109/ICPR.2006.700 |chapter-url=https://www.researchgate.net/publication/220928949|chapter=Improving evidential quality of surveillance imagery through active face tracking|title=18th International Conference on Pattern Recognition (ICPR'06)|pages=1200–1203|year=2006|last1=Bagdanov|first1=A.D.|last2=Del Bimbo|first2=A.|last3=Nunziati|first3=W.|isbn=978-0-7695-2521-1|s2cid=2273696 }} Both the estimation and control models used are ad hoc, and the estimation approach is based on image features rather than 3D properties of the target being tracked.{{cite book |doi=10.1007/978-0-85729-997-0_2|chapter=Beyond the Static Camera: Issues and Trends in Active Vision|title=Visual Analysis of Humans|pages=11–30|year=2011|last1=Al Haj|first1=Murad|last2=Fernández|first2=Carles|last3=Xiong|first3=Zhanwu|last4=Huerta|first4=Ivan|last5=Gonzàlez|first5=Jordi|last6=Roca|first6=Xavier|isbn=978-0-85729-996-3}}

=The master/slave approach=

In a master/slave configuration, a supervising static camera is used to monitor a wide field of view and to track every moving target of interest. The position of each of these targets over time is then provided to a foveal camera, which tries to observe the targets at a higher resolution. Both the static and the active cameras are calibrated to a common reference, so that data coming from one of them can be easily projected onto the other, in order to coordinate the control of the active sensors. Another possible use of the master/slave approach consists of a static (master) camera extracting visual features of an object of interest, while the active (slave) sensor uses these features to detect the desired object without the need of any training data.{{Cite journal |doi = 10.1016/j.cviu.2011.09.011|title = Cognitive visual tracking and camera control|journal = Computer Vision and Image Understanding|volume = 116|issue = 3|pages = 457–471|year = 2012|last1 = Bellotto|first1 = Nicola|last2 = Benfold|first2 = Ben|last3 = Harland|first3 = Hanno|last4 = Nagel|first4 = Hans-Hellmut|last5 = Pirlo|first5 = Nicola|last6 = Reid|first6 = Ian|last7 = Sommerlade|first7 = Eric|last8 = Zhao|first8 = Chuan| s2cid=4937663 |url = http://eprints.lincoln.ac.uk/4823/1/Bellotto2011preprint.pdf}}

In recent years there has been growing interest in building networks of active cameras and optional static cameras so that you can cover a large area while maintaining high resolution of multiple targets. This is ultimately a scaled-up version of either the master/slave approach or the autonomous camera approach. This approach can be highly effective, but also incredibly costly. Not only are multiple cameras involved but you also must have them communicate with each other which can be computationally expensive. 6th Jeff Foster

=Controlled active vision framework=

Controlled active vision can be defined as a controlled motion of a vision sensor can maximize the performance of any robotic algorithm that involves a moving vision sensor. It is a hybrid of control theory and conventional vision. An application of this framework is real-time robotic servoing around static or moving arbitrary 3-D objects. See Visual Servoing. Algorithms that incorporate the use of multiple windows and numerically stable confidence measures are combined with stochastic controllers in order to provide a satisfactory solution to the tracking problem introduced by combining computer vision and control. In the case where there is an inaccurate model of the environment, adaptive control techniques may be introduced. The above information and further mathematical representations of controlled active vision can be seen in the thesis of Nikolaos Papanikolopoulos.{{cite thesis |last=Papanikolopoulos |first=Nikolaos Panagiotis |year=1992 |title=Controlled Active Vision |publisher=Carnegie Mellon University |type=PhD Thesis}}

Examples

Examples of active vision systems usually involve a robot mounted camera,{{cite journal

| last = Mak

| first = Lin Chi

|author2= Furukawa, Tomonari|author3= Whitty, Mark

| title = A localisation system for an indoor rotary-wing MAV using blade mounted LEDs

| journal = Sensor Review

| volume = 28

| issue = 2

| pages = 125–131

| year = 2008

| url = http://www.emeraldinsight.com/Insight/viewContentItem.do;jsessionid=C09740D87448784EC2417E7121F71978?contentType=Article&contentId=1714559

| doi = 10.1108/02602280810856688| url-access = subscription

| hdl = 1959.4/38231

| hdl-access = free

}} but other systems have employed human operator-mounted cameras (a.k.a. "wearables").[http://webdiis.unizar.es/~jdtardos/papers/2007_RSS_Clemente.pdf Mapping Large Loops with a Single Hand-Held Camera].

LA Clemente, AJ Davison, ID Reid, J Neira, JD Tardós - Robotics: Science and Systems, 2007 Applications include automatic surveillance, human robot interaction [http://www.thrish.org/research-team/dimitri-ognibene (video)],{{Cite journal |doi = 10.1016/j.robot.2006.02.003|title = Hierarchical attentive multiple models for execution and recognition of actions|journal = Robotics and Autonomous Systems|volume = 54|issue = 5|pages = 361–369|year = 2006|last1 = Demiris|first1 = Yiannis|last2 = Khadhouri|first2 = Bassam|citeseerx = 10.1.1.226.5282}}[https://www.researchgate.net/publication/236631626_Towards_active_event_recognition Towards active event recognition D Ognibene, Y Demiris The 23rd International Joint Conference of Artificial Intelligence (IJCAI13)] SLAM, route planning,[http://www.surrey.ac.uk/eng/research/mechatronics/robots/Activities/ActiveVision/activevis.html http://www.surrey.ac.uk/eng/research/mechatronics/robots/Activities/ActiveVision/activevis.html] {{webarchive |url=https://web.archive.org/web/20070817170942/http://www.surrey.ac.uk/eng/research/mechatronics/robots/Activities/ActiveVision/activevis.html |date=August 17, 2007 }} etc. In the DARPA Grand Challenge most of the teams used LIDAR combined with active vision systems

to guide driverless vehicles across an off-road course.

A good example of active vision can be seen in this YouTube video. It shows face tracking using active vision with a pan-tilt camera system. https://www.youtube.com/watch?v=N0FjDOTnmm0

Active Vision is also important to understand how humans.Findlay, J. M. & Gilchrist, I. D. Active Vision, The Psychology of Looking and Seeing Oxford University Press, 2003{{cite journal |doi=10.1016/j.preteyeres.2006.01.002 |pmid=16516530|url=http://invibe.net/biblio_database_dyva/woda/data/att/e17f.file.09656.pdf|title=Eye movements and the control of actions in everyday life|journal=Progress in Retinal and Eye Research|volume=25|issue=3|pages=296–324|year=2006|last1=Land|first1=Michael F.|s2cid=18946141 }}

and organism endowed with visual sensors, actually see the world considering the limits of their sensors, the richness and continuous variability of the visual signal and the effects of their actions and goals on their perception.{{Cite journal |doi = 10.1371/journal.pcbi.0020144|pmid = 17069456|pmc = 1626158|title = Mapping Information Flow in Sensorimotor Networks|journal = PLOS Computational Biology|volume = 2|issue = 10|pages = e144|year = 2006|last1 = Lungarella|first1 = Max|last2 = Sporns|first2 = Olaf|bibcode = 2006PLSCB...2..144L | doi-access=free }}{{Cite journal |doi = 10.1038/nature02024|pmid = 14534588|title = Environmentally mediated synergy between perception and behaviour in mobile robots|journal = Nature|volume = 425|issue = 6958|pages = 620–624|year = 2003|last1 = Verschure|first1 = Paul F. M. J.|last2 = Voegtlin|first2 = Thomas|last3 = Douglas|first3 = Rodney J.|bibcode = 2003Natur.425..620V| s2cid=4418697 }}

The controllable active vision framework can be used in a number of different ways. Some examples might be vehicle tracking, robotics applications,{{Cite book |doi=10.1109/ACV.1994.341311 |citeseerx=10.1.1.40.3470|chapter=Application of the controlled active vision framework to robotic and transportation problems|title=Proceedings of 1994 IEEE Workshop on Applications of Computer Vision|pages=213–220|year=1994|last1=Smith|first1=C.E.|last2=Papanikolopoulos|first2=N.P.|last3=Brandt|first3=S.A.|isbn=978-0-8186-6410-6|s2cid=9735967 }} and interactive MRI segmentation.{{Cite book |doi = 10.1109/CDC.2011.6161453|pmid = 24584213|pmc = 3935399|chapter = Interactive MRI segmentation with controlled active vision|title = 2011 50th IEEE Conference on Decision and Control and European Control Conference |journal = |pages = 2293–2298|year = 2011|last1 = Karasev|first1 = Peter|last2 = Kolesov|first2 = Ivan|last3 = Chudy|first3 = Karol|last4 = Tannenbaum|first4 = Allen|last5 = Muller|first5 = Grant|last6 = Xerogeanes|first6 = John|isbn = 978-1-61284-801-3}}

Interactive MRI segmentation uses controllable active vision by using a Lyapanov control design to establish a balance between the influence of a data-driven gradient flow and the human’s input over time. This smoothly couples automatic segmentation with interactivity. More information on this method can be found in. Segmentation in MRIs is a difficult subject, and it takes an expert to trace out the desired segments due to the MRI picking up all fluid and tissue. This could prove impractical because it would be a very lengthy process. Controllable active vision methods described in the cited paper could help improve the process while relying on the human less.

External links

[http://www.robots.ox.ac.uk/ActiveVision/ Active Vision Group] at Oxford University.
[http://www.psy.ed.ac.uk/people/jbrockmo/avl.html Active Vision Laboratory] at University of Edinburgh.
[https://web.archive.org/web/20080720012025/http://cmr.mech.unsw.edu.au/research_areas?q=node%2F27 Active Vision Tracking System for MAV] developed by University of New South Wales.

References

Category:Computer vision