visatt

conda create -n py3att python=3.6.5 anaconda

source activate py3att

useful reading

2017 model to do eye fixation prediction, basically VGG with some weird bias layers on top Kruthiventi, S. S., Ayush, K., & Babu, R. V. (2017). Deepfix: A fully convolutional neural network for predicting human eye fixations. IEEE Transactions on Image Processing, 26(9), 4446-4456.

Early 2015 model to do eye fixation prediction, fixation-centered with variable image sizes, much closer to what you want to do Liu, N., Han, J., Zhang, D., Wen, S., & Liu, T. (2015, June). Predicting eye fixations using convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on (pp. 362-370). IEEE.

initial data

Imagenet: http://image-net.org/challenges/LSVRC/2012/index

Download from: http://www.inb.uni-luebeck.de/fileadmin/files/MITARBEITER/Dorr/EyeMovementDataSet.html

Video frames with eye tracking

model spec

Take an original video and eye-tracking data For each frame crop crop the video into 128x128, 256x256, and 512x512 pixel views, compressed to 64x64. Each view gets passed through a conv/relu/pool layer four times (64x64->32x32->16x16->8x8) with K kernels at each layer (16 maybe?) and all convolutions at 3x3 The final layer 8x8x16x3 gets concatenated to 8x8x48 and deconvolved to a 64x64 map in two steps (8x8x48->16x16x16->64x64x1) Final output is logistic not relu The output is compared to a gaussian-blurred map of the eye position in the window +175ms to +225 ms (center can be shifted but by default 200 ms)

Cross-entropy loss

Pre-train convolutional layers on imagenet with FC layers on the back end, this approximates alexnet but one very small images. Train a base model first on free-viewing data Re-train model for active viewing (e.g. search for people, etc)

datasets

Each dataset is pulled and stored in

Pulled from Mnih, V., Heess, N., & Graves, A. (2014). Recurrent models of visual attention. In Advances in neural information processing systems (pp. 2204-2212).

T. Judd, K. Ehinger, F. Durand, and A. Torralba, “Learning to predict where humans look,” ICCV
N. Bruce, J. Tsotsos, “Attention based on information maximization,” Journal of Vision, 2007.
S. Ramanathan et al., “An eye fixation database for saliency detection in images,” ECCV 2010.
K. Ehinger, et al., “Modeling search for people in 900 scenes: A combined source model of eye guidance,” Visual Cognition, vol. 17, pp. 945-978, 2009.
H. Hadizadeh, M. J. Enriquez, and I. V. Bajić, “Eye-tracking database for a set of standard video sequences,” IEEE Trans. Image Processing, vol. 21, no. 2, pp. 898-903, Feb. 2012.
S. Mathe and C. Sminchisescu, “Dynamic eye movement datasets and learnt saliency models for visual action recognition, ECCV 2012.
L. Itti, R. Carmi, “Eye-tracking data from human volunteers watching complex video stimuli,” 2009 CRCNS.org.
T. Judd, F. Durand, and A. Torralba, “A benchmark of computational models of saliency to predict human fixations,” Computer Science and Artificial Intelligence Laboratory Technical Report, 2012.
S. Winkler and R. Subramanian, “Overview of eye tracking datasets,” 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX), July 2013, Austria.
T. Judd, F. Durand, and A. Torralba: “Fixations on low-resolution images,” J. Vision, vol. 11, no. 4, April 2011, http://people.csail.mit.edu/tjudd/SaliencyBenchmark/
J. Li et al., “Visual saliency based on scale-space analysis in the frequency domain.” IEEE Trans. PAMI, vol. 35, no. 4, pp. 996–1010, April 2013.
M. Cerf, J. Harel, W. Einhäuser, C. Koch, “Predicting human gaze using low-level saliency combined with face detection,” in Proc. NIPS, vol. 20, pp. 241–248, Vancouver, Canada, Dec. 3–8, 2007.
M. Dorr, T. Martinetz, K. Gegenfurtner, E. Barth, “Variability of eye movements when viewing dynamic natural scenes,” J. Vision, vol. 10, no. 10, 2010. 36
J. Wang, D. M. Chandler, P. Le Callet, “Quantifying the relationship between visual salience and visual importance,” in Proc. SPIE, vol. 7527, San Jose, CA, Jan. 17–21, 2010.
G. Kootstra, B. de Boer, L. R. B. Schomaker, “Predicting eye fixations on complex visual stimuli using local symmetry,” Cognitive Computation, vol. 3, no. 1, pp. 223–240, 2011.
I. van der Linde, U. Rajashekar, A. C. Bovik, L. K. Cormack, “DOVES: A database of visual eye movements,” Spat. Vis., vol. 22, no. 2, pp. 161–177, 2009.
N. D. B. Bruce, J. K. Tsotsos, “Saliency based on information maximization.” in Proc. NIPS, vol. 19, pp. 155–162, Vancouver, Canada, Dec. 4–9, 2006.
H. Liu, I. Heynderickx, “Studying the added value of visual attention in objective image quality metrics based on eye movement data,” in Proc. ICIP, Cairo, Egypt, Nov. 7–10, 2009.
H. Alers, L. Bos, I. Heynderickx, “How the task of evaluating image quality influences viewing behavior,” in Proc. QoMEX, Mechelen, Belgium, Sept. 7–9, 2011.
J. Redi, H. Liu, R. Zunino, I. Heynderickx: “Interactions of visual attention and quality perception.” in Proc. SPIE, vol. 7865, San Jose, CA, Jan. 24–27, 2011.
U. Engelke, A. J. Maeder, H.-J. Zepernick, “Visual attention modeling for subjective image quality databases,” in Proc. MMSP, Rio de Janeiro, Brazil, Oct. 5–7, 2009.
P. K. Mital, T. J. Smith, R. L. Hill, J. M. Henderson, “Clustering of gaze during dynamic scene viewing is predicted by motion,” Cognitive Computation, vol. 3, no. 1, pp. 5–24, March 2011.
S. Péchard, R. Pépion, P. Le Callet, “Region-of-interest intra prediction for H.264/AVC error resilience,” in Proc. ICIP, Cairo, Egypt, Nov. 7–10, 2009.
U. Engelke, R. Pepion, P. Le Callet, H.-J. Zepernick, “Linking distortion perception and visual saliency in H.264/AVC coded video containing packet loss,” in Proc. VCIP, Huang Shan, China, July 11–14, 2010.
H. Alers, J. A. Redi, I. Heynderickx, “Examining the effect of task on viewing behavior in videos using saliency maps,” in Proc. SPIE, vol. 8291, San Jose, CA, Jan. 22–26, 2012.
L. Itti, “Automatic foveation for video compression using a neurobiological model of visual attention,” IEEE Trans. Image Processing, vol. 13, no. 10, pp. 1304–1318, Oct. 2004.
Z. Li, S. Qin, L. Itti, “Visual attention guided bit allocation in video compression,” Image Vis. Comp., vol. 29, no. 1, pp. 1–14, 2011.

dbirman / visatt

visatt

useful reading

initial data

model spec

datasets

About

Languages