Predictions from classifier trained on captions with nouns, verbs, adjectives, and adverbs substituted with placeholder words:
- Complex predictions, nouns masked
- Noncomplex predictions, nouns masked
- Complex predictions, nouns and verbs masked
- Noncomplex predictions
- Complex predictions, nouns, verbs, and adjectives masked
- Noncomplex predictions
- Complex predictions, nouns, verbs, adjectives, and adverbs masked
- Noncomplex predictions
MS-COCO 2017: Grayscale images excluded
- All complex/noncomplex val set images using meanshift segmented number of distinct regions as groundtruth
- All complex/noncomplex val set images containing instances of ‘bicycle’
- All complex/noncomplex val set images containing instances of ‘person’
BERT predicted complex/complex images:
- Predictions from best model trained on all 10,000 images in complexity dataset:
- Top 50 images predicted to be complex with highest probability
- Top 50 images predicted to be noncomplex with highest probability
- Predictions from best model trained on color image subset of full complexity dataset:
- Predictions from best model trained on color images with instances of category “accessory”:
- Predictions from best model trained on color images with instances of category “animal”:
- Predictions from best model trained on color images with instances of category “car”:
- Predictions from best model trained on color images with instances of category “food”:
- Predictions from best model trained on color images with instances of category “outdoor”:
- Predictions from best model trained on color images with instances of category “person”:
- Predictions from best model trained on color images with instances of category “vehicle”:
Click on the links below to view ~5,000 images from the MS-COCO 2017 validation set, ranked by automated visual complexity scores.
- Images ranked by # of distinct mean-shift segmented regions
Click on the links below to view ~5,000 images from the MS-COCO 2017 train set, ranked by automated visual complexity scores.
- Images ranked by # of distinct mean-shift segmented regions
- Top 11,828 most complex images (top 10% of train set)
- Bottom 11,828 least complex images (bottom 10% of train set)
Click on the links below to view ~5,000 images from the MS-COCO 2014 validation set, ranked by human and automated visual complexity scores. The scoring was conducted by vislang.
- Human-scored Images
- Automatically scored Images
- Images ranked by mean-shift segmented # of regions
- Images ranked by feature congestion
Citation: Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common Objects in Context. In D. Fleet, T. Pajdla, B. Schiele, & T. Tuytelaars (Eds.), Computer Vision – ECCV 2014 (Vol. 8693, pp. 740–755). Springer International Publishing. https://doi.org/10.1007/978-3-319-10602-1_48