COCO Dataset Visualizations

Predictions from classifier trained on captions with nouns, verbs, adjectives, and adverbs substituted with placeholder words:

MS-COCO 2017: Grayscale images excluded

All complex/noncomplex val set images using meanshift segmented number of distinct regions as groundtruth
All complex/noncomplex val set images containing instances of ‘bicycle’
All complex/noncomplex val set images containing instances of ‘person’

BERT predicted complex/complex images:

Predictions from best model trained on all 10,000 images in complexity dataset:
- Top 50 images predicted to be complex with highest probability
- Top 50 images predicted to be noncomplex with highest probability
Predictions from best model trained on color image subset of full complexity dataset:
- Complex
- Noncomplex
Predictions from best model trained on color images with instances of category “accessory”:
- Complex
- Noncomplex
Predictions from best model trained on color images with instances of category “animal”:
- Complex
- Noncomplex
Predictions from best model trained on color images with instances of category “car”:
- Complex
- Noncomplex
Predictions from best model trained on color images with instances of category “food”:
- Complex
- Noncomplex
Predictions from best model trained on color images with instances of category “outdoor”:
- Complex
- Noncomplex
Predictions from best model trained on color images with instances of category “person”:
- Complex
- Noncomplex
Predictions from best model trained on color images with instances of category “vehicle”:
- Complex
- Noncomplex

Click on the links below to view ~5,000 images from the MS-COCO 2017 validation set, ranked by automated visual complexity scores.

Images ranked by # of distinct mean-shift segmented regions
- Top 500 most complex images
- Bottom 500 least complex images

Click on the links below to view ~5,000 images from the MS-COCO 2017 train set, ranked by automated visual complexity scores.

Images ranked by # of distinct mean-shift segmented regions
- Top 11,828 most complex images (top 10% of train set)
  - Page 1/10
  - Page 2/10
  - Page 3/10
  - Page 4/10
  - Page 5/10
  - Page 6/10
  - Page 7/10
  - Page 8/10
  - Page 9/10
  - Page 10/10
- Bottom 11,828 least complex images (bottom 10% of train set)
  - Page 1/10
  - Page 2/10
  - Page 3/10
  - Page 4/10
  - Page 5/10
  - Page 6/10
  - Page 7/10
  - Page 8/10
  - Page 9/10
  - Page 10/10

Click on the links below to view ~5,000 images from the MS-COCO 2014 validation set, ranked by human and automated visual complexity scores. The scoring was conducted by vislang.

Human-scored Images
Automatically scored Images
Images ranked by mean-shift segmented # of regions
- Page 1/4
- Page 2/4
- Page 3/4
- Page 4/4
Images ranked by feature congestion
- Page 1/4
- Page 2/4
- Page 3/4
- Page 4/4

Citation: Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common Objects in Context. In D. Fleet, T. Pajdla, B. Schiele, & T. Tuytelaars (Eds.), Computer Vision – ECCV 2014 (Vol. 8693, pp. 740–755). Springer International Publishing. https://doi.org/10.1007/978-3-319-10602-1_48