COCO Dataset Visualizations

Predictions from classifier trained on captions with nouns, verbs, adjectives, and adverbs substituted with placeholder words:

MS-COCO 2017: Grayscale images excluded

BERT predicted complex/complex images:

  • Predictions from best model trained on all 10,000 images in complexity dataset:
    • Top 50 images predicted to be complex with highest probability
    • Top 50 images predicted to be noncomplex with highest probability
  • Predictions from best model trained on color image subset of full complexity dataset:
  • Predictions from best model trained on color images with instances of category “accessory”:
  • Predictions from best model trained on color images with instances of category “animal”:
  • Predictions from best model trained on color images with instances of category “car”:
  • Predictions from best model trained on color images with instances of category “food”:
  • Predictions from best model trained on color images with instances of category “outdoor”:
  • Predictions from best model trained on color images with instances of category “person”:
  • Predictions from best model trained on color images with instances of category “vehicle”:

Click on the links below to view ~5,000 images from the MS-COCO 2017 validation set, ranked by automated visual complexity scores.

Click on the links below to view ~5,000 images from the MS-COCO 2017 train set, ranked by automated visual complexity scores.

Click on the links below to view ~5,000 images from the MS-COCO 2014 validation set, ranked by human and automated visual complexity scores. The scoring was conducted by vislang.

Citation: Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common Objects in Context. In D. Fleet, T. Pajdla, B. Schiele, & T. Tuytelaars (Eds.), Computer Vision – ECCV 2014 (Vol. 8693, pp. 740–755). Springer International Publishing. https://doi.org/10.1007/978-3-319-10602-1_48