Week 2 (May 23-27, 2022)

This week I solidified my understanding of BERT and the Transformer architecture. With the help of some tutorials from Professor Ordóñez-Román, I practiced finetuning models like ResNet and BERT for image and text classification on datasets like SUN and MM-IMDb. We spent some of our meetings this week reviewing these tutorials.

One of the most challenging exercises was to customize and train a CNN for scene classification with PyTorch, improving upon a basic model that initially only had 30% maximum accuracy on the validation set so that it could achieve at least 50% accuracy. From this exercise I emerged with a better understanding of the convolution operation, batch normalization, and image augmentation. I also got a lot more practice with PyTorch syntax and commonly used functions/methods.

I ran into issues with GPU and RAM usage limits on Google Collab, which made me appreciate all the more getting access to the vislang lab servers this week. With some Linux CLI pointers from Professor Ordóñez-Román, now I can run scripts (and even remotely run Jupyter notebooks) via VPN connection to Rice. I did have a bit of a problem with matching my installations of PyTorch and CUDA to the lab machines’ GPU’s at first, but was able to fix this after some sleuthing on StackOverflow. In general, I got a lot more practice with package managers (pip, anaconda), setting up and properly using virtual environments, and the Jupyter Lab interface.

Another stumbling block was running out of space on one of the lab servers. Luckily, there are two servers, and for now I can use the other one.

In our last meeting this afternoon, Professor Ordóñez-Román and I discussed next steps for choosing a dataset and image complexity metric for my project (identifying complex images by their textual descriptions). Before our next meeting I’ll be reading more papers on possible datasets to use and taking notes on important information like size, availability of captions, etc., as well as different definitions of “complexity.” Once we choose a dataset, I’ll need to do some preliminary inspection of the data and generate summary statistics like average caption length. Depending on the dataset, we may have to collect our own caption/image descriptions or generate complexity scores, since not all datasets have all the components necessary for the task as formulated.

I also attended the DREU Kick-off Webinar this afternoon, which introduced us to some helpful tips for getting the most out of this program. Additionally, I got my housing assignment at Rice this week and more information on how to access things like laundry and the dining halls (hard to believe I’ll be there in just over two weeks!) My assigned dorm is even near the Computer Science Department building.

Back to homepage

Written on May 27, 2022