About Me

I'm currently a second year Masters student in Computer Science at the University of Massachusetts Amherst, graduating in December 2018. I received my Bachelor of Technology in Electronics and Communication Engineering from the International Institute of Information Technology Hyderabad.

I am in my greatest element when working with any kind of visual data. My research interests lie in the areas of Computer Vision, Machine Learning, and Affective Computing. In the past, my work has involved using data of multiple modalities - visual, audio, EEG, gaze - all of which are both exciting and challenging to work with. I have also recently started exploring the domain of Visual Question Answering and NLP.

For the past two years, I have been a Google Summer of Code and Google Code-In mentor for an open-source organization called Red Hen Lab. Red Hen is a global laboratory and consortium for research into the theory of multimodal communication. My collaboration with them involves developing computational tools that can assist in extracting useful information from their large dataset, NewsScape. You can find out more about them here.


Affect Recognition in Advertisements

As advertisements contain strongly emotional content capable of leaving a lasting impression on the viewer, characterization of affective content in ads can facilitate online advertising, and improve end-user experience. This work involves examination of affective content in ads via human and computational perspectives. Specifically, a framework incorporating CNN features is designed to estimate affective content (emotional valence and arousal) in advertisements. We also explicitly compare content-centric and user-centric ad AR methodologies, and evaluate the impact of enhanced AR on computational advertising via a user study.

Two papers accepted at ACM International Conference on Multimedia (ACM MM) 2017 and ACM International Conference on Multimodal Interaction (ACM ICMI) 2017!

Visual Question Answering on FigureQA - Maluuba

In this group project, I worked along with Microsoft Maluuba researchers to build models which can perform the task of reasoning on their FigureQA dataset. FigureQA is a synthetic dataset of over a million question-answer pairs and images of scientific charts and graphs. We implemented ideas based on task specific architectures such as Relation Networks, as well as generic architectures like FiLM, that matched the state-of-the-art performance on FigureQA.

News Shot Classification (Google Summer of Code)

This is a Google Summer of Code project for the Red Hen Lab organization. This system is capable of automatically tagging an archive of over 300,000 news videos, based on their visual content. Specifically, I developed a classifier which categorizes news shots as one of Newsperson(s), Background roll, Graphics, Weather, or Sports. In addition, the system also extracts information on detected people (using YOLO), objects and scene information from the Places205 dataset. The entire pipeline is deployed onto the Case Western Reserve University High Performance Computing cluster, and is tagging every new day’s news videos. For a more detailed documentation, you can check out my blog!

Fast Neural Style Transfer and Artist Identification

This project explores methods for artistic style transfer based on convolutional neural networks. The core idea proposed by Gatys et. al became very popular and with further research Johnson et. al overcame a major limitation to achieve style transfer in real-time (for example: Prisma app). We implement two research papers which are capable of achieving fast mixed style transfer. Another problem we worked on is artist identification of fine art paintings. It is a challenging problem primarily handled by art historians, and useful for digitizing a vast variety of artworks.Additionally, we also attempt to perform an experiment by assessing our artist identification models on stylized images of unseen content.

Soccer Video Analytics

While watching a game on TV, we automatically perceive certain information based on what we see. An analytics system should be able process this information, as well as information which is not as obvious (game statistics). This work aims to track player and ball positions, and generate event statistics. The project mainly involves the exploration of various image processing techniques.

Gaze Driven Video Editing

Due to the numerous types of media devices, a video of certain aspect ratio will need to be viewed on various devices with different aspect ratios. Displaying a video on a different aspect ratio may cause content to be modified or reduce user appeal. The system aims to preserve the narrative or impact of the video by re-editing based on the implicit information revealed in tracking viewers' gaze.

Things I Love to Do

  • Open Source Dev
  • Visit new places
  • Research, any kind
  • Whistle any tune or play it on a flute
  • Read Sci-Fi
  • Play Mario Bros.

Contact Me

  • Currently Located:
    University of Massachusetts Amherst, USA

  • Email:
  • Links