Amey Hengle


Hello! I am a Predoctoral Researcher / Research Associate at the Laboratory for Computational Social Systems (LCS2), IIT Delhi, advised by Dr. Tanmoy Chakraborty. My research interests lie at the intersection of machine learning and natural language processing, with a particular focus on social computing, LLMs, and low-resourced languages. Before coming to LCS2, I was a Machine Learning Engineer at a top Indian product-based startup - SKIT.ai, where I headed multiple projects spanning spoken language understanding (SLU), dialog systems, and speech.

I have completed my graduation in Computer Engineering from PVG’s College of Engineering and Technology (affiliated to Savitribai Phule Pune University). During the final year of my graduation, I researched under the supervision of Prof. Manisha Marathe. Our study focused on discovering psychological cues from online mental-health communities pertaining to PTSD (Post Traumatic Stress Disorder) and CPTSD (Complex Post-Traumatic Stress Disorder). I’ve also worked as a Research Intern (NLP) at Optimum Data Analytics, Pune. During this period, I worked on NLP solutions for the low-resourced Indian languages and a conversational AI chatbot for addressing mental health concerns.

For a quick overview of my work, please see the Projects section.

Links: [Resume] [Github] [Google Scholar]



Publications

  • Combining Context-Free and Contextualized Representations for Arabic Sarcasm Detection and Sentiment Identification. [Paper]
    Amey Hengle, Atharva Kshirsagar, Shaily Desai and Manisha Marathe.
    EACL 2021 Workshop on Arabic Natural Language Processing (WANLP).

  • Cluster Analysis of Online Mental Health Discourse using Topic-Infused Deep Contextualized Representations. [Paper]
    Atharva Kulkarni, Amey Hengle, Pradnya Kulkarni, and Manisha Marathe.
    EACL 2021 Workshop on Health Text Mining and Information Analysis (LOUHI).

  • An Attention Ensemble Approach for Efficient Text Classification of Indian Languages. [Paper]
    Atharva Kulkarni, Amey Hengle, and Rutuja Udyawar.
    The 17th International Conference on Natural Language Processing (ICON 2020).

  • Smart Cap: A Deep Learning and IoT Based Assistant for the Visually Impaired. [Paper]
    Amey Hengle, Atharva Kulkarni, Nachiket Bavadekar, Niraj Kulkarni, and Rutuja Udyawar.
    The third IEEE International Conference on Smart Systems and Inventive Technology (ICSSIT 2020)


Experience

Machine Learning Engineer

Skit.ai
  • Currently working on the end-to-end design, implementation and deployment of SKIT’s voicebot product
  • My primary responsibilities include improving the voicebot’s NLU, NER, and Distress-Detection capabilities by implementing multilingual, multimodal (audio and text) PLMs
  • I am also working on projects like label-noise detection, unsupervised intent discovery, and domain-biasing our in-house speech-to-text software.
Aug 2021 - Present

Data Science Engineer

Twimbit
  • Lead the design and development of Twimbit’s unsupervised topic discovery, semantic search similarity, and kubernetes-integration projects.
  • Improved overall topic discovery by 35% after implementing a hybrid PLM, which combined feature vectors obtained from an LDA model with contextual embeddings from XLM-Roberta
April 2021 - July 2021

Research Colaborator

Cognitive and Behavioural Neuroscience Lab, IIT Bombay
  • Currently working on two research problems in the area of computational social science and clinical psychology
  • The first project entails the linguistic analysis and classification of Depression-Anxiety comorbid posts from Reddit.
  • As part of the second project, I am working on explainable deep neural networks for depression classification from social-media posts.
May 2021 - Present

Research Assistant

CS Department, PVG’s college of engineering | Advisor: Prof. Manisha Marathe
  • Researched the literature revolving around applications of natural language processing in computational psychology, with a particular focus on the discourse pertaining to mental-health issues on social media platforms.
  • Proposed Topic-Infused Deep Contextualized Representations (TIDCR), a novel document representation method that combines the contexualized representations from RoBERTa with topical representations from LDA using a denoise concatenated autoencoder.
  • Performed dimensionality reduction using UMAP. Employed the clustering techniques like hdbscan and kmeans.
  • Collaborated with a clinial-psychology professor for the qualititive categorization of each cluster and identify the latent thematics,
Aug 2020 -Feb 2021

Research Intern (NLP)

Optimum Data Analytics
  • Researched the contemporary literature in computational psychology and natural language processing to formulate a solution that integrates the standard psychological conversational flows and tests with a chatbot.
  • Developed a retrieval-based chatbot using DialogFlow, Flask and Firebase. Improved the data-analysis pipeline by caching user conversations on server
  • Worked on an attention ensemble CNN-BiLSTM model for the TechDofication Shared Task-1f at ICON 2020. The submitted system not only topped the shared task rankings but also surpassed the organizer’s system in terms of performance.
Aug 2020- Dec 2020

Capstone Project Intern

Optimum Data Analytics
  • Built a Deep Learning and IoT-based assistant with the multiple features of Face Recognition, Image Captioning, Text Recognition (OCR) and online news scrapping.
  • Implemented face recognition using opencv and python. Deployed the model on an ubuntu web server using Flask, Javascript and Ajax.
May 2020- July 2020

Application Developer Intern

Schlumberger
  • Developed an application software for automating data pipelines in SAP using REST, Postman and TkInter.
  • Represented the Schlumberger Cloud-For-Customer (C4C) team at the SLB’s Global Hackathon challenge 2019.
  • Deployed existing API servers on Google Apigee using Javascript and REST.
June 2019- Aug 2019

Projects

A Hybrid transformer-based model for irony detection in Arabic.
  • Developed a multi-channel hybrid model to detect sarcasm in Arabic Tweets. The system was a part of our team SPPU_AASM’s submission in the ArSarcasm Shared Task-2021.
  • The model combines word representations generated using AraBERT, a language specific transformer-based model, with static word vectors trained on Arabic Twitter corpus.
  • Our model outperformed multiple baseline models for the given task, securing a 2nd rank amongst 34 teams in the sarcasm detection subtask. [Project]
An Attention Ensemble model for Marathi text classification.
  • Built a custom text-processing pipeline for the low-resourced Devanagari languages.
  • Implemented a CNN-BiLSTM parallel architecture with attention mechanism to classify short paragraphs in Marathi language into their respective technical domain.
  • The proposed model was a part of our team, SPPU_AKAH’s submission to the ICON-2020 shared task. It ranked 1st for the subtask 1-F, outperforming even the organizer’s model. [Project]

Buddy-Bot: A conversational AI for stress detection
  • BuddyBot AI is a retrieval-based chatbot system. It is designed to act as a conversational-starter before users engage in any psychological counselling sessions.
  • The system comes integrated with a sentiment identification mechanism to keep track of the polarity of a conversation and identify any implicit negative sentiments.
  • The conversational flows of the chatbot are designed in accordance to the Perceived Stress Scale.
  • The chatbot’s conversational user interface is developed using DialogFlow. The system backend is developed using a combination of Flask (for handling web-requests) and Firebase (for real time tracking of messages). To handle multiple-conversations at a time, the system is deployed on an Ubuntu server.
  • Currently, the chatbot supports Telegram as its primary user-interface. [Project]


Smart Cap: AI powered visual assistant
  • The smart-cap is a visual-assistance built to aid people with partial or complete vision impairment. The system employs deep learning techniques over an embedded system (Raspberry Pi-4) as the central computation unit.
  • The Face Recognition module is based on openCV and dlib's face recognition project.
  • The Image Captioning module is developed using an encoder-decoder architecture with a Resnet-101 (Pretrained on ImageNet) and an LSTM with attention model as the encoder and decoder, respectively. The model is trained on the Microsoft COCO 2014 dataset.
  • OCR is implemented using Google's Vision API, combined with statistical approaches like z-scores to identify whether a page is divided into one or more columns.
  • Voice interface was integrated using Pyttsx3 and GSTT for Text-to-Speech and Speech-to-Text, respectively. [Demo]

Dynamic-ship-routing-algorithm
  • Developed a graph-based strategy to connect all lat-long coordinates in a shipping lane.
  • Used beam-search to find the best inter-node route between two ports. [Project]


nautical-calculations
  • Nautical-calculations is a first-of-its-kind python library that implements the theoretical geo-spatial calculations such as bearing angle, rhumb line and great-circle distance in python.
  • The library also supports custom-user functions such as getting the midpoints coordinate or equidistant points on a rhumb line or a great circle. [PyPI package]

Achievements

  • Secured 1st rank in ICON TechDOfication shared task.
  • Secured 2nd rank in WANLP-ArSarcasm shared task.
  • Selected in top-20 (out of 30k) teams at ZS healthcare innovation competition. st
  • Bagged the 2nd Prize at the ASPIRE 2020, a national level project competetion organized by the Computer Society of India (CSI) for final-year capstone project.

Contact