I began my coding journey in the 6th grade, focusing on Arduino projects and IoT-based applications. Some of my early projects included obstacle-avoiding and line-following robots. As I grew older, my passion for music flourished, leading me to produce and write over six original tracks that have been published on more than 100 platforms, amassing millions of streams worldwide. After high school, I returned to my engineering roots, pursuing a career in Computer Science and Engineering. In my first year, I developed iOS applications using Swift, working extensively with frameworks like ARKit and RealityKit to integrate augmented reality into apps, and created engaging animations and entities with SwiftUI. My interest in AI & ML emerged during the starting of my second year, capturing my fascination and leading me to delve into the realm of Computer Vision and Generative AI. Currently, I am focused on research in this exciting field. While I haven’t published any papers yet, I am eager to contribute to the body of knowledge soon!
15/05/2024 - 10/07/2024
Worked under the guidance of Prof. Debanga Raj Neog to develop a Continuous Sign Language Recognition and Translation Framework that interprets gloss from sign language videos and converts it into spoken sentences. This two-part system includes a Video-to-Gloss model using CNN and Bi-LSTM for spatiotemporal feature extraction and sequence modeling, and a Gloss-to-Text Retrieval System utilizing Transformer architectures for natural language understanding and generation. Additionally, it features an emotion model based on valence-arousal theory, employing multimodal sentiment analysis to analyze and classify the emotional content in the sign language video. The framework also includes a TTS synthesis module based on WaveNet and Tacotron models to generate the translated text as audio.
27/01/2024 - 18/07/2021
Worked under the guidance of Prof. Anupam Agrawal to develop a Deep Learning framework for understanding and analyzing crowd behaviors in video footage, focusing on the detection of anomalies and unusual events. The framework utilizes Computer Vision concepts such as LSTM, ConvNets, and MIL to develop a deep anomaly classification model that categorizes video segments into eight distinct real-life crimes.
20/10/2023 - 30/06/2024
Worked under the guidance of Prof. Kripabandhu Ghosh to develop a novel Ayurveda Retrieval Framework that inputs patient-like representations, correlates them with medical representations, and predicts the correct Ayurvedic disease based on Ayurvedic symptoms (Doshas, Dhatus, Srotas). The framework utilizes a curated dataset sourced from Ayurvedic institute doctors, ensuring its novelty. Our approach has demonstrated remarkable effectiveness, achieving a score 77.01 percent higher than vanilla large language models. Specifically, the framework uses Gemini Pro as its core component, while integrating RAG to augment its capabilities with domain-specific Ayurvedic knowledge. This combination allows for more accurate and contextually relevant predictions of Ayurvedic diseases based on patient representations.
As a Beta MLSA among over 4,000 peers, I developed and hosted web applications and machine learning models using Microsoft Azure and AzureML. I actively engaged with the community by organizing events and inspiring peers with Microsoft technologies. Additionally, I conducted various sessions on AI, including those in the NLP domain.
As the Project & Research Head at Manipal University Jaipur’s ACM Student Chapter, I led various events and developed multiple projects. My responsibilities included setting goals and objectives, allocating resources, and ensuring projects were completed within specified timelines and budgets.