About

4th year Ph.D. Student

Department of Electrical & Computer Engineering

Worcester Polytechnic Institute, MA, USA

Welcome to my website! I am pursuing my Ph.D. under the supervision of Dr. Bashima Islam at the at the Department of Electrical and Computer Engineering, WPI .

My research centers on spatial acoustic reasoning for Audio Large Language Models, focusing on how AI systems can infer geometry, location, and physical context from sound using stepwise, verifiable reward for transparent and interpretable reasoning. I also study multi-modal large language models (MLLMs) that integrate audio, speech, sensor, and vision inputs, with an emphasis on efficient modality switching for egocentric perception in dynamic real-world environments.

I worked as a Research Scientist Intern at Meta Reality Labs with the Audio Research Group in the summer of 2024 and as a Part-Time Student Researcher in Fall 20224 with the same group. Previously, I worked as a Software Engineer, AI & IoT at Advanced Chemical Industries Limited (ACI) .

Prior to joining WPI, I completed my Bachelor's in Electrical and Electronic Engineering from Bangladesh University of Engineering and Technology (BUET).

Please take the time to visit my website to learn more about myself, my research, and my professional experiences. Whether you are a fellow engineer, researcher, potential collaborator, or just interested in talking to me about something, please feel free to contact me via email.


Seeking Opportunities

I'm actively looking for internship, full-time industrial research scientist, and post-doc positions focused on Reasoning Models through Verifiable Rewards, Multimodal Learning, and Generative Modeling, and related fields. Reach out to me at sbiswas@wpi.edu if you think I would be a good fit!


News

01-2026

My internship work at Meta Reality Labs on hair-noise suppression for Ray-Ban Meta Glasses was accepted at ICASSP, 2026

10-2025

Received Peter B. Myers Graduate Fellowship from Dept. of ECE, WPI.

08-2025

RAVEN got accepted at EMNLP Main Conference, 2025.

07-2025

LOCUS got accepted at EWSN, 2025.

06-2025

EgoAdapt got accepted at ICCV, 2025.

05-2025

Our paper QUADS got accepted at INTERSPEECH, 2025.

12-2024

Received Master of Science in Electrical and Computer Engineering from WPI.

08-2024

I will be joining Meta Reality Labs as Part-Time Student Researcher.

06-2024

Our paper got accepted at INTERSPEECH, 2024.

05-2024

I will be joining Meta Reality Labs as Research Scientist Intern.

05-2024

Our paper FreeML got accepted at EWSN, 2024.

11-2023

Passed Ph.D. diagonstic exam.

05-2023

Started Working as Graduate Research Assistant at BASH Lab, ECE, WPI.

08-2022

Started Working as Graduate Teaching Assistant at Dept. of ECE, WPI.

08-2022

I will be starting my Ph.D. at BASH Lab, ECE, WPI.

02-2021

Received my Bachelor of Science in Electrical and Electronic Engineering from Bangladesh University of Engineering and Technology (BUET).

02-2021

Defended my undergrad thesis titled "A Deep Learning Based Energy Efficient Downlink Power Control Mechanism for Cellular Networks".

01-2021

I will be starting as Software Engineer, AI & IoT at ACI, Limited.


Education

Ph.D. in Electrical and Computer Engineering
Worcester Polytechnic Institute, Worcester, MA, USA
August 2022 - May 2027 (Expected)
Tentative Thesis Title: Toward Robust and Efficient Reasoning in Perceptually Grounded Multi-modal Large Language Models
MSc. in Electrical and Computer Engineering
Worcester Polytechnic Institute, Worcester, MA, USA
August 2022 - December 2024
BSc. in Electrical and Electronic Engineering
Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
February 2016 - February 2021
Thesis: A Deep Learning Based Energy Efficient Downlink Power Control Mechanism for Cellular Networks

Publications

HAIR NOISE ANALYSIS AND MITIGATION FOR SMART GLASSES AUDIO CAPTURES
Subrata Biswas, Daniel Wong, Bashima Islam, Sanjeel Parekh, Vladimir Tourbabin
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'26)
DOI PDF Video Code
Head-worn devices such as augmented-reality (AR) and smart glasses introduce a previously overlooked form of audio degrada- tion: hair noise, caused by the wearer’s hair brushing against device frames and embedded microphones. To the best of our knowledge, this phenomenon has not been systematically studied. This paper addresses this gap through three contributions. First, we conduct a user study quantifying the perceptual annoyance of hair noise. Sec- ond, we introduce the Hair Noise Mitigation (HNM) dataset, the first multi-channel corpus of hair noise collected across diverse real- world conditions. We further characterize its spectral and spatial properties, revealing a non-stationary and directionally dependent nature. Finally, we propose online and offline semi-supervised non- negative matrix factorization (NMF) methods as benchmark miti- gation approaches, showing perceptual gains that motivate further research. Together, these contributions establish hair noise as a dis- tinct challenge for wearable audio systems and lay the groundwork for tailored enhancement techniques.
RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language
Subrata Biswas, Mohammad Nur Hossain Khan, Bashima Islam
Empirical Methods in Natural Language Processing (EMNLP'25) (main)
DOI PDF Video Code
Multimodal question answering (QA) often requires identifying which video, audio, or sensor tokens are relevant to the question. Yet modality disagreements are common: off-camera speech, background noise, or motion outside the field of view often mislead fusion models that weight all streams equally. We present RAVEN, a unified QA architecture whose core is QuART, a query-conditioned cross-modal gating module that assigns scalar relevance scores to each token across modalities, enabling the model to amplify informative signals and suppress distractors before fusion. RAVEN is trained through a three-stage pipeline comprising unimodal pretraining, query-aligned fusion, and disagreement-oriented fine-tuning - each stage targeting a distinct challenge in multi-modal reasoning: representation quality, cross-modal relevance, and robustness to modality mismatch.
LOCUS – LOcalization with Channel Uncertainty and Sporadic Energy
Subrata Biswas, Mohammad Nur Hossain Khan, Alex Colwell, Jack Adiletta, Bashima Islam
International Conference On Embedded Wireless Systems and Networks (EWSN'25)
DOI PDF Video Code
Accurate sound source localization (SSL), such as direction-of-arrival (DoA) estimation, relies on consistent multichannel data. However, batteryless systems often suffer from missing data due to the stochastic nature of energy harvesting, degrading localization performance. We propose LOCUS, a deep learning framework that recovers corrupted features in such settings. LOCUS integrates three modules: (1) Information-Weighted Focus (InFo) to identify corrupted regions, (2) Latent Feature Synthesizer (LaFS) to reconstruct missing features, and (3) Guided Replacement (GRep) to restore data without altering valid inputs.
QUADS: QUAntized Distillation Framework for Efficient Speech Language Understanding
Subrata Biswas, Mohammad Nur Hossain Khan, Bashima Islam
INTERSPEECH'25
DOI PDF Video Code
Spoken Language Understanding (SLU) systems must balance performance and efficiency, particularly in resource-constrained environments. Existing methods apply distillation and quantization separately, leading to suboptimal compression as distillation ignores quantization constraints. We propose QUADS, a unified framework that optimizes both through multi-stage training with a pre-tuned model, enhancing adaptability to low-bit regimes while maintaining accuracy.
Missingness-resilient Video-enhanced Multimodal Disfluency Detection
Payal Mohapatra, Shamika Likhite, Subrata Biswas, Bashima Islam, Qi Zhu
INTERSPEECH'24
DOI PDF Video Code
Most existing speech disfluency detection techniques only rely upon acoustic data. In this work, we present a practical multimodal disfluency detection approach that leverages available video data together with audio. We curate an audio-visual dataset and propose a novel fusion technique with unified weight-sharing modality-agnostic encoders to learn the temporal and semantic context.
Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems
Subrata Biswas, Pietro Farina, Eren Yildiz, Khakim Akhunov, Saad Ahmed, Bashima Islam, Kasim Sinan Yildirim
International Conference On Embedded Wireless Systems and Networks (EWSN'24)
DOI PDF Video Code
Batteryless systems frequently face power failures, requiring extra runtime buffers to maintain inference progress and leaving only a memory space for storing ultra-tiny deep neural networks (DNNs). We combat these issues by proposing FreeML, a framework to optimize pre-trained DNN models for memory-efficient and energy-adaptive inference on batteryless systems.

Work Experience

Graduate Research Assistant
BASH LAB, Worcester, MA, USA
May 2022 – Present
Part-Time Student Researcher
Meta Reality Labs, Redmond, WA, USA
August 2024 – Present
Research Scientist Intern
Meta Reality Labs, Redmond, WA, USA
May 2024 – August 2024
Graduate Teaching Assistant
Dept. of ECE, WPI, Worcester, MA, USA
August 2022 – May 2023
Software Engineer, AI & IoT
Advanced Chemical Industries Limited, Dhaka, Bangladesh
February 2021 – August 2022

Awards

10-2025

Received Peter B. Myers Graduate Fellowship from Dept. of ECE, WPI.

08-2022

1st Runner up at Robi Datathon 2.0.

10-2020

5th at IEEE Video and Image Processing Cup.

04-2020

4th at IEEE Signal Processing Cup.

06-2019

Winner of Bangladesh Section, IEEE YESIST12 Innovation Challenge 2019.