Papers

Outrunning LLM Cutoffs - A Live Kernel Crash Resolution Benchmark For All

ResourcesPaper ICML

I worked with Chenxi Huang on Live-kBench which alleviates static benchmark drawbacks such as LLM knowledge-cutoff contamination. We built KEnv, an agent-agnostic computer-use environment that decouples agent workflows from heavy-weight kernel compilation and execution. We curated Live-kBench, a continuously-updated benchmark of kernel bugs, and empirically showed that agents perform up to 25% better on bugs from before their knowledge cutoff than on freshly discovered ones.

✦

kAgent: An execution-guided crash resolution agent for the Linux Kernel

ResourcesPaper DL4C - ICML

I worked with Chenxi Huang, on a workflow-based agent inspired by how kernel developers diagnose and fix kernel bugs. The agent inspects relevant execution logs, generates execution-grounded hypotheses, and iteratively synthesizes and validates candidate patches. To support this workflow, we also built KGym++, a toolstack for efficient crash reproduction, execution-trace extraction, and patch validation.

✦

kBench - A Benchmark & Platform to test LLMs On Linux Kernel Crash Resolution

ResourcesPaper Project NeurIPS

I worked with Chenxi Huang to establish the first benchmark and platform that tests LLMs on bug resolution in the Linux kernel. Through our experiments, we showed that LLMs have a large scope for improvement when resolving bugs in low-level and complicated software like kernel code.

✦

COMEX - Generating Customized Source Code Representations

ResourcesPaper Project ASE

I worked with Debeshee Das, Noble Saji Mathews, and Srikanth Tamilselvam (Manager, IBM Research) on creating tools that generate customized source code representations for any generic code snippet (i.e. complete, incomplete, or uncompilable code). This capability is very useful when we wish to use static analysis on incomplete code being fed to an LLM.

✦

Graph Neural Networks For The Recommendation Of Candidate Microservices

ResourcesPaper Project IJCAI

I worked with Srikanth Tamilselvam (Manager, IBM Research) on the Candidate Microservice Advisor project. In this research thread, we experimented with different techniques to represent application software as graphs, which we then partition into smaller sized groups using clustering mechanisms. To this end, I helped in translating this decomposition task as a constrained clustering problem over an embedding space learnt by a heterogeneous graph neural network.

✦

Knowledge Graph Modelling For Mainframe Application Modernization

ResourcesPaper CODS-COMAD

I worked with Amith Singhee (Director, IBM Research) on creating research tools that simplify how we modernize legacy applications. I helped model legacy mainframe codebases as a very fine-grained knowledge graph (KG). Using this KG, we developed methods that allow application architects to make data driven decisions. Such informed decisions allow for a smooth incremental modernization journey of legacy codebases.

✦

Network Traffic Classification And Estimating User Experience

ResourcesPaper IWQoS

I worked on the classification of encrypted network traffic and on estimating user experience in internet applications. The network domain proves to be much more challenging than language and vision because of infinitely many error-inducing factors like network congestion, different network architectures, and various bandwidth capacities. I was able to research and extract robust and reliable features that are immune to varying conditions, and that provide a clear signal for fast and accurate classification.

✦

Adversarial Black-Box Attacks On Text Classifiers Using Genetic Algorithms Guided By Deep Networks

ResourcesPaper Arxiv

My work predominantly focused on studying the robustness of popular text classification models against adversarial attacks. I successfully created adversarial examples using genetic algorithms guided by deep neural networks for many state-of-the-art text classifiers including BERT, RoBERTa, and DistilBERT. The research contributed to understanding vulnerabilities in natural language processing models and improving their defensive capabilities.