About Me

Welcome to my corner of the internet! I am a 4th year PhD student at the University of Pennsylvania advised by Prof. Mayur Naik. My research interests span Programming Languages and Machine Learning. Specifically, my research leverages techniques from program synthesis and analysis to build tools and frameworks to enable machine learning practitioners effectively understand where their models fail, and ways to fix them. My other research interests include developing program synthesis techniques to streamline software analysis, bug finding, and code generation.


News

  • Our paper "TorchQL: A Programming Framework for Integrity Constraints in Machine Learning" is conditionally accepted in OOPSLA 2024. Try out TorchQL here, and read our manuscript here.
  • I am very grateful to be awarded the 2023 Google PhD Fellowship in Programming Technology and Software Engineering.
  • Our paper "Relational Query Synthesis ⨝ Decision Tree Learning" will appear in VLDB 2024.

Research

While machine learning has seen several advances in recent years, with models achieving state-of-the-art performance on a variety of tasks, analyzing and understanding these models and their failures is an ad-hoc and often chaotic process. This is exacerbated by the lack of tools and frameworks that allow practitioners to interactively explore their models in a manner that is intuitive and easily accessible. My research aims to bridge this gap by developing novel techniques and tools to allow the systemic analysis and debugging of machine learning models.

TorchQL

One of the key challenges in analyzing machine learning models is the lack of a uniform interface to interact with them. While there are several tools that allow practitioners to analyze their models, these tools are often limited in scope and are not easily extensible to new models or tasks. To address this, I am developing a novel querying language called TorchQL that allows practitioners to directly query their models and datasets in a style akin to querying frameworks like MongoDB and SQL. This allows practitioners to craft intricate and complex queries that characterize the errors in their models and can generalize to identify similar errors in unseen data as well as in other models. These queries can identify a range of issues, from simple classification errors to violations of domain knowledge, biases and labeling errors in the training data, distribution shift, and more. Read more about TorchQL in our paper here.

SQRL

In addition to TorchQL, my framework SQRL (pronounced squirrel) uses data-driven program synthesis techniques to characterize the errors in machine learning models in terms of grounded concepts and relations intuitive to practitioners. You can read more about SQRL in our blog post here.

Program Synthesis

I am also interested in using machine learning to develop program synthesis techniques, specifically Inductive Logic Programming (ILP) and applying them to a variety of tasks, including software analysis, bug finding, and code generation.

If any of this interests you, I am actively looking for collaborators and would love to chat! Feel free to reach out to me by email here.


Publications

Recent Manuscripts

Interactive Code Generation via Test-Driven User-Intent Formalization
Shuvendu K. Lahiri*, Aaditya Naik*, Georgios Sakkas*, Piali Choudhury, Curtis von Veh, Madanlal Musuvathi, Jeevana Priya Inala, Chenglong Wang, Jianfeng Gao

Conference Papers

TorchQL: A Programming Framework for Integrity Constraints in Machine Learning
Aaditya Naik, Adam Stein, Yinjun Wu, Mayur Naik, Eric Wong
Relational Query Synthesis ⨝ Decision Tree Learning
Aaditya Naik, Aalok Thakkar, Adam Stein, Mayur Naik, Rajeev Alur
Do Machine Learning Models Learn Statistical Rules Inferred from Data?
Aaditya Naik, Yinjun Wu, Mayur Naik, Eric Wong
CodeTrek: Flexible Modeling of Code using an Extensible Relational Representation
Pardis Pashakhanloo, Aaditya Naik, Yuepeng Wang, Hanjun Dai, Petros Maniatis, Mayur Naik
Sporq: An Interactive Environment for Exploring Code Using Query-by-Example
Aaditya Naik, Jonathan Mendelson, Nathaniel Sands, Yuepeng Wang, Mayur Naik, Mukund Raghothaman
Example-Guided Synthesis of Relational Queries
Aalok Thakkar, Aaditya Naik, Nate Sands, Mukund Raghothaman, Mayur Naik, Rajeev Alur
GenSynth: Synthesizing Datalog Programs without Language Bias
Jonathan Mendelson*, Aaditya Naik*, Mukund Ragothaman, Mayur Naik
Code2Inv: A Deep Learning Framework for Program Verification
Xujie Si*, Aaditya Naik*, Hanjun Dai, Mayur Naik, Le Song

Workshop Papers

Learning to Walk over Relational Graphs of Source Code
Pardis Pashakhanloo, Aaditya Naik, Hanjun Dai, Petros Maniatis, Mayur Naik