-
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations; 2019 [pdf]
Dan Hendrycks, Thomas Dietterich
-
Unsupervised Domain Adaptation by Backpropagation; https://arxiv.org/abs/1409.7495 [pdf]
Yaroslav Ganin, Victor Lempitsky
-
Detecting and Correcting for Label Shift with Black Box Predictors; https://arxiv.org/abs/1409.7495 [pdf]
Zachary C. Lipton, Yu-Xiang Wang, Alex Smola
-
Intriguing properties of neural networks; 2014 [pdf]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus
-
Explaining and Harnessing Adversarial Examples; 2015 [pdf]
Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy
-
Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks; CAV 2017 [pdf]
Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer
-
An abstract domain for certifying neural networks; POPL 2019 [ pdf]
Gagandeep Singh, Timon Gehr, Markus Püschel, Martin Vechev
-
Provable defenses against adversarial examples via the convex outer adversarial polytope; ICML 2018 [pdf]
Eric Wong, J. Zico Kolter
-
Certified adversarial robustness via randomized smoothing; ICML 2019 [pdf]
Jeremy M Cohen, Elan Rosenfeld, J. Zico Kolter
-
Jailbreaking Black Box Large Language Models in Twenty Queries; 2023 [pdf]
Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong
-
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks; 2023 [pdf]
Alexander Robey, Eric Wong, Hamed Hassani, George J. Pappas
-
On Calibration of Modern Neural Networks; 2017 [pdf]
Chuan Guo, Geoff Pleiss, Yu Sun, Kilian Q. Weinberger
-
Calibrated Prediction with Covariate Shift via Unsupervised Domain Adaptation; 2020 [pdf]
Sangdon Park, Osbert Bastani, Jim Weimer, Insup Lee
-
A tutorial on conformal prediction; 2007 [pdf]
Glenn Shafer, Vladimir Vovk
-
PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction; 2020 [pdf]
Sangdon Park, Osbert Bastani, Nikolai Matni, Insup Lee
-
PAC Prediction Sets Under Label Shift; 2024 [pdf]
Wenwen Si, Sangdon Park, Insup Lee, Edgar Dobriban, Osbert Bastani
-
PAC Prediction Sets Under Covariate Shift; 2022 [pdf]
Sangdon Park, Edgar Dobriban, Insup Lee, and Osbert Bastani
-
PAC Prediction Sets for Large Language Models of Code; 2023 [pdf]
Adam Khakhar, Stephen Mell, Osbert Bastani
-
TRAC: Trustworthy Retrieval Augmented Chatbot; 2024 [pdf]
Shuo Li, Sangdon Park, Insup Lee, Osbert Bastani
-
Distinguishing Two Dimensions of Uncertainty; 2011 [pdf]
Craig Fox, Gülden Ülkümen
-
Deep Exploration via Bootstrapped DQN; 2016 [pdf]
Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy
-
Simple and scalable predictive uncertainty estimation using deep ensembles; 2017 [pdf]
Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell
-
Fairness Through Awareness; 2012 [pdf]
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, Rich Zemel
-
Equality of Opportunity in Supervised Learning; 2016 [pdf]
Moritz Hardt, Eric Price, Nathan Srebro
-
Inherent Trade-Offs in the Fair Determination of Risk Scores; 2016 [pdf]
Jon Kleinberg, Sendhil Mullainathan, Manish Raghavan
-
Counterfactual Fairness; 2017 [pdf]
Matt J. Kusner, Joshua R. Loftus, Chris Russell, Ricardo Silva
-
Calibration for the (Computationally-Identifiable) Masses; 2016 [pdf]
Úrsula Hébert-Johnson, Michael P. Kim, Omer Reingold, Guy N. Rothblum
-
FairSquare: probabilistic verification of program fairness; 2017 [pdf]
Aws Albarghouthi, Loris D'Antoni, Samuel Drews, Aditya Nori
-
Verifying Fairness Properties via Concentration; 2019 [pdf]
Osbert Bastani, Xin Zhang, Armando Solar-Lezama
-
Algorithms for Fairness in Sequential Decision Making; 2021 [pdf]
Min Wen, Osbert Bastani, Ufuk Topcu
-
Rethinking Fairness for Human-AI Collaboration; 2024 [pdf]
Haosen Ge, Hamsa Bastani, Osbert Bastani
-
SmoothGrad: removing noise by adding noise; Workshop on Visualization for Deep Learning, ICML 2017 [pdf]
Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, Martin Wattenberg
-
A Unified Approach to Interpreting Model Predictions; NeurIPS 2017 [pdf]
Scott Lundberg, Su-In Lee
-
"Why Should I Trust You?": Explaining the Predictions of Any Classifier; KDD 2016 [pdf]
Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin
-
Stability Guarantees for Feature Attributions with Multiplicative Smoothing; NeurIPS 2023 [pdf]
Anton Xue, Rajeev Alur, Eric Wong
-
Counterfactual Visual Explanations; ICML 2019 [pdf]
Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, Stefan Lee
-
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV); ICML 2018 [pdf]
Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres
-
Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR; Harvard Journal of Law and Technology 2018 [pdf]
Sandra Wachter, Brent Mittelstadt, Chris Russell
Understanding Black-box Predictions via Influence Functions; ICML 2017 [pdf]
Pang Wei Koh, Percy Liang
-
Datamodels: Predicting Predictions from Training Data; ICML 2022 [pdf]
Andrew Ilyas, Sung Min Park, Logan Engstrom, Guillaume Leclerc, Aleksander Madry
-
TRAK: Attributing Model Behavior at Scale; ICML 2023 [pdf]
Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, Aleksander Madry
-
Scallop: A Language for Neurosymbolic Programming; PLDI 2023 [pdf]
Ziyang Li, Jiani Huang, Mayur Naik
-
Relational Programming with Foundation Models; AAAI 2024 [pdf]
Ziyang Li, Jiani Huang, Jason Liu, Felix Zhu, Eric Zhao, William Dodds, Neelay Velingker, Rajeev Alur, Mayur Naik