机器学习可解释性历史最全方法、论文及资源整理分享

机器学习可解释性历史最全方法、论文及资源整理分享

    机器学习业务应用以输出决策判断为目标。可解释性是指人类能够理解决策原因的程度。机器学习模型的可解释性越高,人们就越容易理解为什么做出某些决定或预测。模型可解释性指对模型内部机制的理解以及对模型结果的理解。其重要性体现在:建模阶段,辅助开发人员理解模型,进行模型的对比选择,必要时优化调整模型;在投入运行阶段,向业务方解释模型的内部机制,对模型结果进行解释。比如基金推荐模型,需要解释:为何为这个用户推荐某支基金。

     

    本资源整理了可解释机器学习领域的各种方法,以及每个方法最新的论文、代码数据等资源,分享给需要的朋友。

     

    资源整理自网络,源地址:

https://github.com/stefanoteso/awesome-explanatory-supervision

 

目录

机器学习可解释性历史最全方法、论文及资源整理分享

 

Passive Learning

     Approaches that supervise the model's explanations.

    

    Tangent Prop - A formalism for specifying selected invariances in an adaptive network Patrice Simard, Bernard Victorri, Yann Le Cun, John Denker; NeurIPS 1992 paper Notes: injects invariances into a neural net by regularizing its gradient; precursor to learning from gradient-based explanations.

    

    Right for the right reasons: training differentiable models by constraining their explanations Andrew Slavin Ross, Michael C. Hughes, and Finale Doshi-Velez; IJCAI 2017 paper

    

    Interpretable Machine Teaching via Feature Feedback Shihan Su, Yuxin Chen, Oisin Mac Aodha, Pietro Perona, Yisong Yue; Workshop on Teaching Machines, Robots, and Humans 2017 paper

    

    Deriving Machine Attention from Human Rationales Yujia Bao, Shiyu Chang, Mo Yu, and Regina Barzilay; ACL 2019 paper

    

    Do Human Rationales Improve Machine Explanations? Strout, Julia, Ye Zhang, Raymond Mooney; ACL Workshop BlackboxNLP 2019 paper

    

    Concept bottleneck models Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang; ICML 2020 paper

    

    Debiasing Concept Bottleneck Models with Instrumental Variables Mohammad Taha Bahadori, and David E. Heckerman; arXiv 2020 paper

    

    Learning Global Transparent Models Consistent with Local Contrastive Explanations Tejaswini Pedapati, Avinash Balakrishnan, Karthikeyan Shanmugam, Amit Dhurandhar; NeurIPS 2020 paper

    

    Reflective-Net: Learning from Explanations Johannes Schneider, Michalis Vlachos; arXiv 2020 paper

    

    Teaching with Commentaries Aniruddh Raghu, Maithra Raghu, Simon Kornblith, David Duvenaud, and Geoffrey Hinton; arXiv 2020 paper

    

    Improving performance of deep learning models with axiomatic attribution priors and expected gradients Gabriel Erion, Joseph D. Janizek, Pascal Sturmfels, Scott Lundberg, Su-In Lee; arXiv 2020 paper

    

    Interpretations are useful: penalizing explanations to align neural networks with prior knowledge Laura Rieger, Chandan Singh, William Murdoch, Bin Yu; ICML 2020 paper

    

    When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data Peter Hase, Mohit Bansal; arXiv 2020 paper

    

Interactive Learning

    Approaches that combine supervision on the explanations with interactive machine learning:

    

    Principles of Explanatory Debugging to Personalize Interactive Machine Learning Todd Kulesza, Margaret Burnett, Weng-Keen Wong, Simone Stumpf; IUI 2015 paper

    

    Explanatory Interactive Machine Learning Stefano Teso, Kristian Kersting; AIES 2019 paper Note: introduces explanatory interactive learning, focuses on active learning setup.

    

    Taking a hint: Leveraging explanations to make vision and language models more grounded Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, and Devi Parikh; ICCV 2019 pdf

    

    Toward Faithful Explanatory Active Learning with Self-explainable Neural Nets Stefano Teso; IAL Workshop 2019. paper Note: explanatory active learning with self-explainable neural networks.

    

    Making deep neural networks right for the right scientific reasons by interacting with their explanations Patrick Schramowski, Wolfgang Stammer, Stefano Teso, Anna Brugger, Franziska Herbert, Xiaoting Shao, Hans-Georg Luigs, Anne-Katrin Mahlein, Kristian Kersting; Nature Machine Intelligence 2020 paper Note: introduces end-to-end explanatory interactive learning, fixes clever Hans deep neural nets.

    

    One explanation does not fit all Kacper Sokol, Peter Flach; 2020 Künstliche Intelligenz paper

    

    Human-in-the-loop Debugging Deep Text Classifiers Piyawat Lertvittayakumjorn, Lucia Specia, Francesca Toni; EMNLP 2020 paper

    

    Human-driven FOL explanations of deep learning Gabriele Ciravegna, Francesco Giannini, Marco Gori, Marco Maggini, Stefano Melacci; IJCAI 2020 paper Notes: first-order logic.

    

    Machine Guides, Human Supervises: Interactive Learning with Global Explanations Teodora Popordanoska, Mohit Kumar, Stefano Teso; arXiv 2020 paper Note: introduces narrative bias and explanatory guided learning, focuses on human-initiated interaction and global explanations.

    

    Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations Wolfgang Stammer, Patrick Schramowski, and Kristian Kersting; arXiv 2020 paper Notes: first-order logic, attention.

    

    Right for Better Reasons: Training Differentiable Models by Constraining their Influence Function Xiaoting Shao, Arseny Skryagin, Patrick Schramowski, Wolfgang Stammer, Kristian Kersting; AAAI 2021 preliminary paper

    

    Bandits for Learning to Explain from Explanations Freya Behrens, Stefano Teso, Davide Mottin; XAI Workshop 2021 paper Notes: preliminary.

    

Distillation

    Model reconstruction from model explanations Smitha Milli, Ludwig Schmidt, Anca D. Dragan, Moritz Hardt; FAcct 2019 paper

    

    Evaluating Explanations: How much do explanations from the teacher aid students? Danish Pruthi, Bhuwan Dhingra, Livio Baldini Soares, Michael Collins, Zachary C. Lipton, Graham Neubig, and William W. Cohen; arXiv 2020 paper Notes: defines importance of different kinds of explanations by measuring their impact when used as supervision.

    

Regularization without Supervision

    Approaches that regularize the model's explanations in an unsupervised manner, often for improved interpretability.

    

    Beyond sparsity: Tree regularization of deep models for interpretability Mike Wu, Michael Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez; AAAI 2018 paper

    

    Regional tree regularization for interpretability in deep neural networks Mike Wu, Sonali Parbhoo, Michael Hughes, Ryan Kindle, Leo Celi, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez; AAAI 2020 paper

    

    Regularizing black-box models for improved interpretability Gregory Plumb, Maruan Al-Shedivat, Ángel Alexander Cabrera, Adam Perer, Eric Xing, Ameet Talwalkar; NeurIPS 2020 paper

    

Related Works

    Explanation-based learning, focuses on logic-based formalisms and learning strategies:

    

    Explanation-based generalization: A unifying view Tom Mitchell, Richard Keller, Smadar Kedar-Cabelli; MLJ 1986 paper

    

    Explanation-based learning: A survey of programs and perspectives Thomas Ellman; ACM Computing Surveys 1989 paper

    

    Probabilistic explanation based learning Angelika Kimmig, Luc De Raedt, Hannu Toivonen; ECML 2007 paper

    

    Injecting invariances / feature constraints into models:

    

    Training invariant support vector machines Dennis DeCoste, Bernhard Schölkopf; MLJ 2002 paper

    

    The constrained weight space svm: learning with ranked features Kevin Small, Byron Wallace, Carla Brodley, Thomas Trikalinos; ICML 2011 paper

    

    Dual label-feature feedback:

    

    Active learning with feedback on features and instances Hema Raghavan, Omid Madani, Rosie Jones; JMLR 2006 paper

    

    An interactive algorithm for asking and incorporating feature feedback into support vector machines Hema Raghavan, James Allan; ACM SIGIR 2007 paper

    

    Learning from labeled features using generalized expectation criteria Gregory Druck, Gideon Mann, Andrew McCallum; ACM SIGIR 2008 paper

    

    Active learning by labeling features Gregory Druck, Burr Settles, Andrew McCallum; EMNLP 2009 paper

    

    A unified approach to active dual supervision for labeling features and examples Josh Attenberg, Prem Melville, Foster Provost; ECML-PKDD 2010 paper

    

    Closing the loop: Fast, interactive semi-supervised annotation with queries on features and instances Burr Settles; EMNLP 2011 paper

    

    Learning from rationales:

    

    Using “annotator rationales” to improve machine learning for text categorization Omar Zaidan, Jason Eisner, Christine Piatko; NAACL 2007 paper

    

    Modeling annotators: A generative approach to learning from annotator rationales Omar Zaidan, Jason Eisner; EMNLP 2008 paper

    

    Active learning with rationales for text classification Manali Sharma, Di Zhuang, Mustafa Bilgic; NAACL 2015 paper

    

    Critiquing in recommenders:

    

    Critiquing-based recommenders: survey and emerging trends Li Chen, Pearl Pu; User Modeling and User-Adapted Interaction 2012 paper

    

    Coactive critiquing: Elicitation of preferences and features Stefano Teso, Paolo Dragone, Andrea Passerini; AAAI 2017 paper

    

Resources

    A selection of general resources on Explainable AI focusing on overviews, surveys, societal implications, and critiques:

    

    Survey and critique of techniques for extracting rules from trained artificial neural networks Robert Andrews, Joachim Diederich, Alan B. Tickle; Knowledge-based systems 1995 page

    

    The Mythos of Model Interpretability Zachary Lipton; CACM 2016 paper

    

    A survey of methods for explaining black box models Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi; ACM Computing Surveys 2018 paper

    

    Explanation in Artificial Intelligence: Insights from the Social Sciences Tim Miller; Artificial Intelligence, 2019 paper

    

    Unmasking clever hans predictors and assessing what machines really learn Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller; Nature Communications 2019 paper

    

    Interpretation of neural networks is fragile Amirata Ghorbani, Abubakar Abid, James Zou; AAAI 2019 paper

    

    Is Attention Interpretable? Sofia Serrano, Noah A. Smith; ACL 2019 paper

    

    Attention is not Explanation Sarthak Jain, and Byron C. Wallace; ACL 2019 paper

    

    Attention is not not Explanation Sarah Wiegreffe, and Yuval Pinter; EMNLP-IJCNLP 2019 paper

    

    The (un)reliability of saliency methods Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, and Been Kim; Explainable AI: Interpreting, Explaining and Visualizing Deep Learning 2019 paper

    

    Explanations can be manipulated and geometry is to blame Ann-Kathrin Dombrowski, Maximillian Alber, Christopher Anders, Marcel Ackermann, Klaus-Robert Müller, and Pan Kessel; NeurIPS 2019 paper

    

    Fooling Neural Network Interpretations via Adversarial Model Manipulation Juyeon Heo, Sunghwan Joo, and Taesup Moon; NeurIPS 2019 paper

    

    Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead Cynthia Rudin; Nature Machine Intelligence 2019 page

    

    Related Resources

    Awesome explainable AI

    

    Awesome machine learning interpretability

    

    Not Yet Sorted

    e-SNLI: natural language inference with natural language explanations Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, and Phil Blunsom; NeurIPS 2018 paper

    

    Multimodal explanations: Justifying decisions and pointing to the evidence Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach; CVPR 2018 paper

    

    "Learning Deep Attribution Priors Based On Prior Knowledge Ethan Weinberger, Joseph Janizek, Su-In Lee; NeurIPS 2020 paper

    

    Explain and Predict, and then Predict Again Zijian Zhang, Koustav Rudra, Avishek Anand; arXiv 2021 paper

    

上一篇:连夜帮美女小姐姐爬取文献,第二天早晨给她一个Excel文件


下一篇:原创 | 算法工程师为什么成天做数据,都做哪些数据? **