One of the great challenges of modern neural network computing is the mysterious internal organization of these systems. We know exactly how they are designed and what information they use to develop their decisions. But we don’t generally understand the underlying logic that can be responsible for very important decisions.
Max Tegmark’s project dives into the works of artificial neural networks to develop a system for rendering their inner functions more comprehensible.
The project addresses the challenge to “investigate making artificial intelligence’s internal workings interpretable so that seemingly moral decisions made by machines can be fully understood by humans.” (Diverse Intelligences Morality in the Machine Age Challenge Statement, Part 6). Such full understanding may show that the seemingly moral systems have hidden biases or unnoticed flaws that sometimes produce unacceptable behavior.
The core technical work of this project is to develop an algorithm transforming inscrutable neural networks into maximally simple traditional computer code that is intelligible to humans, or that traditional formal software verification tools can analyze. Applications include demystifying tools computing probation recommendations and “risk scores.”
This project is vital because the deep learning systems driving dominating today’s AI progress are largely unintelligible black boxes. Outputs will include peer-reviewed publications and software shared online with the public. As an ever-growing fraction of AI systems involve unintelligible neural networks, the impact of success will be major.