
The one tip to decrease AI agent complexity and increase explainability
AI is getting more powerful every day, and we witness its presence in our daily lives solving tougher and tougher business and consumer problems. If with great power comes great responsibility, then with great AI capabilities comes great complexity.
In turn, complexity breeds increased training costs and challenges in designing effective AI architectures. As the models get more complex, they require more advanced AI design skills as well as more training data.
In addition, as the AI models become more complex, they become harder and harder to explain. This increases the black box effect where it is virtually impossible for the user or the developer to know how the model reaches its conclusions.
To help reduce AI complexity and increase its explainability, our teams of experts from different domains work with the customers. We help them limit the AI scope to its minimal necessary scope, a kind of “Minimal Viable Product” agile engineering strategy, and then use other techniques upstream of the AI model using inputs preprocessing.
The AI conundrum: Abstracting complexity through brute force
Deep Neural Networks (DNN), i.e., AI in the context of this article, can model complex non-linear relationships by learning what output should happen for a given input. With both advances in DNN architecture capabilities (from Positron to Deep Convolutional, to LSTM, to Transformers, and more) and an increase in cloud-computing power, it is possible to consistently go higher in abstraction levels by outsourcing this complexity to a more complex neural network.
For instance, let’s imagine you want to teach an AI how to play foosball. You could feed the raw images of the 22 players and use this as an input for a reinforcement learning model. You would thereby totally abstract all the elements relative to the individual players and ball positions and speed. The first layers of the DNN would automatically extract those characteristics from the images captured.
Similarly, if you check out Tesla’s AI Day from August 2021, Tesla engineers built an AI that creates a 3D model of the road and all its obstacles by directly feeding the raw images from the car’s six cameras into the network.
However, this “brute force” approach is not necessarily the best in many business scenarios. It requires more computing resources, more training data, more advanced DNN architectures, and it increases the black-box effect of AI where no one can predict how the AI makes its decisions.
How do we circumvent this, when possible?
Designing pre-processor models to build better DRL-trained AI agents
Although during its AI day, Tesla explained why their approach made total sense in their use case, often, there is a way to de-complexify an AI agent.
For instance, in the foosball example, one could imagine building a visual ML model to extract each player’s (which is only the position of 4 bars with 1 to 4 players on each) and the ball’s position. This ML model can then compute speed and acceleration for both. With this approach, the foosball playing-AI inputs will only be going to be 9 x 6 variables: 4 bars per side plus one ball, and for all their position (X, Y or, for the players, Y and rotational angle), speed (ditto) and acceleration (ditto).
It immediately reduces the complexity from a 1080×1920 image (i.e., about 2 million data points, four times more for a 4K image) to 36 data inputs.
One can easily replicate this approach in many processes to simplify the AI agent architecture and training complexity. Let’s take another more business-friendly example: how to optimize aircraft landing patterns to support air traffic controllers. Similar to the foosball example, one could imagine feeding a raw radar image and building a massive DNN to propose landing options to the aircraft controller.
However, this may lead to two main issues. First and foremost, creating the right dataset or, in the case of a Bonsai brain reinforcement learning-based training, an appropriate simulator would be extremely complex. It would require logging potentially thousand hours of real-life images together with the decisions taken for each image. This would not only be daunting, expensive, and error-prone, but it would also be applicable for a single airport configuration. This is clearly not manageable to scale across airports. The second issue is that, because of the complexity of this AI agent, this AI “black box” (aircraft pun intended) would be impossible to interpret for controllers. There would be no way even to try to understand why one action or another was advised.
The alternative is to decouple different elements of this problem and pre-process the input to minimize the complexity of the AI agent.
In this situation, for instance, one could use separate models (ML or not) to:
-
- Detect aircraft in the image
- Compute their trajectory and speed
- Predict aircraft health based on this trajectory and maybe additional non-visual telemetry
- Interpret, through speech recognition and NLP, the communications between the control tower and the pilots
- Assess the airport’s immediate environment, e.g., the weather pattern.
Then, based on all these parallel assessments and where a plane is in its approach, build a Project Bonsai brain for each separate phase: queue management, runway assignment, and final landing directions.
Those models could then be themselves supervised by a rule-based safety engine with hard-coded procedures that the brains, i.e., AI agents, won’t be able to overrule.
The explainability bonus
In addition to making the design and training of the AI agent easier and faster, input pre-processing has another side and non-negligible benefit. It allows for more focused AI agents and therefore enables better explainability for each of them.
In the above example, if the AI agent only manages queues based on plane positions, it will be more manageable to explain its behavior (and spot odd ones) than if it were part of a massive all-inclusive one.
How to start defining the right inputs pre-processing strategies
Knowing what to pre-process and embed in your AI agent is not a science; it’s an art. But it’s an art that only experts in Data Science, Machine Learning, and Deep Reinforcement Learning master.
Through a proven process used across multiple projects, customers, use cases, and industries, Neal Analytics can help your team define and design the best way to approach this aspect and then implement both the pre-processing elements and the Project Bonsai brain or another AI or ML model appropriate.
Read more:
-
- Autonomous Systems
- Microsoft Project Bonsai
- Advanced simulations blog post
- Unlocking the potential of AI in manufacturing with machine teaching and deep reinforcement learning