Composabl’s new benchmarking feature allows you to quickly visualize and compare how your multi-agent system designs perform.
When you build a multi-agent system with Composabl, chances are you’re trying to beat a benchmark: to improve on a current process or get the best possible results on a key metric like yield or efficiency.
To do this, you need performance data – and not just once, but throughout the design process, as you build and test different agent arrangements for your use case.
Think of it like a team that you’re coaching. To find the winning formation, you need to experiment with different player combinations and lineups and see how they do in scrimmages and pre-season matches.
Building multi-agent systems is the same. Performance data tells you which multi-agent system designs are working best, allowing you to iterate and improve until you get to a well-engineered multi-agent system that’s ready for deployment.
This is why we’re so excited to release Composabl’s new benchmarking feature. With this feature, the Composabl platform now automatically runs performance tests after every round of training, giving you data on how well your trained agent teams succeed in controlling the process in simulation.
Here’s how it works. You start by choosing a key performance indicator (KPI) for overall success. This is the value that will tell you how well your system is meeting your overall goal.
For example, if your goal is to maximize throughput, your KPI might be a yield variable that indicates how much produced is produced. The designs that return the highest value for this variable are the most successful. (If your overall goal is to minimize a variable, like cost or energy consumption, then you’ll be looking for the agent formations that return the lowest values.) On the benchmark page, you can select your KPI and enter a value that represents the benchmark – the current solution or performance level that you are trying to beat.
After your training cycles and performance tests are complete, Composabl’s benchmark page displays the performance data for each multi-agent system within the same project. This gives you an at-a-glance comparison of your entire suite of multi-agent system designs.
Benchmark results for our Industrial Mixer use case show the relative performance of three different design patterns on the KPI of total yield of usable chemical.
Composabl’s flexible visualization dashboard allows you to explore and compare performance using any variable as a KPI. You can also customize the display statistics depending on what comparison method makes the most sense for your use case.
To provide visibility into the business value of engineering processes, you can even go a step further and quantify the financial impact of optimizing your process by having the system calculate the ROI associated with each percentage of process improvement. For example, a 2% improvement in KPI over the benchmark might be worth $1M per year.
Benchmarking is a key tool for designing multi-agent systems with machine teaching. That’s because machine teaching is an experimental, iterative, and even creative process.
There can be multiple ways to decompose a problem, and many different combinations of specialized agents can be trained to address a use case. Experimentation is the best way to see which agent formations perform best. Individual agent configurations can also be tweaked, like changing up the training regimen for a high-performing athlete to eke out better results. Composabl’s tool makes this experimentation easy.