AgentML: Streamlining the ML Pipeline with AI Agents

18 July 2024

Authored by Punit Arani

Supervised by Dr. Huan Liu and Amrita Bhattacharjee

Code: github.com/punitarani/AgentML

Abstract

AgentML is a tool designed to simplify and enhance the machine learning workflow, encompassing exploratory data analysis, model development, evaluation, and explanation. Offering both simplicity and power, it caters to a diverse audience ranging from students to professionals and non-coders. AgentML supports various datasets and provides dual modes of operation—Supervised and Autonomous—facilitating user interaction or fully automated processes. This paper presents the key features, technical architecture, user accessibility, and deployment methods of AgentML, highlighting its role in democratizing machine learning and making sophisticated AI processes accessible to everyone.

Introduction

The rapidly evolving field of artificial intelligence demands accessible and efficient machine learning tools. Traditional workflows can be complex and time-consuming, often requiring substantial coding expertise and specialized knowledge. AgentML addresses these challenges by streamlining the entire machine learning process—from exploratory data analysis to model development, evaluation, and explanation. Designed for users with varying levels of expertise, AgentML simplifies machine learning tasks while maintaining advanced capabilities. This paper explores the features and architecture of AgentML, emphasizing its role in democratizing machine learning.

Methodology

Dual Capability in Machine Learning Workflow

AgentML is capable of handling both small and complex datasets:

Small Datasets

  • Data Analysis: Analyzes dataset characteristics to understand the problem statement.
  • Pipeline Creation: Develops a comprehensive pipeline, including preprocessing, model building, and evaluation.
  • Iterative Refinement: Continuously refines the model for optimal performance.

Complex Datasets

  • Advanced Data Handling: Manages intricate datasets with enhanced preprocessing and analysis.
  • Custom Model Building: Constructs tailored models to address complex problems.
  • In-depth Evaluation: Provides thorough model evaluation and tuning.

Technical Architecture

AgentML’s modular architecture incorporates several specialized agents:

Manager Agent

  • Central Coordination: Manages inputs and oversees other agents to achieve user-defined goals.
  • Integration: Ensures cohesive operation of all agents and optimizes resource allocation.

Planner Agent

  • Task Analysis: Breaks down goals into manageable tasks.
  • Efficient Delegation: Allocates tasks to appropriate agents, ensuring systematic problem-solving.

Coder Agent

  • Coding Tasks: Handles writing, modifying, and debugging code.
  • Model Development: Builds and visualizes machine learning models.
  • Automation: Ensures thorough completion of coding tasks with error handling.

Vision Agent

  • Data Visualization Analysis: Interprets visual data for comprehensive insights.
  • Enhanced Perception: Supplements language models with visual understanding.

Validator (Pseudo-Agent)

  • Autonomous Mode: Validates each step for alignment with goals, ensuring consistency.
  • Supervised Mode: Allows user oversight for guidance and validation.

User Interaction and Customization

Code Execution

  • Secure Environment: Executes Python code in a sandbox for safety.
  • Interactive Development: Enables real-time code writing and testing.

Template Utilization

  • Customizability: Imports and builds on user-provided code templates.
  • Flexibility: Adapts to various coding styles and requirements.

Results

Demonstrations of AgentML’s capabilities are showcased through videos in both Supervised and Autonomous modes.

In Supervised Mode, AgentML operates with human-in-the-loop interaction, allowing users to guide the training of a classifier on datasets like the Iris dataset. In Autonomous Mode, it employs a validator pseudo-agent to replace human validation, enabling fully automated machine learning processes. These demonstrations highlight AgentML’s ability to handle datasets, write code, and effectively train and evaluate machine learning models.

Demo

The following demo videos showcase AgentML's capabilities in both Supervised and Autonomous modes.

AgentML is capable of handling datasets, writing code and also training and evaluating machine learning models.

AgentML Highlight

AgentML Output

Supervised Mode Video

Human-in-the-loop mode to train a classifier on the Iris dataset.

AgentML Supervised Demo

Autonomous Mode Video

Autonomous mode with validator pseudo-agent to replace human validation.

AgentML Autonomous Demo

Video is sped up for brevity.

Discussion

AgentML represents an innovative approach to simplifying machine learning workflows. By integrating multiple specialized agents, the system streamlines complex tasks and makes machine learning accessible to a broader audience. The dual modes of operation cater to different user preferences and expertise levels, offering both interactive and autonomous experiences. The inclusion of a Vision Agent enhances data interpretation through visual analysis, supplementing traditional language models. AgentML’s design emphasizes user accessibility, with an intuitive interface and support for non-coders, while also providing flexibility and customization options for experienced professionals.

Conclusion

AgentML serves as a transformative tool in the field of machine learning, offering ease, efficiency, and advanced insights. By democratizing access to sophisticated ML processes, it exemplifies the potential of artificial intelligence to empower users across varying levels of expertise. The system’s modular architecture and user-friendly design make it a valuable asset for those seeking to streamline their machine learning workflows.

Future Work

Future developments for AgentML may include expanding its capabilities to handle more complex datasets, integrating additional data sources, and enhancing autonomous decision-making processes. Improvements to the user interface and support for more customization options could further enhance user experience. Additionally, incorporating advanced visualization techniques and expanding the Vision Agent’s functionalities may provide deeper insights, improve accuracy and enhance the cost-to-output ratio, quality, and value.


Appendix

Running the Application

AgentML can be operated in two modes: Supervised and Autonomous. Follow these simple steps to get started:

  1. Clone the Repository:
git clone https://github.com/punitarani/AgentML.git
cd AgentML
  1. Install Dependencies using Poetry:
poetry install

Poetry should automatically create a virtual environment for you. If it doesn't, you can initiate one manually:

poetry shell
  1. Setup Environment Variables:
  • Copy the .config/.env.template to .env in the root directory.
  • Fill out the necessary environment variables in the .env file.

Supervised Mode

python -m streamlit run app.py

Autonomous Mode

python -m streamlit run auto.py

AgentML is more than a tool—it’s a transformative force in machine learning, offering ease, efficiency, and advanced insights. It exemplifies the democratizing power of artificial intelligence, making sophisticated ML processes accessible to everyone.