Service status External connectivity status Startups Support center

What is Large Action Model (LAM)?

Large Action Models (LAMs) are AI models designed to take action by translating human intentions within a given environment or system.

Think of it like this:

You give the LAM a command, such as “Book me a flight to London next week.”
The LAM understands your request, including the destination, date, and your implied need for a flight.
It then interacts with various systems, like airline websites or booking platforms, to find suitable flights, compare prices, and ultimately make the reservation.

Generative AI provides the building blocks for LAMs to understand, reason, and act in the world. It’s a core component that enables LAMs to go beyond passive language processing and become active problem-solvers.

LAMs and AI agents

The term LAM is sometimes used interchangeably with AI agent. AI agents are software programs that can perceive their environment, make decisions, and take actions to achieve specific goals. LAMs can be seen as a type of AI agent that uses large language models (LLMs) for advanced language understanding and action planning.

How does LAM work?

1. Foundation layer: LAMs often begin by integrating a powerful existing large language model. This LLM is fine-tuned with specific data to prepare it for the LAM’s unique purpose. This foundation allows the LAM to understand natural language and figure out what the user wants.

2. Multimodal input processing: LAMs can handle more than just text. They can also process images and potentially even user interactions. Special techniques are used to analyze text and extract key information.

3. Goal inference: The LAM analyzes the user’s request, considering things like their past behavior and what’s happening in the application they’re using. This helps the LAM understand the user’s true goal, which might be more than what they literally said.

4. Understand the user interface: LAMs use computer vision to interpret visual information from application interfaces. They can recognize things like buttons and menus and understand how these elements work within the application.

5. Breaks down tasks and plans actions: Once the LAM understands the goal, it breaks it down into smaller, actionable subtasks. It then creates a plan, prioritizing actions based on efficiency, user preferences, and learned rules of thumb.

6. Decision-making and reasoning: LAMs use advanced algorithms that combine neural networks and symbolic AI techniques for decision-making. This allows them to use pattern recognition and logical reasoning to determine the best action.

7. Execute tasks: LAMs interact with external systems using tools like web automation frameworks. They can simulate user actions like clicking, typing, or navigating between pages. Some LAMs can also directly interact with other software systems.

8. Continuously learning and human oversight: LAMs use various machine learning techniques to improve with each interaction. Many LAMs also allow for human oversight, enabling intervention in complex scenarios.

What can LAMs do?

LAMs have the potential to automate a wide range of tasks, including:

Personal assistance: Managing schedules, making appointments, booking travel.
Customer service: Resolving issues, processing orders, answering questions.
Business process automation: Streamlining workflows, generating reports, analyzing data.
Robotics: Controlling robots to perform physical tasks in various environments.

Large action model examples

Rabbit AI’s R1: A physical device that acts as an AI assistant, claiming to use LAMs to perform tasks like ordering food or booking appointments.
Claude’s computer use: Anthropic’s Claude can be used to perform actions like summarizing content from a URL, writing code, and answering questions based on a document.
Adept AI’s ACT-1: A model that uses a human-like cursor to complete tasks in a digital environment, such as web browsing and using software.
Salesforce XLam: A research project exploring how LAMs can be used to automate complex tasks in a CRM system.

Large action models vs large language models

LAMs often have LLMs at their core. LLMs excel at understanding and generating human-like text. This allows LAMs to:

Interpret instructions: Understand natural language commands from users.
Communicate with users: Provide updates, explanations, or summaries in a clear and understandable way.
Text generation: Create reports, emails, or other text-based outputs as part of their tasks.

Even with all the similarities and overlaps between the concepts of LAM and LLM, here’s a more in-depth look at their key differences:

	LLMs	LAMs
Core functionality	Primarily focused on understanding, generating, and manipulating text. Excel at tasks like writing, translation, and summarization.	Extends beyond text to include action execution. Designed to understand and execute complex tasks by interacting with various systems and interfaces.
Data modalities	Primarily processes textual data. Trained on massive text datasets to learn language patterns and semantics.	Handles multiple data types, including text, images, and potentially other sensory data which allows them to process and act on a broader range of information.
Action and interaction	Generates text-based outputs and insights but does not inherently interact with external environments or systems.	Executes actions based on their understanding. This could include navigating software interfaces, making API calls, or even controlling physical robots.
Feedback and learning	Typically, feedback from actions is not incorporated. Focuses on language tasks without direct environmental interaction.	Utilizes feedback from their actions to refine their performance. This allows for adaptive learning and continuous improvement in task execution.
Decision making	Reasoning is primarily based on language patterns and relationships between words and concepts.	Employs more complex reasoning and decision-making processes, potentially incorporating planning, logic, and knowledge retrieval.
Applications	Typical applications include chatbots, virtual assistants, content creation, and language translation.	Used in applications that require task execution, such as robotic process automation, advanced virtual assistants, customer service automation, and complex workflow management.
Examples	GPT-3, Bard, LaMDA	Adept AI’s ACT-1, (potentially) Rabbit AI’s R1 device

LAM use cases and applications

LAMs have potential applications across various industries, including:

Customer service: Automating support interactions, resolving issues, and personalizing experiences.
E-commerce: Assisting with product discovery, purchase decisions, and order fulfillment.
Healthcare: Analyzing medical data, scheduling appointments, and managing patient records.
Finance: Automating financial tasks, providing personalized recommendations, and detecting fraud.