What is LAM, the new AI model developed by Microsoft to carry out tasks?

0

As we continue to see rapid advancements in AI, large language models (LLMs) have been at the forefront, driving innovations in chatbots, text generation, and even code writing. While LLMs excel at understanding and generating text, they struggle with performing tasks in real-world environments. To address this gap, Microsoft researchers have developed a new AI model called the Large Action Model (LAM), which can autonomously operate Windows programs.

LAMs represent a significant leap forward in AI technology, enabling systems to perform complex tasks based on human instructions. They shift the focus from AI models that merely process text to those capable of taking action in real-world scenarios.

What are LAMs?

Traditional AI models mainly generate and understand text, while LAMs go a step further by turning user requests into actionable steps. These tasks can include operating software or even controlling robots. Though the idea of LAMs isn’t entirely new, this specific model has been designed for use with Microsoft Office products. The concept gained attention in 2024 with the launch of Rabbit’s AI device, which could interact with mobile applications autonomously.

LAMs can process various inputs such as text, voice, or images, and then translate these into detailed, step-by-step plans. They are capable of adjusting their approach based on real-time feedback. Simply put, LAMs are designed to not only understand instructions but to act on them as well.

According to the research paper Large Action Models: From Inception to Implementation, LAMs can interact with both digital and physical environments. For example, instead of asking an AI how to create a PowerPoint presentation, a user could instruct the AI to open the app, generate slides, and format them according to specific preferences. LAMs combine three core components: understanding intent (accurately interpreting user commands), action generation (creating actionable steps), and dynamic adaptation (adjusting based on environmental feedback).

How are LAMs built?

Creating LAMs is more complex than building LLMs and involves five key stages. First, data is gathered, with LAMs requiring two types: task-plan data, which includes high-level steps for tasks like opening a Word document or highlighting text, and task-action data, which consists of specific, executable steps. The models are trained using supervised fine-tuning, reinforcement learning, and imitation learning techniques. Before deployment, LAMs are tested in controlled environments and integrated into agent systems, such as Windows GUI agents, to interact with various systems. Finally, LAMs undergo testing in live scenarios to evaluate their adaptability and performance.

LAMs mark a major advancement, moving beyond text generation to action-oriented AI agents. These models hold the potential to automate workflows and assist people with disabilities, making them more practical for everyday use. As the technology continues to evolve, LAMs may become a standard AI tool across various industries.

About The Author

Leave a Reply

Your email address will not be published. Required fields are marked *

error: Content is protected !!