This article shares my through about the following two questions:

  1. How will agents reshape the future of app experiences;
  2. Whether agents mimicking human-computer interactions is a right pathway.

<aside> ⚠️

"Computers" in this article refers not only to desktops and laptops, but also smartphones.

</aside>

This article focuses only on time-saving apps, not time-killing apps, because their value sources are fundamentally different:


How Traditional Apps Work

First, let me describe the essence of computer applications (i.e. apps) using today's popular terminology:

A collection of workflows that can automatically capture user context and invoke computing resources to respond to and execute human intents.

Here, let me use DoorDash as an example:

For cases that require absolutely no user context, such as "checking the weather in a specific location," these are typically implemented as stateless APIs and can easily be turned into MCP servers.

Next, let me describe how traditional apps operate, again using DoorDash as an example:

  1. User selects an app based on intent, e.g. "I'm hungry, I want pizza" → DoorDash;
  2. User selects a specific feature and inputs function parameters, e.g., food_search("pizza");
  3. User browses results and decides what to order, e.g. Domino's Pizza;
  4. User inputs more precise context for delivery, e.g. <address>.

Future — Restructuring Apps for Agents

Breaking down this process, it's easy to see that every step falls within the capabilities of LLM agents. Let's replace it with a general-purpose personal assistant agent. One possible implementation of the above process could be:

  1. User talks to the agent: "I'm hungry, I want pizza";
  2. Agent calls the food_search tool to execute the search;
  3. ❗️Agent aggregates and reorganises the "atomic results" and presents them in the most user-friendly view;
  4. User selects and confirms payment.

I've specifically marked step 3 with a warning sign for the following reasons:

  1. GUI remains irreplaceable: it's the most efficient way to convey information to users;

    (e.g. products like AI Pin that almost abandon GUI have poor user retention)

  2. For the same information, the choice of view matters greatly. Using food delivery as an example, users may have two completely different needs:

    1. "Restaurant" — "I just want sushi today," in which case the top-level page should display different sushi restaurants;
    2. "Pizza" — "I just want a pizza today," in which case the top-level page should display pizzas from all restaurants.
  3. ❗️Users only know what they want when they see concrete results

    "A lot of times, people don't know what they want until you show it to them." - Steve Jobs

  4. ❗️The selection process itself is enjoyable! When efficiency and time are priorities, agents can certainly help. But not everything should be delegated to agents! Don't deprive users of their enjoyment!!

    <aside>

    More deeply, this isn't just about enjoyment. It's about user trust. I believe agents should be positioned as "decision-support copilots". Users should naturally feel in control during interactions, which requires a transparent and user-led decision-making process.

    </aside>

Of course, in terms of implementation, there are various protocols for step 3, such as Google's GenUI.

From the comparison above, it's clear that the technical elements for agents to serve directly as interfaces are already in place:

Future Trend — Agents Don't Need to Mimic Human Operations

Since time-saving apps can technically be equivalently replaced by "agent-orchestrated workflows + view/UI generation," let's see if the latter actually saves time. First, let's look at the single operation chains:

  1. Traditional app: User intent → OS → App front page → … → App function page;