A two-axis model for understanding LLM strengths and weaknesses

5 min read Original article ↗

Lily Chen

Press enter or click to view image in full size

Working with LLMs can feel magical one moment and infuriating the next. As a software engineer who has used them daily for over a year, I’ve learned that this inconsistency isn’t random. There’s a clear pattern to where they excel and where they consistently fall short.

I’ve mapped this pattern onto a simple two-axis grid that explains my observations. Understanding it comes down to asking two questions about the task at hand:

  1. How open-ended is the problem? (From many possible answers to a single correct one)
  2. How context-specific is the problem? (From general knowledge to deep, domain-specific expertise)

Though most examples here come from my world of software engineering, I believe these principles apply to almost any domain.

The Two Axes Explained

  • The Open-Ended Axis: This measures the “creativity” required. A highly open-ended task is something like “Brainstorm ways to improve the performance of this API.” There are many valid answers. A low open-endedness task is “What’s the syntax for a useEffect hook?" or "Does this code have a race condition?" There is an objectively correct answer.
  • The Context-Specific Axis: This measures the amount of domain-specific knowledge required. A low-context problem is self-contained, like “Write a Python function to perform a depth-first search.” You don’t need to know anything about my project. A high-context problem usually requires deep domain knowledge.

Low Context + High Open-Endedness (The Brainstormer) 👍

This is where LLMs truly shine and can feel like magic. This set of problems is the sweet spot for LLMs.

This is the “blue-sky thinking” quadrant. The LLM doesn’t need to know the intimate details of your code, just the general problem space. Below are examples of problems in this space.

  • Architectural Ideas: “What are some common design patterns for handling real-time notifications in a React app?”
  • High-Level Planning: If you feed an LLM profiling data, it can be brilliant at identifying bottlenecks. It might immediately spot “You have a lot of GC (Garbage Collection) pressure” and generate a detailed, actionable plan to address it. It’s a great starting point that can reveal unknown unknowns — problems you didn’t even know you had.
  • Refining Content— This very blog post was polished with the help of LLMs. While it still required my editing and ideas, the AI probably saved me hours of writing time. If you’re curious, this was the original prompt (unlisted) I fed into these LLMs.

Low Context + Low Open-Endedness (The Encyclopedia) 👍

This is the “solved problems” quadrant. These are tasks with a correct answer that doesn’t depend on your unique environment.

  • Algorithms and Boilerplate: “Write an algorithm that does DFS” or “Create a simple Todo App in React.”
  • Fixing Common Bugs: I recently had a race condition in my React app. A callback on a popover element was sometimes firing after the popover was closed by an onBlur event on a nearby input. Claude correctly suggested I use onMouseDown instead of onClick for the popover's callback, because onMouseDown fires before onBlur. This solution is rooted in general web API knowledge, not my specific codebase.
  • Finding needles in a haystack (in your own codebase):

Cursor (with Claude-4-Sonnet at least) is surprising good at handling questions like: “is there a helper function in our codebase that does XYZ?”

Though this feels like a high-context question, the nature of the problem is still a low-context pattern match. The LLM isn’t reasoning about your architecture; it’s performing a sophisticated semantic search. It’s excellent at finding code snippets that match a functional description, which is incredibly useful for navigating large codebases worked on by hundreds of developers.

High Context + High Open-Endedness (The Therapist) 😐

This example isn’t tech related, but it’s revealing. LLMs are surprisingly adept at analyzing patterns in human interaction. I use it to ask questions like “Analyze the following exchanges and the personality of the writers: [block of text]”. LLMs are trained on tons of data, and as a result, are good at analyzing patterns. Understanding how humans interact and why they interact that way is also about analyzing patterns. The caveat to note is that LLMs don’t have their own moral framework because they don’t have lived experiences. You can use their knowledge as generalized human behavior, but know that they can’t get all the nuances and they’re not speaking from a place of genuine empathy and human connection.

High Context + Low Open-Endedness (The Curse) 👎

This is where developers get frustrated. As soon as a task requires a deep understanding of the intricate, interconnected parts of your specific project, the LLM’s performance plummets.

  • Complex UI Styling: I asked a model to make a specific part of our UI overflow in a specific way. It failed miserably. Why? Because the correct solution depended on the entire React component tree, the existing CSS overflow properties, parent div structures, and z-index stacking. The LLM gave a generic CSS solution that was useless without understanding the full context. It's like asking a stranger to rewire a single switch in your house without giving them the building's electrical schematic.
  • Implementing a Refactor: This is the follow-up to the GC pressure problem. The LLM was great at creating the plan, but when asked to implement the code refactoring to alleviate the pressure, it couldn’t. The changes required modifying multiple interdependent files, understanding data flow, and respecting existing abstractions. LLMs often lack a persistent, holistic mental model of the codebase. They often miss side effects and implicit contracts.