LLMs: Context Matters. A LOT.

April 4, 2024

Context matters a lot in LLMs. I would say it matters much more than anyone thinks. Anthropic released a new version of Claude with 3 models, in order of capability: Haiku, Sonnet, and Opus. Opus is very expensive at $15/1M input tokens and $75/$1M output tokens. In contrast, Haiku is 60x cheaper and benchmarks have shown it has comparable performance to GPT-4.

I recently made a CLI tool called howdoi where I make extensive use of Haiku. The API is simple:

howdoi "write a server in Go" -c <optional_context, can be URLs or local files>

Even without context, Haiku is useful, but with context, it really shines.

To see this, let's focus on Opus for a second:

λ ~/code/howdoi: howdoi "how do i write a web server with bun" -m "opus" > out

2024/04/04 12:52:13 Usage: Input Tokens: 17, Output Tokens: 576, Total Cost: $0.043455

This cost ~$0.04, and the output wasn't quite right.

It's not surprising it would be wrong, as Bun is new, and it's unlikely the base model would have knowledge of it.

Let's add some context and use Haiku:

λ ~/code/domluna.com/content: howdoi "how do i write a web server with bun" -c "https://bun.sh/guides/http/simple"
2024/04/04 12:56:22 Scraping the web page: https://bun.sh/guides/http/simple
To write a web server with Bun, you can follow the example provided in the document. Here's a step-by-step guide:

1. **Import the Bun module**: In your JavaScript file, import the Bun module at the beginning of your code.

import Bun from "bun";

2. **Create the server**: Use the `Bun.serve()` method to create the server. This method takes an object as an argument, which defines the server's configuration.

const server = Bun.serve({
  port: 3000, // The port number the server will listen on
  fetch(request) {
    // The function that handles incoming requests
    return new Response("Welcome to Bun!");
  },
});

In the example, the server listens on port 3000 and responds to all requests with the message "Welcome to Bun!".

3. **Log the server URL**: After creating the server, log the URL where the server is listening.

console.log(`Listening on ${server.url}`);

4. **Run the server**: To start the server, simply run your JavaScript file.

bun your-file.js

This will start the Bun server, and it will be ready to handle incoming requests.

The key points to remember are:

1. Import the Bun module.
2. Use `Bun.serve()` to create the server, passing in the configuration object.
3. Implement the `fetch` function to handle incoming requests.
4. Log the server URL.
5. Run the server using the `bun` command.

This is a basic example, but you can further customize the server by adding more routes, handling different HTTP methods, and implementing more complex logic in the `fetch` function.

2024/04/04 12:56:27 Usage: Input Tokens: 382, Output Tokens: 419, Total Cost: $0.000619
λ ~/code/domluna.com/content: v server.js
λ ~/code/domluna.com/content:
λ ~/code/domluna.com/content: bun run server.js
Server listening on port 3000

We get a working solution for ~60x cheaper.

Context is incredibly important. The most powerful aspect of LLMs is their ability to do in-context learning. Of course, we want a base model that can make use of the context, otherwise it's useless.

Let's think about this a bit more: In order to equal the cost of 1k input tokens of Opus, we would need to have a context of 60k tokens in Haiku for a base cost of $0.015. This cost will outweigh the cost of the output. We would need 4k output tokens to break even at $0.02, and this would still be 2x cheaper than the original Opus query!

In-context learning will only get better, but it seems to me it has reached an inflection point whereby adding context is more worthwhile than using a larger model.