I Was Burning Through AI Tokens Without Knowing It. Here's What I Changed.

I Was Burning Through AI Tokens Without Knowing It. Here's What I Changed.

Published on April 03, 2026

I build with AI every day. Not casually. I run an AI assistant that handles reporting, automates workflows, manages a website, processes emails, and coordinates tasks across two organizations. It has over 30 custom skills, automated agents, scheduled jobs, and integrations with Google Drive, Gmail, Notion, and more.

So when I noticed my sessions draining faster than they should, I didn't just shrug it off. I dug in.

What I found changed how I think about working with AI tools entirely. And if you're using ChatGPT, Claude, Copilot, Cursor, or any AI coding assistant, this applies to you too.

The Thing Nobody Tells You About AI Conversations

Here's what caught me off guard: AI doesn't remember your conversation. It rereads it. Every single time.

When you send message number 30 in a long chat, the model isn't picking up where it left off. It's reading messages 1 through 29 all over again, then generating a response to number 30. That means your first message is cheap. But by the time you're deep into a session, every new message is carrying the weight of every message before it.

The cost doesn't add up. It compounds.

This is how all large language models work. ChatGPT, Claude, Gemini, all of them. Subscription plans hide it behind a flat fee, but the compute cost is still there. It's why long conversations get slower, hit limits faster, and start giving you worse answers. The model is drowning in its own history.

Once I understood this, a lot of things clicked. And I started making changes that had an immediate impact.

What Actually Moved the Needle for Me

I tried a lot of optimizations. Some were game-changers. Others sounded smart but didn't matter much in practice. Here's what actually made a difference in my day-to-day.

Clear the conversation between tasks

This was the single biggest change. I used to let conversations run long, jumping from one task to the next in the same thread. It felt efficient. It wasn't.

Now I start fresh between unrelated tasks. The same request that was getting expensive 30 messages deep costs almost nothing in a clean session. It also produces better results because the AI isn't sifting through irrelevant context from three tasks ago.

Audit your connected tools

This one surprised me. I had three integration servers connected to my setup: one for Microsoft Graph, one for image generation, one for meeting notes. Every connected integration loads its full set of tool definitions into the AI's context on every single message, whether I use it or not.

When I disconnected the ones I wasn't actively using, the difference was noticeable. I reconnect them when I need them. The rest of the time, they're not eating up my budget in the background.

Say what you mean the first time

I used to send quick, vague prompts and then course-correct with follow-ups. "Fix the auth issue." Then "No, I meant the login flow." Then "Actually, just the token refresh part."

Every follow-up stacks permanently in the conversation. Three vague messages cost three times what one specific message costs. Now I take an extra 30 seconds to write a precise prompt upfront. It's faster and cheaper.

Make the AI plan before it acts

The most expensive thing an AI tool can do is confidently go down the wrong path for 20 messages before you catch it. I added a simple rule to my configuration: don't make changes until you're 95% confident in the approach, and ask me questions until you get there.

This one change eliminated most of my wasted sessions. Five minutes of back-and-forth planning is infinitely cheaper than a 20-message dead end.

The Stuff That Matters More Than You'd Think

Your configuration file is always running

Whatever instructions you give your AI tool (a CLAUDE.md, a .cursorrules file, a system prompt), that file gets reloaded on every message. If it's bloated with things the AI doesn't need for most tasks, you're paying for that bloat continuously.

I keep mine under 200 lines. It works like an index that points to detailed information stored elsewhere, not a dump of everything the AI might ever need. Every line I cut saves tokens on every turn for the rest of the session.

Context is expensive, so be precise

When something breaks, my instinct used to be: paste the whole file and let the AI figure it out. Now I paste the specific function. If I know the error is in the authentication logic, I point directly to that function, not the entire 500-line file.

Better input, better output, lower cost. It works in every direction.

Don't wait for auto-compaction

Most AI tools will automatically compress your conversation when it hits about 95% of the context limit. By that point, quality is already degraded. The AI is struggling with too much context and giving you mediocre responses.

I check my context usage during long sessions and manually compact around the 60% mark. I tell the AI exactly what to preserve and what to drop. After a few rounds of this, I'll take a summary, clear everything, and restart fresh with that summary as my starting point. It's a small habit that keeps sessions sharp.

The five-minute rule

Many AI tools cache your recent context so it doesn't need to be fully reprocessed on the next turn. But that cache expires, usually after about five minutes. If you step away for a call or a meeting and come back, your next message reprocesses everything from scratch at full price.

Before I step away now, I either compact or just plan to start fresh when I come back. A simple habit that avoids a hidden cost spike.

For the Power Users

If you're going deeper with AI tools, building automations, running agents, or using AI across your workflow, there are a few more things worth knowing.

Not every task needs the biggest model

I run different models for different jobs. Complex architectural decisions get the most capable model. Routine coding and formatting tasks go to a lighter, faster model. Simple sub-tasks get the lightest option available.

This isn't about cutting corners. It's about matching the tool to the job. A mid-tier model handles 80% of coding work just fine, and it costs a fraction of what the top-tier model charges.

Multi-agent workflows are expensive

If your tool can spawn sub-agents (parallel AI workers that tackle different parts of a problem), understand that each one starts with its own full context. A multi-agent session can use 7 to 10 times the tokens of a single-agent session doing the same work.

I use agents for genuinely parallel, complex tasks where the speed matters. For everything else, a single focused session is the better call.

Work with the clock, not against it

Peak hours (weekday mornings, Eastern time) mean more competition for compute and faster rate limiting. I schedule my heaviest AI work, big refactors, multi-file changes, for afternoons or evenings when things are quieter.

It's not always possible to plan around it, but when I can, the sessions run smoother and stretch further.

What This Really Comes Down To

After months of running a full AI-powered system, here's what I've learned: most people don't need a bigger plan or a more expensive subscription. They need better habits.

The AI tools are powerful. The models are capable. But if you're feeding them a bloated conversation history, leaving unused integrations running, and sending vague prompts that need three follow-ups, you're paying a premium for a fraction of what these tools can actually do.

It's not a limits problem. It's a context hygiene problem.

The good news? Every change I've described here takes minutes to implement. No new tools, no complex configuration. Just awareness of how the system actually works under the hood, and a few small shifts in how you use it.

That's the difference between AI being an expensive experiment and AI being something you can build on every day.


Melanie Markes is the Director of Business Intelligence at CareerSource Central Florida and founder of Blue Dawn Tech. She builds practical AI solutions and writes about making AI work in the real world.

Back to all articles