AI Music Creative Team Building — The Start

I started this because I wanted to find out what AI music generation could actually do.

Not what people were saying it could do. Not the usual arguments about whether it was good, bad, ethical, lazy, brilliant, soulless, revolutionary, or the end of music as we know it. I wanted to try it myself, with my own ideas, on my own equipment, and see what came out.

The first serious step was reading about ACE-Step 1.5. What caught my attention was that I could run it locally on a machine with a 16GB GPU. That mattered to me. I have always preferred to understand systems by getting them running myself, rather than only using them from a distance through someone else’s interface.

So I installed it, played with it, and wrote one song.

That first song was the title track for a very vague idea I had about a rock opera based on Hammurabi. At that point it was not a project in any serious sense. It was more an excuse to test the process. I had a concept, some lyrics, a style direction, and enough curiosity to see what the system would make of it.

What surprised me was not that it generated music. I expected that.

What surprised me was that the result was listenable.

It had shape. It had enough of the style I was aiming for. It was not perfect, and it was not magic, but it was appealing in a way I did not expect from an early experiment. There was something there that felt worth following.

That was the point where the project changed from “I wonder what this can do” to “there might be a workflow here.”

Building the first specialist

The first problem I ran into was not really musical. It was communication.

AI music tools are extremely sensitive to how an idea is described. Style words matter. Structure matters. The difference between a useful prompt and a vague prompt can be the difference between something focused and something that sounds like the machine has grabbed three genres out of a hat.

I realised I needed help turning musical intent into usable prompt language.

That was where Ace came from.

Ace began as an AI specialist to help me format music styles and prepare prompts for song generation. Not as a songwriter in the traditional sense, and not as some imaginary bandmate pretending to hold a guitar, but as a specialist assistant with a specific job: help translate what I wanted into instructions that a music generation system had a better chance of understanding.

That distinction is important to me.

I am not trying to create a fake band. I am not interested in pretending there are human session players in a room somewhere. The workflow is AI-assisted, and I am open about that. My role is to write, direct, curate, edit, reject, keep, revise, and make the final creative decisions. The tools generate things. I decide what belongs.

Ace was the first sign that this could work better if I stopped thinking of AI as one general assistant and started thinking of it as a set of specialists.

Why build a team?

As a solo creator, one thing I miss is the side conversation.

In a workplace, you often get feedback without formally asking for it. Someone looks over your shoulder and says, “That bit works,” or “That heading is confusing,” or “You are trying to do too many things at once.” Sometimes the comment is useful. Sometimes it is not. But the presence of other minds around the work changes the way you think.

When you are working alone, that is easy to lose.

AI assistants can fill part of that gap. Not all of it. They do not replace taste, lived experience, or proper human collaboration. But they can provide friction, structure, suggestions, objections, and alternative angles at the moment you need them.

That is the core of the workflow I have been building.

I develop AI specialists for specific tasks I have. Ace was one of the first. Others followed for visual direction, cover art prompting, release strategy, website content, and music promotion. Each one has a defined role. Each one has a system prompt that describes what it is meant to do, how it should speak, what it should avoid, and where its boundaries are.

That matters because a general assistant will often try to be helpful in every direction at once. A specialist assistant is more useful because it has constraints.

Ace should care about lyrics, structure, music style, and prompt control.

Dawn should care about image prompting, visual composition, artwork, colour, lighting, and design direction.

Mandy should care about release planning, audience positioning, platform strategy, and practical promotion.

I do not need every assistant to have an opinion on everything. In fact, that would make the system worse. The point is to get focused help from the right specialist at the right stage of the work.

The working environment

Most of these assistants live in Open WebUI.

That has become the main workshop space for this system: a place where I can run different models, maintain specialist prompts, and build a workflow around repeatable roles rather than starting from a blank chat every time.

The wider setup has grown to include OpenAI Codex and a paid ChatGPT subscription as well. I will write more about those parts in later posts, because they each solve different problems. For now, the important point is that the assistants are not one product or one model. They are a working method.

The model underneath can change.

The specialist role remains.

That has become one of the more important lessons. If the workflow depends entirely on one model, it is fragile. If the workflow is built around clear roles, good prompts, useful context, and a practical understanding of what each tool is good at, it is much easier to move as the technology changes.

And it changes quickly.

Models, cost, and reality

This part of the process has been a moving target.

I started out using GPT-5.2 and progressively migrated and updated the assistant prompts for Claude through the 4.5, 4.6, and 4.7 models. My experience was that Claude 4.6 was the most capable for creative work. It had a good feel for structure, nuance, and the kind of collaborative back-and-forth I wanted.

But it was expensive.

Very expensive, if you are not careful.

Claude 4.7, in my opinion, was less capable creatively for this kind of work, while still carrying a very high cost. That made it harder to justify as the centre of the system.

So I have pivoted back to GPT-5.5, and at the moment all of the assistant prompts are running there.

That may change again. In fact, it probably will. This is one reason I want to document the process as it evolves. The tool choices are not fixed forever. They reflect the best balance I can find at the time between capability, cost, speed, and usefulness.

The goal is to do this with a minimal budget.

That is not just because I like saving money, although I do. It is also because I want the workflow to be realistic. If a solo creator needs hundreds or thousands of dollars a month to make the system function, then it becomes a toy for people with spare cash rather than a practical creative method.

At the moment, the full creative team uses less than $30 per month in API tokens when managed carefully.

The important phrase there is “when managed carefully.”

With capable models, it would be very easy to burn $30 in an hour if the system was left unchecked or if every task was thrown at the most expensive model without thinking. Part of the workflow is knowing when a high-end model is worth using and when a cheaper or local tool is good enough.

For images, I use local models in ComfyUI where I can. That keeps costs down and gives me control over the process. I also keep a $20 per month ChatGPT subscription because it is very capable for some image manipulation and upscaling tasks that are difficult or less convenient through my local ComfyUI setup.

Again, this is not about finding one perfect tool. It is about building a practical bench of tools that I can use without blowing the budget.

What this has become

What started as a test of AI music generation has become something broader.

It is now a creative team structure.

Not a human team. Not a company. Not a fake studio. A set of AI specialists that help me think through different parts of the work.

The music still starts with an idea I care about. Sometimes that is a story, sometimes a mood, sometimes a lyric, sometimes a sound I want to chase. The assistants help shape the process around that idea. They provide feedback, structure, formatting, alternatives, and occasional resistance when something is weak or unclear.

That last part matters.

A useful assistant should not just agree. If the idea is vague, it should say so. If the prompt is confused, it should help untangle it. If the promotion angle is weak, it should suggest a better one. If the visual idea does not match the song, it should challenge it.

That is where the value is for me.

The AI does not replace the creative decision. It makes the decision points more visible.

It gives me something to react to.

The start of the archive

This post is the beginning of documenting that workflow.

I want to write about how the system develops, what works, what fails, what costs too much, what becomes useful, and what turns out to be a dead end. I also want to be clear about the role of AI in the work, because there is no point pretending it is not part of the process.

For me, the interesting question is not “Can AI make a song?”

Clearly, it can make something that sounds like a song.

The more interesting question is: can a solo creator use AI tools deliberately, honestly, and creatively enough to build a body of work that has direction, taste, and continuity?

That is what I am testing.

The Hammurabi track was the first surprise. Ace was the first specialist. The rest of the team grew from there.

This is the start of that story.