Most AI video tools forgot the film part.
A cinematic AI canvas for short-form video. Plan, generate, script, and storyboard in one fluid workflow — no node spaghetti, no app-hopping, no slop. I joined as the founding designer and built the product from a blank canvas to a live public app.
The stakes
AI video tooling in 2026 is at the same place AI image tooling was in 2022 — spectacular demo clips, but a workflow that collapses the moment you try to actually make something. Filmmakers and creators who want to ship a 30-second piece end up with a tab of Runway open, a tab of Midjourney, a Notion doc with the script, a voice generator on the side, and a video editor at the end of the chain. The cuts never quite match because nothing in this chain shares state — the character in shot 3 has a different jawline than shot 1, and there's no single place that knows what the story is supposed to be.
The market response so far has been two extremes. Node-based canvases (ComfyUI and its descendants) give you precision and power, but the audience that can actually drive a graph is engineering-shaped, not film-shaped. Prompt-only tools collapse the other way — one input, one output, no narrative scaffolding, and the same character looks different every time you re-roll. There's a missing middle: a canvas that feels like filmmaking — script, characters, locations, shots, cuts — with the AI doing the generation underneath, not on top.
The problem
People who want to direct short-form video end up running their workflow across four to five disconnected tools — none of which share state — so characters drift, voices don't match shot-to-shot, and the script lives in a separate document from the cut. The current AI video tools are built either for prompt-engineers or for node-graph power users, not for the filmmaker who wants to think in scenes and shots and have the model generate inside that structure.
What we found
Where time actually goes
Talking to creators who'd shipped at least one AI-assisted short, the bulk of the time isn't generation — it's the orchestration: copying a prompt from Notion into Midjourney, downloading the still, uploading it to Runway, re-prompting the next shot to match the previous character, manually stitching cuts in Premiere. The generation step takes minutes. The plumbing takes hours.
Consistency is the failure mode
Across every workflow we observed, the single most damaging gap was character and location consistency between shots. A 6-shot ad takes 30 minutes to generate and another two hours to re-roll because shot 4's character has the wrong eye colour. The tools generate in isolation; the story needs them to generate as a set.
Filmmakers think in templates
Talking to creators who do this commercially — UGC ads, micro dramas, product shoots, animation cuts — they always start from a template in their head. Almost no one starts from a literal blank canvas. The most-loved tools we benchmarked were the ones that surfaced this implicit template as the explicit entry point.
Prompt-only chat fails
Chat-only interfaces work for ideation but fall apart at the production step. There's nothing for the user to point at, edit, or override at the asset level — they're stuck re-prompting until they give up. The canvas needs to be the source of truth; chat is a control surface on top of it, not the surface itself.
Options considered
Option A — Node-based canvas (a ComfyUI for video)
RejectedMaximum power, maximum learnability cost. The audience that can drive a node graph isn't the audience we wanted to serve — that audience already has tools. Node UIs also actively discourage thinking in cinematic terms: a node graph is about data flow, not about scenes and shots.
Option B — Single chat with model orchestration behind it
RejectedEasy to ship, easy to demo. But chat with no visible state means every re-roll is a re-prompt, the user can't point at the thing they want to fix, and consistency between shots becomes an LLM-memory problem. Shipping it would have meant shipping the same gap the existing prompt-only tools have.
Option C — A filmmaking-shaped canvas with chat layered into it
ChosenThe canvas is the source of truth — script, characters, locations, storyboard, shots — and the chat sits inside it as a control surface for generation and edits. The model has shared state across every asset, so character and location consistency stops being a re-prompt problem and starts being a property of the canvas itself.
Tradeoffs
Single-user canvas in v1, no team collaboration
Real-time multi-user editing is one of the most expensive things a canvas product can take on. We chose to ship single-user, single-project first — every creator works in their own canvas, no presence, no comments, no shared cursor. The cost: studios and agencies who want to review-in-place have to export and review out-of-band for now. We accepted this because the v1 audience is solo creators making one piece at a time, not teams; multi-user is on the roadmap when we have user data telling us how teams actually want to collaborate, instead of guessing.
Web-only, no native app
Most professional video tooling lives on the desktop — Premiere, Final Cut, DaVinci. We chose web for v1 because it ships faster, updates instantly, and is the right surface for AI-generation-heavy workflows where the compute is server-side anyway. The cost: large project files and long renders feel less native than a local app, and some power users will compare us to desktop tools and find us slower. Native apps come once the web product proves the workflow.
Capability-led entry over job-shaped templates
The home surface is four capability cards — Generate Image, Create Video, Write Script, Storyboard — not a grid of job templates like "Micro Drama" or "UGC Ad". We tested the template-led entry earlier; it scaffolded too aggressively and locked users into a shape before they knew what they wanted. Capability cards are slower to start (the user has to know they probably want to script before generating), but they don't constrain the work. The chat input below the cards is the catch-all — type what you want, and the assistant routes you to the right capability without making the choice on your behalf.
One generation provider per asset class in v1
For each asset class — voices, characters, locations, shots, video — we picked one provider and locked it in. Multi-provider routing (try voice on ElevenLabs, fall back to PlayHT) is the obvious v2 move, but routing well is its own product problem. Shipping one curated stack now lets us tune prompts and parameters deeply per provider, and gives users a consistent quality bar instead of a lottery.
What we built
Generate Image — visuals as first-class canvas objects
The entry point for stills. Type a prompt, get an image; the image lands on the canvas as a draggable object, not a download. Same character refs, same style controls, regeneratable in place. The canvas — not a downloads folder — is where the work lives.

Asset library — consistency as a data model
Characters, locations, and voices are persistent objects in the project, not one-off prompt outputs. A character has a name, appearance, personality, and a reference voice; a location has a description and conditions; a voice is regeneratable inline. Every generation in the canvas — whether the user is in Generate Image, Create Video, Write Script, or Storyboard — reads from the same library, so the character in shot 12 is the same object as in shot 1. The consistency that defeats most multi-shot AI workflows isn't a re-prompt problem here; it's the data layer underneath all four entry points.

Write Script — scenes you can edit, not transcripts you can't
Drafting and editing the narrative scene by scene. The model proposes a scene; the user can edit any scene in prose, regenerate a single scene, or rewrite the whole arc. The script view is intentionally not a chat — chat is for direction, not authorship — so the writer has a stable surface to work in, with full keyboard editing, before proceeding to the storyboard.

Storyboard — the canvas where the cut lives
Every scene breaks into shots; every shot has its own description, dialogue, image prompt, keyframes, and video prompt — all visible at once, all editable in place. Generate keyframes for a shot and they appear inline. Generate the video for the shot and it slots in. The Storyboard tab is the planning surface; the Video Editor tab is the final assembly — both read from the same canvas state, so a change in one shows up in the other without an export step.
Design targets
KYNE is live and public at kyneai.com — the team is running early acquisition through prompt-to-pixels and AI storytelling challenges. Usage numbers are still early; the metric we're really watching is not weekly active users yet, it's the ratio of projects that reach a rendered cut versus projects that get abandoned at the storyboard step. The hypothesis the product is testing is that having the script, characters, and storyboard all in one canvas with shared state cuts that drop-off compared to multi-tool workflows. We don't have enough data to claim that yet — that's the first thing we'll instrument and report on.
What's next
Team canvases
Real-time multi-user editing for studios and agencies — presence, comments, review mode. The hardest single product addition on the roadmap, gated on having enough user data to design it properly rather than guess.
Multi-provider generation routing
Each asset class today is locked to one provider. As the model landscape changes monthly, the v2 system needs to route automatically — quality-first for hero shots, cost-first for iteration, with the user invisible to all of it.
Native render pipeline
Today the cut is assembled in-app but the heavy render still depends on the upstream provider. Bringing the render in-house means tighter feedback loops on iteration and full control over output quality.
Desktop apps
Web-first ships faster; desktop wins for the people who treat the canvas as a daily-driver. Native macOS first, Windows after.