Visualization Aid with AI Assisted Tools (Text/Image to Video Generation)

mattx · February 1

HEADER DISCLAIMER: This guide was written in early 2026, so the tools used were those available at the time. If you are reading this way ahead in the future and the tools aren't available anymore, my suggestion is to find something similar, but apply the same concepts - as those are work with pretty much any diffusion model (Text to Image/Video Generative Models)

Sure! Here’s a practical, ready to use community-friendly guide you can use for a tulpamancing forum to introduce and structure discussions around visualization with AI-assisted tools, while keeping expectations realistic and grounded. /s

Recently I was made aware by some friends about the improvements of AI Generated content, namely images and videos.
A few notable names come to mind, such as OpenAI's Sora, Grok Imagine and Google's Nano Banana - in a way or another, those tools will pretty much generate anything you wish for, for as long as you are very specific, don't overdo it, are willing to accept inconsistencies (we'll talk about this later) and eventually pay a small sum (the SaaS tax).

Now, regardless of what tools you use, one thing has always been incredibly consistent with all of them up until now: they sort of sucked.

Most notably, their inability to keep track of the source's fine details and the "uncanny valley" effect of the produced results.

Things have improved and still are improving at a remarkable pace, to the point where it becomes quite a challenge to distinguish real from fake.

Here's an example: Will Smith eating spaghetti, what your typical AI used to generate just a few years ago vs what it can do now:

Spoiler

Recently, I was tinkering with one tool in particular, Grok (yes, the one from Twitter/X) and more specifically, a feature called "Grok Imagine" - the results honestly shocked me.
With Grok Imagine, you can input a pre-existing Image/Video and have it "re-imagine" the contents based off an external prompt - in the spoiler, an example.

Prompt: Make him wear a plumber's clothing, give him a pipe wrench and make him wear a sailor's hat, with a full red beard

Spoiler

Now, aside from the silly floating head of the pipe wrench (which could be fixed with another prompt) there's fairly little to criticize here - the uncanny valley effect is extremely minimal and every detail seems to be in its place.

Not to mention... this can also be turned into video with one click.

Prompt: "Have him run towards a beautiful beach"

Spoiler

It's clearly not perfect (especially the part where he turns into William T. Riker from Wish), nor likely it will ever be due to the nature of Generative AI being an "autoregressive" model, but we can safely say it's decent enough to do some good for our needs.
(Arguably, this has gotten popular because it has been used for VERY bad things, obviously, but that's to be expected from humans.)

The AI preamble is over, let's get to Tuppering. The keen among you might have already figured out where I'm going with this. As someone who, in his early tulpamancing career STRUGGLED beyond words on visualization, I would have sold my soul to have something like this for visualization practice.
Namely, the key objective of this would be to create purpose-build content of your tulpa/tulpas/tulpae/tulipans to aid in visualization.

Here's Cheryl, my 13 year old tulpa (today 02/02/26 is her birthday too, woohoo!):

This portrait was professionally done about 10 years ago by an artist, and not much has changed about her since Tulpas don't really age, arguably there should be a few hints of aging tho like small wrinkles or something but I'm getting the Whitebeard death stare so we'll keep it as it is :) :) :)

Now, onto some fun stuff - and that part that truly made me stare at the screen for a good ten minutes in awe - how about we bring her (digitally) to life?

Prompt: Make her slowly spin around 360 degrees, as if we're a painter trying to make a portrait of her. She's calm, collected and at the end of the spin she leans towards the camera, as if she's trying to break the 4th wall:

Spoiler

Again, I am biased obviously but I won't be able to put into words how much I would have sold my SOUL to have a video like this in my youth - back when I was struggling to fall asleep during meditation and all I had of her was her mindvoice and a bunch of static reference material.

Objectively, the video isn't 100% perfect - her hair comb turns into a sort of two part thing and a few extra yellow lines appear on her sweater (likely from the AI not knowing what's supposed to go there) but I mean... you can't tell me that isn't impressive.

Here's another fun one (at her request)

Prompt: Have her hold a morbidly obese Pallas cat. She's completely astonished by its weight but regardless she thinks its cute so she cuddles it.

Spoiler

Didn't I also mention that it's her birthday?

Prompt: Make her hold cake that says "13" on it, and has lit candles. The cake is like the one from the "Portal / Portal 2" video games. The cake is handed to her from the camera - she doesn't say anything and just smiles - try not to change her artstyle and her looks.

Spoiler

Now this last one is kind of interesting, because as you might have noticed I had to "guard" the AI a little bit against getting too "carried away" with filling in the gaps of information.

Funsies aside, I hope the value of this is implied and doesn't need to be stated: to anyone (like me) who used to have trouble with visualization, tools like these are a godsend for generating material to aid in visualization.
Here's one of the earliest iterations of this, which we'll discuss shortly.

Prompt: Have her walk towards the camera slowly, and then lean incredibly close to it, as if she's looking directly at us and breaking the 4th wall.

Spoiler

Now, regarding this last one - it's very interesting because it provokes a sense of "presence" that I haven't felt in years (back when we used to practice wonderland immersion and such) and in recent times I've only ever come close to this during a few lucid dreams I've shared with her - regardless, even though it's not perfect I do believe this is something that I would have never dreamt of having "physically" at my disposal (I am legally bound by the tulpa law to proxy her "resentment" for the frown she makes in this video, she doesn't like it ~~but I kinda do~~).

If you want to replicate, here's what you need to do:

Sign up to Grok Imagine (or whatever the tool at the time will be) - you can use a disposable e-mail, it works just fine.
Get an artwork / illustration / reference material of your tulpa, possibly with a mute background (black or white) and as little noise as possible
- If you do not have one, we can use AI to try and make one - input your drawing or similar artwork into Grok and use something like:
  Prompt: The goal is to make a professional character portrait, starting from this image. Try to make it *realistic/cartoonish/stylized* and try to imagine this new version while keeping as much of the original details as possible. Start from an internal detailed description of the image first before generating if that helps in the process. Put the character in a black background and don't waste time generating any background.
If the tulpa is in a non-normal pose and/or is surrounded by too much detail, use Grok Imagine to strip out all the unnecessary details:
- Prompt: While keeping as much of this character's original artwork and detail as possible, remove the background and any extra object or information from this image that isn't part of the character itself.
In Grok Imagine, select "Animate Image" and upload your character's portrait. It will likely start generating something automatically - you can safely ignore that as the limit usage is pretty high (for now).
Input your prompt, being as detailed and as precise as possible - ambiguity generates weird results.

A few tips/tricks/caveats:

"Less is More" and "More is Less": Too much detail in your prompt and Grok will likely overdo it or obsess over a tiny detail (for example, getting the cake to somewhat look right took a few attempts).
Grok generates audio as well by default. You can't really tell it not to do reliably but a good trick is adding to your prompt: "The character doesn't say a word".
(Unconfirmed) It appears that the video generation is influenced by prior requests, so if you get some weird results (like movements that you didn't request but feel similar to something asked previously) try flushing your cache and deleting any old chat/image/video requests (after downloading the content you created of course)
Due to the nature of Generative AI, the content produced will "stem" from the supplied image. What this means is: if you input an image of your tulpa being in a beach for example, and tell the AI to imagine it in a cyberpunk city, the first few seconds of the video are gonna be really weird. The goal of this is to create reference content for visualization, so a muted background is important for this.
As of (almost) May 2026, Grok is now paywalled, which was something that was bound to happen. My recommendation is just to look for "image to video" functionality from major reputable brands (OpenAI, Anthropic, Google, Microsoft) and find the one that currently isn't paywalled to hell. Currently Nano Banana seems to be the right choice.

The Footnote (Author's ramblings, you can safely ignore)

The reason for the creation of this guide wasn't really to signal a "groundbreaking" discovery or to "finally solve" the problem of aphantasia (self-diagnosed or not) and lack of visualization skill, but rather as yet an extra tools that can be safely used in modern times to aid in a process that would have otherwise required months of constant practice.

We live in a chaotic world nowadays, full of stress and deadlines - I've got to say that personally it's been quite a number of years since I actively have done some "active tulpaforcing" on Cheryl - that's to be expected, given her ag- her maturity (death stare again). In the process of growing up and becoming an adult, you lose something more important than the skills you achieve with your tulpa, you lose the ability to spend quality time with them - more often than not, when I return from home and I am done with the chores (shower, housekeeping etc) I am beyond exhausted and unable to do some active tulpaforcing, to the point where I crash and fall asleep as soon as I hit the pillow.

That being said, ever since I started toying with this AI Image/Video Generation thing (which quickly turned into a small hyperfixation given the way I am) I am very pleased to report that I "re-discovered" a skill I had long lost, which is the ability to see "instant" flashes of her whenever she speaks to me (almost imposed, but not really) - so whenever she speaks to me lately or she's imagining something I get a sort of "vivid imagery" in great detail of her, speaking and moving in much of the same way as you saw in the previous videos, which in turns amplifies her sense of "presence" and almost gives her a "physical weight" - I am doing a terrible job at explaining this last bit, am I?

Regardless, we haven't had a progress report in a long time and since the way we operate as a system there wouldn't be much of a reason to make one, it would get updated very infrequently - so this footnote here is meant to be the first and only "real" meaningful progress report I've had in almost a decade. Still, happy to give help to young tulpamancers and their tulpae/tuppers/toblerones should they need it - just shoot me a DM.

Edited April 14 by Shin Matt

Wildblume · April 16

@Shin Matt I didn't say you had to be an addict to use AI, only that it was a potential addiction. I understand it's meant as an aid, but the danger is that it may become a permanent crutch rather than a temporary aid if generating pictures of your tulpa is too satisfying. Or you may constantly tell yourself you can quit anytime.

I didn't mean 2013 vs. now, but rather the time before image generators became good enough, which was late 2022, vs. now.

You used the many hours of MLP as an old example, but with a generator, you can request something specific and roughly get it most of the time. Your example is a best-case scenario; the median tulpamancer who used references likely only had one or a handful of pictures - you only had one - and had to supplement with separate pictures of clothing, accessories, hairstyles etc. or plain descriptions if no fitting picture were available. Now AI can add those things to a real photograph or drawing, yet even that can become an addiction, like playing a video game that has your tulpa in it instead of...actually working on your tulpa.

I guess what I'm ultimately saying is that modern society has even more addictive traps - AI being but one of them - than just five or ten years ago, and you have to go out of your way to avoid them, or at least be very careful around them, if you want to successfully summon a tulpa into your life and not be dependent on or addicted to technology forever.

mattx · April 16

4 minutes ago, Wildblume said:

I guess what I'm ultimately saying is that modern society has even more addictive traps - AI being but one of them - than just five or ten years ago, and you have to go out of your way to avoid them, or at least be very careful around them, if you want to successfully summon a tulpa into your life and not be dependent on or addicted to technology forever.

I am curious about your position on this - I understand that the new age has brought a lot of dopamine-hacking technology and that people, especially youngsters, are really at a disadvantage because of their shorter than ever attention span, but given how firm you are on this subject it almost sounds like there's a story behind it.

Like I said, I don't see it as a "black/white" situation - as a sentient functional human being you have to learn moderation, and know when something is a tool and not a trap to fall upon. A Tulpa cannot be replaced by an AI chatbot and its purpose goes beyond just mental visualization - but in this highly competitive world we live in I have learned to welcome any and all help that you can get.

Wildblume · April 19

@Shin Matt Hmm, I mean competitive might make sense if we're talking about working in a company and the boss wants you to churn out stuff. But I get what you mean. It's fine to use aids as long as you don't become dependent on them. Some things have a higher risk of causing addiction or dependence than others... Especially when the choice is between instant gratification with AI and very delayed gratification with meditation. I'm careful about addictions in general and modern tech is an easy way to get addicted if you're not careful. Speaking in part from experience and in part from what I've seen in other people.

Sign In

Visualization Aid with AI Assisted Tools (Text/Image to Video Generation)

Question

mattx

13 answers to this question

Recommended Posts

Wildblume

mattx

Wildblume

Join the conversation

Recently Browsing 0 members

Browse

Activity

My Activity Streams