Visualization Aid with AI Assisted Tools (Text/Image to Video Generation)

mattx · February 1

HEADER DISCLAIMER: This guide was written in early 2026, so the tools used were those available at the time. If you are reading this way ahead in the future and the tools aren't available anymore, my suggestion is to find something similar, but apply the same concepts - as those are work with pretty much any diffusion model (Text to Image/Video Generative Models)

Sure! Here’s a practical, ready to use community-friendly guide you can use for a tulpamancing forum to introduce and structure discussions around visualization with AI-assisted tools, while keeping expectations realistic and grounded. /s

Recently I was made aware by some friends about the improvements of AI Generated content, namely images and videos.
A few notable names come to mind, such as OpenAI's Sora, Grok Imagine and Google's Nano Banana - in a way or another, those tools will pretty much generate anything you wish for, for as long as you are very specific, don't overdo it, are willing to accept inconsistencies (we'll talk about this later) and eventually pay a small sum (the SaaS tax).

Now, regardless of what tools you use, one thing has always been incredibly consistent with all of them up until now: they sort of sucked.

Most notably, their inability to keep track of the source's fine details and the "uncanny valley" effect of the produced results.

Things have improved and still are improving at a remarkable pace, to the point where it becomes quite a challenge to distinguish real from fake.

Here's an example: Will Smith eating spaghetti, what your typical AI used to generate just a few years ago vs what it can do now:

Spoiler

Recently, I was tinkering with one tool in particular, Grok (yes, the one from Twitter/X) and more specifically, a feature called "Grok Imagine" - the results honestly shocked me.
With Grok Imagine, you can input a pre-existing Image/Video and have it "re-imagine" the contents based off an external prompt - in the spoiler, an example.

Prompt: Make him wear a plumber's clothing, give him a pipe wrench and make him wear a sailor's hat, with a full red beard

Spoiler

Now, aside from the silly floating head of the pipe wrench (which could be fixed with another prompt) there's fairly little to criticize here - the uncanny valley effect is extremely minimal and every detail seems to be in its place.

Not to mention... this can also be turned into video with one click.

Prompt: "Have him run towards a beautiful beach"

Spoiler

It's clearly not perfect (especially the part where he turns into William T. Riker from Wish), nor likely it will ever be due to the nature of Generative AI being an "autoregressive" model, but we can safely say it's decent enough to do some good for our needs.
(Arguably, this has gotten popular because it has been used for VERY bad things, obviously, but that's to be expected from humans.)

The AI preamble is over, let's get to Tuppering. The keen among you might have already figured out where I'm going with this. As someone who, in his early tulpamancing career STRUGGLED beyond words on visualization, I would have sold my soul to have something like this for visualization practice.
Namely, the key objective of this would be to create purpose-build content of your tulpa/tulpas/tulpae/tulipans to aid in visualization.

Here's Cheryl, my 13 year old tulpa (today 02/02/26 is her birthday too, woohoo!):

This portrait was professionally done about 10 years ago by an artist, and not much has changed about her since Tulpas don't really age, arguably there should be a few hints of aging tho like small wrinkles or something but I'm getting the Whitebeard death stare so we'll keep it as it is :) :) :)

Now, onto some fun stuff - and that part that truly made me stare at the screen for a good ten minutes in awe - how about we bring her (digitally) to life?

Prompt: Make her slowly spin around 360 degrees, as if we're a painter trying to make a portrait of her. She's calm, collected and at the end of the spin she leans towards the camera, as if she's trying to break the 4th wall:

Spoiler

Again, I am biased obviously but I won't be able to put into words how much I would have sold my SOUL to have a video like this in my youth - back when I was struggling to fall asleep during meditation and all I had of her was her mindvoice and a bunch of static reference material.

Objectively, the video isn't 100% perfect - her hair comb turns into a sort of two part thing and a few extra yellow lines appear on her sweater (likely from the AI not knowing what's supposed to go there) but I mean... you can't tell me that isn't impressive.

Here's another fun one (at her request)

Prompt: Have her hold a morbidly obese Pallas cat. She's completely astonished by its weight but regardless she thinks its cute so she cuddles it.

Spoiler

Didn't I also mention that it's her birthday?

Prompt: Make her hold cake that says "13" on it, and has lit candles. The cake is like the one from the "Portal / Portal 2" video games. The cake is handed to her from the camera - she doesn't say anything and just smiles - try not to change her artstyle and her looks.

Spoiler

Now this last one is kind of interesting, because as you might have noticed I had to "guard" the AI a little bit against getting too "carried away" with filling in the gaps of information.

Funsies aside, I hope the value of this is implied and doesn't need to be stated: to anyone (like me) who used to have trouble with visualization, tools like these are a godsend for generating material to aid in visualization.
Here's one of the earliest iterations of this, which we'll discuss shortly.

Prompt: Have her walk towards the camera slowly, and then lean incredibly close to it, as if she's looking directly at us and breaking the 4th wall.

Spoiler

Now, regarding this last one - it's very interesting because it provokes a sense of "presence" that I haven't felt in years (back when we used to practice wonderland immersion and such) and in recent times I've only ever come close to this during a few lucid dreams I've shared with her - regardless, even though it's not perfect I do believe this is something that I would have never dreamt of having "physically" at my disposal (I am legally bound by the tulpa law to proxy her "resentment" for the frown she makes in this video, she doesn't like it ~~but I kinda do~~).

If you want to replicate, here's what you need to do:

Sign up to Grok Imagine (or whatever the tool at the time will be) - you can use a disposable e-mail, it works just fine.
Get an artwork / illustration / reference material of your tulpa, possibly with a mute background (black or white) and as little noise as possible
- If you do not have one, we can use AI to try and make one - input your drawing or similar artwork into Grok and use something like:
  Prompt: The goal is to make a professional character portrait, starting from this image. Try to make it *realistic/cartoonish/stylized* and try to imagine this new version while keeping as much of the original details as possible. Start from an internal detailed description of the image first before generating if that helps in the process. Put the character in a black background and don't waste time generating any background.
If the tulpa is in a non-normal pose and/or is surrounded by too much detail, use Grok Imagine to strip out all the unnecessary details:
- Prompt: While keeping as much of this character's original artwork and detail as possible, remove the background and any extra object or information from this image that isn't part of the character itself.
In Grok Imagine, select "Animate Image" and upload your character's portrait. It will likely start generating something automatically - you can safely ignore that as the limit usage is pretty high (for now).
Input your prompt, being as detailed and as precise as possible - ambiguity generates weird results.

A few tips/tricks/caveats:

"Less is More" and "More is Less": Too much detail in your prompt and Grok will likely overdo it or obsess over a tiny detail (for example, getting the cake to somewhat look right took a few attempts).
Grok generates audio as well by default. You can't really tell it not to do reliably but a good trick is adding to your prompt: "The character doesn't say a word".
(Unconfirmed) It appears that the video generation is influenced by prior requests, so if you get some weird results (like movements that you didn't request but feel similar to something asked previously) try flushing your cache and deleting any old chat/image/video requests (after downloading the content you created of course)
Due to the nature of Generative AI, the content produced will "stem" from the supplied image. What this means is: if you input an image of your tulpa being in a beach for example, and tell the AI to imagine it in a cyberpunk city, the first few seconds of the video are gonna be really weird. The goal of this is to create reference content for visualization, so a muted background is important for this.
As of (almost) May 2026, Grok is now paywalled, which was something that was bound to happen. My recommendation is just to look for "image to video" functionality from major reputable brands (OpenAI, Anthropic, Google, Microsoft) and find the one that currently isn't paywalled to hell. Currently Nano Banana seems to be the right choice.

The Footnote (Author's ramblings, you can safely ignore)

The reason for the creation of this guide wasn't really to signal a "groundbreaking" discovery or to "finally solve" the problem of aphantasia (self-diagnosed or not) and lack of visualization skill, but rather as yet an extra tools that can be safely used in modern times to aid in a process that would have otherwise required months of constant practice.

We live in a chaotic world nowadays, full of stress and deadlines - I've got to say that personally it's been quite a number of years since I actively have done some "active tulpaforcing" on Cheryl - that's to be expected, given her ag- her maturity (death stare again). In the process of growing up and becoming an adult, you lose something more important than the skills you achieve with your tulpa, you lose the ability to spend quality time with them - more often than not, when I return from home and I am done with the chores (shower, housekeeping etc) I am beyond exhausted and unable to do some active tulpaforcing, to the point where I crash and fall asleep as soon as I hit the pillow.

That being said, ever since I started toying with this AI Image/Video Generation thing (which quickly turned into a small hyperfixation given the way I am) I am very pleased to report that I "re-discovered" a skill I had long lost, which is the ability to see "instant" flashes of her whenever she speaks to me (almost imposed, but not really) - so whenever she speaks to me lately or she's imagining something I get a sort of "vivid imagery" in great detail of her, speaking and moving in much of the same way as you saw in the previous videos, which in turns amplifies her sense of "presence" and almost gives her a "physical weight" - I am doing a terrible job at explaining this last bit, am I?

Regardless, we haven't had a progress report in a long time and since the way we operate as a system there wouldn't be much of a reason to make one, it would get updated very infrequently - so this footnote here is meant to be the first and only "real" meaningful progress report I've had in almost a decade. Still, happy to give help to young tulpamancers and their tulpae/tuppers/toblerones should they need it - just shoot me a DM.

Edited April 14 by Shin Matt

Saruzer · February 1

Wow, that was insane. Like I remember couple of years ago, I tried to use some AI image generative tools to get more of Pearl references. It was... bad. Really bad. But now, seeing how far the AI went already... I am in loss of words. The AI by itself is a tool and the person behind it is the one who is responsible for its usage. I am more concerned how horrible, disgusting and addicted humans will be having instruments that are capable of something like that(heard about people being addicted to character.ai lol)... But besides that, this is really a blessing for others. Don't think I will use it. I already have a decent visualization and other practices that allows me to see Pearl detailed enough. I just don't really feel any "life" and "soul" from content that AI generates. But as a reference, to observe and analyze the details it can be pretty useful. Just don't forget that you have your real tulpas to spend time with.

Ranger · February 3

Oh wow, that's really impressive. Those videos are surprisingly really clean.

It makes me sad that some people may turn to AI than learn visualization skills. But for the immensely frustrated, this could be a life saver. And I do mean that literally. I would much rather people be happy than suffer working hard only to get abysmal results. Or even if they are impatient and just want creation now, and maybe build their visualization skills later.

I'm a stickler. We are skeptical of AI and don't want to depend on it. We made significant progress making art, even if it isn't professional grade. Our visualization skills are decent. I want to teach people the skills because I love teaching this practice, despite my personal issues.

But my wants are not always what other people need. And some people need this. Thank you for sharing.

mattx · February 3

7 hours ago, Ranger said:

It makes me sad that some people may turn to AI than learn visualization skills

Just for the record before this thread gets too long and people get the wrong idea: this isn't meant to be a replacement for visualization practice, but I thought that was obvious?

Akin to how an artist might use multiple places of reference material instead of "just winging it using his mind", this is meant to be a tool to create reference material for your brain to fill in the gaps of visualization that would otherwise be missing, that wouldn't make you a better/worse tulpamancer (especially given how watered down the process of creating a tulpa has become in the past 10 years)

Whether you like AI or not, the impact it's had on the world is undeniable, and many people have used it to jumpstart their career in fields that would have normally had an almost unaccessible learning curve (think coding, for example) - I am not an AI enthusiast or anything, just a community boomer sharing a tool that could be helpful to someone who might have struggled in the same way I did back when these technologies didn't exist.

Ranger · February 3

3 hours ago, Shin Matt said:

Just for the record before this thread gets too long and people get the wrong idea: this isn't meant to be a replacement for visualization practice, but I thought that was obvious?

I'm aware. I'm just cynical about AI usage- people tend to use chat/image AI to talk for them or skip doing art all the time. I know there are people who will use AI as a jumping point. But I'm also aware people will skip visualization skills if given the chance- a lot of people just want the creation part.

But like I said earlier- this is me airing out my sadness. It's on-topic because I expect others to feel similarly, and I hope that my post validates their feelings while pointing out why someone could be better off turning to AI.

It's not a problem with your guide, it's a problem with how people use AI. Also in my case, me griping because not everyone cares about learning the skills, and this practice is my favorite thing 😅

cptyossarian · February 26

im not rly on the AI train, like the other person, but i also used it for a reference. i tried every other option first tbh. my mind's eye is very weak and i can barely visualise anything for a split-second, but it's easier to remember smthn ive already seen. i wanted my tulpa to look like a real person, not drawing, n the first thing i wanted to use was Metahuman creator, which i used in the past, but now it's tied to unreal engine which my computer can't run. i spent a couple days struggling with other realistic character creators, but they either had too crappy graphics or not enough customisation. i considered photoshopping (despite my low skill) but couldnt find any decent reference images n it'd just look like a nightmare. and using a real person's selfie would feel weird. finally i got a picture from chatgpt and it's just perfect

The Incans · February 26

I use video games pick one with a character creator ... make it look like your Tulpa and then it puts them in an adventure so instead of just having video's of a few secs with plain background you're in a virtual reality with your Tulpa.

what we do too is research games and find out what NPC's look like... if there's one that looks most like me or Jess than our created character is the other person. we can then do role play and pretend we're both hanging out in the world adventure...sometimes conversation is limited in the game so we turn the voice sound off and create our own dialogue to fit with the situation ....it's often hilarious!

Currently we're playing Hogwarts Legacy..I (host) am NPC Poppy Sweeting as she looks like I did when younger and Jess has made her character look like her, it's her adventure. Jess is a Kitsune (shapeshifter) so we have a lot of flexibility with her also being able to play multiple NPC's or animal sidekicks too if our character is based on me.

Albireo · February 28

At first, I thought this would be a guide on how to use AI to overlay a tulpa's image onto reality (for example, using VR glasses, a webcam feed of your room, or something similar)

I was already preparing to be skeptical because I believe those methods are illusory and lead only to an illusion (after all, the tulpa wouldn't actually be controlling those bodies)

That being said, for the early stages of visualization, AI isn't bad as a means of creating pictures or short videos to assist in visualization practice

BUT. Regardless, one still needs to set limits and develop their own imagination to see significant results in forcing (I use it for this way too, but I try to do this very rarely)

AI can be used as a supplementary tool for beginners, and only from time to time when it is truly necessary

mattx · April 14

Added a small edit to signal that unfortunately Grok Imagine has been paywalled, which means you need to buy X Premium to use it (and it has been severely limited.)
As things like these continue to evolve, there's no definite set in stone method - my best recommendation right now is to use Google's Nano Banana as my guess is that it'll be the last one to get paywalled due to Google's revenue strategy being around ads, and not premium subscriptions.

Wildblume · April 15

I can see how image and video generators can be useful. The lack of a good visual reference of one's tulpa used to be a common problem. The problem is that it's way too easy to just continue making more images and videos. The button is right there. You have to go out of your way not to if you want to make progress on what you actually came for: Your tulpa. Otherwise you're just an AI gooner.

In earlier times, if you based your tulpa's appearance on somebody, all you had was photos or drawings of that person or character. You knew you couldn't just make more. This forced you to move on. And many people didn't base their tulpa's appearance on anything but let it come spontaneously. But now? AI is a doubled-edged sword. If you have the willpower to only create the necessary references to give your imagination something to work with and nothing more, go ahead. But if you have low willpower - if you can't resist eating the metaphorical marshmallow on the table - then don't touch AI unless you want to add another addiction to your life and feel miserable about lack of tulpa progress months or years later with nothing to show for it but a very large folder of “references”.

mattx · April 15

5 hours ago, Wildblume said:

You have to go out of your way not to if you want to make progress on what you actually came for: Your tulpa. Otherwise you're just an AI gooner.

5 hours ago, Wildblume said:

If you have the willpower to only create the necessary references to give your imagination something to work with and nothing more, go ahead. But if you have low willpower - if you can't resist eating the metaphorical marshmallow on the table - then don't touch AI unless you want to add another addiction to your life and feel miserable about lack of tulpa progress months or years later with nothing to show for it but a very large folder of “references”.

I'm sorry, but I disagree.

Aside from the fact that you don't have to necessarily be a "gooner" (god I'm old for these terms) to use AI, like it has been said repeatedly in this thread this isn't meant to be a way to stop practicing visualization altogether, or replace it. It is meant to be a tool for people who, for one reason or the other, can't do proper visualization and therefore struggle in the "quality" of their forcing sessions.

You sound like an old timer, so let me give you an example: this isn't Fede's Tones, or anything like that. It's just yet another angle at a problem as old as the community itself. If you find it useful, good - if you don't, plenty of other guides out there, some 20x more wild than using AI to generate reference imagery of your tulpa tbh (remember the pony hypnosis? lol)

5 hours ago, Wildblume said:

In earlier times, if you based your tulpa's appearance on somebody, all you had was photos or drawings of that person or character. You knew you couldn't just make more.

That makes no sense - especially if you wanted a tulpa based off a popular show's character. You had INFINITELY more content than AI would ever give you.

For example, let's take your average Joe creating a tulpa in early 2014.

Fresh out of the My Little Pony hype (those were the days *sip*) he joins the community and starts the process. Obviously, it's either Rainbow Dash or Pinkie Pie or Vinyl.

What does he have to work with?

According to the evil ChatGPT:

Season 1 (2010–2011): 26 episodes
Season 2 (2011–2012): 26 episodes
Season 3 (2012–2013): 13 episodes
Season 4 (2013–2014): 26 episodes

Each episode with a runtime of about 20 minutes (not accounting for the opening/ending). 20 minutes is 1200 seconds, at 24 fps we're talking on average around 28000 frames per episode, multiplied by the number of episodes we're talking roughly 2.6 million frames of reference, or 30 hours of straight footage.

Let's not mention the fanart. Let's not mention the youtube content. Let's not mention the real gooning material (r34 unfortunately didn't fail that time) and you have an almost endless source of reference material.

Now, of course not every frame of every episode is gonna have Joe's tulpa form in it, but it'll contain reference material nonetheless about how the cartoon horses move, the way the mouth speaks, the color palette, finer details etc.

So again: this "crusade" about generative AI feels kind of out of place for the reasons mentioned above.

Skipping our pony era (which thankfully lasted only a month or two) Cheryl's form has been the same since 2013 - the original reference art was one image off of DeviantArt. One.

That's all I had, because the artist disappeared and any attempts for me to contact her were fruitless.

So, for years, I had to rely on friends doing drawings of her or hire artists to create variations that were always off on the style that I actually liked. Of course, as the years go by and your tulpa reaches double digits in age you don't really need to rely on these as much, you literally grew up with that fixed idea in mind, so it becomes almost automatic to picture her in detail - but that doesn't mean I wouldn't have paid with my blood to have a tool like Grok in my youth to generate more stable reference material because let me tell you, if you were an adhd overthinking teenager like I was back then, every forcing session (that didn't end prematurely by me falling asleep) were something that not even Picasso under LSD could dream of seeing. Her body was always too tall/too short, her hairstyle was all over the place, and when she turned her face in the wonderland I could literally feel my brain going like: "uhhhh what do we do now boss?" because the details were just not there.

Some people had it easier of course, but not everyone was like that.

I do agree with one bit though, there were people with "self-diagnosed aphantasia" that claimed they couldn't see a thing, but those were more often than not just really lazy people that never actually put in the effort.

Ending this way too long reply as my food's cooked, there's something to say about discipline and the modern community too, given how "liberal" it became and how every mental phenomena is now considered a full blown tulpa - those who were there back then know what I am talking about, but there's a difference (or, there used to be a difference) between tulpas made with years of trial and error and those that were made in a day or two - one of the reasons why I have that message in my signature, by the way.

Sign In

Visualization Aid with AI Assisted Tools (Text/Image to Video Generation)

Question

mattx

14 answers to this question

Recommended Posts

Saruzer

Ranger

mattx

Ranger

cptyossarian

The Incans

Albireo

mattx

Wildblume

mattx

Join the conversation

Recently Browsing 0 members

Browse

Activity

My Activity Streams