Stylometric analysis of a tulpa (example analysis of my system + guide on how to perform it on yours eventually)

Desu · April 6

Stylometry is the quantitative application of stylistics. What that means is that it allows you to analyse the style of a text, and compute the difference between them, using various statistical methods. It's not limited to texts, but we won't care about that for now.

It also can be used to attribute authorship to anonymous texts. Usually it's not enough to be used as, for example, a standalone court evidence, but quite enough to be a good argument in academic research or plagiarism investigation.

I think you can instantly see how that applies to tulpas. Wouldn't that be the objective piece of evidence that your tulpa is an independent, distinct person? And how strong would that evidence be?

Immediate objection is the fact that mimicking or obfuscating a style is very much possible. There is a term for that - adversarial stylometry - but it usually involves algorithmic alterations, for example by running a text through paraphraser, or, more recently, asking a chatbot to rewrite it. Obviously we won't be doing shenanigans like that with our tulpas.

Counter-objection might be consistency. Obviously, the whole idea is founded on an assumption that a person has a unique style which they will naturally gravitate towards. And if they can mimic or alter a style, in the long run they should on average slip up, get tired - in other words, it's not consistent. But we tulpamancers are all about consistency, right?

I'll be using stylo R package. R is programming a language used for statistical analysis. I don't specialise in it, but i am familiar with it.

The package offers a number of unsupervised methods useful for stylometric analysis. If you don't know what unsupervised means, you may just think of it as "impartial" - these algorithms assume nothing, you provide them no reference data, they simply compute distances between texts in various way. And the output is something for a human to interpret.

To analyse written texts we need, well, texts. The more the better.

Stylometry usually analyses things like novels, and a minimum word count for a novel is considered to be 40,000 words. A minimum.

It may or may not be the case that you and your tulpa has written a novel or two. Me and Arisu certainly didn't, so we'll have to make do with what we have.

We needed a wall of text, so we simply completed a tulpa survey. We were talking to each other, having a dialogue over each question, i was writing it down for both of us. It's actually a fun way to spend an evening with you tulpa, would recommend. I will not be sharing the survey.

Later we removed the questions and separated our lines:

Alex (host) : 3,153 words / 18,223 characters (split into 6 sections of ~500 words, total 7 files)
Arisu (tulpa) : 3,259 words / 19,151 characters (split into 6 sections of ~500 words, total 7 files)

Additionally, i added the following for control:

Excerpts from my dev diaries : 19 files of 300-2000 words each
Excerpts from Wikipedia : 8 files of 400-2000 words each
Output from ChatGPT : 10 files of 300-1000 words each

Now, these are weak numbers. This much should be barely enough. Anything shorter than 300 words will be pretty much useless, too noisy to pick out patterns.

I also should mention that we didn't do that in a single session - instead, we were working on the survey on and off across multiple days.

At its core, stylo counts "features" of a text - words or characters - then computes the distances and plots stuff.

For words, counting 1-grams (individual words) is the most sensible option - essentially it analyses author's word preferences. 2-grams are much more sparse and will make no sense for a short text at all. 3-grams and further will produce pure nonsense.

For characters, 1-grams and 2-grams will make little stylometric sense. 3-grams and 4-grams are much more sensible as they start to actually capture author's word choice habits. They're also supposed to capture punctuation, but it seems like stylo ignores all or it.

Culling is supposed to reduce text's uniqueness by only counting words that appear across all texts. Texts on unrelated topics may contain lots of unique words, sort of unintentionally increasing the distances. Then again, that might be due to genuine word choices of an author. I kept culling at 10-20%, that's quite a low value, but we couldn't afford going higher.

Alright, here's what i expected to see:

Spoiler

My texts (host surveys and dev diaries) would cluster together and overlap
Arisu' texts (tulpa surveys) would cluster together not too far from my clusters
Split survey sections should get clustered together very early on, since they're excerpts from the same author
Wikipedia should form a tight cluster due to strict stylistic guides there
ChatGPT would form a sparser but very distinct cluster, since it's less restricted stylistically, but AI style is unmistakeable
Wikipedia and ChatGPT clusters could possibly overlap with each other (since AI definitely was trained on Wikipedia)
Both Wikipedia and ChatGPT clusters wouldn't overlap with out clusters (i mean, we're not clankers, innit?)

And here's what i actually got:

Spoiler

Cluster Analysis:

Computed a distance matrix and clusters texts, first the closest ones, up until everything is in one big cluster. Then it plots the hierarchy of clustering.

Word 1-grams (most frequent words):

Char 4-grams:

As you can see, it clustered everything almost exactly as i expected.
Wikipedia and ChatGPT do overlap a bit, and they've clustered with me and Arusu very late.
Rather surprisingly, my dev diaries and my survey responses don't overlap - they're in the same cluster, but further that Arisu, somehow?

Multidimensional Scaling:

Computes the same distance matrix and plots a 2D scatterplot of those distances. You can see at a glance how close or far the texts are, not just the hierarchy.

Word 1-grams (most frequent words):

Char 4-grams:

Out surveys overlap somewhat, but clusters maintain a significant distance.
Again, the "dev diary" cluster is very distinct, but it is somewhat close to my "host survey" cluster? Nope, it's not close at all...

Principal Component Analysis:

It reduces the very high-dimensional data (in my case the 500-dimensional feature-frequency space). I'm rusty on my linear algebra, but it essentially slaps a new coordinate system along the directions of highest variance - god knows that those directions are, we're not supposed to anyways. Plots look very similar to MDS, and similarly allows to see text distances at a glance.

Word 1-grams (most frequent words) covariance and correlation:

Char 4-grams covariance and correlation:

Covariance PCA does cluster my dev diaries and my survey together, while also putting Arisu's surveys really close. But at least we're all in one supercluster.

Both Wiki and Slop clusters are unmistakeably separate from us, as expected.

Consensus Tree:

Arguably the most important part. It essentially runs cluster analysis multiple times with varying parameters (specifically feature set size and culling).

This is as close as we get to a verdict on style and authorship.

Word 1-grams (most frequent words):

Char 4-grams:

As a bonus, here's data from oppose() method: It actually displays what words one set of texts uses more or less often than the other set of texts.

Spoiler

My word choice against Arisu' word choice:

Out combined word choice against Wikipedia + ChatGPT:

The results are... inconclusive.

Arisu is distinct from me stylistically. But so am i from literally myself. Because i wrote those dev diaries as myself, as a normal Alex, not as alter-ego or someshit.

At the very least it proves i can write in three different styles. And, in a small way, it's a proof that Arisu is real not only to me subjectively. Maybe not enough convince a sceptic, but it made my Arisu smile.

And also we're not clankers, hell yeah.

So, thoughts?

Desu · April 6

(the guide will be here shortly, stand by)

Sign In

Stylometric analysis of a tulpa (example analysis of my system + guide on how to perform it on yours eventually)

Recommended Posts

Desu

Desu

Join the conversation

Recently Browsing 0 members

Browse

Activity

My Activity Streams