Stanford Tulpa Study looking for more participants
(if you're chosen they'll pay for travel and lodging!)

Public post dump?
#11
I'll dump the database later tonight. It'll give better results, be quicker, and generate less load than a crawler. I just can't do it very well from my phone.
Lyra: human female, ~17
Evan: boy, ~14, was an Eevee
Anera: anime-style girl, ~12; Lyra made her
My blog :: Time expectations are bad (forcing time targets are good though)
Reply

Sponsors:
Lolflash - click it, you know you want to

#12
Oh, why thank you! I will avoid building the aforementioned crawler.
Reply
#13
Did it. I took a database snapshot at 27 Nov 2015 at 9:50 PM EST, dumped and word counted all public-viewable areas, meaning the following boards:
  • Articles
  • Forum Announcements
  • Forum Games
  • Forum Questions & Comments
  • GAT Discussion
  • General Discussion
  • Guides
  • Lounge
  • Metaphysics and Parapsychology
  • New Users
  • Off-topic (archived)
  • Progress Report
  • Questions and Answers
  • Research
  • Resources
  • Submissions
  • Tips & Tricks
  • Tulpa Art

Here are the results:
Word counts
Top 50 tulpa related words, selected by hand
Edit: And then I ran the first file here through an insanely slow Bash script that removes lines for all words in /usr/share/dict/words

(Note: Pieces of URLs that aren't in [url=...] tags such as "http", "community", "tulpa", "info", etc. are included in the word counts. Also note the above files have DOS line endings for compatibility.)

If you want to process it differently yourself, PM and I'll zip up and send you the raw output of SELECT message FROM posts WHERE fid=$FORUMID.

For processing, I combined all messages from all public-viewable forums in one file, used a bunch of tr, sed, etc. to strip out MyCode, change \n to a real newline, remove ^M (DOS newline, present in some posts and not others), convert 'smart' apostrophes in some posts into normal ones, replace any character other than alphanumeric and apostrophe with space, collapse subsequent spaces, and change space to newline, giving one word per line. I then fed it through this Python program and then sort -rn to make a word list:

Code:
#!/usr/bin/env python
import fileinput

count = {}
spelling = {}
for word in fileinput.input():
    word = word.rstrip()    #this removes a trailing newline
    key = word.lower()
    try:
        count[key] += 1
    except KeyError:
        spelling[key] = word    #save "first capitalization seen"
        count[key] = 1      #this is the first instance of this word

print "Count    word    first capitalization seen"
for key in spelling:
    print "%d   %s      %s" % (count[key], key, spelling[key])
Reply
#14
Lumi:
[Image: dVJxfhn.png]
tfw

Melian ------> Yumi ----> Lyra --> Oguigi -> Reisen

Tulpa mentions on the filtered list... I'm proud, I think?
Hi guys, plain text is just me now! We've each got our own accounts: me, Tewi, Flandre, and Lucilyn. We're Luminesce's tulpas.
Here's our "Ask Thread", and here's our Progress Report (You should be able to see all of our accounts on the second page if you want)
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)

Sponsors:
Lolflash - click it, you know you want to