Jump to content

Public post dump?


jean-luc

Recommended Posts

I'll dump the database later tonight. It'll give better results, be quicker, and generate less load than a crawler. I just can't do it very well from my phone.

Lyra: human female, ~17

Evan: boy, ~14, was an Eevee

Anera: anime-style girl, ~12; Lyra made her

My blog :: Time expectations are bad (forcing time targets are good though)

Link to comment
Share on other sites

Did it. I took a database snapshot at 27 Nov 2015 at 9:50 PM EST, dumped and word counted all public-viewable areas, meaning the following boards:

  • Articles
  • Forum Announcements
  • Forum Games
  • Forum Questions & Comments
  • GAT Discussion
  • General Discussion
  • Guides
  • Lounge
  • Metaphysics and Parapsychology
  • New Users
  • Off-topic (archived)
  • Progress Report
  • Questions and Answers
  • Research
  • Resources
  • Submissions
  • Tips & Tricks
  • Tulpa Art

 

Here are the results:

Word counts

Top 50 tulpa related words, selected by hand

Edit: And then I ran the first file here through an insanely slow Bash script that removes lines for all words in /usr/share/dict/words

 

(Note: Pieces of URLs that aren't in tags such as "http", "community", "tulpa", "info", etc. are included in the word counts. Also note the above files have DOS line endings for compatibility.)

 

If you want to process it differently yourself, PM and I'll zip up and send you the raw output of SELECT message FROM posts WHERE fid=$FORUMID.

 

For processing, I combined all messages from all public-viewable forums in one file, used a bunch of tr, sed, etc. to strip out MyCode, change \n to a real newline, remove ^M (DOS newline, present in some posts and not others), convert 'smart' apostrophes in some posts into normal ones, replace any character other than alphanumeric and apostrophe with space, collapse subsequent spaces, and change space to newline, giving one word per line. I then fed it through this Python program and then sort -rn to make a word list:

 

#!/usr/bin/env python
import fileinput

count = {}
spelling = {}
for word in fileinput.input():
   word = word.rstrip()    #this removes a trailing newline
   key = word.lower()
   try:
       count[key] += 1
   except KeyError:
       spelling[key] = word    #save "first capitalization seen"
       count[key] = 1      #this is the first instance of this word

print "Count    word    first capitalization seen"
for key in spelling:
   print "%d   %s      %s" % (count[key], key, spelling[key])

Lyra: human female, ~17

Evan: boy, ~14, was an Eevee

Anera: anime-style girl, ~12; Lyra made her

My blog :: Time expectations are bad (forcing time targets are good though)

Link to comment
Share on other sites

Lumi:

dVJxfhn.png

tfw

 

Melian ------> Yumi ----> Lyra --> Oguigi -> Reisen

 

Tulpa mentions on the filtered list... I'm proud, I think?

Hi guys, plain text is just me now! We've each got our own accounts: me, Tewi, Flandre, and Lucilyn. We're Luminesce's tulpas.

Here's our "Ask Thread", and here's our Progress Report (You should be able to see all of our accounts on the second page if you want)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...