Chupi November 27, 2015 Share November 27, 2015 I'll dump the database later tonight. It'll give better results, be quicker, and generate less load than a crawler. I just can't do it very well from my phone. Lyra: human female, ~17 Evan: boy, ~14, was an Eevee Anera: anime-style girl, ~12; Lyra made her My blog :: Time expectations are bad (forcing time targets are good though) Link to comment Share on other sites More sharing options...
jean-luc November 28, 2015 Author Share November 28, 2015 Oh, why thank you! I will avoid building the aforementioned crawler. I don't visit as often as I used to. If you want me to see something, make sure to quote a post of mine or ping me @jean-luc Link to comment Share on other sites More sharing options...
Chupi November 28, 2015 Share November 28, 2015 Did it. I took a database snapshot at 27 Nov 2015 at 9:50 PM EST, dumped and word counted all public-viewable areas, meaning the following boards: ArticlesForum AnnouncementsForum GamesForum Questions & CommentsGAT DiscussionGeneral DiscussionGuidesLoungeMetaphysics and ParapsychologyNew UsersOff-topic (archived)Progress ReportQuestions and AnswersResearchResourcesSubmissionsTips & TricksTulpa Art Here are the results: Word counts Top 50 tulpa related words, selected by hand Edit: And then I ran the first file here through an insanely slow Bash script that removes lines for all words in /usr/share/dict/words (Note: Pieces of URLs that aren't in tags such as "http", "community", "tulpa", "info", etc. are included in the word counts. Also note the above files have DOS line endings for compatibility.) If you want to process it differently yourself, PM and I'll zip up and send you the raw output of SELECT message FROM posts WHERE fid=$FORUMID. For processing, I combined all messages from all public-viewable forums in one file, used a bunch of tr, sed, etc. to strip out MyCode, change \n to a real newline, remove ^M (DOS newline, present in some posts and not others), convert 'smart' apostrophes in some posts into normal ones, replace any character other than alphanumeric and apostrophe with space, collapse subsequent spaces, and change space to newline, giving one word per line. I then fed it through this Python program and then sort -rn to make a word list: #!/usr/bin/env python import fileinput count = {} spelling = {} for word in fileinput.input(): word = word.rstrip() #this removes a trailing newline key = word.lower() try: count[key] += 1 except KeyError: spelling[key] = word #save "first capitalization seen" count[key] = 1 #this is the first instance of this word print "Count word first capitalization seen" for key in spelling: print "%d %s %s" % (count[key], key, spelling[key]) Lyra: human female, ~17 Evan: boy, ~14, was an Eevee Anera: anime-style girl, ~12; Lyra made her My blog :: Time expectations are bad (forcing time targets are good though) Link to comment Share on other sites More sharing options...
Reisen November 28, 2015 Share November 28, 2015 Lumi: tfw Melian ------> Yumi ----> Lyra --> Oguigi -> Reisen Tulpa mentions on the filtered list... I'm proud, I think? Hi guys, plain text is just me now! We've each got our own accounts: me, Tewi, Flandre, and Lucilyn. We're Luminesce's tulpas. Here's our "Ask Thread", and here's our Progress Report (You should be able to see all of our accounts on the second page if you want) Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.