IP 97.93.98.254 - we blocked your bot

By: Maynard Handley (name99.delete@this.name99.org), December 18, 2018 12:11 pm
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on December 18, 2018 11:24 am wrote:
> Hi 97.93.98.254,
>
> Your bot are hitting our forums really hard in an automated fashion. That is putting an unreasonable
> load on our servers. We have blocked your bot, as we would rather not block you personally.
>
> If you are doing something legitimate, please contact me directly and we'd be happy to work something out.
>
> Thanks,
>
> David

I already explained this to David, but I'll explain to everyone else:
It may look like I (or at least my IP address) was abusing the site! That was not my intention. I was actually playing with a fun project.
I got curious about whether we can answer the question "is RWT biased against Apple?" with actual data, not just impressions. So I wrote some Mathematica that essentially scrapes the site (grabs every post from about 2009 till today), runs sentiment analysis on each post, classifies posts as apparently being about Apple or Intel, and then displays the results over time.

Obviously I started small, to catch every bug I could (this sort of text processing code is always maddening when you think you have the text patterns correct then at some point something about the site changes, or you hit an anomaly you didn't expect). It's also interesting to note that something (I guess it's the site, but it could be Mathematica or my OS) is limiting how many connections I get per second, so that I can only pull in about ten posts per second, which means the whole site (at least from 2009 till now, basically postIDs 100,000 till now which is about 183,000) was expected to take about three hours.

Obviously this is not a perfect methodology. Sentiment analysis is not an exact science, especially when targeting anything beyond tweet-sized text, and you can quibble with various technical choices I made. But it's fun!

The graphs I obtained with just small samples (initially 1000, then 10000) posts were interesting.

Some obvious patterns are:
- RWT as a whole is consistently negative. We complain and complain, and very rarely praise!
- Apple sentiment in about the past three months (10000 most recent samples) is generally below Intel sentiment. I didn't get to running longer term means yet, just drawing graphs, but eyeballing it, Apple average sentiment is at maybe -.6, Intel at maybe -.5, overall RWT at about -.45.

- Apple sentiment is much more volatile that Intel or RWT as a whole, which months that go very low (-.8, -.9), ie for that stretch of thirty days, 80% or 90% of RWT posts that mentioned Apple had negative sentiment; but there are also months where Apple negative sentiment rises to only about -.3, and there are even months with occasional substantial spikes in positive Apple sentiment, something that never happens for Intel or RWT (where the positive sentiments are generally limited to about 5% or less of posts).

The initial data I grabbed at the very start of the run (so around 2009) showed the same sort of thing but more volatility in sentiment for everyone, Apple in particular showing patterns of a week or so of extreme negativity, followed by a week of sort of normal (50% negativity or so) then another extremely negative week.

OK, so that's background.
Last night I figured everything looked correct, seemed to be working, so I pulled the trigger and tried to pull in the entire site, running overnight. Sadly when I woke up, I found that something about posts in the middle of the data stream made my code unhappy. So I started running an aggressively debug version of the code (much slower, but supposed to stop when it hits an error, giving me full details, so I can look at the offending posts manually and fortify my text processing appropriately).

Unfortunately it seems that my code (even though the bandwidth is not that extreme, maybe 150K/sec, 10 hits/sec) is slowing down the site noticeably, so I will have to terminate the experiment.

My apologies to everyone. I really did not think that a non-optimized (ie not firing out requests very fast) spider running on a home-grade internet connection could cause enough work on a real website to be disruptive.

But this was interesting. I think it shows that one can (to some extent, like I said, if you want to complain about the current state of sentiment analysis, I won't disagree with you) answer questions like this. It also shows, once again, that Mathematica is just the most amazing thing ever created! In less than a day, with no prior expertise in any of these fields, I could slap together something that crawled a web site, processed the HTML to extract relevant text, applied sentiment analysis to it, and created appropriate time series.
(You think I am bad in my rants about Apple, you ain't seeing nothing compared to my praise of Mathematica!)

You can see something of what I was doing here (I think the data is correct, not going to bother to explain the graphs because they were temporary, hence not nicely annotated or anything else, but you get the idea).

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
IP 97.93.98.254 - we blocked your botDavid Kanter2018/12/18 11:24 AM
  IP 97.93.98.254 - we blocked your botMaynard Handley2018/12/18 12:11 PM
    IP 97.93.98.254 - we blocked your botMaynard Handley2018/12/18 12:12 PM
      IP 97.93.98.254 - we blocked your botanon2018/12/18 12:18 PM
        IP 97.93.98.254 - we blocked your botMaynard Handley2018/12/18 12:35 PM
    IP 97.93.98.254 - we blocked your botMichael S2018/12/18 12:43 PM
      IP 97.93.98.254 - we blocked your botMaynard Handley2018/12/18 01:16 PM
        IP 97.93.98.254 - we blocked your bot-.-2018/12/18 03:16 PM
          IP 97.93.98.254 - we blocked your botFoo_2018/12/19 03:00 AM
    Algorithm questionFoo_2018/12/18 02:29 PM
      Algorithm questionMaynard Handley2018/12/18 04:43 PM
        Algorithm questionFoo_2018/12/19 02:58 AM
    IP 97.93.98.254 - we blocked your botGeoff Langdale2018/12/18 04:54 PM
      IP 97.93.98.254 - we blocked your botMaynard Handley2018/12/18 05:33 PM
        IP 97.93.98.254 - we blocked your botDomaldel2018/12/18 05:40 PM
    IP 97.93.98.254 - we blocked your botanon.x2018/12/18 05:45 PM
    negative vs. praiseOok2018/12/19 12:59 AM
      Incredible amazing awesomevvid2018/12/19 08:07 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?