Last quarter, I had a thread presenting some analysis of the
writing style and grade level for top posters. This quarter, I decided to focus on something different - to see how posters interact with each other through the use of quotes. I'm talking about using the Quote button to respond to all or part of other poster's post. I've been interested for a long time in how online communities operate, the internal politics and conventions, conflict resolution, etc. Similar to last quarter, I thought this would be a quick little exercise to do some automated and objective analysis on a collection of raw post data. I just wanted to know who the biggest "quote buddies" are and if they match up with what we think of as the most intense feuds. I also felt like I was spending way too much time engaging with SilverMan and wanted a way to quantify that. As usual, once I had the data it sort of spiraled out from there to a bunch of other related questions.
Since I wasn't going to go through each post individually, I had to focus on things that can be determined automatically, specifically who's quoting who and how much. What I didn't measure is the context of the quote and whether it was positive (i.e., supportive), negative, or neutral. While there are obviously lots of quotes done with a supportive purpose (even just "this" or "+1"), my gut tells me that a large majority of quotes are done in a negative way, i.e., for the purpose of disagreement or insulting the poster.
But regardless of context, we might think of quoting as a sort of currency - a proxy for the amount of attention each poster gives and receives. What might it mean if somebody gets quoted a lot? Are they an impressive original thinker, or just an imbecile or an effective troll? What conclusions can be drawn from an analysis of the economy of quoting?
Methodological Note: Basically all posts in both Politics and Unchained for Q2 were collected and analyzed for instances of "Originally Posted by". Most posts that have a quote have just one, but sometimes you want to break apart somebody's post and respond to pieces of it individually. There are two basic approaches to this.
This post quotes the original once, then breaks it apart manually and splices in the response content. In contrast,
this post quotes the original numerous times, and pares each quote down to the desired content. (Some people deliberately repeat a quote for effect - extreme example
here.) In any of these cases, it should count as one quote in the "quoting economy" and for this analysis that's how they're counted. The quoter gets credit for one outgoing quote and the quoted poster gets credit for one incoming quote.
1. General Post Metrics
For the three-month period 4/1/2014 - 6/30/2014, I collected 41,990 posts from 632 different posters. It's a very top-heavy distribution, with the top 10 posters having 30% of the posts on one end, and more than half the posters having 6 or fewer posts on the other end. The top 100 posters (minimum 78 posts) account for more than 85% of all posts. The analysis below will focus only on these top 100 posters. In terms of sheer number of posts, ikestoys is untouchable with an average of almost 30 posts/day. The top 25:
2. General Quote Metrics
There were a total of 23994 quotes representing 473 different quoters and 592 quote sources. While the vast majority of quote sources are other posters, it's possible to manually create a quote from an arbitrary source. Examples of engineered quote sources include:
Bill of Rights
Urban Dictionary
President of the NRA
Ernest Hemingway
me, hundreds of posts ago
Dana "Atheists should leave the country" Perino
And my personal favorite:
ME, IN THE POST YOU ****ING QUOTED
3. Individual Quote Metrics
In terms of absolute numbers, it's unsurprising and uninteresting that ikestoys did the most quoting and was quoted the most. More illuminating will be to look at ratios that factor in the posting volume. First, looking at who's quoting the most and least relative to their posting frequency. I call this metric "reactivity" since lots of quoting means in general you're reacting to other posts more than you're posting your own unsolicited thoughts. Of course different posters have different styles, and some people often just start replying to something without bothering to quote (especially for "lol" drive-by posts). I knew I was a pretty reactive poster. For the most part I know I don't have enough time to get into long back-and-forth discussions so I mostly just lurk until I see something so stupid or wrong I can't control myself (Note: for all the individual metrics, I'll chart the top and bottom quarter of the top 100 posters).
Next, how often you get quoted might be referred to as being "provocative" since you're inspiring others to respond. Again, there are different kinds of provocation - both brilliance and idiocy can be provocative, so having a high provocation score might not be a good thing.
I thought it might be interesting to combine the above metrics - in other words, who's generating a lot of quotes without doing a lot of quoting themselves. In terms of the quoting economy, these people are getting a lot of bang for their buck, so I'll refer to this metric as "quote efficiency". A high efficiency combines high provocativeness with low reactivity. For example, Riverman made only 42 quotes in his 378 posts, but was quoted 229 times for a score of 229/42 = 5.45. This is much higher than anybody else.
One reason for a low efficiency score might be related to post-count. Low post-count posters might be more likely to be ignored. This, at least, is how I rationalize my own low score in this area.
Finally, how promiscuous are you with your quoting? Some posters are focussed and discriminating with their quotes; others are more willing to respond to a wider variety of posters. This would range from a theoretical maximum of 1 if you never quote the same person twice to something approaching 0 if all your quotes are from the same poster. Top rank here goes to kioshk, who had 16 quotes of 15 different sources. At the other end is Proph with 446 quotes of only 33 different sources.
4. Interactive Quote Metrics
In this section, we'll look at individual pairings. In raw terms, the top pairings are pretty unsurprising.
When factoring in the posting volume, it gets a little more interesting. This is effectively the pair's total posting footprint that's devoted to interacting with each other. My suspicion that I was wasting too much of my life engaging with SilverMan was confirmed. There's another big outlier here, with kerowo and Proph spending nearly 30% of their collective activity interacting with each other. Obsessive co-dependency? Elaborate gimmick? Who knows?
Finally, I wondered which pairings are the most lop-sided? This might be called "unrequited quoting" or maybe "quote stalking". I used an arbitrary floor of 10 combined quotes for this in order to filter out the uninteresting "1 to 0" relationships. Tied for top honors here was John Bland quoting Fly 12 times and not getting quoted back even once. This seems pretty surprising since clearly they went head-to-head in the "**** People" thread in Unchained. Fly responded directly to Bland numerous times, he just didn't use quotes to do so. Actually he did one quote (#12) but it was engineered to exclude the "Originally Posted by John Bland" part so it was not officially counted.
5. Conclusions
Like the last thread, there are obviously severe limitations and caveats to this type of data analysis for a huge number of reasons. At the same time, it's not completely meaningless either. In my own case, it's caused some self-reflection about both my high reactivity score as well as the amount of time wasted on what I know are unproductive interactions. Beyond that, I hope others find the data useful or at least interesting.