Oracle bloggers are storytellers, Microsoft bloggers are technocrats (II)
Edit #1: As Justin Kestelyn points out, Orablogs.com is not Oracle’s official blog hub (blogs.oracle.com is).
Edit #2: Sadly some of the charts in this post are still missing due to problems with a recent Wordpress update. If I find the time I will write a follow-up on this with new charts.
Welcome to part two of this class: Blog Stylistics 101. Last week we looked at some statistics and word lists comparing the OraBlogs and MSDN blog hubs. Today, let’s turn to the specific differences between the two hubs. I’ll start by giving you the updated word list, since the one use in the previous entry is already a tad stale by now.
| OraBlogs1 the DT 88742 to TO 4815
3 a DT 4098 4 and CC 3528 5 I PP 3339 6 of IN 3212 7 in IN 2618 8 is VBZ 2172 9 It PP 2002 10 For IN 1837 11 you PP 1767 12 on IN 1563 13 this DT 1469 14 with IN 1106 15 Oracle NP 1080 16 that IN 1074 17 be VB 939 18 was VBD 932 19 at IN 823 20 my PP$ 803 21 are VBP 757 22 an DT 748 23 as IN 736 24 from IN 700 25 but CC 699 |
MSDN1 the DT 217912 to TO 11819
3 a DT 8811 4 and CC 8626 5 of IN 7701 6 in IN 6186 7 is VBZ 5687 8 I PP 4614 9 For IN 4610 10 you PP 4454 11 It PP 3864 12 this DT 3689 13 on IN 3317 14 with IN 2506 15 that IN 2411 16 are VBP 2394 17 be VB 2368 18 we PP 2334 19 as IN 1964 20 If IN 1926 21 can MD 1921 22 that WDT 1778 23 will MD 1682 24 from IN 1636 25 an DT 1563 |
First I’ve highlighted the pronouns I, WE, IT, YOU and the possessive determiner MY. The OraBloggers are a bit more egocentric (I #5) than the Microbloggers (I #9, WE #18), who appear to mention the team more frequently (Borg Collective, anyone?). Now, before you run amok with those numbers, there are of course a lot of possible factors and caveats there. You can avoid I and WE pronouns by using IT, or THERE-constructions, or by simply repeating the referenced noun phrase (maybe that’s the case with ORACLE at #15 – hard to say). Of course WE can refer to something other than the company; it can simply be an indicator that people tend to hang out in groups at Microsoft while Oracle devs are more solitary. WE can mean the royal company we as in we at Microsoft love our customers, or it can be just be any group of people the speaker includes himself in, as in Bob and I, we had sandwiches for lunch. Technically, the second scenario is actually more likely, but common sense tells us that occurrences of this “general WE†shouldn’t be more frequent in Microsoft’s blogs than they are in Oracle’s unless there is some difference in either their behavior or in the report thereof. But even when taking this and a number of other things into account, the difference seems at least worthy of closer investigation, especially the variation in first person personal pronouns, which is pretty clearly marked. The frequency of I is determined both by the author’s stylistic preference and by the subject matter. Generally, personal involvement of the author makes it very hard to omit the use of I (as, for example, referring to yourself in the third person is not really an viable strategy in English) but there are exceptions. For example, it it relatively impossible to report what you did last summer without using I, but it is quite possible to report how you conducted a scientific experiment with little or no use of that pronoun. In most media reportages there is no explicit voice that is linguistically detectable, even if the reporting journalist is clearly the individual who has experienced the events. Likewise, use or omission of I makes a big difference when expressing opinion or criticism. A presidential address typically contains no first-person reference to the speaker because the president is not offering his private opinion, but acting in his official function. Assuming, however, that none of this is really typical of blogs (which prefer to be quite involved, with lots of I-usage) the higher I-count in OraBlogs really signals more personal involvement compared with Microsoft. Or, you can interpret it as egocentrism at Oracle vs. team-orientedness at Microsoft. Tricky, isn’t it?
Next I’ve marked past tense BE (#18 in OraBlogs) and the modals CAN (#21 in MSDN) and WILL (#23 in MSDN). It is notable that the modals rank higher-than-average in MSDN but lower in OraBlogs (corpus averages are #33, #32). In other words, there is more past tense usage in Oracle’s blogs than in the corpus mean. Since that includes personal blogs and other types which tend to have a knack for storytelling, the tendency is actually a relatively strong one. MSDN, by contrast, is more about future events and possibility than storytelling.
So far so good – let’s look at word classes.
OraBlogs(left) , MSDN (right)


This chart probably needs a little explanation. Start with the leftmost column, where the first line starts with “CCâ€. That stands for “part of speech†and is used to label word classes such as noun, verb, adjective etc. The second column has the absolute frequency of that part of speech. So the adjective (JJ) count for OraBlogs is 10,428. That in turn means that adjectives make up 5.3% of all words in Oracle’s blogs. The graph in the column right of the percentage visualizes this accordingly, which is why it’s so long for the NN type. NN stands for common noun (things like man, dog, or cable connector all belong to this category) which is usually significantly represented as a class.
So where are the differences? One notable thing is the higher IN-frequency in OraBlogs (9.1%) compared with MSDN (8.1%). The IN tag is used for both prepositions (e.g. behind, on) and subordinating conjunctions (e.g. whether, despite), which makes it rather difficult to say what exactly is more frequent here. However, the higher IN-frequency in OraBlogs makes sense in context with the greater average sentence length – longer sentences demand either coordination (measured with the CC tag) or subordination. The other interesting thing is the frequency of NN (common nouns) and NP (proper nouns) because that’s where Microsoft’s bloggers score very high, much higher than Oracle who is actually below the corpus average. So what are all those nouns needed for? My assumption is they’re mostly for talking about inanimate subjects – stuff – because that would fit with the comparatively low pronoun (PP) count. The table is actually incomplete; the figures for verbs (which would appear further down the list, after TO) are missing but there isn’t a lot of observable variation there - except for a higher past-tense usage on the part of the Oracles.
Okay, enough to digest for one sitting. I’ll put the grand conclusion into the third part of this series. And yes, I’ll try to post that in less than a week from now.