Presentation at the DGfS last week

Last week I presented part of my work at the annual meeting of the Deutsche Gesellschaft fuer Sprachwissenschaft (German Linguistic Society). I’ve decided to make the presentation available on Slideshare, although it might be a little opaque without any accompanying narration and some of the terms used may not be familiar to non-linguists.

Let me know if you have any questions…

A great conference that I’ll probably miss (ICWSM, Boulder)

In case you’re interested in what researchers from a number of academic disciplines are doing with blogs (things such as social network analysis, how we use language in blogs etc): check out the International Conference on Weblogs and Social Media (ICWSM) that will take place in Boulder, Colorado, March 26 to 28.

It promises to be a very exciting conference, especially since it has both a strong academic line-up and a number of very interesting speakers from major media and tech companies (Microsoft, Google, Yahoo! and Nielsen, to name just a few). Scott Nowson is giving a presentation, danah boyd is an invited speaker and I’m especially curious about a presentation entitled Building Trust with Corporate Blogs by Paul Dwyer.

Frankly, I was in shock when I first heard about the event only about a week ago. Blog researchers from all over the world will meet in Colorado to discuss their data, methods and to-date findings, as well as the implications of the social media for companies, governments and society as such. This is not some promotional fluff-fest either: the vast majority of presentations are concerned with empirical research - comparing notes with others who, like me, are investigating blogs would be immensely valuable for my thesis project.

So why am I (probably) not going? Quite simply - lack of funds. I have been invited to present at four conferences this year: the DGfS meeting in Germany next week, the IPra in Sweden, the WebGenres Colloquium in Britain and finally the SIGET in Brazil, and though I have some support from the university I am covering most of the costs myself. The schedule may look like I have enough on my hands already, but the input from a good conference is simply invaluable, especially when you’re dealing with something largely unexplored such as the social media. ICWSM is unique in that it is specifically about blogs and thus it’s something I really can’t miss.

I must have buried my head too deep in the sand lately (the sand being my f-score stuff), because I somehow managed to overlook the whole event up to now. If I hadn’t I certainly would have submitted something there.

Now, in the unlikely case that you happen to know a potential sponsor willing to support a highly motivated PhD student whose resources are sadly outpaced by his passion for research, please contact me. Of course I would gladly write a detailed report to outline how ICWSM relates to my research and what my preliminary results look like. A collaboration with a company involved in blogging would be interesting for me in other ways as well – for example, to compare internal with public-facing blogs and to assess the practical considerations of corporate bloggers.

Visualizing blog language data

I’ve been playing around with this great little tool for several days now and thought I’d share some of the results with you.

But first, here’s a brief recap of what I’ve been doing before I start throwing statistics at you.

I am in the process of building a textual database (or corpus, as linguists call it) of corporate and enterprise web logs. The purpose of this corpus is to investigate corporate blogs as a text type. In the current phase of my research, I am especially interested in the following questions

- how do corporate blogs compare stylistically with non-corporate blogs, news texts and other types?

- is there a typical ‘corporate blogging style’ in terms of how people write?

- are there recognizable differences in style that correspond with differences in purpose or authorship (in other words, do CEOs, marketers, software developers, etc have distinct styles?)

- how much variation is there stylistically between different blogs, different bloggers in the same hub (e.g. MSDN) and between different posts by the same blogger?

- are there patterns of change in style over time?

You might wonder what such a description is good for (well, apart from furthering the pursuit of knowledge and all that). I think that, on the practical level, it will enable us to better understand what people are trying to achieve with blogs and how they do it. Ultimately blogging is about good writing. The trouble is, neither is ‘good’ easily defined, nor is it always the same to everyone on any occasion. Blogging styles are highly dynamic and situation-dependent and I think the most successful bloggers very consciously adapt different styles to address different people and issues.

Right, so what do I have so far?

One of the first measures I’ve implemented into my database is a relatively simple formula for calculating how formal/informational or (on the other end of the scale) involved/context-dependent a text is. This is done by adding the frequencies of certain types of words together and subtracting others, under the assumption that (for example) nouns are more numerous in texts which are primarily informational, while a high frequency of pronouns indicates involvement. The formula looks like this:

0.5 * ((NOUNS + ADJECTIVES + PREPOSITIONS + DETERMINERS) - (PRONOUNS + VERBS + ADVERBS + INTERJECTIONS) + 100)

(see Heylighen and Dewaele 2002)

As you can guess, the results are potentially ambiguous - in other words, texts can have a very high or low score for a variety of reasons - and should be used with care. That being said, the measure produces some pretty interesting results.

This is a chart of f-scores from Robert Scoble’s blog




Each data point in the graph is the f-score for a single post, or the average for several posts made on a single day. As the graph shows, Scoble’s posts are fairly consistently in the 50s in August and September. They surge to over 100 in mid-October and make overall gains in November and December, though these gains aren’t really as significant as they might look at first. The more notable change is the high degree of variation in these months compared to the time span before that.

You might wonder which posts exactly get a high or low f-score. Here are the entries with the highest score, by date.

Comparing new TailRank/DiggTech/TechMeme to Google Reader, 16 October 2006 (f-score 102)

Grapes on a Plane, 29 October 2006 (f-score 97)

The highs and lows of CES, 15 January 2007 (f-score 93)

Photo “training”, 21 January 2007 (f-score 106)

If you have a look at those posts, you’ll probably notice that they aren’t really in any way more formal than Scoble’s other writing. The difference is that they tend to be more informational, i.e. have more and more condensed information crammed into to them than most entries. Lists and enumerations will immediately lead to a high score (because they usually translate into a high noun count) and for Scoble those entries which are written in a sort of telegraph style to convey information about a photowalk or CES thus have a high score. This doesn’t really demerit the f-score as a metric - it simply means that it’s context-sensitive. What’s important is that, with an overall mean score of 60, Scobelizer ranks on the extreme low end of the formal/informational vs involved/contextual scale. To Scoble, blogs really are conversations, not just metaphorically but in a quite literal stylistic way.

That’s the score for one source over time. Let’s compare a bunch of sources.




If you have trouble seeing anything on the chart, look for a little dropdown menu on the lower right hand side labeled dot size. Change it from ‘posts’ to ‘no selection’ and all the dots will be changed to have the same size, which should make the whole thing a lot easier to read.

The chart is a representation of scores for 137 different blogs, computed from data collected during the last five months. Each dot represents a single blog and its average f-score on the x axis. The position of a dot on the y axis indicates the standard deviation of values inside of that blog, i.e. the degree of internal variation

The vast majority of the sources I’ve used are corporate blogs - after all that’s what my research is about. But in addition I’ve also thrown in a few non-corporate sources, simply to be able to compare one type of blog with another one. Thus the list contains 17 personal blogs randomly found via blogger.com, 1 a-list professional blogger (Scoble), 1 political blog hub (huffingtonpost.com) and 3 non-blog sources, namely editorials from the New York Times, the Washington Post and the LA Times collected in the course of this week (see below for a full list of sources).

The first thing likely to catch you eyes are the outliers. On the far right hand side, there is one source simply tagged “Blog” (informative, I know) with a record f-score of 195 and and a standard deviation of 92. That’s Ray Ozzie, Chief Software Architect of Microsoft. Now, if you have a look at his blog you might find that the best description for his writing is not so much formal, but rather “technical” or maybe “information-oriented”. The reasons for the high scores are the many compound nouns (things like development ecosystem, application components, clipboard data formats, etc) coupled with the overall significant length of entries. Like the other outlier, Irving Wladawsky-Berger of IBM, Ozzie also produces very long posts. Ozzie’s longest has 1,700 words, while Wladawsky-Berger is a close second with 1,500. Length tends to coincide with somewhat higher f-scores, however, there are counter-examples. Heather Hamilton has one post with a whopping word count of over 2,000 and an f-score of only 105. Generally brief posts tend to coincide with lower scores, but, as the example shows, there are exceptions.

Overall it is important to consider a few things, especially in regards to the those sources with a high standard deviation and a high f-score:

- the deviation is often high simply because there aren’t many posts (for example, Ozzie only has 6 entries)

- several of the high-deviation blogs are hubs, i.e. they aggregate a number of individual blogs (e.g. MSDN and HuffPo)

But the cool part is that the remaining sources usually contain very conscious stylistic variation (Jonathan Schwarz is a prime example). I other words, they write differently to address different people and achieve different things and this - at least to some extent - stylistically visible. Compare that with the scores for the three newspaper editorials grouped together in the lower right area of the plot. They are surprisingly consistent if you consider that we’re looking at texts published in three different papers, written by an even larger number of journalists. Which just shows that the editorial is a pretty solidified type of text in terms of style, while the (corporate) blog isn’t - at least not yet.

Anyway, I’ll wrap it up for now and save the more in-depth look for another post.

Sources

iUpload InSights
http://hopper.iupload.com/default.asp

Time Leadership
http://www.jimestill.com/

I Love Me, vol. I
http://www.michaelocc.com/

Simply Albert
http://simplyalbert.blogspot.com/

ChristianLindholm.com
http://www.christianlindholm.com/christianlindholm/

PR Thoughts
http://www.prthoughts.com/

Occam’s Razor
http://mgoldberg.typepad.com/occams_razor/

Loic Le Meur Blog
http://www.loiclemeur.com/

CTO Blog
http://www.capgemini.com/ctoblog/

Lakattack
http://spreadlog.net/

Marcel Reichart Blog
http://marcellomedia.blogs.com/mrb/

stefan
http://stefan.21publish.com/

Amazon Web Services Blog
http://aws.typepad.com/

Cisco High Tech Policy Blog
http://blogs.cisco.com/gov/

Digital Straight Talk
http://www.digitalstraighttalk.com/

Direct2Dell, Dell’s Weblog
http://www.direct2dell.com/default.aspx

eBay Developers Program
http://ebaydeveloper.typepad.com/

EDS’ Next Big Thing Blog
http://www.eds.com/sites/cs/blogs/eds_next_big_thing_blog/default.aspx

From Edison’s Desk - GE Global Research Blog
http://www.grcblog.com/

Real Baking with Rose Levy Beranbaum
http://www.realbakingwithrose.com/

GM Fastlane Blog
http://fastlane.gmblogs.com/

Google Blog
http://googleblog.blogspot.com/

Dan Socci’s Blog
http://h20325.www2.hp.com/blogs/socci

Kara R
http://www.honeywellblogs.com/kara_r/

ING Asia/Pacific’s Blog
http://mycupofcha.ingblogs.com/

TinyScreenfuls.com
http://www.tinyscreenfuls.com/

Open for Discussion
http://csr.blogs.mcdonalds.com/default.asp

One Louder
http://blogs.msdn.com/heatherleigh/

NIKEBASKETBALL
http://blog.nikebasketball.com/

OraBlogs
http://www.orablogs.com/orablogs/

Things That Make You Go Wireless
http://businessblog.sprint.com/1/1/

The Lobby from SPG
http://www.thelobby.com/

Jonathan Schwartz’s Weblog
http://blogs.sun.com/jonathan

Texas Instruments Video360 Blog
http://blogs.ti.com/

The Jason Calacanis Weblog
http://www.calacanis.com/

Boeing Blog: Randy’s Journal
http://www.boeing.com/randy/

Guided By History
http://blog.wellsfargo.com/guidedbyhistory/

PlayOn
http://blogs.parc.com/playon/

Yahoo! Search Blog
http://www.ysearchblog.com/

The CEO’s Blog - John Mackey
http://www.wholefoodsmarket.com/blogs/jm/

Blog
http://www.nixonmcinnes.co.uk/about-us/blog/

Kate’s Blog
http://katesblog.u3.com/

The Bocada Blog
http://bocada.typepad.com/bocadablog/

Michael M’s X10 Blog
http://www.x10community.com/michaelm/

Notes from MNR
http://blogs.adobe.com/notesfrommnr/

Entrepreneurial Marketing
http://blogs.accenture.nl/EntrepreneurialMarketing/

TiVo Blog
http://blog.tivo.com/tivo_blog/

Guiness Blog
http://www.guinnessblog.co.uk/blogs/home.aspx?App=guinnessblog&allowAccess=4r7a6h

Hu Yoshida’s Blog
http://blogs.hds.com/hu/

Forta Blog
http://www.forta.com/blog/

Novell Open PR
http://www.novell.com/prblogs/

Jeff Jaffe’s Blog
http://www.novell.com/ctoblog/

Blog
http://rayozzie.spaces.live.com/blog/

Mena’s Corner
http://www.sixapart.com/about/corner/

Alan Meckler
http://weblogs.jupitermedia.com/meckler/

Infrablog
http://blogs.verisign.com/infrablog/

Thompson Holidays Blog
http://thomsonholidays.blogs.com/my_weblog/

Baby Babble
http://stonyfield.typepad.com/babybabble/

The Bovine Bugle
http://stonyfield.typepad.com/bovine/

Stone Creek Coffee Blog
http://sccv3.stonecreekcoffee.com/blog.cfm

bugBlog
http://rescuebugblog.typepad.com/rescue_bugblog/

Speaking of Security
http://www.rsasecurity.com/blog/

Hybrid Talk
http://hybridtalk.nyse.com/

Jonathan Bruce’s WebLog
http://jonathanbruceconnects.com/jonathan_bruce/

The Tinbasher Sheet Metal Blog
http://www.butlersheetmetal.com/tinbasherblog/

The NCC Weblog
http://www.northfieldconstruction.net/

Signs Never Sleep
http://signsneversleep.typepad.com/

ACCAbuzz
http://www.accabuzz.com/

English Cut
http://www.englishcut.com/

Life at Wal-Mart
http://walmartfacts.com/lifeatwalmart/

Scobelizer
http://scobleizer.wordpress.com/

The DustBlog
http://thedustblog.blogspot.com/

The Baby Blawg
http://babyblawg.blogspot.com/

life’s short…make it sweet…
http://dunlin.blogspot.com/

xbsg
http://mi50.blogspot.com/

I am the evil master genius
http://arnique.blogspot.com/

i want you
http://nuratikahnabilah.blogspot.com/

44 Words for 365 People
http://44for365.blogspot.com/

neurotic kitten
http://nkitten.blogspot.com/index.html

Discover Norwegian Music
http://discovernorwegianmusic.blogspot.com/

my smiles arent a facade
http://badass-freak.blogspot.com/

�?ů�?ð£з �?�? Ŧ�?ǿůĝ�?ŧ�?
http://chibinyu.blog.com/

Flying Tragic
http://tragicflyer.blog.com/

The Irony of Life
http://mujerlatina319.blog.com/

cudgeland
http://cudge.blogspot.com/

Over the Horizon
http://blogs.zdnet.com/OverTheHorizon/

DaveBlog
http://blogs.netapp.com/dave/

Earthling
http://blogs.earthlink.net/

developerWorks blogs
http://www-03.ibm.com/developerworks/blogs/

Irving Wladawsky-Berger
http://irvingwb.typepad.com/

Forum Nokia Blogs
http://blogs.forum.nokia.com/author_group.html?id=2

Nokia N90 Blog
http://n90.bloggercomm.com/

Sparkle Like The Stars
http://www.sparklelikethestars.com/

FYI Blog
http://fyi.gmblogs.com/

Southwest Airlines Blog
http://www.blogsouthwest.com/

Benra Blog: ZoomAlbum, Photos & Photo Sharing
http://benra.typepad.com/

WeatherBug Corporate Blog
http://blog.weatherbug.com/

CTO Blog - TalkBMC
http://talk.bmc.com/blogs/blog-bishop/cto/

Commentary from Cape Clear’s CEO […]
http://www.capeclear.com/annrai/

QuickBooks Online Edition The Team Blog
http://quickbooks_online_blog.typepad.com/

The QuickBooks Team Blog
http://www.quickbooks.blogs.com/

The Mindjet Blog
http://blog.mindjet.com/

Warehousing and Distribution
http://thirdpartylogistics.blogspot.com/

The Official Salesforce Blog
http://blogs.salesforce.com/

Park City Mountain Resort
http://parkcity.typepad.com/park_city_mountain_resort/

SunbeltBLOG
http://sunbeltblog.blogspot.com/

TaylorMade Blogs
http://www.taylormadeblogs.com/

Scenic Nursery Gardening Blog
http://www.scenicnursery.com/

Lightning Labels Blog
http://lightninglabels.typepad.com/blog/

Wiggly Wigglers
http://wigglywigglers.blogspot.com/

EIE FLUD
http://www.eieflud.co.uk/blog/

Eriska, Scottish Islan
http://www.isleoferiska.com/

Outdoor Landscape Lighting
http://www.residential-landscape-lighting-design.com/blogger.html

Thoughts of Beauty
http://www.overallbeauty.com/beauty-blog/

Stormhoek Winery
http://www.stormhoek.com/

Chevron Collectible Toy Cars
http://chevroncarsblog.com/

MSDN Blogs
http://blogs.msdn.com/

Ruby is Coming
http://rubyiscoming.blogspot.com/

am I lonely
http://rongsheng.blogspot.com/

Pineywoods Opinings
http://longleaf.blogspot.com/

Tangent, Oregon
http://tangentcity.blogspot.com/

Verizon - PoliBlog
http://poliblog.verizon.com/PoliBlog/Blogs/poliblog.aspx

Ted’s Take
http://ted.aol.com/

The Student LoanDown
http://blog.wellsfargo.com/StudentLoanDown/

Emerson Process Experts
http://www.emersonprocessxperts.com/

A Thousand Words
http://1000words.kodak.com/

Glenfiddich Blog
http://blog.glenfiddich.com/

IT@Intel Blog
http://blogs.intel.com/it/

All My Eye
http://allmyeye.blogspot.com/

HuffPo Full Blog Feed
http://www.huffingtonpost.com/theblog/

News@Cisco Notes
http://blogs.cisco.com/news/

Mobile Visions
http://blogs.cisco.com/wireless/

Open standards, open source, open minds, open opportunities
http://www-03.ibm.com/developerworks/blogs/page/BobSutor

Marriott on the Move
http://www.blogs.marriott.com/

NYT Editorials
http://topics.nytimes.com/top/opinion/editorialsandoped/editorials/

Washington Post Editorials
http://www.washingtonpost.com/wp-dyn/content/opinions/columnsandblogs/?nav%3Dleft⊂=new

LA Times Editorials
http://www.latimes.com/news/opinion/editorials/

Why you’re interesting but your company just isn’t

Alright, alright - I know I’ve been a bad, bad blogger these past few weeks, as the abhorrent lack of posts aptly demonstrates. Other activities (more precisely reading, reading and reading) have kept me busy. But I’ll make up for the neglect now and I promise not to let things slide again, even if that means having to tear myself away from fascinating stuff such as this.

Northeastern University and Backbone Media recently conducted a study on corporate blogging where they asked 21 company bloggers for their experiences and opinions. I think the study is interesting not only because of the responses it cites, but because the responses say something about the bloggers who were interviewed and their take on how corporate blogging works. I’ve argued before that corporate blogs - as much, perhaps even more than private blogs - serve a social function, that is, that they seek to establish a relationship between the blogger (acting as a representative of the company) and his readers. Obviously a blog benefits from being informative, but before it can inform it has to achieve the status of a trusted source. However, the only way it can become a trusted source is by making the blogger a familiar, tangible person to his readers - someone with a personality, humor, interests, quirks, etc. The trouble with brochures, CEO interviews, mission statements and normal company websites isn’t that they aren’t informative, it’s that they lack what the cluetrainers have dubbed “a human voice”*.

Let’s look at some of the findings:

After careful review, the research team identified five factors for success. The majority of the twenty participant bloggers pointed to these factors as important to the success of their blog. We focus in on these factors in Section Three.

The five factors identified by the participants were:

1. Culture
2. Transparency
3. Time
4. Dialogue
5. Entertaining Writing Style and Personalization

A company should carefully consider all of these factors before making a decision to blog:

Culture: If a company has particular cultural traits worth revealing, or conversely, a bad reputation they want to repudiate, blogging could be an attractive option. A great example of the latter is Microsoft. Microsoft had a distinct problem—distrust on the part of many customers. The company was seen as being very big and unresponsive to customers. Microsoft used blogs to reveal that individual employees do care about customers, and they are willing to provide a lot of value by way of product and developer information. Blogging at Microsoft has worked well because Microsoft and Microsoft bloggers were able to show the public what Microsoft’s culture was really like behind the big company image.

I find the internal factors culture and cultural traits somewhat strange in this context, especially since they appear entangled with external factors, specifically reputation and image. Somehow culture seems to be understood as an amalgam of positive communicative, interactional and organizational traits. Revealing these traits to the public via blogging is presented as a strategy to counter-point a negative reputation or image, to show what the the company is “really like” - but only if it is “worth revealing”, i.e. if it is likely to be perceived as positive. But doesn’t the second factor, transparency, imply that what your organization is like should be, well, transparent, even if not all of what is revealed is positive? In other words: if blogging is a strategy for replacing the big company image with another - friendlier - image, isn’t that different from showing people what things are “really like”?

A bit further down, the authors provide a list of personal characteristics which they consider important for blogging:

In preparing to blog, it is important to pick the right people to begin blogging for your company. Several of the corporate bloggers gave their insights into what characteristics to look for in a good corporate blogger. These characteristics include:

* The ability to listen to your audience
* Passion for the topics
* The ability to communicate a personality online
* Perseverance and commitment
* Expertise in a field or variety of topics
* A warm and friendly approach
* Good writing ability
* The necessary amount of time for blogging
* Openness to criticism

A company can use these insights as a yardstick when identifying the right corporate blogger.

I’ve highlighted those qualities that I think of as associated with social or communicative competence. Several of these qualities are closely related to (good) writing ability, in the sense that communicating a personality via a blog is achieved through text. I can be warm and friendly in person, but if I decide to write like a lawyer people may infer that I’m distant, aloof, etc, because their impression of me is created by subconsciously analyzing my language and inferring from that what kind of person I am. The ability to listen and openness to criticism alone aren’t enough - I have to actively prove that I “got the message” by saying so, otherwise nobody will know about it.

To summarize, the assumptions supported by the study are that:

a) bloggers can use their personal credibility to make their company appear more credible

b) bloggers can interact with their readers directly and demonstrate communicative competence, which is also assumed to benefit their employer

c) bloggers achieve this by being competent communicative actors, i.e.:

- by what they say

- by how they say it

- by if and how they interact with others rhetorically

d) general social competence is decisive to the success of a blogger, especially whether or not this competence is visible on the screen

Right now you might be thinking that none of this is really new. But I have the nagging feeling that there is a hidden caveat blogger in there. How can we be sure that the relationship between the blogger and his readers actually has any positive effect on how the company is perceived?

As the study describes:

Many bloggers described how it was often personal posts unrelated to the main topic of the blog that generated a lot of comments and traffic. A post that is about unrelated subject matter demonstrates the connections between an audience member and a blogger, and so builds a closer connection between blogger and their readership, precisely because the post is less about business and more about living life.

Wonderful, but isn’t it problematic that readers tend to find those topics most interesting which are least connected to the company? Doesn’t that imply that blogs have a tendency to be personal and that the relationship between the reader and the blogger has little effect on the relationship between customers and the company? I’m not sure, but I’d be careful to dismiss these questions.

* I’ll ramble about Cluetrain another time. Let’s just say there’s a lot of interesting stuff there that needs to be scrutinized.

Screenshots of the Corporate Blogging Corpus

I feel guilty for not blogging enough lately, but I’ve just been too darn busy. Or maybe I should say I’ve felt too darn busy. If FT500 executives can find the time to blog, a leisure-spoiled PhD student with a laughable 30-hour workweek (that’s just the day job though, research comes on top of that) should really not complain.

Let’s just say that I have been distracted. And because I’m a nerd I feel the need to share the origin of my distraction with my readers.

Here are a few screen shots of what has been keeping me busy over the last weeks:
Corporati Main Page

Global word frequency list

List of blogs

Blog details for GM Fastlane

Word frequency list for Jonathan Schwartz' blog

Example of a pos-tagged post

In case you are wondering what on earth Corporati is exactly: it is linguistic database (or corpus) that I’ve developed for the empirical part of my thesis project. It automatically indexes posts from a number of corporate blogs (about 120 at the moment) and performs statistical language analysis. Before, it was just able to count words and sentences and build a list of the most common words in the collection. Since last weekend, however, it can also automatically get grammatical information about the words in a text - whether something is a noun or adjective, whether it is singular or plural etc. I didn’t code that part myself but used this great tool. Automating the task (called part-of-speech tagging) is not just for lazy people. I have close to 9,000 posts in that database now… and I do hope to finish that PhD while I’m still young. Before statistical tagger were common, people (=brave/crazy linguists) did all tagging by hand. Ouch.

Next time we return to our regular scheduled program.

Note: I’m aware that it probably doesn’t look impressive at all to the non-linguist (not sure if most linguists would find it impressive either, but perhaps at least somewhat interesting). I plan to make it look prettier in the future, but since it’s mostly a research tool I doubt normal (non-nerd) people will want to use it anyway. ;-)

Corporate Blogging ROI: Hard Return vs. Soft Return

I’ve been both busy and a little bit blog-lazy this last week, so my apologies for the long silence. I hope to post more later this week as I should finally have enough time to catch up on feed-reading.

Last month Charlene Li made a very interesting proposal for a framework to measure the ROI of corporate blogging. Being neither a marketer/economist nor a corporate blogger, I can hardly add any suggestions for a precise system of measurement. However, I think that there are some basic ways of describing the different kinds of return blogging can net a business. I’ll start by looking at the aims associated with company blogs.

Functional areas of corporate blogs

First, let’s examine six different basic functions for which corporate blogs are utilized. I’ve included the main target groups for each function in brackets.

a) PR/Image (targeting: public, customers)

b) Marketing (targeting: customers, potential customers)

c) Customer Relations (targeting: customers)

d) HR/Recruiting (targeting: potential employees)

e) Intra-company/intra-industry Communication (targeting: staff, industry experts)

f) Strategy (targeting: shareholders, journalists, staff)

Note that I don’t call these different functions “categories” or “types”. The problem with that would be that no corporate blog fits neatly into a single category. Instead, virtually all of them are hybrids, serving a combination of purposes. GM’s Fastlane Blog has posts concerned with a diverse set of issues, such as customer relations, marketing/market research, corporate strategy and the company’s image. While Fastlane deals with a range of topics and has a number of different authors, Sun’s CEO blog has just one author - Jonathan Schwartz - but the array of functions is equally large. And while developer blog hubs, such as those maintained or supported by Microsoft, Oracle or SAP have hundreds of authors, they serve comparably few functions (intra-industry/intra-company communication and potentially customer relations).

I have grouped the functions outlined above according to what could be called their broader readership orientation, i.e. whether they are concerned with customers and the general public (functions a,b and c), with the company itself (d and e) or with interest groups vital to the company, such as shareholders and journalists (f).

Here’s an illustration of the function groups, their focus and orientation:

Functions and orientations of corporate blogs

Stress and hard vs. soft return

Assuming that corporate blogs generally have the objective of somehow generating a return on investment for the companies which maintain them, it makes sense to look at the kinds of return and how they are related to different functions.
I’m going to make two basic distinctions:

hard return = more readers, more sales, greater visibility (usually quantitative)

soft return = trust, positive image, human face (usually qualitative)

Any blog can potentially yield both types of return, but the distribution between the two depends on which aspect is prioritized by the authors. Take Robert Scoble when he was still blogging for Microsoft. Scoble’s role was not really to make Microsoft more visible, nor would anyone suggest an immediate connection between his blog and the company’s sales. What Scoble changed - at least to some of his readers - was how people perceived Microsoft, in contrast to just how many. Large companies are mostly interested in influencing how they are perceived by the public or vital interest groups, thus their blogs are more focused on soft return factors (see McDonald’s, Cisco). Since soft return normally has a qualitative effect*, it is more difficult to measure than hard return. Most copywriters, marketers and smaller businesses will have hard return in mind when blogging - increasing visibility and sales via an informative or entertaining blog is the most common goal. Blog-SEO is another significant hard return factor. Because hard-return bloggers aim for growth in readership, they write to deliver, that is, they seek to provide some kind of added value to the reader, whether this value is information, entertainment, instruction etc. By contrast, soft return-oriented blogs aim to engage. They tend to focus on the discussion around an issue more than on the issue itself. The dynamic social relationship between the author and his readers is the decisive element. Soft return as a concept works somewhat analogically to what Doc Searls calls the because effect, in the sense that it highlights the interaction over the item, just as the because effect highlights the business opportunities created because of the propagation of a technology over the technology itself.

The basic decision to prioritize delivering over engaging or vice versa determines the stress of the blog. A blogger may choose to emphasize content (deliver), or the interaction with his readers (engage):
Types of return

Measuring different types of return

If we assume that different function-focus combinations favor one of either types of return over the other, and therefore stress either delivery or engagement, it follows that different measurements are needed to determine the return a blog produces. The frequency and length of comments, for example, seems an appropriate measurement for how engaging a blog is, but arguably a blog can be informative without provoking a lot of feedback. Blogs written for functions such as recruitment or intra-industry/intra-company communication usually aim to both deliver and engage, but focus on very specific groups and have specific objectives in mind (recruiting qualified staff and solving technical problems).

Simply put, measurement of you blog’s ROI should be tailored to your type of blog. Taking aspects such as orientation, focus, function and stress into account makes that task much more workable than coming up with a one-size-fits-all solution.

* I would argue that frequency of comments can be counted as a quantitative effect of soft return. Most other metrics tend to apply to hard return, however.

Dissecting Robert Scoble (2)

As promised earlier, today I’m going to look at how Robert Scoble’s blog differs from other corporate blogs, and from blogs in general (apologies for the delay, this should have been up two days ago).

The earlier entry focused on a number of language-related statistics: word length, sentence length, words per post etc. In this second step, I want to look at the distribution of individual words in the three different collections analyzed and draw some (lofty) conclusions based on the results.

Here are the top ten most frequent words for Scobleizer, the corporate blogs collection and the random blog comparison group:

Scobleizer
Rank Word Frequency
1 THE 625
2 TO 442
3 A 431
4 I 414
5 AND 332
6 OF 313
7 IS 255
8 THAT 243
9 ON 221
10 IN 175

Corporate Blogs
Rank Word Frequency
1 THE 35432
2 TO 19714
3 AND 17692
4 A 16457
5 OF 16154
6 IN 11110
7 IS 8475
8 THAT 7819
9 I 7342
10 FOR 7220

Random Blogs Comparison Group
Rank Word Frequency
1 THE 4374
2 TO 2985
3 AND 2975
4 I 2951
5 OF 2097
6 A 2025
7 IN 1335
8 YOU 1146
9 THAT 1120
10 MY 1065

At first glance, you’re likely to think that the three lists look very alike. This is not unusual in any way - in virtually any given English text “THE” will rank at number 1, whether you are looking at the Bible or Personal Finance for Dummies. The same is the case with common function words such as prepositions, which form the basic building blocks of pretty much any text you can come across.

An interesting variation that I want to focus on for the moment is the distribution of the personal pronoun “I” and the possessive determiner “MY”. Both for Scobleizer and the Random Blog Comparison Group “I” ranks at number 4, well ahead of any other pronouns (for example “WE”). In the Corporate Blogs Collection “I” is at rank number 9, making it significantly less frequent. Further down the list, “MY” ranks at 14 in Scobleizer and at 28 in the Corporate Blogs Collection. Consequently, “WE” ranks higher in Corp. Blogs than it does it the other two collections.

Big surprise there, you might think. Obviously Scoble speaks only for himself, thus he is unlikely to use “WE” as frequently as it is used in blogs on corporate responsibility or policy, most of which are authored by a team of people. Even in those cases where there is just one author, he or she often prefers the corporate “WE”, especially when the person in question is an executive. And of course there’s the possibility of largely writing without a personal agent. What is intriguing to me, however, is just how close Scoble is to the Random Blogs Group in regards to “I”-use. The Random Blogs Group largely consists of blogs written by teenagers, housewives, activists and other private individuals. As with their writing, the question of personal involvement is always relevant in Scoble’s blogging - it all relates to him as an individual in some way. I find it likely that this level of involvement in turn engages his readers more strongly than a less personal (that is, “self-centric”) approach would. Telling others about yourself serves a social function; it allows them to empathize with you, to better understand your motifs. “Talking about yourself” does not necessarily always mean relating thoughts or emotions, though. Scoble very often describes where he is and what he is doing because this gives his readers a better understanding of who he is, which allows them to better judge whether they value his opinion on whatever gadget, trend or company he then proceeds to discuss. He makes a conscious effort to overcome the decisive asymmetries in the relationship with his readers: the fact that they aren’t in the same place at the same time as he is. When you’re having a chat with your friend, all or some of the following apply:

- you are physically in the same place, at the same time

- you can hear the other person’s voice

- you can see the other person

- the other person is actively addressing you

- you can immediately respond to what he or she is saying

In a real-life, face-to-face conversation all of these points usually apply. In a technically mediated interaction, whether it’s texting on AIM or talking on the phone, normally some (but not all) criteria are applicable. The more of them are, the closer the interaction resembles a “real” conversation, simply because a real-life conversation has all of these characteristics. Notice how blogs are different. Only the last point works - you can respond to a blog, but not quite immediately. A blog author is very unlikely to exclusively address just one other person; the readership is usually plural and largely unknown.

So what does Scoble do to overcome these limitations? He tells you where he is and what he’s doing to make the kind of communication between him and his readers seem more like a conversation. Of course you could argue that it really is a conversation since you can respond to him - and you’d be right -, but he aims to overcome the other impairments as well. The innovation here is that Scoble doesn’t pretend to address his readers directly (unless he really is responding to another blogger) since he doesn’t really know who they are individually. Instead he focuses on his part of the equation by making sure that you know where he’s coming from and where he’s going with something – both physically and metaphorically speaking.

While the figures cited above are pretty vague indicators which should not be over-interpreted, I think they support the basic idea that blogs can function as time-delayed conversations and are naturally used in that manner by individuals. When organizations blog they are confronted with their inherent inability to have conversations in the same way that individuals do. The options are thus to either let individuals speak for the company – which is risky for a plethora of reasons – or to (mis)use blogs as a broadcast medium. I’m not even saying that the latter can’t work, just that people are likely to be very critical of such a usage, because they expect blogs to work differently.

One thing to always keep in mind is that you’re not real to your readers unless you have a face, name, identity and physical location. We like to think that we can relate to abstractions just as easily as we relate to concrete things, but our instincts often say otherwise.

Dissecting Robert Scoble

Disclaimer: No bloggers were harmed in the course of this experiment.

As I’ve hinted at in the past, I’m in the process of building a textual database that contains thousands of posts culled from the RSS feeds of about a hundred corporate blogs, plus a comparison group of several “miscellaneous” blogs randomly picked through blogger.com and blog.com. The corpus currently has a little under 800,000 words and is expected to reach a round million words (or tokens) in about two to three weeks time.

So far, I’m just calculating a few very basic statistics: post word count, post sentence count and average word/sentence/post length, along with a top 100 list of the most frequent words. Though these are very basic figures, they nevertheless give a few interesting clues about the sources in question, especially when you compare one collection of blogs with another.

My test subject today will be Robert Scoble’s blog, Scobleizer. I’ll compare it to a) a large collection of other company blogs and b) a collection of randomly chosen non-corporate blogs. My reasons for picking Robert are pretty unspectacular. I happened to add him to the database fairly early on so that now I have a reasonable amount of data. Also, his immense popularity should make for some interesting results… note that I say “interesting” and not conclusive – a few language statistics don’t equate to the recipe for the Scoble Special Sauce of Blogging Fame. Anyway, let’s crunch a few numbers.

Scobleizer

Posts: 327

First Post to Last Post (FPLP): 2 August 2006, 03:26 - 30 September 2006, 22:07

Tokens / Types (Ratio): 17014 / 3743 (4.55)

Sentences (SC): 1950

Average Word Length (AWL): 4.9

Average Sentence Length (ASL): 10.1

Average Words per Post (AWpP): 52.9*

* not relevant because Scoble’s RSS doesn’t include complete posts but only summaries (the first 56 words)

Corporate Blogs

(Blogs: 107)

Posts: 4443

First Post to Last Post (FPLP): 2 May 2005, 00:00 - 2 October 2006, 00:50

Tokens / Types (Ratio): 667969 / 62230 (10.73)

Sentences (SC): 44350

Average Word Length (AWL): 5.5

Average Sentence Length (ASL): 15.9

Average Words per Post (AWpP): 155.1

Random Blogs Comparison Group

(Blogs: 18)

Posts: 576

First Post to Last Post (FPLP): 17 November 2004, 03:17 - 2 October 2006, 00:48

Tokens / Types (Ratio): 105253 / 16979 (6.2)

Sentences (SC): 10335

Average Word Length (AWL): 5.1

Average Sentence Length (ASL): 10.8

Average Words per Post (AWpP): 184.5

The stats

The first thing to note is that the three collections differ significantly in terms of size. The Scobleizer collection only has a size of 17,014 tokens (words), while both the corporate blog collection (667,969 tokens) and the random blogs comparison group (105,253 tokens) are much larger. This has strong implications for the accuracy of the figures, as a larger sample is obviously more accurate. The posts indexed in my database are not the total of posts made in those blogs, but only those which have been recorded since I began indexing a few months ago. Some entries date back several years, which is simply due to the fact that some of the RSS feeds which were used go back that far.

You might be wondering what on earth types are. Don’t worry, it’s really simple: while tokens are all words in a text, types are all unique words. So while the sentence “The cat ate the mouse” has 5 tokens, it only has 4 types because “the” occurs twice. The token-type-ratio for that sentence would be 5:4, or 1.25. As you can imagine, a long text will have a significantly larger number of tokens than types, since function words (pronouns, articles, prepositions etc) are re-used all the time, while lexical words (something like “blog”, “Google” or “greenish”) occur a lot less often.

The other statistics are pretty straight-forward: the number of total posts in the database, the time span from the first to the last post, the total number of sentences and three averages: average word length (AWL), average sentence length (ASL) and average words per post (AWpP). AWL refers to the number of characters in a word, while ASL in turn refers to the number of words in a sentence. As mentioned above, Scoble’s AWpP value should be ignored, since his RSS feed does not include complete entries but only summaries.

A cautious interpretation

The comparison shows that Robert Scoble uses shorter words and sentences than both the blogs in the random comparison group and those in the corporate blogging collection. Words are only slightly shorter (Scoble: 4.9; Corp.blogs: 5.5; Random blogs: 5.1) but it should be noted that variation in this category is normally not very strong, thus the difference between Scoble and the corporate blogs seems notable. The differences in sentence length (10.1; 15.9; 10.8) are even more pronounced: on average, the other corporate blogs have much longer sentences than Scoble, who is again a little below the average value of the random blogs. Finally, it cannot be determined if Scoble’s posts are shorter than those in the other two collections (52.9*; 155.1; 184.5) because his RSS syndicates only summaries, though my personal bet would be that they are. This is also the only category where the random group scores higher than the corporates.

So what does this mean? In one sentence, it means that on average Robert Scoble seems to use shorter sentences than most other corporate bloggers, and that the words he uses are also significantly shorter. Looking further, it appears that Scoble’s style – only speaking in terms of word and sentence length – is closer to that of non-corporate bloggers. However, these numerical statistics aren’t terribly exciting by themselves, which is why tomorrow I’ll take a peek at a list of the most frequently used words in our three source collections.

(to be continued)

Edit: My claim that Robert’s RSS feed does not contain full texts is bogus - my indexing tool was simply looking in the wrong place. I’ll correct the problem asap. Mea culpa.

What blogging does to your business

I’ve just finished reading two interesting pieces (one, two) by the anonymous author of the Yankee Wombat blog. The writer describes corporate blogging in conjunction with Marshal McLuhan’s theory of media and culture, as outlined in The Gutenberg Galaxy (1962) and Understanding Media (1964). In the second entry, he also discusses Eric Raymond’s The Cathedral and the Bazar (1999) which compares the open source model of software development with customary closed-source methods used by companies such as Microsoft.

While I don’t want to delve to much into McLuhan’s work (my posts are lengthy enough as it is), it is worth pointing out one interesting observation that Yankee makes vis a vis McLuhan:

McLuhan argued that when a new medium emerges people tend to focus on content, not form. […] Innovations that emerge as people come to grips with the implications of a new media environment are difficult to see at first because no one can see the new environment. Indeed, at first, they can only see the innovation in the context of the old environment.

This is what McLuhan meant with his famous observation that “the medium is the message”. We tend to think of communication purely in terms of content (”what“), while largely ignoring how something is medially communicated. The fact that the how either shapes our perceptions of the message, or that in some cases it can be the message itself is widely acknowledged today. However, McLuhan to my knowledge stressed this aspect most strongly in conjunction with radio and television, which he characterized as “hot” media, in contrast to books which he regarded as “cold” because they require a higher degree of interaction on the part of the recipient (reading in contrast to listening or watching). Traditional mass media is characterized by its unidirectionality - the fact that there is no feedback - and by its tendency to stimulate our senses through its packaging.

How are blogs different? Firstly it is worth noting that from the vantage point of “visual culture” they are actually a step backwards in terms of presentation. Blogs are almost purely textual, with their “special effects” largely serving the purpose of linking to other texts. They are also highly participatory (in contrast to traditional printed media), whether the author is “speaking” to another blogger’s text or his own, responding to comments, etc. In summary, blogs are mostly content and fairly little form.

What does this mean? One consequence of the relative “coolness” of blogs is that they are less useful to attain a specific effect on their readers in the same way that visual media does on its viewers. Words and sentences are always subject to interpretation, while images - though having the potential to be just as ambiguous - usually suggest a clear meaning to us; they “say more than a thousand words”. The fact that blogs are highly participatory further complicates the goal of getting a targeted message across without interference. A continuous dialog produces no final result, it is per definition unresolved and incomplete.

So what’s to like about blogs under these circumstances? I think that precisely because they are so content-centric, because they are participatory and because they can be created by “anyone with a computer and a connection” (Yankee) we perceive them as authentic. And authenticity is precisely what a trans-national, multi-million dollar organization built one hierarchy upon another is usually lacking. Traditional media cannot achieve the same kind of directness, which can be either a blessing or a curse, depending on what your goals are.

As Yankee points out, blogs are certainly an excellent tool for facilitating communication inside a company. To point back to Raymond’s analogy, they have to potential to make the cathedral a little bit more like the bazaar without necessarily upsetting the corporate organizational structure. Their external role in marketing and PR is much more difficult to define. Most products aren’t really the stuff of narratives and passionate debates and regarding PR there’s always the question of how much public scorn you can endure and at what point it becomes dangerous to your company to allow it (then again, if Dell didn’t have a blog people would again turn to blasting the company in their own blogs, as they did before).

Perhaps in the end the most interesting question is not what companies can do with blogs but what blogs will do to them. The ROI of blogging may be understanding not only your customers and industry better, but also your own business. Hard to measure, I know, but a potentially valuable learning experience nevertheless.

A first attempt at a categorization (II)

The beauty of taxonomies is that they will never fit with the darned data. Since making my first proposal for a categorization of corporate blogs, I have examined my corpus more closely and consequently I’ve had to update my categories. The communicative purpose behind a blog is now the deciding factor. All blogs are assigned a category based on what their main focus is, from products (narrowest) to general/multi-purpose (widest).

Blogs by small and medium-sized businesses (SMBs) are the exception. I felt that a separate category will be needed, since they tend to very greatly in terms of focus and are created in a different organizational context than blogs belonging to the other categories. Because of these differences, I’m going to save SMB blogs for another post.

So here are the new and improved categories:

A. Product Blogs

Written by: marketing

Target audience: customers

Examples (direct):

Guinness Blog (Guinness & Co)

Nike Basketball (Nike)

Examples (indirect):

bugBlog (RESCUE/Sterling International)

Thompson Holiday Blog (TUI UK)

Real Baking with Rose Levy Beranbaum (General Mills)

Product blogs aim to

a) to promote a product directly,

b) to generate a discussion centered on the product and

c) to address issues closely related to the company’s products.

Guiness and Nike both center their blogs on the product itself, whereas RESCUE, TUI and General Mills give issues directly related to the product the main focus. Real Baking [..] stands out among the cited examples because it places blog author and award winning baker Rose Levy Beranbaum at the center, whereas the other four blogs are either anonymous (RESCUE, Nike, TUI) or written by bloggers only identified by first name (Guinness). Of all blog types examined, product blogs stand out as the only type where anonymous posting is common.

B. Image/Lobbying Blogs

Written by: PR

Target audience: customers/public

Examples:

Life at Wal-Mart (Wal-Mart)

Open for Discussion (McDonald’s)

Digital Straight Talk (Cox Communications)

The Bovine Bugle (Stonyfield Farm)

From Edison’s Desk (General Electric Company)

Image/Lobbying blogs seek to

a) to create a positive public perception of a company,

b) to actively shape the public discussion of a company and its products,

c) to advance company interests in regards to policy (lobbying) and

d) to preempt or react to criticism from customers.

Wal-Mart and McDonald’s aim to refute public criticism of their business practices (Wal-Mart) and products (McDonald’s) through their blogs, while Cox is both product-focused and noticeably targets the competition. Stonyfield Farm and GE use their blogs to bring attention to their commitment to the environment (SF) and their research (GE). The common goal is to convince consumers and the general public that the company is dedicated to corporate responsibility. Though having a different focus (on the customer) Dell’s Direct2Dell blog also falls into this category, as I do not see CRM as a distinctly separate type. Blogs may serve to influence the public’s perception, including the sentiments of disgruntled customers, but as a tool for troubleshooting they are not very useful.

C. Industry Blogs

Written by: experts

Target audience: other experts

Examples:

OraBlogs (Oracle)

IBM developerWorks (IBM)

eBay Developers Program (eBay)

IEBlog (Microsoft)

Industry blogs are blogs which are written

a) to inform other experts inside or outside of the company about issues related to specialized fields such as engineering/software development/hardware r&d etc,

b) to seek information and advice from other experts about such issues and

c) as as a mnemonic instrument for the author.

They are most often authored by experts in a subject area which is relevant to the company and manifest a frequent use of jargon. They are also frequently aggregated and topically tagged.

D. Strategy Blogs

Written by: executives

Target audience: shareholders

Examples:

Jonathan Schwartz, CEO Sun Microsystems

Randy Baseler, CEO Boeing

John Mackey, CEO Whole Foods Market

Usually written by executives, strategy blogs are blogs which may

a) discuss the position of the corporation and its products in the market,

b) evaluate competitors and their products,

c) legitimate management decisions such as layoffs, restructuring, expansion etc. and

d) outline future strategic goals.

The subtype is set apart by the fact that a) authors tend to hold senior positions in the corporate hierarchy and b) both the job title and the name of the author is virtually always integrated into the blog’s title. While CIOs and CTOs are usually most strongly concerned with product development (i.e. software), CEOs, COOs and most VPs primarily discuss industry issues.

E. General/Multipurpose Blogs

Written by: multiple

Target audience: varies

Examples:

“Kara R” (Honeywell)

Yahoo! Search Blog

Google Blog

General/multipurpose blogs differ from blogs belonging to the categories described above in that they are

a) written by a plethora of employees belonging to a large variety of departments (human resources, accounting, security) and/or

b) cover subject areas and serve purposes not commonly found in other blogs.

The Honeywell blog fall into that category, as its primary goal seems to be to facilitate recruiting. The Yahoo! and Google blogs are general-purpose sources as outlined in a). Common traits of blogs in this category are the lack of a single dominant focus and the the high degree of authorship variation. They may also extend the informative function usually occupied by press releases.

I am a hard bloggin' scientist - read the Manifesto Subscribe to the CorpBlawg Feed

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 License.