Zac Echola is muffin but trouble

Leverage data, not just pages

Saturday, March 14th, 2009

Far too often, news sites under leverage their data or they don’t even bother to store the data in a structured, machine-readable way. It’s not about recreating the newspaper experience online with those wacky Web 2.0 features thrown in for the fun of it.

It is the journalist’s job to provide context to facts, to string important bits of related data together in a way humans can quickly understand. We call these stories and they work great–for humans. 

Stories are a terrible way to store information. As much as we like to imagine computers with super-intelligent capabilities, they don’t compare to the human brain. Even the most advanced artificial intelligence is only slightly smarter than a rock rolling down a hill. Computers have great difficulty interpreting complex data. At best, they can merely process data and leave the interpretation to us.

Here’s an example: We can read a story and parse out the who, what, where, when, why and how. We can then take that information and apply it to other information we know about the world. We can read an article about Jim Cramer on The Daily Show with Jon Stewart and place that new information in other contexts; We can near-instantaneously access our knowledge of the financial crisis, journalism ethics, comedy, the personal histories of Jim Cramer and Jon Stewart, and the recent clash between The Daily Show and CNBC and apply broader knowledge to this particular story, enhancing not only our understanding of this particular story, but also our broader knowledge of its context. Where we run into new information we can’t put into context, we deduce and interpolate. Context is an extremely simple process for you and me. Humans are fantastic at finding patterns (we even find them where none exist).

Most software can’t create context without help. To a machine, that story is just a string of  characters attached to an ID number that separates this story from others. When you click a link to an article, the application doesn’t think “Oh this user is interested in The Daily Show,” it thinks “This user requested an article with a unique ID from my database that contains this string of alpha-numeric characters.” The application fulfills the request and then moves on to the next task. 

If there were two articles in the database about The Daily Show (each with a different ID number), the application wouldn’t have the slightest clue they were related. We need to provide that kind of context.

The simplest way to provide granular context is through tagging and meticulous categorization.

 Here’s another example: Most news sites break their content into a few categories. Let’s imagine a site with three categories: news, sports and opinion. Now the computer can “understand” three types of stories. It can’t really understand, but it can differentiate. Story with ID number 11 belongs in News, which is category 1. Story with ID number 22 belongs in Sports, which is category 2. Story number 33 is Opinion, which belongs in category 3. When a user clicks on News, the application organizes all the story ID numbers that are also in category 1. With the right database structure, one story in the database could be attached to all three categories. 

This categorization can get deeper and a lot of sites do dig deeper in their categorization. The Star Tribune has categories for all the major sport teams. The Chicago Tribune breaks down their columnists into news, business, etc. But they could do even more still. Each team is made up of people, places and things.  Each story contains those people, places and things. The who, what, where, when, why is all meta data that a computer can “read,” if stored in a structured way.

Here’s the key point I’m trying to make: By storing data in this way, you can exponentially increase the number of pages on your site, without actually creating more content. Leverage your data in a more efficient way.

Returning to The Daily Show article, if we stored this type of meta data about what that article was about, we could write an application that searches all our other content for related information. Not just for all stories about The Daily Show or all stories about Jim Cramer, but you could weight the page a user is already on against all other stories about both The Daily Show and Jim Cramer or all other stories about the financial and journalism ethics. More context available to the user immediately.  

If you had enough stories about The Daily Show, you could spin that data into a separate site, using the same tables. If you had several newspapers in different markets writing about the same topics, you could easily leverage that data into an aggregate site. You could create granular feeds for each piece of meta data. And so much more.

And that’s just leveraging the content. News sites are full of other data:  User information, advertising information, the list goes on. 

Let’s assume I’m me and you’re you. I read the Jim Cramer/Daily Show story and also a story about a new bar near my house. You live in the same neighborhood as me and read the story The Daily Show story. With the right data, an application could be written to suggest the bar story to you, because we share the same location and interest in The Daily Show story. Think Netflix recommendation engine for news. 

From an advertising perspective, this kind of data leveraging is huge. If I’m a sporting good store I don’t want to sell my brand, I want to sell my inventory. An article about Twins catcher Joe Mauer could feature an ad pitching Mauer jerseys, while the article about the new bar could feature drink specials. If my user profile says I’m interested in the White Sox, the sporting good store ad probably wouldn’t be effective in trying to get me to buy the Mauer jersey and would pitch something else, but the bar ad might want to tell me to enjoy the game against the Twins tonight with half-off taps.

Now, instead of selling one ad to the bar and one ad to the sporting good store, you’ve sold two ads (with presumably lower initial buy in cost, but higher overall CPM or CPC) with their message tailored to the right people and kept the rest of your advertising inventory available for ads more effective for other businesses. The point is that advertising contains meta data, too, you just have to store it so the machines can better differentiate.

Contextual advertising doesn’t have to be the Google approach, with spiders to crawling pages and keyword algorithms weighting context. It can be as simple as a relational table in a database and some elbow grease from editorial, advertising and users to create maintain the data. That’s the Achilles heel of the Google approach. Google’s robots have difficulty understanding tone. An article slamming Microsoft might still serve an ad for Microsoft Office, based on keyword density. Computers are stupid. People, presumably, are not.

In my next post in this series, I’ll break out a bunch of flow charts describing behavioral, social and contextual delivery methods. From there, we’ll further discuss ways to scale up.

The revolution will not be twitterized

Thursday, March 12th, 2009

I’m a little shocked, though not surprised, by the cheerleading of these 10 ways newspapers are using social media to save the industry.

Few, if any, of the things listed will save the industry.

Let me start by defining “save the industry.”

Newspapers are a poor distribution method in comparison to digital distribution, though they are not unprofitable. However, they do not support large enough margins to overcome the massive debt most newspaper companies took on before the bottom completely fell out. Let me say that again. Newspapers are still profitable business, but the profits they generate can not pay the banks back. Blame the banks for handing out huge loans to newspaper executives who expected to sustain profit margins well over 15 percent. But blame doesn’t save anything.

The Internet will not save newspapers. So, suggesting that the Seattle P-I or Rocky Mountain News or any other former newspaper company go online-only is not helpful. Going online-only doesn’t magically make bankruptcy and the loans that lead to bankruptcy go away. If it were that simple, newspapers would make more than a handful of dollars from digital revenue and probably wouldn’t be in this big of a mess to begin with, since margins can run considerably higher for online media.

The newspapers that will eventually die in the coming months and years will do so under the weight of their debt, their bravado and their inabilty to adapt to rapid changes in the market. But let me say this again, the Internet won’t save the newspaper. It cannot support the cost of distributing the newspaper in its current form.

“Saving the industry” is not a matter of redefining journalism. It is a matter of recapturing lost revenue. Journalism isn’t necessarily broken. It is still an incredibly cheap way to deliver warm bodies to advertisers. But the core business of newspapers is not journalism. It is advertising. What’s killing newspapers isn’t an over abundance of information on the Web, it’s that the huge piles of advertising revenue evaporated from newspapers and they neglected building a solid business foundation online.

It was not an overnight thing. The bottom didn’t fall out because the recession hit. The bottom fell out because, for years, newspapers didn’t respond to the craigslists, the Monsters and the eBays of the world. Upwords of half their revenue slipped away as they sat by and watched. They focused too heavily on “hyper-local” without understanding what that actually meant. Instead of building huge networks of granular advertising, building on the reach of each Web site, they walled themselves off, they spent much of their energy focused on replicating online what worked for print. The display ad mentality, the overzealous protection of the “core” print product and the shameless lack of foresight and innovation, day after day, month after month, year after year, put most newspapers in a horrible position for turning a large and sustainable profit online.

It might be too late to save the industry. However, I think there’s still a glimmer of hope, though it dims every moment wasted on micropayments and absurdly drawn out conversations of whether or not allow comments on the site (flip a fucking coin already).

That hope doesn’t come from twitter. It doesn’t come from APIs or live blogging. It doesn’t come from attempts to attract more readers online or false hope in the Amazon Kindle or Google as saviors. Many news sites already have plenty of traffic. Lack of traffic or want of news is not the problem here. The problem is that the industry hasn’t properly figured out how to profit from that traffic. This is not an audience problem. This is not a journalism problem. This is entirely a business problem. Our problem has been that we’re entirely focused on the wrong damn thing.

The newspaper business has never been in the journalism business. Journalism is a means to an end (an end that unfortunately may no longer support the luxury of  “wasteful” spending on bureaus and months long investigations that turn up little to no news). The true core business is not newspaper production and distribution, it’s advertising.

If you want to save the industry, do it by focusing on things that will generate revenue. If people aren’t buying run-of-site display ads online, it’s because they’re ineffectual and expensive. Newspapers need to stop managing dwindling client lists and start actively seeking out new advertisers. To do that, they need to have the products in place to target new advertisers, to increase the reach of their network and the right technology to segment their audience.

They need data. I’m still surprised that so many news sites haven’t bothered to tap into the wealth of user information available through Facebook Connect. Instead they’re spending their time creating pages within Facebook that merely link to their sites full of ineffectual advertising, if they have any advertising at all. Or worse, spending gobs of time and money reinventing the wheel by building hyper-local social networks from scratch.

I have a few posts in the works that deeply address targeting new advertisers, increasing reach and segmenting audience. They’ll all go up over the next few days.