« Previous
Do it: Leverage data, not just pages
Published on 14/03/09
by Zac Echola
Far too often, news sites under leverage their data or they don’t even bother to store the data in a structured, machine-readable way. It’s not about recreating the newspaper experience online with those wacky Web 2.0 features thrown in for the fun of it.
It is the journalist’s job to provide context to facts, to string important bits of related data together in a way humans can quickly understand. We call these stories and they work great–for humans.
Stories are a terrible way to store information. As much as we like to imagine computers with super-intelligent capabilities, they don’t compare to the human brain. Even the most advanced artificial intelligence is only slightly smarter than a rock rolling down a hill. Computers have great difficulty interpreting complex data. At best, they can merely process data and leave the interpretation to us.
Here’s an example: We can read a story and parse out the who, what, where, when, why and how. We can then take that information and apply it to other information we know about the world. We can read an article about Jim Cramer on The Daily Show with Jon Stewart and place that new information in other contexts; We can near-instantaneously access our knowledge of the financial crisis, journalism ethics, comedy, the personal histories of Jim Cramer and Jon Stewart, and the recent clash between The Daily Show and CNBC and apply broader knowledge to this particular story, enhancing not only our understanding of this particular story, but also our broader knowledge of its context. Where we run into new information we can’t put into context, we deduce and interpolate. Context is an extremely simple process for you and me. Humans are fantastic at finding patterns (we even find them where none exist).
Most software can’t create context without help. To a machine, that story is just a string of characters attached to an ID number that separates this story from others. When you click a link to an article, the application doesn’t think “Oh this user is interested in The Daily Show,” it thinks “This user requested an article with a unique ID from my database that contains this string of alpha-numeric characters.” The application fulfills the request and then moves on to the next task.
If there were two articles in the database about The Daily Show (each with a different ID number), the application wouldn’t have the slightest clue they were related. We need to provide that kind of context.
The simplest way to provide granular context is through tagging and meticulous categorization.
Here’s another example: Most news sites break their content into a few categories. Let’s imagine a site with three categories: news, sports and opinion. Now the computer can “understand” three types of stories. It can’t really understand, but it can differentiate. Story with ID number 11 belongs in News, which is category 1. Story with ID number 22 belongs in Sports, which is category 2. Story number 33 is Opinion, which belongs in category 3. When a user clicks on News, the application organizes all the story ID numbers that are also in category 1. With the right database structure, one story in the database could be attached to all three categories.
This categorization can get deeper and a lot of sites do dig deeper in their categorization. The Star Tribune has categories for all the major sport teams. The Chicago Tribune breaks down their columnists into news, business, etc. But they could do even more still. Each team is made up of people, places and things. Each story contains those people, places and things. The who, what, where, when, why is all meta data that a computer can “read,” if stored in a structured way.
Here’s the key point I’m trying to make: By storing data in this way, you can exponentially increase the number of pages on your site, without actually creating more content. Leverage your data in a more efficient way.
Returning to The Daily Show article, if we stored this type of meta data about what that article was about, we could write an application that searches all our other content for related information. Not just for all stories about The Daily Show or all stories about Jim Cramer, but you could weight the page a user is already on against all other stories about both The Daily Show and Jim Cramer or all other stories about the financial and journalism ethics. More context available to the user immediately.
If you had enough stories about The Daily Show, you could spin that data into a separate site, using the same tables. If you had several newspapers in different markets writing about the same topics, you could easily leverage that data into an aggregate site. You could create granular feeds for each piece of meta data. And so much more.
And that’s just leveraging the content. News sites are full of other data: User information, advertising information, the list goes on.
Let’s assume I’m me and you’re you. I read the Jim Cramer/Daily Show story and also a story about a new bar near my house. You live in the same neighborhood as me and read the story The Daily Show story. With the right data, an application could be written to suggest the bar story to you, because we share the same location and interest in The Daily Show story. Think Netflix recommendation engine for news.
From an advertising perspective, this kind of data leveraging is huge. If I’m a sporting good store I don’t want to sell my brand, I want to sell my inventory. An article about Twins catcher Joe Mauer could feature an ad pitching Mauer jerseys, while the article about the new bar could feature drink specials. If my user profile says I’m interested in the White Sox, the sporting good store ad probably wouldn’t be effective in trying to get me to buy the Mauer jersey and would pitch something else, but the bar ad might want to tell me to enjoy the game against the Twins tonight with half-off taps.
Now, instead of selling one ad to the bar and one ad to the sporting good store, you’ve sold two ads (with presumably lower initial buy in cost, but higher overall CPM or CPC) with their message tailored to the right people and kept the rest of your advertising inventory available for ads more effective for other businesses. The point is that advertising contains meta data, too, you just have to store it so the machines can better differentiate.
Contextual advertising doesn’t have to be the Google approach, with spiders to crawling pages and keyword algorithms weighting context. It can be as simple as a relational table in a database and some elbow grease from editorial, advertising and users to create maintain the data. That’s the Achilles heel of the Google approach. Google’s robots have difficulty understanding tone. An article slamming Microsoft might still serve an ad for Microsoft Office, based on keyword density. Computers are stupid. People, presumably, are not.
In my next post in this series, I’ll break out a bunch of flow charts describing behavioral, social and contextual delivery methods. From there, we’ll further discuss ways to scale up.
The end. Or is it?
Please leave a comment so I know what you think about this post. After that, check out It's randomonium! Or, if you're so inclined, take a gander at what I'm reading and my del.icio.us links.
Trackback URL: Leverage data, not just pages.
Tags: the revolution will not be twitterized