Archive for the ‘ Semantic Web ’ Category

Closing Remarks

As this project comes to a close, what I’ve learned about organizing and presenting news items using a common language like XML has reinforced what I discovered last semester about online content. Most of these discoveries distill down to this bit of advice:

Media companies, don’t EVER trash your content.

At one point in my career, I contributed an article to the publication I was working for at the time. Nothing fancy, just an interview with an international artist. The publication killed the story, and I never saw it again. I’d publish the story online myself, but the article still legally belongs to the publication. After all, I would not have scored the exclusive interview I needed for it without their help. I don’t think the publication should publish it or post it online to protect my feelings, though that might become a concern with a bigger-egoed writers. I don’t care either way. This article, however, took me weeks of research and use of the publication’s resources to write, and because it didn’t fit the format they were looking for—and frankly because another more seasoned writer had a better article on a similar story—the publication scrapped it. Continue reading


API development sheds light on a new workflow

Making a stand-alone scraper and XML readers seems to be the vogue nowadays. I met news API developer for The New York Times Derek Willis last week, and he brought up the good point that many online developers working in journalism had to use APIs and other querying services for their own publications because they are usually not given admin access to the publication’s electronic database.

Like Derek, I want to help journalists solve redundancies within their information gathering and distribution models. All of the hard data that many of my hard-working colleagues gather (i.e. names, ages, dates, etc.) should be stored at the most granular level within a relational or hierarchical database should be easily reused and accessible to everyone, not just journalists. Practical sorting of individual articles, i.e. by relevance, relies on an article’s meta-data, the most accurate of which is derived from the article’s most minor elements. Continue reading

Semantic relevance and news web archives

Last semester I contributed to a report for my Applied Research in Content Management class wherein each student identified a problem and a solution with, a student-run news site. For my section in the report The NewsHouse Optimization (Fall 2009), I focused on the importance of interlinking news items by relevance rather than chronology, offing it as a solution for drawing readers further into the site’s content.

TheNewsHouse, which runs on a Drupal CMS, uses many tools to interlink stories such as related tags and bylines. When clicked on, however, the site ranks stories that match these tags or keywords by date published rather than strict relevance. Individual stories also do not contain links to news items with similar information but instead to similar bylines, and non-bouncing visitors tend to concentrate within these byline-linked stories according to TheNewsHouse’s analytics.

To ensure the maximum potential viewing of their content, news sites such as must conform to what users have come to expect. As I mentioned in my previous post, Google and other powerful search engines now retrieve news and information that is more and more relevant to the user’s particular interest. To cater to these particular interests, the news site’s management must optimize the linking of stories by associating news items to other news items that are as semantically relevant as possible. Continue reading