Operational difficulties of querying semantic information

Folder Structure and Workflow for Parallactic DriftIn order to implement an efficient system of organizing news items, content providers must label information in a common way within each platform, be it RSS, blogs or web sites. Standards in fact do exist for XML tagging for news sites. Several web consortia exist (including W3C and NewsML) to ensure that a single format is followed, and that information flows freely between publications and reaches more users.

Perhaps it’s because this principle of “free-flowing information” seems in itself counter-intuitive to how traditional publications share news items, but an inconsistent style stifles any RDF standard across different publications. Even if designs remain idiosyncratic, as they should, the semantic tagging of information, in HTML and XML should not deviate too much from an agreed-upon standard.

For my project, I plan on using RSS to populate my sample XML framework. This weekend I realized that the query module and raw XML a comprehensive server-side approach that uses Python or another similar language with tools that can grab new RSS feeds on a scheduled basis, which an XSL batch processor can then use to populate a doctree according to the XML tags from those files.

I don’t think I can learn Python in a few weeks, otherwise this feature would definitely be included within my framework by the end of this semester. The point of the project is to show how news companies can optimize their archives. Ways to integrate latest content varies widely within each online publication’s respective CMS.

I do hope, however, to build a versatile enough news XML framework that will work for a number of situations, which may include a relational database of a news organization’s CMS or a social networking site. Since I can’t set up a populating system quite yet, I must focus on making the XSL batch processing script work with a set number of XML files from four different news sites. This script will break up each news item from RSS files into an XML file within a semantically labeled folder structure, deleting or overwriting duplicate news item.

The XSLT will then transform these files into properly tagged HTML code, on which I can then use CSS elements to show different locations on a map and the linked information in modal boxes. With the XML files from these few publications’ RSS feeds, I hope to show how semantic functions can and neatly present them by geographic, topical and file-type relevance.

  1. March 2nd, 2010
  2. March 18th, 2010

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: