Establishing a more usable online news archive

I apologize for getting too technical in my previous posts. I mainly went on about XML, XSLT and other coding languages to outline for myself what I intend to do for the back-end of this project. In my last post, I spoke about the limitations of my method of gathering RSS feeds, sorting them into descriptively named folders and displaying the information in a more organized and aesthetically pleasing way. The limitations are basically that I have no starting point for sorting this information. There is no data bank that I know of where I can draw a bunch of key words from an article’s headline and lede (such as “Nigeria” or “China”) and use those to put in individual folders with those particular names.

My professors assure me that this mechanism is sort of innate to the coding languages I will be using, and that I can create a robust archive with simple conditional (if this, then that) commands. My limitations lie in my programming capabilities. I know I can’t create an automated system simply because I can’t pick up a complex programming language like Python and learn it within a week or even a month. This same principle goes for making a heuristic, or self-teaching, semantic aggregator that makes associations between words such as “West Bank” and “Palestinians.” This kind of programming is for the trained professionals and enthusiasts, not for code newbies like me.

What I’m doing instead will take a lot of work, however. Since I plan on working with a static number of news items, I plan on hard coding the key terms present in these stories into the scripts I plan will use to parse news items and sort them into the different folders. While I start cutting my teeth with the Java, XML and XSLT code for one of my other classes, I need to start gathering this information bit by bit now so I can hard code into the script later. This method may seem crude and needlessly laborious for the average programmer, but it reflects both my expertise and the purpose of my project (of taking a static set of archived information and making it more interactive). Besides, real programmers can always improve upon this model if it were ever implemented for a single or multiple publications, and I want to encourage this behavior.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: