Posts Tagged ‘ XML ’

Closing Remarks

As this project comes to a close, what I’ve learned about organizing and presenting news items using a common language like XML has reinforced what I discovered last semester about online content. Most of these discoveries distill down to this bit of advice:

Media companies, don’t EVER trash your content.

At one point in my career, I contributed an article to the publication I was working for at the time. Nothing fancy, just an interview with an international artist. The publication killed the story, and I never saw it again. I’d publish the story online myself, but the article still legally belongs to the publication. After all, I would not have scored the exclusive interview I needed for it without their help. I don’t think the publication should publish it or post it online to protect my feelings, though that might become a concern with a bigger-egoed writers. I don’t care either way. This article, however, took me weeks of research and use of the publication’s resources to write, and because it didn’t fit the format they were looking for—and frankly because another more seasoned writer had a better article on a similar story—the publication scrapped it. Continue reading

Advertisements

Finding the best way to query news items

The XSLT code for the site is done! Now I am focusing more on different ways the user can search for different news items. As mentioned in a previous post, I’ve been looking at using a multi-level JavaScript drop-down menu, a search box or a combination of the two.

An attempt to present all the countries and cities that appear in the scraped RSS XML makes for a drop-down or scroll-down menu that is way too long to be practical. Also, instead of using a menu for sorting stories by format (text, video, audio, et. al.) I’d rather use the simple icon key explained in that same post. Thus, menus for my project are a no go for now.

Another minor issue I’d like to work out is how to present overlapping file types. For example, I want to be able to present the link to a story that has text with an embedded video as having video and text. So far I’m thinking of using another so-called “code scraper” such as HtmlCleaner, which turns HTML code into plain text. Continue reading

Making it work

Each format may contain one or more of these iconsI’ve been plugging away at the XSL loop for a solid week now. I think I’ve nearly exhausted all that XSLT can do for this project, which I am mainly using to call information from a massive database when certain conditions are met. For example, I was able to make it so that when “Israel” is selected as the location, all of the stories involving Israel will show up with the news item’s headline (wrapped in the story’s permalink), over the publication date, the lede and icons corresponding to the news item’s format (text, audio, video and commentary).

Although I include the publication date as part of each item’s presentation, it will not bear any impact the site’s organizing principle. Each item’s story location(s) will also show up as a dot on a world map. Hovering over either a dot or its corresponding news item’s box brings focus to both the news item and the dot (i.e. by fading out all other story items and dots). Continue reading

Testing…

I’ve been hard at work on coding the XML framework, specifically making the processor that will generate what the final site will look like, but let me just update on my progress.

Even as an XML newbie, I’ve been able to streamline the workflow I highlighted two posts ago. I’ve opted for a massive, well-constructed XML file instead of a folder structure. The XSLT stylesheet I’m coding now will draw the necessary information from the XML file and place each element in a uniquely labeled div, which I will later style with CSS.

To eliminate any duplicate news items, I’ve turned to the admirable open-source work found at EXSLT.org. This community has created extensions that help simplify overly complicated XSL commands, in this case reducing a complicated template command to the line set:distinct().

I still intend to make a database of locations and file types, but I think coding a discrete topic database by hand. The more I think about it, the more I want the topic navigation to be user-defined (i.e. with a search bar) rather than solely defined by me, the developer (i.e. with a drop-down menu with a discrete set of terms). Continue reading

Useful site for automated semantic tagging

My professor Vin Crosbie, who runs the site Digital Deliverance, turned me onto this open-source semantic solutions by the good people at OpenCalais. OpenCalais’s collaborators have created several tools, from a simple standalone API to tagging applications for Drupal and WordPress. This service reads textual and structural information from three different types of files (txt, HTML, and XML), renders meta-information on those files and offers possible topics based on keyword matches from OpenCalais’s massive database.

I’m considering OpenCalais as a sorting tool for Parallactic Drift. Believe me, hard coding keywords and conditionals to match specific topics is difficult, but it allows me to create a script that sorts news items down to specific events. While I would lose this level of specificity with OpenCalais, the API seems to have evolved from its initial functions to include more advanced vector calculations for reading word placement within text. The API even diagrams sentences’ grammatical structure to figure out what a sentence says! Just type a word into their demo viewer. Continue reading

Establishing a more usable online news archive

I apologize for getting too technical in my previous posts. I mainly went on about XML, XSLT and other coding languages to outline for myself what I intend to do for the back-end of this project. In my last post, I spoke about the limitations of my method of gathering RSS feeds, sorting them into descriptively named folders and displaying the information in a more organized and aesthetically pleasing way. The limitations are basically that I have no starting point for sorting this information. There is no data bank that I know of where I can draw a bunch of key words from an article’s headline and lede (such as “Nigeria” or “China”) and use those to put in individual folders with those particular names.

My professors assure me that this mechanism is sort of innate to the coding languages I will be using, and that I can create a robust archive with simple conditional (if this, then that) commands. My limitations lie in my programming capabilities. I know I can’t create an automated system simply because I can’t pick up a complex programming language like Python and learn it within a week or even a month. This same principle goes for making a heuristic, or self-teaching, semantic aggregator that makes associations between words such as “West Bank” and “Palestinians.” This kind of programming is for the trained professionals and enthusiasts, not for code newbies like me. Continue reading

Semantic news aggregation before it was cool

I’ve been looking for this site for months now. We Feel Fine is a part-art design, part-sociological, part-semantic web experiment by visual artist/developer Jonathan Harris and computational scientist Sep Kamvar.

We Feel scours the ‘net and in seconds provides the most-used terms, phrases and sentences that start with the words “I feel.” The different interactive ways they use to explore this simple concept and present the data are downright beautiful.

Oh and look mah, no Flash! Instead of Flash, the site employs database-querying PHPs in a custom-made applet. The database is instantly populated with information that link to the original source. Harris uses a similar applet in another project he made for DayLife called Universe: Revealing our Modern Mythology.  This type of development allows for much more semantically labeled and organized information to than Flash tools. Kamvar himself has worked in the field of retrieving information from poorly meta-labeled information, so this practice makes sense to both developers. Continue reading