Finding the best way to query news items
An attempt to present all the countries and cities that appear in the scraped RSS XML makes for a drop-down or scroll-down menu that is way too long to be practical. Also, instead of using a menu for sorting stories by format (text, video, audio, et. al.) I’d rather use the simple icon key explained in that same post. Thus, menus for my project are a no go for now.
Another minor issue I’d like to work out is how to present overlapping file types. For example, I want to be able to present the link to a story that has text with an embedded video as having video and text. So far I’m thinking of using another so-called “code scraper” such as HtmlCleaner, which turns HTML code into plain text.
The advantage of this kind of service is HUGE since HTML is often neither consistently coded nor well-formed enough to look for meta-data in a structural way (unless you’re a Google crawler). Reducing code to text gets rid of that factor and makes literal expressions much easier to find. For that video and text example, I can just ask to look for beginning of the literal embed code for a YouTube video (i.e. ’embed src=\&qt\http://www.youtube.com/v’).