The Problem
Given an HTML file, extract all hyperlinks to mp3 files.
In TagSoup
[mp3 | TagOpen "a" atts <- parseTags txt
, ("href",mp3) <- atts
, takeExtension mp3 == ".mp3"]
The code is a list comprehension. The first line says use TagSoup to parse the text, and pick all "a" links. The second line says pick all "href" attributes from the tag you matched. The final line uses the FilePath library to check the extension is mp3.
A Complete Program
The above fragment is all the TagSoup logic, but to match exact the interface to the original code, we can wrap it up as so:
import System.FilePath
import System.Environment
import Text.HTML.TagSoup
main = do
[src] <- getArgs
txt <- readFile src
mapM_ putStrLn [mp3 | TagOpen "a" atts <- parseTags txt
, ("href",mp3) <- atts
, takeExtension mp3 == ".mp3"]
Summary
If you have a desire to quickly get a bit of information out of some XML/HTML page, TagSoup may be the answer. It isn't intended to be a complete HTML framework, but it does nicely optimise fairly common patterns of use.
Very nice indeed. I have one more little HTML/XML-related problem to solve. I'll take the tagsoup-route first, I think it offers enough for that problem as well.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDelete