Toronto Library New Holdable DVDs With Haskell
The Toronto library doesn’t allow you to put a hold on newer DVDs until several months later. When a new batch of DVDs becomes holdable on the 15th of every month they’ll show up on this list. This weekend I was reminded of the idea to write a program to automatically grab the list and check Rotten Tomatoes for ratings to decide what movies to put on hold. I started writing the program in Haskell and my progress so far is now on my github. What’s missing right now is actually logging in and place a hold.
Use HTTP.Conduit to fetch web pages
I used http-conduit to grab the HTML source from Toronto library website. It’s pretty straightforward. Just install it by cabal install http-conduit
, then use the simpleHttp
function.
import Network.HTTP.Conduit
main :: IO ()
main = do
content <- simpleHttp newMoviesURL
Use regex-tdfa for regular expressions
Whenever I need to use regex to extract data from HTML source code, I used regex-tdfa’s =~ function.
Example:
import qualified Data.ByteString.Lazy as L
import Text.Regex.TDFA ((=~))
updated :: L.ByteString -> L.ByteString
updated s = if length matches > 0
then last $ head matches
else L.empty
where matches = s =~ "<h3[^>]*>Updated (.*)</h3>"
Use Data.Aeson for parsing JSON
For parsing Rotten Tomatoes JSON API data, I used aeson package for that. Install it by cabal install aeson
. Below is how I mapped the JSON result to what I need.
First the declarations:
{-# LANGUAGE DeriveGeneric #-}
import Data.Aeson (FromJSON, ToJSON, decode, encode)
import GHC.Generics (Generic)
data RTCast = RTCast {
name :: L.ByteString
} deriving (Show, Generic)
data RTMovie = RTMovie {
year :: Int
, ratings :: RTRatings
, abridged_cast :: [RTCast]
} deriving (Show, Generic)
data RTInfo = RTInfo {
movies :: [RTMovie]
} deriving (Show, Generic)
instance FromJSON RTInfo
instance FromJSON RTMovie
instance FromJSON RTCast
instance FromJSON RTRatings
then to actually decode:
let rt = decode content :: Maybe RTInfo