wiki:shepherd_logic

Shepherd

Shepherd is an attempt to reconcile many different tv_grab_au scripts and make one cohesive reliable data set.

It works by calling a series of scripts that grab data from a large variety of sources, and then analysing the resulting XML data sets and determining which of the many is the most reliable. Postprocessors are also used to augment the data sets with additional information (e.g. movie information from  http://www.imdb.com, HDTV programming from  http://www.dba.org.au etc.).

When switching between data sources, Shepherd's reconciler also tries to ensure that programme names are consistent. e.g. if you're used to recording a programme called "House" yet a different data source names it as "House, M.D.", Shepherd is smart enough to remember the original name and substitute it. No configuration is necessary to enable this; it happens automatically.

Shepherd is designed to be future proof, never requiring manual intervention once initially installed and configured. Shepherd will automatically update itself with fixes, enhancements and additional plugin components as and when they become available.

How it works

Shepherd is made up of multiple scripts each with their own function:

  • shepherd (tv_grab_au) (source:trunk/applications/shepherd) manages the process of checking for updates, parsing options, maintaining (centralized) logging, centralized configuration and the calling individual grabbers, reconcilers and postprocessors:
    • it decides what grabbers to call (and in what order),
    • whether additional grabbers need to be called to fill in any missing data,
    • it remembers what grabbers it has used in the past, and whether they can fulfill data requirements with minimal use of resources (it knows if a grabber uses caches data and can therefore verify data at relatively minimal cost)
    • once there is sufficient guide data it will call a reconciler to distill any overlapping data down to a consistent form (e.g. multiple grabbers may have provided data for the same channel on the same day)
    • postprocessors will then be called to augment the guide data in some manner.
    • at each stage (grabbing, reconciling, postprocessing), the  XMLTV data is inspected to ensure it is still valid (and complete). Any corrupt or bogus data will be discarded ensuring failure of individual components won't result in bad data.
    • Analysis and processing of data in the grabbing and reconciling stages can be tuned via various policy directives.
  • multiple reconciler scripts (source:trunk/reconcilers/) are used to reconciler overlapping data. Typically only one reconciler is used at a time, although multiple reconcilers are supported for handling an individual reconciler failure. If no reconcilers seem to work, shepherd falls back to  basic concatenation of grabber data.
  • multiple postprocessor scripts (source:trunk/postprocessors/) are used to postprocess the output data. Postprocessors typically augment data in some way - e.g. an  Internet Movie Database postprocessor may augment movie data by adding plot / cast / credits / ratings / trivia data.

While shepherd and its associated scripts and script interactions may be somewhat complex, running shepherd should not be that difficult. Initial installation requires little more than downloading one script, installing a handful of dependencies and running the script. All scripts are then installed in one common tree which Shepherd will then maintain automatically. Configuration? of global items such as channels and XMLTV IDs should only be necessary once.

Attachments