Despite the long overdue ejection of Alan Hansen and Mark Lawrenson from the Match of the Day couch, I still find myself thinking the same thing every time I tune in. Why do they bother with the punditry?
The current state of Match of the Day punditry places the show uncomfortably between the enlightening discussions of Sky’s Monday Night Football and a pure highlights-only style show. As a result, viewers are deprived of the full benefits of either show and instead tune into a bloated mess. I seriously doubt there’s anyone watching the highlights at home thinking ‘Gee, I sure can’t wait to hear Alan Shearer’s opinion on that penalty call!’ (and not just because the usage of ‘Gee’ hasn’t really recovered since the heady days of 1939)
So instead of continuing to complain, I wrote a small python module to trim out unwanted fluff from a video of Match of the Day, leaving only the highlights of actual football.
After some initial (and unsuccessful) experiments in identifying portions of the video in which highlights were playing, I settled on a method. In portions of the video in which highlights are playing, the top left of the screen displays a scoreboard:
This scoreboard remains in place even during closeups on players, coaches and members of the crowd (something an alternative frame ‘greenness’ method had difficulty with). Meanwhile, during shots of post match discussion in the studio (including analysis of highlights) do not have the scoreboard showing (another thing the aforementioned naive approach failed at).
As a result, we can use this to identify which parts of Match of the Day we want to keep and actually watch. But how do we identify whether the scoreboard is showing?
Thankfully, if you’re lazy like me the scikit-image documentation has a helpful example on corner detection that
you can steal comes in handy here (I’ve also put a demo on github). By taking a rolling average of the number of peaks detected in the top left corner of the screen (where the score box will be during matches) we can see a clear distinction between the match highlights and the analysis segments:
We can then identify the start and end points of different matches by looking at when the number of peaks rises above or below the show-long average*. All thats left to do then is add some finishing touches like fade in/out, made easy by the
moviepy python package.
Somehow, this pretty crude solution comes out really well. The module was developed and ‘trained’ using the last episode of Match of the Day from the 2014/15 season and I’ve since successfully tested it on the last two weeks’ episodes. Despite initially starting out as a silly experiment, I am actually considering using this for the foreseeable future. There’s also the side benefit of time saved: this week’s episode goes from 1 hour and 30 minutes down to 50 minutes after being trimmed. So long as Match of the Day don’t remove the scoreboard from the top left corner, it should continue to work, and you can go from this (plus another few minutes of Shearer & co):
… to this:
(I should also apologise for the horrible quality of the gifs and note that the actual output video quality is much higher)
If you’re interested in trying this out yourself or looking at the method in a bit more detail, I put it up on Github here.
* There are more sophisticated ways to cluster 1d series, but none of the ones that I tried performed significantly better than this simple method. If you know of any method you think is particularly suited to this kind of thing, I’d be really interested to know, so tweet at me or something.