The Wayback Machine - https://web.archive.org/web/20141229185553/http://www.bbc.co.uk:80/blogs/aboutthebbc/entries/108fa5e5-cc28-3ea8-b4a0-129912a74efc

Genome – Radio Times archive now live

Tagged with:

The first edition of the Radio Times

Genome – the BBC project to digitise the Radio Times magazines between 1923 and 2009 is now live. On the site you can find BBC broadcast information – ‘listings’ - extracted from those editions. You can also search individual programme titles, contributors and synopsis information.

Our aim on this project is to curate a comprehensive history of every radio and TV programme ever broadcast by the corporation, and make that available to the public. Our first step has been this digitisation of the BBC radio and TV programme schedules from the Radio Times magazine; the next phase of the project is to incorporate what was actually broadcast, as well as the regional and national variations. It’s one of the most important steps we’re taking to begin unlocking the BBC’s archive, as Genome is the closest we currently have to a comprehensive broadcast history of the BBC.

We’re really pleased to get the site live, not least because so many of you have been asking “when”, “how soon” and telling us “how useful it would be”. The challenges in making available the 4.42 million programme records so far have been significant - you can read about some of the recent ones on the Internet blog.

We need your help too though. We’re looking to you to help us to clean up the data. The scanning process - known as ‘Optical Character Recognition’ - has produced plenty of errors: punctuation in the wrong places, spaces where there shouldn’t be any or no spaces where there should, as well as fundamental misunderstandings about who did what.

We’ve made it possible for you to submit an edit to us, as you use the site. We’ll validate your suggested changes and publish the ones which are approved.

We’ve also included a ‘Tell Us More’ form, at the bottom of each programme listing, so we can tap into the collective memory, insight and knowledge of our users, making use of the wealth of experience out there about our programmes, something we’d like to capture.

We also know that the schedule changed considerably on occasion, because of events in the real world and we need that information too.

Additionally, during the process of building Genome, we’ve identified a few ‘chunks’ of data that are missing from the database, but due to the way in which OCR works, didn’t get picked up in the original scans. So, we will be adding this in.

The Radio Times has been published with regional variations since 1926. The magazines we scanned and the data sets which have been included in Genome are not exhaustive, rather they represent the ones which we could access and which covered the greatest areas and variations. In the future, we will look into the implications of attempting provide a more complete set of regional data.

We won’t be able to reflect what you send us straight away, but as we build on BBC’s Genome, it will come in to its own.

Now that we have published the planned broadcast schedule, our next step is to match the records in our archive catalogue (the programmes that we have a copy of in our physical archives) with the Genome programme listings. This helps us identify what proportion of the broadcasts exist in a potentially ‘playable’ form, and highlights the gaps in our archive.

It is highly likely that somewhere out there, in lofts, sheds and basements across the world, many of these ‘missing’ programmes will have been recorded and kept by generations of TV and radio fans. So we’re hoping to use Genome as a way of bringing copies of those lost programmes back in to the BBC archives too.

But, even if we don’t have an actual copy of the programme, we’ll also look to publish related items in our archives, such as scripts, photographs and associated paper-work. We’re looking in to the logistics of making some of these items available via Genome. Clearly, this will in some cases be a long and painstaking task. The BBC’s various archives contain millions of items spread over 23 archive centres across the UK, most of them in analogue form. It’s a big job, one we’re looking forward to reporting back on in the future.

What happens after 2009 when the Genome data “stops”? Well the information held at www.bbc.co.uk/programmes starts in 2007 (the birth of the iPlayer) and as the Genome data is improved and corrected (by you!), we expect to start ‘backfilling’ the bbc.co.uk/programme pages with the Genome data.

Hilary Bishop is Editor, Archive Development, BBC Archive, and Jake Berger is Programme Manager, Digital Public Space

Tagged with:

Comments

This entry is now closed for comments.

  • Comment number 2. Posted by John

    on 31 Oct 2014 11:23

    Are there any plans to use the Corporation's massive internal archive database, INFAX?
    The level of detail available on that system would complement and augment the RT Genome
    project. As someone who was closely involved in the making and structure of INFAX - now retired -
    I must confess to have an interest!

    • This entry is now closed for comments. Number of positive ratings for comment 2: 0
    • This entry is now closed for comments. Number of negative ratings for comment 2: 0
    Loading…
  • Comment number 1. Posted by Titipounamu

    on 17 Oct 2014 20:14

    Can you tell me whether the data I clean will be available for open reuse by everyone? I contribute to a few crowd sourcing projects and I don't want to spend my precious time on those that stay behind a paywall or aren't going to be open. The BBC doesn't have to have the data openly available now - I'd be happy with just a public commitment to openness and a estimated timeframe.

    • This entry is now closed for comments. Number of positive ratings for comment 1: 0
    • This entry is now closed for comments. Number of negative ratings for comment 1: 0
    Loading…

More Posts

Previous