Wednesday 9 January 2019

FindMyPast trials new OCR method for newspapers

In the late afternoon of 21 December - a day when many, if not most of us, cleared our desks for the Christmas and New Year break ‐ FindMyPast issued a press release announcing the trial of a 'revolutionary' new method for searching OCR (optical character recognition) text. Why they chose such a date and time to issue it, I've no idea, but it comes as no surprise that the news made so little splash.
Click image for The Essex Newsman collection
Nearly three weeks late, then, here is the news:

FindMyPast's has tested a new method of extracting first and last names from printed text identified by OCR on part of its holding of The Essex Newsman, a paper first published in Chelmsford in 1870. The company says that 1.2million names published in editions dating from 31 October 1881 to 6 November 1943 can now be searched with greater accuracy and efficiency than under the standard OCR method.

Researchers who keep a family tree on FindMyPast will also start to receive hints against The Essex Newsman articles containing names that match those in their stored tree.

While this first iteration of the new technology is focused on accurately identifying names, Findmypast plans to extract other details such as locations, events and relationships in 2019. This is expected to further improve the search experience and increase the number of hints generated by the collection.

The Essex Newsman is already available to search in FindMyPast's regular British Newspaper (BNA) Collection, but the editions subjected to the new search method are available in the general A–Z list of the database's holding as a distinct record set (see image).