First Look: Nineteenth Century Collections Online
It’s been nearly ten years since the launch of Eighteenth Century Collections Online [ECCO]. This ambitious project aimed to digitise “every significant English-language and foreign-language title printed in Great Britain during the eighteenth century, along with thousands of important works from the Americas.” The definition of a ‘significant’ text remains open to interpretation, but the contents of the archive are undeniably impressive – in its present form it contains more than 180,000 titles. The unparalleled breadth of its coverage – along with the number of university libraries that took up subscriptions – quickly established it as a key focal point for the researching and teaching of eighteenth-century history.In other words, it’s a tough act to follow.
Enter Nineteenth Century Collections Online [NCCO]. This recently launched project follows in the footsteps of its eighteenth-century predecessor and, in the words of its publisher Gale Cengage, aims to be “the most ambitious scholarly digitisation and publication program ever undertaken.” The archive will contain millions of pages of nineteenth-century books, periodicals, diaries, letters, manuscripts, photographs, government records, pamphlets, and maps. More interestingly, it promises researchers the opportunity to subject these sources to some interesting new forms of qualitative and quantitative analysis. I’ve spent the last few days playing around with a trial version and, whilst it’s too soon to write a full review, I have a few preliminary thoughts on how it’s shaping up.
NCCO contains such an eclectic range of sources that it’s difficult at first to get a handle on all of its contents. In fact, it makes more sense to think of NCCO as a customisable research platform that houses a series of themed archives. By the looks of things, it’ll be possible for libraries to select which archives they want to subscribe to. At present, three archives are available, each of which contains a series of sub-collections:
- Asia and the West: Diplomacy and Cultural Exchange
British Foreign Office correspondence on Japan; dispatches and records from U.S. consuls in various Asian territories; missionary correspondence and journals; periodicals on Asian culture and society.
- British Politics and Society
British Cabinet Papers, 1880-1916; British Labour History Ephemera; British Trade Union History Collection; Civil Disturbance, Chartism and Riots in Nineteenth Century England; Colonial Defence Commission under Lord Carnarvon; Diaries of Sir Frederick Madden; Discontent and Authority, 1820-1840; Transactions of the Manchester Statistic Society I & II; Home Office papers, records, and correspondence; the Police Gazette, 1828-1845; Ordnance Survey Drawings, 1789-1840; Papers and Correspondence of Charles James Fox, 1749-1806; Papers of Sir Robert Peel; Working Class Autobiographies; papers relating to Radicalism, Anti-Radicalism and Reform, 1769-1861; ephemera relating to British social and working conditions, politics, and economics, 1770s-1850s; the papers of John Cam Hobhouse, 1809-1869; rare freethought militant 19th century books; rare radical and labour periodicals; letters relating to the Jack the Ripper killings; books, pamphlets, and periodicals relating to working-class politics.
- European Literature, 1790-1840: The Corvey Collection
A collection of rare English (3,250 works), French (3,658) and German (2,653) Romantic-era writing.
A fourth collection, ‘British Theatre, Music, and Literature: High and Popular Culture’, will be released soon. It promises to contain a range of playbills, scripts, scores, and other pieces of theatrical ephemera. Presumably, if the product is successful, a steady stream of new archives will be announced over the coming years.
It’s hard to review such a disparate collection of items – historians of the period will each find different elements of the archives interesting. In general terms, the main thing to note is that these collections are more curated than many previous archives. Rather than digitise millions of pages of books and newspapers and then throw them together, the collections in NCCO are carefully compiled and well presented. There’s an impressive amount of background information provided for each archive, and brief summaries for most of the sub-collections too. Here’s what you’ll find if you access the Jack the Ripper letters from within the British Politics and Society collection:
There are pros and cons to using such a carefully curated archive. On the plus side, browsing through its contents is more user friendly – it’s much easier to casually meander through the archive when everything is clearly subdivided and signposted. The sub-collections should also make it easier for teachers to set more focused and manageable research tasks for undergraduate students. However, there are downsides to an archive in which all of the documents have been carefully picked out for their historical ‘significance’ and thematic relevance. Namely, the opportunity for new discoveries feels more limited. I’m sure that there are plenty of secrets still to be uncovered in NCCO‘s collections, but browsing through its contents isn’t quite as exciting as exploring the ‘vast terra incognita of print’ that has been opened up in recent years by large-scale newspaper digitisation projects. Each visit to the British Library Newspaper Archive [BLNA] brings with it the promise of exploring virgin territory; it’s likely that many of the articles you’ll encounter haven’t been read since the day they were published. By comparison, NCCO’s collections feel like well-trodden ground. Of course, the ability to search these documents by keyword should lead to new discoveries, connections, and perspectives that weren’t available using conventional archives.
The methodological possibilities of any digital archive are determined in large part by its interface. I’ve always been a fan of Gale’s work in this area – compared to their competitors, their interfaces and search tools are usually faster and more user friendly. The British Library Newspaper Archive isn’t without its design faults, but its interface is quicker than similar databases by ProQuest and far more user-friendly than the disastrous efforts of UK Press Online. The BLNA, like the Times Digital Archive before it, was based on a relatively straightforward html interface which displayed its images as jpegs. This format allowed newspaper articles to load quickly and for users to save or copy them with a quick right-click of the mouse. It was simple, but it worked. In recent years, however, Gale has introduced a more high-tech, flash-based interface. Users of NewsVault and the Illustrated London News Digital Archive will already be familiar with the basic components of this new interface. Here’s how it looks:
It has some nifty new features – you can zoom in and out of an article more quickly (though not as smoothly as in the new British Newspaper Archive), alter brightness and contrast levels, rotate the image, view it in full-screen, and view separate sections of the source simultaneously by using the ‘split-screen’ feature. Newspaper articles are also displayed in their true context, with the rest of the page faded out slightly. The new interface lets you tag items (with both public and private keywords), create personal annotations and bookmarks, and export references to leading citation managers. The site is also compatible with Zotero – a welcome new feature that promises to make the organisation of primary research materials much easier. Unfortunately, the plugin just downloads the metadata for your chosen document and not the document itself.
Which leads us on to one of the problems with NCCO‘s new interface. It’s no longer possible to right click an image and save it as a jpeg. Instead, you have to use the archive’s own ‘download’ button – a feature that only allows you to save the document in pdf format. If you want to copy it over to a PowerPoint presentation, you’ll have to convert this pdf into an image file yourself or, alternatively, capture it as a screenshot. It’s perfectly possible, but it’s a nuisance and represents a regrettable step backwards in terms of speed and efficiency. Fortunately, the quality of the downloads is good – far better than the near-unreadable articles provided via the download feature of the British Newspaper Archive. It’s also possible to download the raw OCR data at a txt file. Gale’s decision to reveal this information is very welcome, but in this instance the BNA‘s solution is more elegant and its user-correction tool is more ambitious.
The other drawback of the flash interface is the space devoted to viewing documents. Put simply, the interface gets in the way. Here’s another screenshot. This time, I’ve shaded the interface red and left the area devoted to the document itself unshaded:
The first thing to note is the enormous amount of unused white space on either side of the archive’s main interface. I appreciate that not everybody has the luxury of using a 24″ widescreen monitor, but it’s a shame for this space to go unused when (as you can see) it’s not possible to see the entirety of the article in the small amount of space allotted to it. Contrast this interface with the old one used by the British Library Newspaper Archive:
Here, the whole screen is used and it’s possible (with a quick flick of my mouse’s scroll wheel) to view an entire newspaper page at once. The new interface certainly looks cleaner and more elegant, but this elegance comes at a cost. The most important thing about the database is the experience of browsing through its documents, but it currently feels like I’m looking at the world through a letterbox. This is particularly irritating when viewing newspapers. The full-screen feature provides a partial solution to this problem, but it’s a nuisance having to fire this up each time you want to view a document. For all of the powerful new search tools at our disposal, digital archives still require us to slog through hundreds (sometimes thousands) of potentially relevant sources before finding the ones that we need. In order to do this kind of research, it’s absolutely essentially to be able to examine and rule out irrelevant documents quickly. If you’ve got to enter full-screen, tweak the zoom level, and scroll around a bit before making these decisions it eats up time – an extra five seconds fiddling with each document soon mounts up over the course of a day’s research. Fortunately, the search interface includes a ‘Keywords in Context’ feature that allows you to preview the appearance of your search terms before loading an item in full – again, however, the BNA‘s solution of providing this contextual information by default (rather than after a mouse-click) is more elegant.
It’s hard to offer constructive solutions to these problems – flash interfaces provide us with some useful new tools, but I’ve yet to be convinced that the loss of speed and the cramped screen is worth it. A larger viewing area and a more fluid browsing experience would help to address some of the drawbacks.
NCCO’s search interface is typically powerful. As usual, it’s possible to select a number of different search types (Keyword, document title, entire document, etc) and limit searches by range of additional properties. Gale’s peculiar decision to draw a distinction between ‘keyword’ and ‘entire document’ searches remains a problem – I’ve lost count of the number of experienced researchers who mistakenly thought that they were searching the entire British Library Newspaper Database only for me to point out that they’d only been searching for ‘keywords’ (the title of the article plus the first few sentences). Gale are alone in this idiosyncratic use of the term ‘keyword’ and their decision to persist with it presents frustrating a obstacle to new users. Aside from this, however, the number of options available through the advanced search interface is excellent.
For digital humanities enthusiasts like me, perhaps the most exciting thing about NCCO is its two new search tools. First up is the Graphing Tool. Put simply, this new tools allows you to enter a keyword, specify a date range, and then track how often it appears in the archive using a line graph. A search for the term ‘America’ is displayed below:
An image like this should be familiar to fans of Google’s ngram viewer – a freely accessible search tool that lets you track the changing frequency of word usage in the Google Books archive. Tracking this kind of information is an imprecise way to map cultural change, but a carefully constructed search can identify broad trends and help researchers to view topics from a new perspective – I make occasional use of them in my PhD thesis and discuss their methodological potential in a forthcoming article for the Journal of Victorian Culture. So, I was undeniably excited when I learned that Gale was introducing a similar tool. Unfortunately, the results are disappointing. The tool is undermined by a fundamental methodological flaw. Put simply, it doesn’t take account of the fluctuating number of documents in the archive. If there are 1 million pages available for one year, but 10 million pages available for the next, it doesn’t take a genius to recognise that most graphs will have an upwards trajectory. Google solves this problem by measuring results as a percentage of the total number of words – that way, it doesn’t matter whether the archive expands or contracts. Unfortunately, NCCO’s graphing tool just displays the raw number of articles and makes no attempt to normalise the data.
Fluctuations in genre are also a problem. If coverage for the 1850s is mostly made up of newspapers, but the 1860s is dominated by political pamphlets, it’s impossible to make valid comparisons. The obvious solution to this problem is to allow users to select their own documents to search. Unfortunately, the graphing tool has been detached from the advanced search interface and has far less flexibility when it comes to constructing a query. It’s possible to restrict searches to four broad content types (manuscripts, maps, monographs, and newspapers), but this isn’t subtle enough to create methodologically sound searches. In sum, the tool is an interesting way to visualise search results but isn’t particularly useful for serious quantitative research. It’s a missed opportunity but, if it could be fixed, NCCO would represent an interesting step forward for digital research methodologies.
The second new feature is the Term Clusters tool. This text-mining tool identifies linguistic patterns and connections between documents. The graph below shows a search for the term ‘humour’:
The inner ring shows the terms that frequently appear within the first 100 words of each item in the search results – so, articles about humour frequently feature the words ‘novel’, ‘good’, and the term ‘Yankee Humour’. The outer ring performs the search again (this time on an inner-ring term) and reveals a new set of connections – so, articles featuring the term ‘Yankee Humour’ are also likely to include the words ‘miss’, ‘doctor’, and ‘heir’. Excerpts from these articles are displayed to the right. I confess that I was a bit confused by this tool at first, but the more I play with it the more impressed I’ve become. It’s a great way to identify previously unseen patterns and connections between material. I’d love to apply this tool (and a modified version of the graphing tool) to the British Library Newspaper Archive – with any luck, we’ll see them integrated into NewsVault sooner rather than later.
So, all in all, there’s some good news and bad news here. The contents of the archive are interesting, eclectic and well curated. There’s plenty here for researchers to get stuck into and the sub-collections will provide some interesting teaching opportunities. The interface has a lot of useful new features, but the move from html to flash continues to result in a clunky and cramped browsing experience. The core search interface is excellent and the introduction of innovative new search tools is exciting. Term clusters are a particularly intriguing new addition to our armoury, but the graphing tool needs a bit more work before its full potential is fulfilled. It’s too early to tell whether NCCO will have the same impact as its eighteenth-century predecessor. It’s entering a far more crowded market place (the sheer volume of nineteenth-century material available in digital archives is already staggering) and doing so at a time when library budgets are contracting. However, there’s enough here to suggest that NCCO may well become the next leading digital platform for nineteenth-century research – if they iron out a few of the problems this wouldn’t be a bad thing.