Jun 6 2012

First Look: Nineteenth Century Collections Online

It’s been nearly ten years since the launch of Eighteenth Century Collections Online [ECCO]. This ambitious project aimed to digitise “every significant English-language and foreign-language title printed in Great Britain during the  eighteenth century, along with thousands of important works from the Americas.” The definition of a ‘significant’ text remains open to interpretation, but the contents of the archive are undeniably impressive – in its present form it contains more than 180,000 titles. The unparalleled breadth of its coverage – along with the number of university libraries that took up subscriptions – quickly established it as a key focal point for the researching and teaching of eighteenth-century history.In other words, it’s a tough act to follow.

Enter Nineteenth Century Collections Online [NCCO]. This recently launched project follows in the footsteps of its eighteenth-century predecessor and, in the words of its publisher Gale Cengage, aims to be “the most ambitious scholarly digitisation and publication program ever undertaken.” The archive will contain millions of pages of nineteenth-century books, periodicals, diaries, letters, manuscripts, photographs, government records, pamphlets, and maps. More interestingly, it promises researchers the opportunity to subject these sources to some interesting new forms of qualitative and quantitative analysis. I’ve spent the last few days playing around with a trial version and, whilst it’s too soon to write a full review, I have a few preliminary thoughts on how it’s shaping up.

Content

 

NCCO contains such an eclectic range of sources that it’s difficult at first to get a handle on all of its contents. In fact, it makes more sense to think of NCCO as a customisable research platform that houses a series of themed archives. By the looks of things, it’ll be possible for libraries to select which archives they want to subscribe to.  At present, three  archives are available, each of which contains a series of sub-collections:

  1. Asia and the West: Diplomacy and Cultural Exchange
    British Foreign Office correspondence on Japan; dispatches and records from U.S. consuls in various Asian territories; missionary correspondence and journals; periodicals on Asian culture and society.
  2. British Politics and Society
    British Cabinet Papers, 1880-1916; British Labour History Ephemera; British Trade Union History Collection; Civil Disturbance, Chartism and Riots in Nineteenth Century England; Colonial Defence Commission under Lord Carnarvon; Diaries of Sir Frederick Madden; Discontent and Authority, 1820-1840; Transactions of the Manchester Statistic Society I & II; Home Office papers, records, and correspondence; the Police Gazette, 1828-1845; Ordnance Survey Drawings, 1789-1840; Papers and Correspondence of Charles James Fox, 1749-1806; Papers of Sir Robert Peel; Working Class Autobiographies; papers relating to Radicalism, Anti-Radicalism and Reform, 1769-1861; ephemera relating to British social and working conditions, politics, and economics, 1770s-1850s; the papers of John Cam Hobhouse, 1809-1869; rare freethought militant 19th century books; rare radical and labour periodicals; letters relating to the Jack the Ripper killings; books, pamphlets, and periodicals relating to working-class politics.
  3. European Literature, 1790-1840: The Corvey Collection
    A  collection of rare English (3,250 works), French (3,658) and German (2,653) Romantic-era writing.

A fourth collection, ‘British Theatre, Music, and Literature: High and Popular Culture’, will be released soon. It promises to contain a range of playbills, scripts, scores, and other pieces of theatrical ephemera. Presumably, if the product is successful, a steady stream of new archives will be announced over the coming years.

It’s hard to review such a disparate collection of items – historians of the period will each find different elements of the archives interesting. In general terms, the main thing to note is that these collections are more curated than many previous archives. Rather than digitise millions of pages of books and newspapers and then throw them together, the collections in NCCO are carefully compiled and well presented. There’s an impressive amount of background information provided for each archive, and brief summaries for most of the sub-collections too. Here’s what you’ll find if you access the Jack the Ripper letters from within the British Politics and Society collection:

There are pros and cons to using such a carefully curated archive. On the plus side, browsing through its contents is more user friendly – it’s much easier to casually meander through the archive when everything is clearly subdivided and signposted. The sub-collections should also make it easier for teachers to set more focused and manageable research tasks for undergraduate students. However, there are downsides to an archive in which all of the documents have been carefully picked out for their historical ‘significance’ and thematic relevance. Namely, the opportunity for new discoveries feels more limited. I’m sure that there are plenty of secrets still to be uncovered in NCCO‘s collections, but browsing through its contents isn’t quite as exciting as exploring the ‘vast terra incognita of print’ that has been opened up in recent years by large-scale newspaper digitisation projects. Each visit to the British Library Newspaper Archive [BLNA] brings with it the promise of exploring virgin territory; it’s likely that many of the articles you’ll encounter haven’t been read since the day they were published. By comparison, NCCO’s collections feel like well-trodden ground. Of course, the ability to search these documents by keyword should lead to new discoveries, connections, and perspectives that weren’t available using conventional archives.

Interface

The methodological possibilities of any digital archive are determined in large part by its interface. I’ve always been a fan of Gale’s work in this area – compared to their competitors, their interfaces and search tools are usually faster and more user friendly. The British Library Newspaper Archive isn’t without its design faults, but its interface is quicker than similar databases by ProQuest  and far more user-friendly than the disastrous efforts of UK Press Online. The BLNA, like the Times Digital Archive before it, was based on a relatively straightforward html interface which displayed its images as jpegs. This format allowed newspaper articles to load quickly and for users to save or copy them with a quick right-click of the mouse. It was simple, but it worked. In recent years, however, Gale has introduced a more high-tech, flash-based interface. Users of NewsVault and the Illustrated London News Digital Archive will already be familiar with the basic components of this new interface. Here’s how it looks:

 

It has some nifty new features – you can zoom in and out of an article more quickly (though not as smoothly as in the new British Newspaper Archive), alter brightness and contrast levels, rotate the image, view it in full-screen, and view separate sections of the source simultaneously by using the ‘split-screen’ feature. Newspaper articles are also displayed in their true context, with the rest of the page faded out slightly. The new interface lets you tag items (with both public and private keywords), create personal annotations and bookmarks, and export references to leading citation managers. The site is also compatible with Zotero – a welcome new feature that promises to make the organisation of primary research materials much easier. Unfortunately, the plugin just downloads the metadata for your chosen document and not the document itself.

Which leads us on to one of the problems with NCCO‘s new interface. It’s no longer possible to right click an image and save it as a jpeg. Instead, you have to use the archive’s own ‘download’ button – a feature that only allows you to save the document in pdf format. If you want to copy it over to a PowerPoint presentation, you’ll have to convert this pdf into an image file yourself or, alternatively, capture it as a screenshot. It’s perfectly possible, but it’s a nuisance and represents a regrettable step backwards in terms of speed and efficiency. Fortunately, the quality of the downloads is good – far better than the near-unreadable articles provided via the download feature of the British Newspaper Archive. It’s also possible to download the raw OCR data at a txt file. Gale’s decision to reveal this information is very welcome, but in this instance the BNA‘s solution is more elegant and its user-correction tool is more ambitious.

The other drawback of the flash interface is the space devoted to viewing documents. Put simply, the interface gets in the way. Here’s another screenshot. This time, I’ve shaded the interface red and left the area devoted to the document itself unshaded:

The first thing to note is the enormous amount of unused white space on either side of the archive’s main interface. I appreciate that not everybody has the luxury of using a 24″ widescreen monitor, but it’s a shame for this space to go unused when (as you can see) it’s not possible to see the entirety of the article in the small amount of space allotted to it. Contrast this interface with the old one used by the British Library Newspaper Archive:

Here, the whole screen is used and it’s possible (with a quick flick of my mouse’s scroll wheel) to view an entire newspaper page at once. The new interface certainly looks cleaner and more elegant, but this elegance comes at a cost. The most important thing about the database is the experience of browsing through its documents, but it currently feels like I’m looking at the world through a letterbox. This is particularly irritating when viewing newspapers. The full-screen feature provides a partial solution to this problem, but it’s a nuisance having to fire this up each time you want to view a document. For all of the powerful new search tools at our disposal, digital archives still require us to slog through hundreds (sometimes thousands) of potentially relevant sources before finding the ones that we need. In order to do this kind of research, it’s absolutely essentially to be able to examine and rule out irrelevant documents quickly. If you’ve got to enter full-screen, tweak the zoom level, and scroll around a bit before making these decisions it eats up time – an extra five seconds fiddling with each document soon mounts up over the course of a day’s research. Fortunately, the search interface includes a ‘Keywords in Context’ feature that allows you to preview the appearance of your search terms before loading an item in full – again, however, the BNA‘s solution of providing this contextual information by default (rather than after a mouse-click) is more elegant.

It’s hard to offer constructive solutions to these problems – flash interfaces provide us with some useful new tools, but I’ve yet to be convinced that the loss of speed and the cramped screen is worth it. A larger viewing area and a more fluid browsing experience would help to address some of the drawbacks.

 

Search Tools

NCCO’s search interface is typically powerful. As usual, it’s possible to select a number of different search types (Keyword, document title, entire document, etc) and limit searches by range of additional properties. Gale’s peculiar decision to draw a distinction between ‘keyword’ and ‘entire document’ searches remains a problem – I’ve lost count of the number of experienced researchers who mistakenly thought that they were searching the entire British Library Newspaper Database only for me to point out that they’d only been searching for ‘keywords’ (the title of the article plus the first few sentences). Gale are alone in this idiosyncratic use of the term ‘keyword’ and their decision to persist with it presents frustrating a obstacle to new users. Aside from this, however, the number of options available through the advanced search interface is excellent.

For digital humanities enthusiasts like me, perhaps the most exciting thing about NCCO is its two new search tools. First up is the Graphing Tool. Put simply, this new tools allows you to enter a keyword, specify a date range, and then track how often it appears in the archive using a line graph. A search for the term ‘America’ is displayed below:

An image like this should be familiar to fans of Google’s ngram viewer – a freely accessible search tool that lets you track the changing frequency of word usage in the Google Books archive. Tracking this kind of information is an imprecise way to map cultural change, but a carefully constructed search can identify broad trends and help researchers to view topics from a new perspective – I make occasional use of them in my PhD thesis and discuss their methodological potential in a forthcoming article for the Journal of Victorian Culture. So, I was undeniably excited when I learned that Gale was introducing a similar tool. Unfortunately, the results are disappointing. The tool is undermined by a fundamental methodological flaw. Put simply, it doesn’t take account of the fluctuating number of documents in the archive. If there are 1 million pages available for one year, but 10 million pages available for the next, it doesn’t take a genius to recognise that most graphs will have an upwards trajectory. Google solves this problem by measuring results as a percentage of the total number of words – that way, it doesn’t matter whether the archive expands or contracts. Unfortunately, NCCO’s graphing tool just displays the raw number of articles and makes no attempt to normalise the data.

Fluctuations in genre are also a problem. If coverage for the 1850s is mostly made up of newspapers, but the 1860s is dominated by political pamphlets, it’s impossible to make valid comparisons. The obvious solution to this problem is to allow users to select their own documents to search. Unfortunately, the graphing tool has been detached from the advanced search interface and has far less flexibility when it comes to constructing a query. It’s possible to restrict searches to four broad content types (manuscripts, maps, monographs, and newspapers), but this isn’t subtle enough to create methodologically sound searches. In sum, the tool is an interesting way to visualise search results but isn’t particularly useful for serious quantitative research. It’s a missed opportunity but, if it could be fixed, NCCO would represent an interesting step forward for digital research methodologies.

The second new feature is the Term Clusters tool. This text-mining tool identifies linguistic patterns and connections between documents. The graph below shows a search for the term ‘humour’:

The inner ring shows the terms that frequently appear within the first 100 words of each item in the search results – so, articles about humour frequently feature the words ‘novel’, ‘good’, and the term ‘Yankee Humour’. The outer ring performs the search again (this time on an inner-ring term) and reveals a new set of connections – so, articles featuring the term ‘Yankee Humour’ are also likely to include the words ‘miss’, ‘doctor’, and ‘heir’. Excerpts from these articles are displayed to the right. I confess that I was a bit confused by this tool at first, but the more I play with it the more impressed I’ve become. It’s a great way to identify previously unseen patterns and connections between material. I’d love to apply this tool (and a modified version of the graphing tool) to the British Library Newspaper Archive – with any luck, we’ll see them integrated into NewsVault sooner rather than later.

 

Conclusions

So, all in all, there’s some good news and bad news here. The contents of the archive are interesting, eclectic and well curated. There’s plenty here for researchers to get stuck into and the sub-collections will provide some interesting teaching opportunities. The interface has a lot of useful new features, but the move from html to flash continues to result in a clunky and cramped browsing experience. The core search interface is excellent and the introduction of innovative new search tools is exciting. Term clusters are a particularly intriguing new addition to our armoury, but the graphing tool needs a bit more work before its full potential is fulfilled. It’s too early to tell whether NCCO will have the same impact as its eighteenth-century predecessor. It’s entering a far more crowded market place (the sheer volume of nineteenth-century material available in digital archives is already staggering) and doing so at a time when library budgets are contracting. However, there’s enough here to suggest that NCCO may well become the next leading digital platform for nineteenth-century research – if they iron out a few of the problems this wouldn’t be a bad thing.


Jun 2 2012

It’s alive!

In Mary Shelley’s version of the story, Victor Frankenstein locks himself in a laboratory for two years in order to pursue his scientific research. He is driven by an insatiable appetite for discovery, but when he finally witnesses the results of his labours he is filled with an overpowering sense of dread:

“I had desired it with an ardour that far exceeded moderation; but now that I had finished, the beauty of the dream vanished, and breathless horror and disgust filled my heart. Unable to endure the aspect of the being I had created, I rushed out of the room…”

I was reminded of this passage a few weeks ago on the morning of my PhD viva. It had been more than a month since I had last read my thesis, but in preparation for the big event I plucked up the courage to have a final look. It was a mistake. Every page seemed to bring a fresh disaster; a grammatical error here, a missing footnote there, and so many sentences that I longed to rewrite. Three and a half years earlier I had set out to create something beautiful. Now, as I looked upon it with fresh eyes, I saw only a monster; a hideous mess of typos, disjointed ideas, gaping holes, and embarrassing errors. I wanted to destroy it; to hide my shame from family, friends and colleagues. But it was too late. I had already branded the monster with my name and released it into the world. Soon, I thought, the villagers would come with their pitchforks and torches and drive me out of academia for good.

As it turns out, things went a bit better than I expected. My examiners were extremely positive about the thesis and only identified a few minor typographical errors that needed to be fixed. We had some stimulating conversations about how the project might be developed into a monograph and, before I knew it, it was all over.  I polished off the corrections in a few hours and, last Tuesday, I submitted the final bound version of the thesis. I’m done. I’d ask you to call me Dr. Bob, but it makes me sound like a talk show host with a degree in ‘Relationship Science’  from an online university.

When I collected the final, hardbound version of the thesis from the printers, I had a more positive Frankenstein moment. This time, I felt more like Colin Clive’s demented scientist from the 1931 film who greets the success of his experiment in a slightly different fashion:

Swap the noise of crashing thunder for the sound of a laser printer, and you’ve pretty much got the scenario that was playing out in my head. Unfortunately, no friends were on hand to hold me back as I proclaimed myself a god, so I thanked the man behind the desk and quietly shuffled out.

The euphoria of that moment – of seeing my creation materialise – lasted for a few giddy days, but has now passed. It was all a useful lesson in the importance of perspective. I had spent hours agonising over tiny, insignificant defects, whilst remaining blind to the bigger picture. Like a lot of writers, I had a distorted image of my own work and found it difficult to see the positives without somebody else pointing them out. I still feel anxious about releasing my creation into the world, but now as I look upon it with less anxious eyes I suspect that it’s more likely to be met with indifference than abject horror.

If you want to test this theory yourself, copies of the thesis should be in Manchester University library and on EThOS soon. If you’d like to read a digital version, send me a tweet/email and I’ll forward a pdf. For those of you who aren’t familiar with the project, here’s the abstract:

 

Bob Nicholson, ‘Looming Large: America and the Victorian Press, 1865-1902′, (2012).

Widespread popular fascination with America, and an appreciation of American culture, was not introduced by Hollywood cinema during the early decades of the 20th century, but emerged during the late-Victorian period and was driven by the popular press. By the 1880s, newspaper audiences throughout the country were consuming fragments of American life and culture on an almost daily basis. Under the impulses of the so-called ‘new journalism’, representations of America appeared regularly within an eclectic range of journalistic genres, including serialised fiction, news reports, editorials, humour columns, tit-bits, and travelogues. Forms of American popular culture – such as newspaper gags – circulated throughout Britain and enjoyed a sustained presence in bestselling papers. These imported texts also acted as vessels for the importation of other elements of American culture such as the country’s distinctive slang and dialects.

This thesis argues that the late-Victorian popular press acted as the first major ‘contact zone’ between America and the British public. Chapter One tracks the growing presence of America in the Victorian press. In particular, it highlights how the expansion of the popular press, the widespread adoption of ‘scissors-and-paste’ journalism, the development of transatlantic communications networks and technologies, and a growing curiosity about life in America combined to facilitate new forms of Anglo-American cultural exchange. Chapter Two explores how the press shaped British encounters with American modernity and created a pervasive sense of a coming ‘American future’. Chapter Three focuses on the importation, circulation, and reception of American newspaper humour. Finally, Chapter Four unpacks the role played by the press in the importation, circulation, and assimilation of American slang.

It makes an original contribution to a number of academic disciplines and debates. Firstly, it challenges the established chronology of Anglo-American history; America gained a significant foothold in British popular culture long before the twentieth century. Moreover, this was not a result of a forcible American ‘invasion’ but a form of voluntary transatlantic exchange driven by the tastes and desires of British newspaper readers. Secondly, it argues that America’s presence in late-Victorian popular culture has been underestimated by historians who have focused instead on domestically produced culture, engagements with Western Europe, and the cultural dimensions of Empire. Whilst the full extent of America’s significance cannot be mapped out in one study, this thesis establishes the extent of America’s cultural presence and makes the case for its insertion into future Victorian Studies scholarship. Thirdly, this thesis contributes to the growing field of press history. It maps out connections between British and American newspapers, exploring how the press served to move information between the old world and the new. Finally, this project acts as an early example of born-digital scholarship; a study conceived in response to the development of digital archives. As such, it contributes to discussions on digital methodologies and debates within the field of Digital Humanities. In particular, it demonstrates that digitisation allows researchers to research and write do new kinds of history; to ask new questions, make new connections, and develop new projects – to do things that we couldn’t do before.

Or, if you’d prefer, here it is in image form: