Google Play icon

From Dickens to Data Science

Posted September 8, 2017

What if we could browse through all the literary works of an author and quickly get ideas for similarities or differences in the underlying narrative structures? Researchers at Victoria University of Wellington in New Zealand are approaching this problem space by applying novel data analytics and network science.

Data-driven analysis has emerged as a growing methodology, if not sub-discipline within literary studies. This approach, broadly described as “distant reading”, has harnessed available technology to open new avenues for how we understand literary texts, both individually and in the aggregate. Whereas traditional literary scholarship is generally grounded in the interpretation of the specific language of a text or body of texts, macroanalytic approaches have offered new ways of seeing texts.

The interdisciplinary research project at Victoria University of Wellington attempts to theorise the relationship between macroanalytic and microanalytic (distant and close) readings of individual works. The researchers apply the Transcendental Information Cascades (TIC) approach (Luczak-Roesch et al., 2015) to understand how emergent structures of information are generated during the unfolding of a text. This treats the text as a diachronically evolving information system and uses TIC to isolate the structural properties of that system. The network thus provides a visualization of the occurrence of characters and models the information structures they generate.

The novels of Charles Dickens (1812-1870) are a particularly interesting object of investigation within this field of research. Not only was Dickens a central figure in the development of the nineteenth-century novel — the literary form that has been a primary object of computational analyses — but his novels construct vast and elaborate character networks as they represent and the rapidly changing Victorian world.

Dickens’s character networks are important because of their density and of the complex social world they represent; the way in which those networks were generated also warrants attention. Dickens was a pioneer of the serial novel form, writing monthly (or weekly) installments of his novels over the course of up to eighteen months. Thus, his novels not only create character networks in the process of their unfolding, but also dramatize the creation and management of those networks in the very act of composition. They offer the opportunity to analyse both how a novel, taken as a completed aesthetic object, maps character connections and also how those networks are imagined and managed in their production.

Furthermore, the evolution of Dickens’s career provides another avenue for analysis, as his early novels present episodic structure before he self-consciously announces an intention “to keep a steadier eye on the general purpose and design” of his works. Thus, Dickens’s mode of production, the arc of his career, the substantial but manageable corpus of fourteen completed novels, and the very substance of the world he represents present variables for analysing his character networks through a computational approach.

The approach has been tested on nineteen novels to this point; all fifteen novels by Charles Dickens, and four by other Victorian novelists for comparative purposes. An initial user study to evaluate the tool was performed involving humanities scholars and university students in English literature.

This study demonstrated that the resulting networks reflect properties of the narrative structure of the analysed works, and make accessible quantitative features of novels that can reveal areas for further investigation. The user study also informs user interface (UI) as well as user experience (UX) designers about how domain experts in the digital humanities collaboratively interact with tools that make use of network science and data visualisations.

References for further reading

  • Adam Grener, Markus Luczak-Roesch, Emma Fenton, & Tom Goldfinch. (2017). Towards a Computational Literary Science: A Computational Approach to Dickens’ Dynamic Character Networks. Zenodo,
  • Luczak-Roesch M, Tinati R, O’Hara K. (2017) What an entangled Web we weave: An information-centric approach to socio-technical systems. PeerJ Preprints 5:e2789v1
  • Markus Luczak-Roesch, Ramine Tinati, Max Van Kleek, and Nigel Shadbolt. 2015. From Coincidence to Purposeful Flow? Properties of Transcendental Information Cascades. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (ASONAM ’15), Jian Pei, Fabrizio Silvestri, and Jie Tang (Eds.). ACM, New York, NY, USA, 633-638. DOI:
  • Markus Luczak-Roesch, Ramine Tinati, and Nigel Shadbolt. 2015. When Resources Collide: Towards a Theory of Coincidence in Information Spaces. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15 Companion). ACM, New York, NY, USA, 1137-1142. DOI:

Source: Towards a Computational Literary Science

Featured news from related categories:

Technology Org App
Google Play icon
86,998 science & technology articles

Most Popular Articles

  1. You Might Not Need a Hybrid Car If This Invention Works (January 11, 2020)
  2. Toyota Raize a new cool compact SUV that we will not see in this part of the world (November 24, 2019)
  3. An 18 carat gold nugget made of plastic (January 13, 2020)
  4. Human body temperature has decreased in United States, study finds (January 10, 2020)
  5. Donkeys actually prefer living in hot climate zones (January 6, 2020)

Follow us

Facebook   Twitter   Pinterest   Tumblr   RSS   Newsletter via Email