Editor’s note: This post for the April ‘Ethnomining‘ edition comes from Fabien Girardin @fabiengirardin who describes his work with networked/sensor data at the Louvre Museum in Paris. Based on this inspiring case study, he discusses the overall process, how mixed-methods are relevant in his work, and what kind lessons he learnt doing this.
Fabien Girardin is Partner at the Near Future Laboratory, a research agency. He is active in the domains of user experience, data science and urban informatics.
At the Near Future Laboratory we like to experiment and to go in different directions from the typical technology consultancy. We thrive on the involvement of multiple practices, and bet on the unordinary when it comes to question formulation, data collection and solution creation. After completing my PhD in Computer Science, I left the bounded disciplines of academia to embrace learning and connecting to the other “fields”, the other ways of knowing and seeing the world. Along with partners Julian Bleecker, Nicolas Nova and a network of tactical scouts, we formed a technology-based practice that combines insight and analysis, design and research, and rapid prototyping to transform ideas into material form.
Over the past 5 years, I have led investigations that aim to extract knowledge from the byproducts of people’s digital activities (i.e. network data, also often called digital shadows or digital footprints). That intangible material can take the form of logs of cellular network activity, aggregated credit card transactions, real-time traffic information, user-generated content or social network updates. Over time my contributions have evolved into helping transform this type of big data into insights, products and services. Whether applied for a client or as part of our self-started initiatives, this practice requires the basic skills of a “data scientist” (data analysis, information architecture, software engineering and creativity) along with a capacity to engage at the intersections with a wide variety of professionals, from physicists and engineers to lawyers, strategists and designers. The transversal incline of investigations on network data requires understanding the different languages that shape technologies, reporting on the context of their use, and describing people’s practices. The model of inquiry blends qualitative field observations with quantitative evidence often extracted from logs.
Past projects have led us to exploit untapped data sources, uncover opportunities to transform data into insights, and materialize new services or products. Our method first contemplates datasets and techniques to approach our objectives. Then we develop tangible solutions that engage the project stakeholders in exploring different scenarios and solutions. It is through the experiences of people with knowledge of the project domain that we are able to extract possible near-future changes and opportunities.
Practically, I like to showcase our project for the Louvre Museum a couple of years back. Not only because I have fantastic memories of the breathtaking environment, but mainly because we learned a lot from the use of network data to provoke qualitative knowledge.
The Louvre is by far the most visited museum in the world with 8.5 million visitors and more than 40.000 visitors on peak days . In Paris, it is one of the main drivers of the “cultural enthusiasm” that is an inherent feature of the city. In consequence, the museum witnesses levels of crowding that, beyond a certain threshold, can be described as “hyper-congestion”. This phenomenon has some direct negative consequences on the quality of the visitor experience as well as on the organization and management of the Museum (e.g. increased stress level of the security staff).
The Study, Evaluation and Foresight Department of the Museum performs extensive surveys, audience analysis and on-site observations to ensure a good visiting experience. However, the information they collect only partially feeds the visitor flow models necessary to setup and evaluate some of the museum strategies. That was the reason they approached us. They wanted to investigate new solutions to their concerns with “hyper-congestion”. In response, we (1) investigated the collection of new empirical data on the flows and occupancy levels of visitors in key areas of the Louvre, and (2) developed diagnostic indicators to capture changes in visitor behaviors relative to the congestion in the museum.
In collaboration with our friends at the real-time traffic information provider BitCarrier, we designed sensors that audited the presence of Bluetooth-enabled mobile phones on a key trail that leads to the Venus de Milo. The analysis of the collected longitudinal measures of presence and flows of visitors led to the development of an indicator that unveiled areas in which the congestion of a room changes the presence and flow of visitors. While unprecedented in the history of the Louvre, some results produced more questions than answers. We faced a new set of inquiries that quantitative evidence from sensors could not answer, but that field observations could address. For instance, what events provoked the congestion, what aspects of the visiting experience were affected, and why did some rooms show no symptoms of hyper-congestion?
We returned to the visualizations produced as part of our data analysis. Yet this time, we did not prepare them for the decision makers but for the security staff. They were a unique source of on-site information on visitor practices and flow management strategies — an unstructured contextual knowledge that only specific questions help expose. So we setup meetings at the museum and used our data visualizations to have the staff qualify the results of the audits. Their evidence explained some irregularities and completed the understanding of visitor behaviors. For instance, a door periodically closed was a source of radical changes in visitor flows.
In that experience, we learned the types of questions that analysis of network data can answer. For instance, “how many observations can we produce”, “what do the data tell us about a population”, “what kinds of evolution can we measure over time”, “can we categorize these evolutions”, “what are the trends and the outliers” or “what are the flows that connect different places”. In the context of network data however, it is a big assumption to see the world as consisting of bits of data that can be processed into information that then will naturally yield some value to people. Quantitative data analysis and visualization techniques will answer some questions but prompt many more. Indeed, the understanding of an environment such as the Louvre goes beyond logging machine states and events. In consequence, my work takes a critical perspective on the limitless capabilities of Big Data that some assume . At this stage we are still often trying to figure out: 1) What parts of reality can quantitative data reveal? 2. What we can do with this limited view of reality?
The qualitative view from the staff at the Louvre reinforced the quantitative measures and consolidated our overall knowledge of hyper-congestion at the museum. In other words, the articulation between qualitative insights and sensor measures enabled us to refine our understanding of the phenomenon. However, I have not seen yet a set of good practices for using quantitative data mining to inform qualitative inquiries, or for using qualitative observations to inform the definition of quantitative queries.
These questions suggest that researchers and practitioners need to develop a coherent dialogue between the techniques used to collect information about people’s activities: both qualitative (e.g. audio and video recordings of action and interviews) and quantitative (e.g. network data). Besides in the work at the Louvre, I often find it necessary to be able to visualize temporary results very quickly, and communicate them to project stakeholders. In the industry, this kind of exploratory investigation needs to maintain a certain momentum; it needs to fail, fork or succeed early. As a consequence, more and more results of our investigations became interfaces or objects with a means of input and control rather than only static reports. This practice calls for its own set of tools to manipulate and visualize data.
At Near Future Laboratory we were struggling to combine tools that could allow our clients who have knowledge and data but not technical skills to prototype their own solutions and scenarios. That is until our friends at Bestiario showed us a visual programming environment they were developing. We naturally contributed to what now has become Quadrigram, a tool specifically designed for iterative data exploration and explanation. Each iteration or “sketch” is an opportunity to find new questions and provide answers with data. Data mutate, taking different structures in order to unveil their multiple perspectives. For us, in our research process Quadrigram offers the opportunity to manipulate data as a living material that can be shaped in real time. This capacity not only concerns ‘data scientists’ but rather everybody with knowledge and ideas in a project that involves data.
It might be obvious to a community of ethnographers, but quantitative data are not sufficient to give full answers about people, their behaviors and their usage of technology. Yet the world of ‘data science’ and computer science still lacks sensitivity to the limitations of quantitative evidence and the models we can build from it. I have often been confronted by these limitations. Several of our projects with network data taught me that there are insights that only the articulation of sensor data and in-situ observations can provide.
On a more general level, the Louvre project confirmed our use of an approach that is closer to the academic researcher’s than to the consultant’s. It implies staying humble, not starting an investigation with a priori assumptions, and not being afraid to express doubts. When it comes to mixing techniques and methods, this posture of the researcher driven by doubt, but also confident in his/her methods, is what drives relevant insights.
 Figures of 2009 when we performed the project
 See for instance The End of Theory: The Data Deluge Makes the Scientific Method Obsolete by Chris Anderson