Data stories: from facts to fiction

The image above is taken from „Marx Engels Werke“ (MEW): Marxism is the most prominent example of what postmodernism calls a ‚Grand Narrative‘. Marx and Engels took all kinds of data, drew their conclusions, and told the one story that made sense from what they found.

Un poème n’est jamais qu’un alphabet en désordre. (Jean Cocteau)

Our time is perhaps the time of an epidemic of things. (Tristan Garcia)

I remember the elderly complaining about „information over saturation“ or even „overload“ when I was a child, in the early 1980. 30 years later, the change of guards comes to my generation. „But once a sponge is at capacity, new information can only replace old information.“ Things like that we read in random articles every day. But what is this information that people are so afraid no longer to get, when the deluge of data has taken over?

What is data? Data is the raw-content of our experience – primary the sensory readings that get conveyed into our minds, secondary the things we measure when we try to make experiences. I don’t want to get too philosophical here, but there are quite a few thinkers who share my discomfort with connecting data with facts directly. The whole postmodernism is about deconstructing false confidence with empirical truths. A century ago, Husserl already warned us, that sciences thus might give us mediated theories rather than direct evidence. Quantitative social science, be it empirical sociology, be it experimental psychology, is in particular prone to the positivist fallacy. While throwing a dice might be correctly abstracted into a series of stochastically independent occurrences of one and the same experiment, this is almost never true for human behavior.

Let us stop taking data as facts. Let us take data as fiction instead. Let us, just for the moment, think of data as the line of a story by which we tell about our experience. There might very well be no such thing like information in the data – just the scaffolding for different narratives, that reduce the randomness and complexity. Take for example how our eyes abstract the shadow in the room’s corners to straight lines that make the edges. In fact there is no such thing as a line; if you get closer and closer to the edge, you see a rather round or uneven surface spanning between one wall and the adjoining next wall. The edge-impression is just our way to reduce our visual sensory input to a meaningful aggregate; a story.

Data as such is mostly incomprehensible. To comprehend, we have to find structure, construct causalities, reduce complexity. Data visualization is fulfilling the same task: Info graphics tell a plausible story from data, make it digestible for our mind.

The link between data and our comprehension of reality from the data is built via metaphors. A metaphor connects different things in a way, we can identify one with the other. If we summarize objects under one category, this category becomes in fact the metaphor. „‚Table‘ is a word with five letters“, as Rudolph Carnap put it. The concept of a ‚table‘ however is the metaphor, the image, the ideal of an arbitrary set of objects. To speak of a ‚table‘ is our way to evoke an image of the concrete object we have in mind in the consciousness of our audience.

There is no law that forces us to see data as necessary, as caused, and effecting. If data would be positive, scientific progress would just be correcting errors of predecessors. But certainly this is not the case. Even what is called ‚hard science‘ changes direction according to the narrative. Quantum physics was not necessary. Heisenberg’s operators are not reality in the sense that there is a factual object that changes one quantum state to the next. It is a meaningful abstraction from a reality that we cannot directly comprehend. In the same way, we may use data from social interaction, behavioral data, or economic data, and try to find a meaningful narrative to share our model of the world with others.

The narrative that we derive from data is of course by no means totally random. Of course not every narrative does fit our data points. But within our measurements, any model that does not contradict the data could be possible, and might – depending on the context – make an appropriate metaphor of our reality.

Since many narratives are possible, and a broad range of parameters can fit with our data, we should be humble when it comes to value judgements. If a decision can be justified from our data depends on the model we choose. We should be clear, that we have a choice, and that with this choice comes responsibility. We should be clear about our ethics, about the policies that guide our setting the models‘ parameters. We should be aware of algorithm ethics.

We should also recognize that our data story is not free from hierarchy. It is very well possible that we impose something onto others with what is just one possible narrative; no story can be told independently from social context.

When we accept data not to be just the facts that have to add to some information, but as the hints of our story, we will be liberated from the preassure of sucking every bit in our brains. We might miss something, but that will hardly be more dramatic than in past times. Our model might not be perfect, but nevertheless, in hearing the data narrative we might catch a glimpse of what is missing. We should just let go our dogma of data as facts.

As big data becomes the ruling paradigm of empirical sciences, I hope we will see lots of inspiring data stories. I hope that data will transcend from facts to fiction. And I want to hear and tell the fairytale where we wake the sleeping beauty in data.

This is the summary of my talk „Data story telling: from facts to fiction“ I gave at the Content Strategy Forum 2014 in Frankfurt:

A Bit of Data Science – What Your Battery Status Tells About You

Working with lots of data, the biggest challenge is not to store or handle this data – these jobs are far from being trivial, but there are solutions for nearly any kind of problem in this space. The real work with data starts when you ask yourself: what’s behind the data? How could you interpret this data? What story can you tell with this data? That’s what we do and we want to share some of our findings with you and motivate you to join our discussion about the meaning of the data . We want to create Data Fiction.

Today, we start with some sensor data collected by our explore app – the smartphone’s battery status including the loading process. Below you see sample data for our user’s behavior during the week (Feature Visual) and at the weekend (Figure 1).

Smartphone Battery Weekend
Figure 2: Smartphone Battery Status (weekend) (Datarella)

In Figure 1 you see that most users load their smartphones around 7 a.m. and (again) around 5 p.m. What does that tell us? First, we know when most users wake up in the morning – around 7 a.m.. Most probably they have used their smartphones‘ alarm functions and then connect their devices to the power supply. Late afternoon, they load their devices a second time – probably at their office desks – before they leave their workplaces. During weekends, the loading behavior is different: people get up later, and maybe use their devices for reading, social networking or gaming, before they reconnect them to their power supplies.

Late rising leads to an avarega minimum battery status of 60% during weekends, whereas during the week, users let their smartphones batteries go down to 50%. This 10% difference is interesting, but the real surprise is the absolute minimum battery status of 50% or 60%, respectively. It seems that the days of „zero battery“ and hazardous action to get your device „refilled“ are completely over.

For some, data is art. And often, it’s possible to create data visualizations resembling modern art. What do you think of this piece?

matrix
Figure 2: Battery Loading Matrix (Datarella)

This matrix shows the daily smartphone loading behavior of explore users per time of day. Each color value represents a battery status (red = empty, green = full). So, you either can print it and use it as data art on your office’s wall or you think about the different loading types: some people seem to „live on the edge“, others do everything (i.e. load) to staying on the safe side of smartphone battery status.

What are your thoughts on this? When and how often do you load your mobile device? Would you describe your loading behavior as „loading on the edge“ or „safe? We would love to read your thoughts! Come on – let’s create Data Fiction!

The design of the explore app – The Datarella Interview

Today, we speak with Kira Nezu (KN), Co-founder of Datarella, about the design of the explore app.

Q
The explore app is available for Android smartphones only. What is the reason not to launch an iPhone version, too?

KN
We started to develop explore as a so-called MVP, a Minimum Viable Product. We chose Android to start with since it offers more variety regarding sensor and phone data. So we only test and make mistakes on one platform. At some point, we will also launch an iPhone version.

Q
explore consists of two different elements: the sensor tracking and the interaction area with surveys, tasks and recommendations. Could you tell us more about the structure and the functionalities of the app?

KN
With the MVP we are trying to stay as flexible as possible to enable fast changes and bug fixing. So we decided to create a hybrid app which incorporates native and web elements. The native part basically is the container with most of the graphics. The content is dynamically fetched from our backend, whereas the result area is fully created with web views. This brings great flexibility: we can update our content within minutes.

Regarding the structure there are 3 areas:
– main content area – divided into the survey area and recommendations,
– menu area,
– result area.

Q
Regarding surveys: there are already mobile survey apps on the market. How does the explore app differ from those?

KN
Before designing the app, we did a lot of research on existing survey apps. We found that either the apps had a very technical design that reminded us of Windows 95. Other apps were very playful but done simply, i.e. one app would show two images – and the user could tap one of those to make a choice.

We want the user to have a playful experience while keeping the flexibility of different interaction formats.

Q
You call explore a Quantified Self app. Can you elaborate on that?

KN
The Quantified Self aspect of explore relies on regular interactions wich ask the same information from the user. In the result area we show the user her personal mood chart with her own results compared to other explore users. Currently we are working on a location heat map in which the user can see her personal location history of the past days – and also that of other users. We had some surprise moments in internal tests: it took quite a while to recall why we were at certain locations. You could compare that with cooking water for tea 5 times before finally remembering to brew your tea.

Q
So what are the next steps for explore?

KN
We will focus on adding more Quantified Self elements as results as well as offer an API for users to play with their own data. We are really looking forward to see what our users will come up with! If you are interested with playing with your data now, you are welcome to participate in our Call for Data Fiction.

Q
Thank you very much.

User interaction with the explore app – The Datarella Interview

Today, we speak with Yukitaka Nezu (YN), Co-founder of Datarella, about user interaction with the explore app.

Q
The explore app provides two key elements: sensor tracking and social interaction. You are responsible for the social interaction part. Could you tell us more about it?

YN
There are three different kinds of interactions among the editorial team and our users:

– Surveys
– Tasks
– Recommendations

With the surveys we ask our users about common trends and their everyday behavior. Answers are collected, analyzed and instantly presented in the feedback area. Based on the Quantified Self approach every single user sees her own results compared with other users.

Then, we run different programs helping people to simply feel better. One of our popular programs, SMILE!, motivates the user to start smiling herself and to animate others to smile, too, in return. On a daily basis, SMILE! participants receive tasks they have to fulfill. SMILE! participants managed to feel better after having finished the program and were happier compared with non-participants.

Last but not least, we provide two kinds of recommendations:
– General recommendations regarding health, fitness, nutrition, etc.
– Based on the individually collected sensor data as well as the answers to the surveys we issue personalized recommendations which help the user to increase their wellbeing and happiness

Q
Using explore for quite a while, I have seen many different interesting topics. How do you and your editorial team find these?

YN
We don’t invent things. We listen to the people. We read what they write, and talk. Then, there are seasonal topics of interest, such as national elections, or topics which are somehow linked to special dates. These days, one of the hottest topics is the Soccer World Cup in Brazil.

Q
Being provided with individualized recommendations seems very promising for the user. On the other hand – it sounds like a lot of work for your team. How is the user feedback on that? Do they like it?

YN
Yes, indeed, working on the interaction side of the explore app is the opposite of a part-time job: handling this huge amount of data and interacting with our users individually is a lot of work. However, we have developed tools supporting us in our analytical work. For example, there is our core instrument, the Complex Event Processing Engine CEPE. This engine automatically triggers certain interactions based on specific events; e.g. if a user enters a shopping mall, he will be provided with a coupon from a shop nearby.

The feedback we receive from people participating in our programs is very positive. Our users like the daily tasks – they are regarded as a welcomed distraction from their everyday routines. And, most of them confirmed that they have changed their behavior in a positive way. Above all, this behavior change aspect is the most important one for us: if you realize that your users really appreciate you work on the one hand and that they are successful in changing their behavior for the better on the other – then you know that you’re doing a meaningful job. It’s about creating meaning behind the data – and social relevance, after all.

Q
Thank you very much!

Boost your wellbeing and happiness with the explore app program SMILE!

Too much workload, stress and ultimately the burnout – that’s how many people see their everyday life. One way to handle the negative aspects of daily routines is to make it to the weekend (TGIF), another is to go on vacation. Whereas the first tactic is easy to realize but only helpful to a certain degree, the latter is possible once or twice a year for most of us. But there is another, more easy way to calm down and to boost your wellbeing and happiness: create and repeat small positive experiences – and you will see an immediate effect on your overall awareness of life.

As Sonya Lyubomirsky and Kristin Layous show in their paper, based on research by Ed Diener and others, it’s the small and regularly repeated positive experiences which influence your wellbeing and happiness to a great extent. According to the Positive-Activity Model, features of positive activities, including their dosage, variety, sequence, and built-in social support, all influence their success in that process.

Positive-Activity Model

Positive-Activity Model

For our editorial team at Datarella, this model was a challenge: how could we use the explore app to get this model work in an optimal way? As always, the team decided not to head for the optimal – but for a good solution, to invite volunteers to participate  in a special program and ultimately to optimize the program together with the explore users. This program, SMILE!, should be designed very lean, with just a minimum number of interactions, and with an active participation for just a few days,  in order not to interfere with the model’s cause-and-effect relationships.

After the 5-day-program, our data team analyzed the results. In short: our findings completely back the findings of Ed Diener et al.:

  1. participants of the SMILE! program experienced a significant increase of their happiness with each additional day during the program
  2. participants of the SMILE! program experienced an increase of their happiness compared with a test group of non-participants whose happiness level remained constant
  3. small, regular and well-portioned challenges triggered a change of the participant’s behavior resulting in an increased happiness level

The two charts below demonstrate the SMILE! effect:

Hast Du heute viel gelacht?hast Du nach dem Programm mehr gelacht?

(For non-german speaking users: Translation Chart 1:  „Did you laugh a lot, today?“, Translation Chart 2:“Do you think that you laughed more often at the end of the program?“, Translation Feature Visual:“How do you feel at the moment?)

The Datarella team itself participated in SMILE!, too. For me personally, it was a great experience. Being an optimistic guy and smiling often, the SMILE! challenges opened my eyes: in reality I have been smiling much less than I had thought. And triggered by the SMILE! challenges I was forced to become much more friendly.

For Datarella, the SMILE! program was a first test. We are planning to roll out several programs of this kind, all of them aiming to boost personal wellbeing and happiness. Since our editorial team is still in the process of creating these programs we’d like to invite you to participate and add your ideas, proposals and thoughts! We’d love to hear from you!

Big Data Product Development – The Datarella Interview

Today, we speak with Joerg Blumtritt (JB), CEO and Co-founder of Datarella, about product development based on Big Data.

Q
What is so special about product development based on Big Data?

JB
Big Data is not so much about technology, it’s more about letting go your traditional business practices: where you used to differentiate between data and meta data or between master data and transitional data, you now just see …. data. If you take Social Media data for example, the old way of analyzing things would have been taking the texts of postings a data and time stamps, geolocation, the profile of the author, etc. as meta data. However, for most contexts, it’s far more valuable to analyze the connections of different authors or it might be even more telling to include the geolocations to reveal the true meaning of the posting without understanding a single word of the language it as written in (BTW: this is how the NSA does Social Media monitoring)

The second aspect of this is not to work hypothesis-driven, but in an explorative way: don’t restrict yourself by narrowing the scope – instead analyze all given variables.

Q
You mentioned Social Media monitoring. In times of everybody being an author and adding content to the internet on a daily basis – is there still a need to know even more about „the user“?

JB
Data is made of people. Mass customization, for the first time, is not a buzzword disguising half-baked services or products. Now we have the means to stop aggregating people and to deal with each person individually.

Q
Regarding the paradigm shift of not building hypotheses first but collecting analyzing all data: isn’t then former US State Secretary Donald Rumsfeld the real inventor of Big Data analysis by pointing to knowns, unknowns, unknown unknowns, etc.?

JB
The known knowns in Rumsfeld’s narrative are classic dashboards, where only information from your data warehouse gets displayed – which already is known. Big Data really is about the unknown unknowns. However, there are also unknown knowns – things like social conventions, moral restraints, etc. that bias our perspective.

Q
Product development typically requires stable company structures and efficient processes. Now we have this paradigm shift and with it – instability and untested processes. What is the most important aspect of product development based on Big Data?

JB

Keep your product in permanent beta. Publish fast, collect all information how your product is used by its users, and constantly update with what you have learned.

Good product development with Big Data means being agile, iterative and lean and focusing on the Minimum Viable Product MVP instead of feature-laden products. Always A/B test: present small variations of your product to random samples of users and check if variations increase the value of your product.

Q
Ok – so we have a lean startup process here. But – where to get ideas from in the first instance?

JB
My favorite quotation for this is:

Never look into the past – it only distracts from the future!

by the Incredible Edna ‚E‘ Mode.

Business intelligence only tells you what is already there. so look to other industries to get inspired: take Google or Tesla entering the car manufacturing industries: both have been software and services companies disrupting a classic industry. The full product development cycle to build a robotic car factory from an old used one to drive in a compelling electricity-powers sportscar took Tesla less than 2 years. Compare that with literally decades it takes to develop one of the classic road dinosaurs.

Q
To sum it up:, your recommendation for an optimized Big Data product development process is: never underestimate attacks from completely different industries, collect and analyze all available data in a very lean way and get your product out early to learn from your customers.

JB
Exactly.

Q
Thank you very much!

Call for Data Fiction

DATA FICTION – THE STORIES BEHIND THE DATA

Do you read science fiction? Can you make data interesting? Can you tell the story behind a pool of data? Are you a data fictionista? Submit your data fiction.

People, animals, plants and things produce data – a lot of data. The data itself is the basic resource – like words are the basis for language. If you put words together to sentences and you combine sentences to chapters and aggregate several chapters – you write a story, you create fiction. Same with data: if you combine different data sources to data pools and aggregate them – you write the story behind the data, you create data fiction.

[Strong narrative] augments the available data by way of context, and extends the patience of the audience by sustaining their interest as well.

Does that sound like you?

We’d love to see and discuss your applications, analyses, case studies and models with you and help you make your data fiction become reality.

DATA, APP & COMPLEX EVENT PROCESSING ENGINE
The Data
We will provide you with sample data resulting from the usage of our explore app.

The App
The data has been created by users of the explore app. In explore, the user interacts by answering surveys, attending tasks and heeding valuable recommendations based on her behavior. She immediately sees the results of her interactions in the feedback area. Second, explore tracks several sensors of the user’s phone, which can be set on and off by the user herself (see full list of sensors below). explore connects both areas, interactions and the sensor tracking area, with the integrated Complex Event Processing Engine CEPE.

datarella explore app

The Complex Event Processing Engine (CEPE)
The CEPE is a mechanism to target an efficient processing of continuous event streams in sensor networks. It enables rapid development of applications that process large volumes of incoming messages or events, regardless of whether incoming messages are historical or real-time in nature.
Our CEPE is based on ESPER and Event Processing Language EPL

List of Sensors
– GPS location data
– Network location data
– Accelerometer
– Gyroscope
– Wifi
– Magnetic field
– Battery status
– Mobile Network

REQUIRED
– Overview and extended description or representation of your main idea, any subtopics and a conclusion
– Use or integration of at least 1 (one) category of sensor data (e.g. Gyroscope). If you use GPS location, you should use or integrate at least 1 (one) additional category of sensor data beside GPS location data.

DATA FICTION TYPES
– Presentation
– Video
– Installation

RESULTS
We will reward fascinating data fiction with preferred access to our data, a post on the QS Blog and the possibility of making data fiction come true.

Yes, I am a data fictionista and want to submit my data fiction!