As reply to our “Call for Data Fiction”, Benedikt Koehler has applied the BreakoutDetection packege for R on our sample data set. Here you see his findings with the code:
So many things about wearable tech, self tracking, and the Quantified Self! So we started to put everything we read into a Flipboard Magazine to share with everyone.
Looking forward to your comments!
From February 18th to 20th, O’Reilly’s Strata+Hadoop World Conference will once again be the most important Big Data event of the year. More than 6,000 visitors are expected at San Jose Convention Center. I’ve been visitor to every Strata from the beginning. Thus I regard it as a special privilege not only to be there and listen, but give talk myself.
Here is the link to our session:
“Smartphone Data: Tell the Story of People’s Lives.”
Come and visit us there!
“Facebook would never change their advertsing relying on a sample size as small as we do medical research on.”
People want to learn about themselves and get their lives soundly supported by data. Parents record the height of their children. When we feel ill, we measure our temperature. And many people own a bathroom scales. But without context, data is little meaningful. Thus we try to compare owr measurements with those of other people.
Data that we track just for us alone
Self-tracking has been trending for years. Fitness tracker like Fitbit count our steps, training apps like Runtustic deliver to us analysis and benchmark us with others. Since 2008, a movement has been around that has put self-tracking into its center: The Quantified Self.
However it is not just self-optimizer and fitness junkies who measure themselves. Essential drive to self-tracking originated from self-caring chronically ill.
Data for the physician, for family members, and for nursing staff
In the US like in many countries lacking strong public health-care, it becomes increasingly common to bring self-measured data to the physician. With many examinations this saves significant consts and speeds up the treatment. With Quantified Self, many people have been able to get good laboratory analytics about their health for the first time ever. One example is kits for blood analysis that sends the measurement via mobile to the lab and then displays the results. Such kits are e.g. widely in use in India.
Also for family members and nursing staff, self-tracked data of the pations is useful. They draw a realistic picture of our conditions to those who care for us. Even automatic emergency calls based on data measured at site are possible today.
The image at the top is taken from the blog of Sara Riggere, who suffers from Parkinson. Sara tracks her medication and the syptoms of her Parkinson’s desease with her smartphone. Her story is worth reading in any case, and it shows all facettes that make the topic “own data” so fascinating:
Data for research
Self-recorded data for the first time maps people’s actions and condition into an uninterupted image. For research, these data are significantly richer than the snap-shots made by classic clinical research – regarding case numbers as well as by making possible for the first time to include the multivariate influences of all kinds of behavior and environment. Even if only a small fraction of self-trackers is willing to share their data with researchers, it is hardly to imagine the huge value the findings will have for medicine, enabled by this.
The difficulty with these data: they are so rich and so personal, that it is always possible to get down on the single individual. Anonymization, e.g. by deleting the user id or the IP adress is not possible. Like fingerprints, the trace we leave in the data can always identify us. This problem cannot be solved by even more privacy regulation. Already today, the mandatory committment to informed consent and to data avoidance impede research with medical data to such extent, it is hardly worthwhile to work with it, at all. The only remedy would be comprehensive legal protection. Every person sharing their data with research has to be sure that no disadvantages will come from their cooperation. Insurance companies and employers must not take advantage from the openness of people. This could be shaped similar to anti-discrimination laws. Today, e.g. insurance companies are not allowed to differenciate their rates by the insurant’s gender.
Another issue lies within the data itself. First, arbitrary, technical differences like hardware defects, compression algorithms, or samling rates make the data hard to match. Second, it is hardly the raw data itself, but rather mathematical abstractions derived from the data, that gets further processed. Fitbit or Jawbone UP don’t store the three-dimensional measurements of the gyroscope, but the steps, calculated from it. However, what would be regarded as a step, and what would be another kind of movement, is an arbitrary decision of the author of the algorithm programmed for this task. Here it is important to open the black boxes of the algorithms. As the EU commission demands Google to open its search algorithms, because they suspect (probably with good reasons) that Google would discriminate against obnoxious content in a clandistine way, we have to demand to see behind the tracking-devices from their makers.
Data is generated by the users. The users have to be heared what is made from it.
Our phones register in radio cells to route the calls to the phone network. When we move around, we occasionally leave one cell and enter another. So our movements over leave a trace through the cells we have been passing the course of the day. Yves-Alexandre de Montjoye and his co-authors from MIT explored, how many observations we need, to identify a specific user. Based on actual data provided by telephone companies, they calculated, that just four observations are sufficient to identify 95% of all mobile users. We need just so little evidence because people’s moving patterns are surprisingly unique, just like our fingerprints, these are more or less reliable identifiers.
When we analyze the raw data, that we collect through our mobile sensor framework ‘explore’ we found several other fingerprint-like traces, that all of us continuously drop by using our smartphones. Obviously we can reproduce de Monjoye’s experiment with much more granular resolution when we use the phone’s own location tracking data instead of the rather coarse grid of the cells. GPS and mobile positioning spot us with high precision.
Inside buildings we have the Wifis in reception. Each Wifi has a unique identifier, the BSSID and provides lots of other useful information.
To provide compass functionality, most smartphones carry a magnetic flux sensor. This probe monitors the surrounding magnetic fields in all three dimensions.
The way we use the phone has effect on the power consumptions. This can be monitored via the battery charge probe:
All the sensors in our phones have typical and very unique inaccuracies. In the gyroscope data shown at the top of the page, you see spikes that shoot out from the average pattern quite regularily. Such artifacts caused by small hardware defects are specific to a single phone and can easily be used to re-identify a phone.
No technical security
“We no longer live in a world where technology allows us to separate communications we want to protect from communications we want to exploit. Assume that anything we learn about what the NSA does today is a preview of what cybercriminals are going to do in six months to two years.”
Bruce Schneier, “NSA Hacking of Cell Phone Networks”
As Bruce Schneier points out in his post: there are more than enough hints that we should not regard our phones as private. Not only have we learned how corrosive governmental surveillance has been for a long time, there are lots of commercial offerings to breach the privacy of our communication and also tap into the other, even more telling data.
But what to do? We can’t just opt-out. For most people, not using mobile phones is not an option. And frankly: I don’t want to quit my mobile. So how should we deal with it? Well, for people like me – white, privileged, supported by a legal system providing me civil rights protection, that is more discomfort than a real threat. But for everyone else, people that can not be confident in the system to protect them, the situation is truly grim.
First, we have to show people what the data does tell about them. We have to make people understand what is happening; because most people don’t. I am frequently baffled how naive even data experts often are.
Second, as Bruce Schneier argues, we have to get NSA and other governmental agencies to use their knowledge to protect us, to patch security breaches, rather then exploit these for spying.
Third, it is more important then ever, to work and fight for a just society with very general protection of not only civil but also human rights. Adelante!
Datarella now provides an API for our app ‘explore’, that allows every user to access the data collected and stored by the app.
An Application Programming Interface, in short API, is an interface for accessing software or databases externally. Web-APIs giving us access via the internet, have become the principle condition for most businesses in the web. Whenever we pay something online with our credit card, the shop system accesses our account via the API of the card issuing company. Ebay, Amazon, PayPal -they all provide us with their APIs to automatize their whole functionality to be included in our own website’s services. Most social networks offer APIs, too. Through these we can post automatic messages, analyze data about usage and reach, or control ad campaigns.
The ‘explore’ app was developed by Datarella to access the smartphones internal sensors (or probes), and to store the data. It is however not just about standard data like location, widely known because of Google Maps. ‘explore’ reads all movements in three dimensions via the gyroscope, accelleration, magnetic fields in the environment. Mobile network providers and Wifis in reception are also tracked. From these data we can learn many interesting things about ourself, our surroundings and environment, and about our behavior. To set the data in context, the API also gives out data from other users. For the sake of privacy and information self-determination, this is aggregated and averaged over several users, so that identification of a specific person is not possible.
With our API, Datarella commits to open data: We are convinced, that data has to be available for users.
➜ Here is our API’s documentation: explore.datarella.com/data_1.0.html
➜ Here the download-link for ‘explore’: play.google.com
We are excited to learn, what you will make from the data.
At Datarella, you offer different programs your users can participate in. Can you elaborate on the meaning behind these programs?
With our explore app, we provide a useful free tool for smartphone users to optimize their lives. There is a broad range of specific life situations in which the explore programs provide valuable and sustainable benefits. From lifestyle oriented programs as SMILE!, our guide to learn how to smile in 5 days, to specific health programs as our OsteoGuide which supports users suffering from Osteoporosis – we provide a broad range of programs. The most important aspect for Datarella is to always provide real benefits to our users: it’s not about technology, it’s about the social relevance of technology, its immediate impact on the user.
Could you describe one of those programs and its impacts on your users in more detail?
Sure! Let’s take the OsteoGuide: in countries with populations with median ages of 45 and older, Osteoporosis has become a widespread disease. People suffer from Vitamin D shortage, move less and less during the day and, as a result, their bone structure becomes more fragile. If Osteoporosis is analyzed at an early stage it’s curable in most cases. To cure a patient from Osteoporosis you have to help her to regulate her Vitamin D level and to move more; i.e. to change her behavior: the patient should use the staircase instead of the escalator, or walk or go by bike instead of using the car or a taxi.
A change of human behavior is one of the toughest challenges you can think of. Ask yourself: how easy is it for you to quit smoking, stop taking the extra bar of chocolate, etc. The best method to support people in changing their behaviors is to provide them with instant feedback of their behavior and to give regular counsel in terms of notifications and recommendations. With the explore app and our programs, we cover these aspects perfectly. We accompany our users during a certain period of time and help them to change their behavior to the better, step by step, day by day. In case of the OsteoGuide, we cooperate with Prof. Dr. med. Reiner Bartl of the Bayerisches Osteoporosezentrum, an acknowleged expert in the field of Osteoporosis.
That sounds fascinating: you say that people in need of medical care can get rid of their diseases by using the explore app?
To be very clear: the explore app cannot fully compensate a medical treatment. And Datarella is not a team of health professionals. We have to join forces with experts like Prof. Bartl to provide our share of a solution for a patient. But, in many cases, medication can only applied successfully if the patient herself contributes to her well-being. And, in most cases, this means that she has to change her behavior. We have string evidence that the explore app programs are perfect tools to achieve this goal.
You mentioned that the explore programs are free. Where is your business model?
Yes, every smartphone user can download the explore app and apply for any of the explore programs. It’s free to participate on the basic program level which includes, tasks, notifications and recommendations during the complete program. If a user wants more, e.g. if she is looking for a personalized individual coaching, she would have to subscribe to the premium version of the corresponding program. With the premium version she would also get tasks, notifications and recommendations, but on an personalized level, customized to her individual needs. This coaching approach is mostly sought-after by users who must change their behavior in order to achieve a satisfying level of personal well-being. And if behavior change is a must, then you’ll look for the easiest way to reach your goal. The explore app programs fit very well into that requirement since the user will be coached in a soft, but equally demanding and rewarding way.
Thank you very much for these insights!
The image above is taken from “Marx Engels Werke” (MEW): Marxism is the most prominent example of what postmodernism calls a ‘Grand Narrative’. Marx and Engels took all kinds of data, drew their conclusions, and told the one story that made sense from what they found.
Un poème n’est jamais qu’un alphabet en désordre. (Jean Cocteau)
Our time is perhaps the time of an epidemic of things. (Tristan Garcia)
I remember the elderly complaining about “information over saturation” or even “overload” when I was a child, in the early 1980. 30 years later, the change of guards comes to my generation. “But once a sponge is at capacity, new information can only replace old information.” Things like that we read in random articles every day. But what is this information that people are so afraid no longer to get, when the deluge of data has taken over?
What is data? Data is the raw-content of our experience – primary the sensory readings that get conveyed into our minds, secondary the things we measure when we try to make experiences. I don’t want to get too philosophical here, but there are quite a few thinkers who share my discomfort with connecting data with facts directly. The whole postmodernism is about deconstructing false confidence with empirical truths. A century ago, Husserl already warned us, that sciences thus might give us mediated theories rather than direct evidence. Quantitative social science, be it empirical sociology, be it experimental psychology, is in particular prone to the positivist fallacy. While throwing a dice might be correctly abstracted into a series of stochastically independent occurrences of one and the same experiment, this is almost never true for human behavior.
Let us stop taking data as facts. Let us take data as fiction instead. Let us, just for the moment, think of data as the line of a story by which we tell about our experience. There might very well be no such thing like information in the data – just the scaffolding for different narratives, that reduce the randomness and complexity. Take for example how our eyes abstract the shadow in the room’s corners to straight lines that make the edges. In fact there is no such thing as a line; if you get closer and closer to the edge, you see a rather round or uneven surface spanning between one wall and the adjoining next wall. The edge-impression is just our way to reduce our visual sensory input to a meaningful aggregate; a story.
Data as such is mostly incomprehensible. To comprehend, we have to find structure, construct causalities, reduce complexity. Data visualization is fulfilling the same task: Info graphics tell a plausible story from data, make it digestible for our mind.
The link between data and our comprehension of reality from the data is built via metaphors. A metaphor connects different things in a way, we can identify one with the other. If we summarize objects under one category, this category becomes in fact the metaphor. “‘Table’ is a word with five letters”, as Rudolph Carnap put it. The concept of a ‘table’ however is the metaphor, the image, the ideal of an arbitrary set of objects. To speak of a ‘table’ is our way to evoke an image of the concrete object we have in mind in the consciousness of our audience.
There is no law that forces us to see data as necessary, as caused, and effecting. If data would be positive, scientific progress would just be correcting errors of predecessors. But certainly this is not the case. Even what is called ‘hard science’ changes direction according to the narrative. Quantum physics was not necessary. Heisenberg’s operators are not reality in the sense that there is a factual object that changes one quantum state to the next. It is a meaningful abstraction from a reality that we cannot directly comprehend. In the same way, we may use data from social interaction, behavioral data, or economic data, and try to find a meaningful narrative to share our model of the world with others.
The narrative that we derive from data is of course by no means totally random. Of course not every narrative does fit our data points. But within our measurements, any model that does not contradict the data could be possible, and might – depending on the context – make an appropriate metaphor of our reality.
Since many narratives are possible, and a broad range of parameters can fit with our data, we should be humble when it comes to value judgements. If a decision can be justified from our data depends on the model we choose. We should be clear, that we have a choice, and that with this choice comes responsibility. We should be clear about our ethics, about the policies that guide our setting the models’ parameters. We should be aware of algorithm ethics.
We should also recognize that our data story is not free from hierarchy. It is very well possible that we impose something onto others with what is just one possible narrative; no story can be told independently from social context.
When we accept data not to be just the facts that have to add to some information, but as the hints of our story, we will be liberated from the preassure of sucking every bit in our brains. We might miss something, but that will hardly be more dramatic than in past times. Our model might not be perfect, but nevertheless, in hearing the data narrative we might catch a glimpse of what is missing. We should just let go our dogma of data as facts.
As big data becomes the ruling paradigm of empirical sciences, I hope we will see lots of inspiring data stories. I hope that data will transcend from facts to fiction. And I want to hear and tell the fairytale where we wake the sleeping beauty in data.
This is the summary of my talk “Data story telling: from facts to fiction” I gave at the Content Strategy Forum 2014 in Frankfurt: