If There Was Already an Ocean of Data in 2007, How Much is there Now?


600full-waterworld-screenshotI’ve been trying to figure out how to convey the scale of the ‘Big Data‘ phenomenon — the recent worldwide explosion of the volume of data encoded in digital form. Inspiration came from Randall Munroe’s fantastic “What if?” comics, which provide “Serious Scientific Answers to Absurd Hypothetical Questions.” (check out his 2o14 TED talk and pre-order the “What if?” book.)

So I decided to (poorly) imitate his methodology and try to seriously answer the question posed in the title of this post — if there was already an ocean of data in the world in 2007, how much more ‘datawater’ was there in 2013? I chose this metaphor of data-as-water because it’s a familiar one; we often use the imagery of having ‘oceans of data’ that we’re ‘drowning’ in.

I chose 2007 for no particularly good reason. It’s pretty recent, which makes any large increase in datawater more surprising. It wouldn’t surprise many people to learn that the amount of datawater has gone up a lot over the past thirty years, for example, but how much has it increased just over the past seven? 2007 is also when the phrase “big data” was just starting to be used.

Finally, it’s a year for which we have a couple estimates for how much worldwide digital data existed. The first is from the 2011 Science paper “The World’s Technological Capacity to Store, Communicate, and Compute Information” by Martin Hilbert and  Priscila López; 2007 is the last year considered in this research. The second is from a series of whitepapers published by EMC each year since 2007 that give the amounts of digitally encoded data worldwide.

These two estimates line up pretty well for 2007 (the one year they overlap), so I’ll use the Science estimate for 2007 and the EMC one for 2013 (the most recent year in the series). They tell us that there were 295 exabytes of digital data in 2007, and 4.4 zettabytes in 2013, giving an annual growth rate of 57.5% over the period.

I’ll associate the volume of digital data in the world in 2007 with the volume of the Atlantic Ocean, just because it’s the one closest to home. The Atlantic Ocean is big in absolute terms — it contains over 300,000,000 cubic kilometers of water — but it’s also relatively small; it makes up less than 0.03% of the volume of the Earth. So as I started my calculations I found that I didn’t have good intuition about how much datawater there would be in 2013. Would it be a thin film covering the Earth’s surface? Would it come up to our waists? Be deep enough to swim in?

It turns out that it would cover all of our creations and everything else on the planet to the point that we could happily boat around on the surface of the datawater ocean without worrying about bumping into any of the mountaintops of the former world. The volume of datawater created between 2007 and 2013 would cover the Earth to a depth of  84,417 meters (276,000 feet), which is almost ten times the height of Mt. Everest (you can double-check my calculations here).

The addition of this amount of water would certainly be the biggest change in the history of the world. Which makes it a pretty good analogy for the advent of Big Data.