Why Data Visualization Is An Essential Part of Big Data Strategy
By Tim Scargill
There was a time when, with a spreadsheet and perhaps an automatically generated bar chart in hand, you could present your information and no-one would bat an eyelid. That was data visualization. But with the advent of big data, that just doesn't cut it anymore. The very nature of big data can make it ill-suited to traditional tools and methods, and even when those tools are suitable, a visualization has all too often been an afterthought - something we knew we should probably have, but really didn't think too much about.
The practicalities of big data have meant that, perhaps understandably, enterprise has been more focused on data storage than anything else. But as rapidly changing technology transforms both the type of information we want to communicate, and how we can communicate it, we need to refresh our approach and remind ourselves of why the way we process and present that data is so important. Data visualization should be something we are all investing in - and here's why.
Bigger and Bigger Data
The amount of data we create continues to grow exponentially - it's predicted to be a staggering 44 zettabytes (44 trillion gigabytes) annually by 2020. Due to technologies like social media and the IoT, businesses are receiving more information from more sources than ever before, which can and should be leveraged to improve operations. In many cases that data may be displayed in a spreadsheet, so it can be accessed easily by employees. But clearly, as the amount of data increases, it becomes less and less readable in a numerical format, and more important to produce a graphical representation. By doing that, we simplify and condense the data into something our brains can actually process. A range of spreadsheet applications are available for this approach, each with its own strong points - cloud sharing, scripting, chart designs and formatting options are all things to consider.
However, what they all have in common is that they are not designed for extremely large datasets. Even the most powerful, Excel (which increased its 1M row limit in 2013), can struggle. One solution Microsoft released was 'Power Pivot', a new feature of Excel which makes it much easier to crunch big data - so if you need to do that, and spreadsheets work for you, that could be a great option. But it still has it limits, and you may find you need to invest in specialist data analysis software. Again it depends on your needs; Matlab and R are known for great visualization options, while SAS is capable of handling the largest datasets.
Perhaps even more of an issue is how complex this data is. From social media statistics to data streaming from IoT sensors, the variety of information to be analyzed is huge. That produces complexity not only in preparing those different formats for comparison, but in discovering how they may be related. But by understanding how disparate elements such as manufacturing set up, supply chain management and customer e-commerce interactions are inextricably linked, it is possible to refine and improve every part of our businesses.
It is the role of a data scientist is to go beyond basic assumptions to find those deeper insights, those hidden relationships in the data. And to do that they need an array of analysis and visualization options at their disposal - a correlation that is not obvious in one type of chart may be revealed in another. In the near future, augmented reality (AR) and virtual reality (VR) features will even enable 3D visualizations, which will greatly speed up the analysis of large, complex datasets. Finally, software that provides built-in scripting or programming facilities may also be very beneficial when dealing with different types of data.
However, data scientists are not the end users of these visualizations. Therefore, as well as finding those insights, we have to present them in a way that generates interest, that captures the attention of employees and clients alike. Modern software provides a variety of eye-catching options for charts and visual effects, which can go a long way to forming a lasting impression of a presentation. How those visualizations can be distributed across a shared platform is also important - cumbersome attachments are easily misplaced or even ignored altogether.
But here is the most common mistake: just because we have a great-looking graphic to show people, that's not business analytics. Data visualizations are not the endgame, but should aid interpretation and inspire action, and that 'data storytelling' skill is an undervalued one. For example, a beautifully designed scatter graph might be incredibly pleasing and make perfect sense to the person who produced it, but if it isn't obvious to the decision makers what needs to be done next, it's useless. We have to invest in training data scientists to recognize this - at all points of the visualization process, it is critical to keep in mind the business problem to be solved, the audience (people and processes), and the actions we are recommending as a result of these insights.
Another factor when considering how we present information is efficiency. As visualization programs provide more and more flashy effects and configuration options, data scientists need to know how to use those features sparingly; to get the desired message across quickly and enable faster decision making, noise and extraneous detail have to be minimized. There is a large amount of research available on human perception and how this affects the interpretation of visualizations, and those producing them should be aware how not only the choice of chart, but any effects added will affect how the graphic is read. For example, one significant finding is that 3D bar charts, while they might look great, are much worse than 2D bar charts when it comes to accurately portraying data comparisons.
Having said that, where advanced software tools definitely are required is in the realm of real-time data visualization. In order to monitor and communicate rapidly changing information, such as the data from IoT sensors, we have to be able to constantly update our graphics. Seeing the data change in real-time allows us to react faster, and even anticipate problems and trends. In a recent project, a team from Datatonic (a Europe-based analytics consultancy) analyzed London transport data with Google Cloud Platform, combining real-time visualization with machine learning to predict areas of congestion.
As big data plays a key role in more and more industries and sectors (for example in crime analysis), the use of graphics to communicate that data will also increase. However, if those visualizations are to provide value, it is essential that they are carefully considered, that they are neither an afterthought nor a final outcome. Therefore, a successful big data strategy has to include investment in both the appropriate software and the proper training of data scientists to produce those effective visualizations.
With the IoT still in its infancy, with wearables and other related technologies on the horizon, the volume and complexity of big data is only going to grow; and with faster channels of communication, faster decision-making could increasingly provide a competitive advantage. New technologies such as AR and VR have the potential to revolutionize visualization techniques, and machine learning will be used alongside them to provide valuable new insights. With all of that in mind, it is vital that businesses are prepared with the right tools to take advantage - including trained, experienced staff, and best practices engrained into their organization.