Tag Archives: data viz

Use the force of tables, but choose wisely.

“You must unlearn what you have learned”, said Master Yoda. Tables are not visuals! Truth? Have you ever heard that?

Nothing more wrong. Tables are a very powerful tool for visualizing data if you use them wisely. The main advantage of tables is the ability to present several measures for the same category in one row. This allows your audience to make quicker decisions because all important information is “on the table”.

However, the human brain READ the table. There are plenty back and forward iterations which it does to understand table content. So to make understanding easier, some additional elements should be introduced into tables. In the end, we don’t want to overload the lazy brains of our audience. Let’s see how we can improve tables to make them more accessible for people.

What makes the bottom table better than this at the top? There are several bullet points, which I’m going to address. You should have already noticed titles. Titles, itself, are introducing a huge difference.

Flat table

This table is simply flat. All information is at the same level, which means that they equally attract your attention. Nothing is highlighted, except for the second rows… which is unnecessary. Well, it’s hard to read, right? There are more sins: small fonts, cluttering elements such as lines, grey backgrounds, no formats of values.

Meaningful table

In the table, I’ve introduced information hierarchy by using different font colour. Rows and columns headers are in the background. Values have the darker, bold font. What is more, visual elements are added. Bars differentiate revenue volume, RAG icons simply convey the message about target realization, arrows indicate the direction of the year over year change. Columns headers well describe a column content and columns order leads through information importance.

Spaghetti Monster. Visualising multicategories.

When I say “multicategories”, I mean more than 4 categories. Sometimes, a challenge of visualizing multicategories is like an old polish proverb “eat a cookie and have a cookie”, which is hard to put into practice. I often observe how data analysts try to approach this challenge. Common scenarios are for products, countries, businesses, departments, teams, agents or cost centres. For all these data, they try to find out meaningful insights by depict patterns and highlight interesting points… mostly on one chart. That visual decision creates a beautiful piece of abstract art riched in colours, shapes, different sizes of objects, patterns or crossing lines.

Could you imagine how someone must be determined and persistent to look at and try to understand the column bar chart with 15 categories presented on 5 years horizon? Which category has upward and downward trend? Which is a leading one? And foremost which one is which?

When I think about “multicategories”, my first association popping into my head is “clutter”. The clutter is one of the greatest factors of cognitive overload. To understand clutter impact imagine that, you try to talk to your friend in a crowded space like a bar. You are all ears to follow her or his, and even then, you are not successful. The same effort your brain does, when it is exposed to the flashy visualization.

So how to overcome this challenge?

Both visualisation present the same information:

  • trends over time,
  • product comparison,
  • the best and the worst-selling products.
Spaghetti Monster
Clear view

Doesn’t it look like Spaghetti Monster? You rummage with a fork to find a juicy bit of meat. Similar is with decoding some information from this visualisation (line chart), it costs a lot of effort and time. Our brain decoding one information eg. line colours, then stores it in memory, then compares lines position on the chart, then look for trends for each of lines goes back and forward through the chart to make a sense of it.

On the second approach, I’m showing the different strategy to present the same data set. Splitting information into two visualisations gives clarity and ability to draw a conclusion. The left chart enables the receiver to compare sales amount between products and memorise them easily. The right chart provides information about particular product trend in comparison to average sales. In this approach, we guide the audience through data. We pay their attention to important points. We don’t leave them alone having hope that they draw a conclusion on their own.

We are the data storytellers.

Let’s start from 0

I came up with the idea for this article on the last webinar, which I had the pleasure to conduct with my coworkers. One of the participants paid attention to the starting point of the line chart, which I presented. He noticed that the starting point of the axis wasn’t in “0”. He addressed it with the famous book by Alberto Cairo “How Charts Lie” and commented that the line chart should have started at 0.

There is no doubt when it comes to the bar chart that it should ALWAYS start at 0. Bar charts encode data by length. People have developed the ability to compare objects in terms of length for thousands of years of evolution. Thanks to that they were able to estimate how high the food hangs on a tree branch or compare themselves with the enemy to fight or escape. Placing starting point in non-zero skews data and misleads our audience, because in the first place, unconsciously, they will start comparing bars length.

Of course, we can label bars and axes properly. The crime would be to switch off the Y-axis (in such case), what I observe from time to time. But even then in our brain, there is cognitive dissonance. Numbers don’t reflect lengths and proportions. Lengths and proportions are what our brain will remember because numbers are quite fresh phenomenon for our brain.

Let’s compare below examples for the bar and line chart with zero and the non-zero starting point and check what consequences it might have in the interpretation of data.

Skewed Y-axis & Bar Chart

To have no heart attack in the near future and be still in fit, WHO (World Health Organization) recommends taking a 10 000 steps per day. There are plenty of apps which can track your daily physical activities. Above charts presenting my recent results from the same range of dates. On the left side chart, a proper baseline is applied in 0. All daily results fluctuating nearby the daily goal. In one second, the level of dopamine in my blood pomps up looking at bars achieving the daily target.

The right chart doesn’t give me a reason to be proud of my self at first glance. Firstly, my brain notices gaps between bars and target line. And OMG, twice I almost took no steps! If you don’t notice Y-axis label, you can interpret this chart so dramatically. Worse, if you just had a chance to see it for a few second, you would probably make such a conclusion. Your brain wouldn’t have time to notice Y-axis labelling. But two times I exceeded the target more than twice. Awesome! Everything WRONG.

Skewed Y-axis & Line Chart

A different situation is with line charts. There is no length to compare. There are only slops and positions. In this case, context and narration play first fiddles.

On these line charts, the same data set is presented. From the chart on the left side, we can take out a similar story. The performance is almost aligned with the target. However, looking on the right chart, our brain doesn’t make automated assumptions on lengths because there are no lengths. We see connected dots.

And now is a question. Does the non-zero axis skews data at line chart or not?

There is a discussion around it. Still, non-zero baseline, even though there are no bars to compare, can mislead the audience, presenting steep slope of tiny mountains. However, in some cases, having a particular purpose in mind, it can be the best option to choose. Non-zero axis at line charts is good for presenting minor fluctuations or changes of phenomenons like stock exchange rates, products quality tracking (production series) etc. Especially, tracking performance within companies. Even small changes can have a huge impact.

In our scenarios. Well, to pat on the back myself, I would choose the bar chart with “0″ baseline, but to be able to control my daily results in details, I would definitely choose the line chart with non-zero baseline.

Key Ingredient of Compelling Stories. The Power of Context.

Seneca said, “We are more often frightened than hurt, and we suffer more from imagination than from reality.” Imagination is a powerful weapon. Designing compelling data visualizations to sell stories might get a human imagination down to work.

To make it happen, the context is a key player. Without the context, it is hard to understand presented numbers or outcomes. The human brain always seeks for comparisons to create a meaningful picture of the world. In this article, I would like to talk about how we can add context to presenting the behaviour of the phenomenon over time.

From my experience, I often see a single line of eg. revenue, sales, costs or number of claims presented on a line chart. However, without the proper highlighted background, it’s hard to say if what we see is positive or negative. Is this change is for better or worse. Using additional information, the message is strengthened and helps tell a thoughtfully crafted story.

This approach is especially important when the report supports the decision-making process. Quick business insights can be easier revealed when decision-maker can benchmark presented data to thresholds.

Let’s check how different stories can be told. On this chart, we can see a single line represents revenue of company X. Analytical eyes will see the downward trend over time. However, maybe this observation is not so clear for people who have other skills then analytical.

The first story can be about a decline in revenue over two last years. The declines in revenue can be depicted with an added trend line. In real scenario would be good to highlight specific points in time which caused this change.

The second one can focus on now and then. Comparing the two times period, current and last year helps see the magnitude of change. However, it’s good to remember that on such visualization trend over the longer period is lost.

The last one doesn’t emphasize changes over a longer time at all. It just presents performance vs. budget and directs the audience attention to “here and now”.

In conclusion, there are three different contexts for the same dataset, which changing the data perspective. Frankly speaking, combining these three perspectives gives an insightful story of revenue condition.

Numbers with Human Face

Recently, I’ve taken part in a discussion about how to present numbers to convey a message about true people stories.

We often forget that these are not only numbers. Each number represents a human being, his/her tragedy and tragedy of her/his relatives.

Statistics often show numbers, % of populations, rises, falls and trends. There is a huge challenge and effort to depict context and tell the story behind datasets. Especially, when we try to depict in numbers the phenomenon such as #COVID-19. We have to remember that “confirmed cases” are real people, who are diagnosed with coronavirus. A number of deaths is a number of people who lost their lives because of this disease.

Daily, we are exposed to numerous statistics in media, workplaces, schools. They describe current situations, accidents, local and global events like car accidents, infants mortality or unemployment. Most of them are expressed as a ratio or percentage. These formats are not intuitive and for most people are hard to interpret. However, there are some methods, which connect numbers with people. Maybe not with individuals, but with countable human beings, with whom we can empathize.

KPI approach

A good example is the unemployment rate, which is one of the most important economic indicators. In the governmental statistics, unemployment is presented as a ratio of employees to all people who can work.

An unemployment rate expressed as a percentage does not cause any emotions among most of us. Most of us understand what see, but … it is nothing personal. Percentages are abstract objects. It is about closer indefinite part of the population throughout the country.

As studies show, we can transform this message in a way to evoke people feelings and make them start to take a more human perspective. Instead of abstract 20%, we can present that 1 out of 5 people is unemployed. Each of us can count to five. Each of us can easily list five people. Behind this number, people’s faces may stand. In such a small group of people, our neighbour or our family member may be out of work. This is no longer an abstraction but a very real threat.

Human approach

The Nature of the Phenomenon. Linear vs. logarithmic scale

The one dataset, two charts, two opposite stories.

The introduced scale has a huge impact on how we digest and interpret the presented data. The linear scale represents natural numbers, which we can easily compare. The logarithmic scale is not intuitive for us. It’s a mathematical concept, which we can use when we want to describe multiplicative factors or when is a huge skewness towards large numbers. We need to use brainpower to understand it. What is more, we are so used to linear one that we can easily overlook that visual is depicted on a logarithmic scale. We should inform our audience that logarithmic is used… and make sure that they understand how to read it.

Because of COVID-19 huge amount of statistic are generated and published across the internet. Those statistics try to tell a story about COVID-19 phenomenon. Most of them focus on a number of confirmed cases and deaths. I notice two data visualisation’s trends regarding presenting data about this virus. The first one concentrates on the growth of a total number of confirmed cases and the second one on the pace of disease spreading.

Let’s feel the difference.

“PANIC chart” — I saw somewhere a good name of such a linear chart. I couldn’t more agree. Tell me, what feelings this chart evokes in you?

This is an exponential chart (another mathematical concept), which depict the growth of the phenomenon. Very rapid growth to be specific.

Below we can see the same data. However, embedded on a different scale. Please, look carefully. Each grid represents 10 to n power. Don’t you think that the below chart isn’t so scary?

What stories these two charts tell us?

Let’s base them on 18th of Mar and 4th of Apr. The Linear chart tells us that till 18th of Mar nothing spectacular happened. Totally opposite to the Logarithmic one, where we can see the fastest growth of confirmed cases. Between 18th and 4th on the Linear, we can see the huge growth. On the second one, the pace of growth decelerates. After 4th of April, the Linear continues to present the same pace of growth (steep hill), but on the Logarithmic, it’s plain to see that the curve flattens.

PIES ARE FOR EATING NOT FOR DISPLAYING DATA.

Recently, I have had an interesting discussion about the pie charts. My interlocutor claimed that the pie charts should be in usage because they are intuitive, and people decoding information from them very quickly. Well, it’s exactly opposite, how research shows. The human brain cannot quickly and accurately compare several angles. What is more, on the pie chart there are not only angles but areas and colours which confuse the human brain as well. As proof, check out the paper of W. Cleveland & R. McGill about the visual decoding of quantitative information. So, how to cope with part-to-whole cases? What instead of the pie charts? How to effectively convey information?

Paulina, what is wrong with you? Why do you hate those pie charts so much? Look. They are based on the ideal shape, and you can use all your most favour colours at once!

Unfortunately, those colours and this ideal shape are a true curse in the clear interpretation of the data. Research shows that the human brain has problems with decoding quantitive information presented by those three attributes:

1. angles,

2. areas,

3. colours.

Let’s see an example. Imagine that parts of the wheel are product categories which you offer in your store. You would like to find out which category is the most profitable one. What kind of question would you ask?

Which category sales the best? Which one is the worst?

Let’s see how visual decisions can affect the process and speed of getting valuable insights.

Note: I’m not adding data labels on purpose. I want to focus only on visual decoding (without text support).

PIE CHART

I would never choose a pie chart to compare more than two categories. Are you able quickly and accurately answer questions statement in the example? I’m not. NOT AT ALL.

100 % STACKED BAR CHART

We can instead of pie chart use 100% stacked bar chart, up to some point. Research shows that people are quite good at comparing lengths. However, colours can distort lengths. The more saturated colours, the larger the object seems to be. The second issue is again with many categories. The more of them, the workload for comparing elements increase.

STACKED BAR CHART

This is my first choice. There is no field to mislead anybody with a stacked bar chart.

As I mentioned, people are quite good at comparing lengths. When we use one colour of bars, we can be 100% sure that no one will have issues to recognize the longest and the shortest bar. One blink of an eye and you understand what you see. For your lazy brain, it is pure magic.

The Power of Alerts.

Is all information are equally important for running your business smoothly? Or one is more desirable than others? Do you need to see detailed data on the first page of your report and devote precious time to analyze and interpret them? Or maybe it’s better to look at a carefully selected information with additional colour, which guides you through electrifying insights?

Often, when I talk to clients, I find out that most of them still live in the world of long charts and tables. As if a true report should be very detailed, very extensive and covers large areas of the subject.

My approach, which I offer, is slightly different. If we would like to craft an insightful dashboard, it’s good to follow several rules:

1. Select information, which is relevant and depicts the business condition.

2. Design indicators that will change daily and alarming colour coding can be easily applied.

3. Keep it simple. Don’t overload it.

4. The rest of the information is saved for sub-pages.

We aiming in designing a tool, which helps to make a data-driven decision for business decision-makers. Very often, they just have a blink of an eye to see a complex situation.

Let’s check an example. Let’s imagine that you are the Sales Director. What information you would like to check sipping morning coffee?

On below two approaches, we present exactly the same information:

– current sales,

– change over time,

– budget,

– trend.

However, the difference is huge when we take into account the speed of digesting the information, the making sense of it, and perhaps actions, which we are going to take.

Analytical approach

The analytical approach doesn’t provide quick insights. Some time must be involved to understand what a graph presents, then find out the number for current year sales, then to compare bars, then to estimate how large gap is between current and last year sales and budget.

KPI approach

On the KPI approach (Key Performance Indicators) desirable information is presented at glance. KPIs are selected to tell a story and facilitate to understand the situation. No additional effort is needed. You as the Sales Director can easily conclude and act. You just see it.

Time flies! Categorical data embedded in time.

Presenting time-related categorical data can be tricky. Fortunately, there are some good practices which guide us on how to approach the topic. In this article, you can find a summary of DOs and DON’Ts upon a subject.

The first well-documented calendar systems, which portray the linear nature of the time, appeared in the archaeological record around 5’000 BC. Most of us feel the pressure of time. To express it, we use common sayings like “Time is money” or “Time waits for no one”.

The culture shaped human time perception. We think about the time as the arrow shot into space. That has, of course, the impact on data visualization aspect. There are three basic rules regarding presenting data in the context of time. By following them, people easily digest information and form conclusions.

1. Use left-right direction.

2. Keep chronological order.

3. Use typical time units.

Another thing to consider is proper visualization. The decision of using one visualization than another depends on the aspect of continues or categorical data type. Continuous data have only one option — line graph. However, categorical data, which are clustered in periods or bins, can be more tricky.

In today’s scenario, let’s imagine that we present outcomes (level of satisfaction) from the survey grouped by respondents age. What would be the best choice when the data dimension is embedded in time, but not expressed in typical time units?

Stacked bar chart

The general rule is to present categorical data on a stacked bar chart (using Y-axis), with proper descending sorting. Nevertheless, categories which are linked with the time, for instance: age bins, archaeological periods, process’s phases, the human brain decodes much easier on X-axisstarting from left and ending on right.

Stacked column chart

Presenting time-related categories on X-axis is good to remember to keep chronological order. Like in this example, sorting descending categories by the value are not very effective. The audience must use a cognitive power to understand the meaning of X-axis and figure out the order of age bins in the survey population.

Stacked column chart (chronological order)

Then, when we placed categories on X-axis with chronological order, and like in this example the order is from younger to older, we tell the story of distribution of survey outcomes among respondents population in the age group context. Combination of a column chart and X-axis embeds us within the context of population distribution and helps remember the results.