THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

John Paul Cook

BI Beginner: Using Power BI Desktop Instead of Excel for Visualizing Big Data

Excel is a great tool but Power BI offers more in the realm of visualization. The 64-bit version of Excel is better at processing large files than the 32-bit version, but even it has limitations. Power BI Desktop (BI as in Business Intelligence) is a free alternative to Excel for working with large files and Big Data. It is possible to enable the Power Pivot add-in for Excel and overcome the 1,048,576 row limit. You can extend Excel’s visualization capabilities with Power Map and Power Query, but Power BI can do more. Power BI Desktop has very powerful and easy to use visualization features of interest to anyone who needs to analyze and visualize data. Like with many software packages, it takes longer to read the instructions than it does to actually do the work. Once you know how, this exercise can be completed in just a few minutes. You don’t have to be a geek or a power user to succeed with Power BI. When you’re done creating your visualizations, you can share them with your colleagues over the web.

image

Figure 1. File is too large for Excel.

Use Get Data to load Excel and csv files into Power BI Desktop.

image

Figure 2. Use Get Data to load an Excel or csv file into Power BI Desktop. I selected CSV because that is the type of file I needed to analyze.

The file that had too many rows for Excel was obtained from Kaggle, which is a great source of large, real world, free datasets. I downloaded the Center for Disease Control’s (CDC) mortality dataset from Kaggle. The DeathRecords.csv file is 2,631,171 rows and took about a minute to load on my system that has a 4 GHz processor, 32 GB of ram, and an SSD.

image

Figure 3. Select the file of interest.

image

Figure 4. Click Load to load the selected file.

image

Figure 5. DeathRecords.csv loaded into Power BI Desktop.

Now it is time to do some visualizations. We’ll start with a pie chart showing what ages people died. There is more than one way to accomplish this task. Since we know we want a pie chart, go to Visualizations and select the pie chart icon. Next, select Age as shown. By default, it isn’t what we want. It is a sum and we need a count. After changing to a count, we need to set the Legend to Age.

image

Figure 6. Select pie chart under Visualizations.

image

Figure 7. Select Age.

image

Figure 8. Go to the dropdown to the right of Age under Values and select Count.

image

Figure 9. Under Fields, place the mouse over Age and hold the left mouse button down to drag it under Legend.

image

Figure 10. Release the left mouse button to drop Age under Legend.

image

Figure 11. Completed pie chart.

The next step is to add a stacked column chart visualization.

image

Figure 12. Adding a stacked column chart.

image

Figure 13. Underneath Fields, check MaritalStatus to add it to the stacked column chart.

image

Figure 14. Drag MaritalStatus underneath Value. It correctly defaults to a count.

image

Figure 15. You can collapse Visualizations and Fields. You can also resize your two visualizations and reposition them.

At first it appears that these two visualizations are independent of each other. They are interconnected. Move the mouse and place it over the far right column, the column for the number of widowed people who died. Click once. You will see that what you highlighted, the widowed data, is superimposed on the pie chart. The result is similar to the pie chart popularized by nurse and statistician Florence Nightingale. To undo the selection, click on the W columns again.

image

Figure 16. Completed pie chart and stacked column chart.

image

Figure 17. Click the W for widowed column.

image

Figure 18. After clicking the widowed column, the pie chart is updated to show how as people age, they are more likely to be widowed at the time of death.

You can save your model and come back and work with it later. It is saved with a file extension of pbix. You can also publish the model to the Microsoft cloud and grant access to the model to your colleagues. It’s important to understand that you’re not sharing a static view with your colleagues. You are sharing the data and the dynamic visualization features. If the underlying data is updated, your colleagues will be able to see the visualization change after doing a refresh to get the latest data.

image

Figure 19. Using the published version of the Power BI model from a browser. The M for married column is selected. The pie chart now shows that people who die as younger adults are more likely to be married than those who die as elderly adults.

Power BI can read all types of data sources, not just files. It natively supports reading files, folders of files, relational databases, Big Data stores, and more as you can see in the screen capture below.

image

Figure 20. Datasources supported by Power BI Desktop. If you have a csv file, select csv.

NOTE: Long time readers of my blog have seen numerous references in my posts to making accessibility accommodations. Since my last post, I found this research titled Good Fonts for Dyslexia by Luz Rello and Ricardo Baeza-Yates indicating that italics decrease the readability of text for people with dyslexia. In the past, the captions underneath my screen captures were italicized. I will avoid the use of italics going forward. Also notice that I provided a hyperlink instead of an APA formatted citation. I think MLA and APA formatting are not up with the times. Strict adherence to these legacy formatting rules with origins long before the digital age causes people to waste time on pedantic matters. Use of hyperlinks to primary sources placed directly in the text are more in keeping with the digital millennium providing both attribution and convenience. As always, my opinions are my own and not those of my employer or graduate school.

Published Thursday, June 02, 2016 5:07 PM by John Paul Cook

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(required) 
Submit

About John Paul Cook

John Paul Cook is a Data Platform Solution Architect working out of Microsoft's Houston office. Prior to joining Microsoft, he was a SQL Server MVP. He is experienced in SQL Server and Oracle database application design, development, and implementation. He has spoken at many conferences including Microsoft TechEd and the SQL PASS Summit. He has worked in oil and gas, financial, manufacturing, and healthcare industries. John is also a Registered Nurse who graduated from Vanderbilt University with a Master of Science in Nursing Informatics and is an active member of the Sigma Theta Tau nursing honor society. He volunteers as a nurse at a clinic that treats low income and uninsured patients. Contributing author to SQL Server MVP Deep Dives and SQL Server MVP Deep Dives Volume 2. Opinions expressed in John's blog are strictly his own and do not represent Microsoft in any way.

This Blog

Syndication

Privacy Statement