Excel is a great tool but Power BI offers more in the realm of visualization. The 64-bit version of Excel is better at processing large files than the 32-bit version, but even it has limitations. Power BI Desktop (BI as in Business Intelligence) is a free alternative to Excel for working with large files and Big Data. It is possible to enable the Power Pivot add-in for Excel and overcome the 1,048,576 row limit. You can extend Excel’s visualization capabilities with Power Map and Power Query, but Power BI can do more. Power BI Desktop has very powerful and easy to use visualization features of interest to anyone who needs to analyze and visualize data. Like with many software packages, it takes longer to read the instructions than it does to actually do the work. Once you know how, this exercise can be completed in just a few minutes. You don’t have to be a geek or a power user to succeed with Power BI. When you’re done creating your visualizations, you can share them with your colleagues over the web.
Figure 1. File is too large for Excel.
Use Get Data to load Excel and csv files into Power BI Desktop.
Figure 2. Use Get Data to load an Excel or csv file into Power BI Desktop. I selected CSV because that is the type of file I needed to analyze.
The file that had too many rows for Excel was obtained from Kaggle, which is a great source of large, real world, free datasets. I downloaded the Center for Disease Control’s (CDC) mortality dataset from Kaggle. The DeathRecords.csv file is 2,631,171 rows and took about a minute to load on my system that has a 4 GHz processor, 32 GB of ram, and an SSD.
Figure 3. Select the file of interest.
Figure 4. Click Load to load the selected file.
Figure 5. DeathRecords.csv loaded into Power BI Desktop.
Now it is time to do some visualizations. We’ll start with a pie chart showing what ages people died. There is more than one way to accomplish this task. Since we know we want a pie chart, go to Visualizations and select the pie chart icon. Next, select Age as shown. By default, it isn’t what we want. It is a sum and we need a count. After changing to a count, we need to set the Legend to Age.
Figure 6. Select pie chart under Visualizations.
Figure 7. Select Age.
Figure 8. Go to the dropdown to the right of Age under Values and select Count.
Figure 9. Under Fields, place the mouse over Age and hold the left mouse button down to drag it under Legend.
Figure 10. Release the left mouse button to drop Age under Legend.
Figure 11. Completed pie chart.
The next step is to add a stacked column chart visualization.
Figure 12. Adding a stacked column chart.
Figure 13. Underneath Fields, check MaritalStatus to add it to the stacked column chart.
Figure 14. Drag MaritalStatus underneath Value. It correctly defaults to a count.
Figure 15. You can collapse Visualizations and Fields. You can also resize your two visualizations and reposition them.
At first it appears that these two visualizations are independent of each other. They are interconnected. Move the mouse and place it over the far right column, the column for the number of widowed people who died. Click once. You will see that what you highlighted, the widowed data, is superimposed on the pie chart. The result is similar to the pie chart popularized by nurse and statistician Florence Nightingale. To undo the selection, click on the W columns again.
Figure 16. Completed pie chart and stacked column chart.
Figure 17. Click the W for widowed column.
Figure 18. After clicking the widowed column, the pie chart is updated to show how as people age, they are more likely to be widowed at the time of death.
You can save your model and come back and work with it later. It is saved with a file extension of pbix. You can also publish the model to the Microsoft cloud and grant access to the model to your colleagues. It’s important to understand that you’re not sharing a static view with your colleagues. You are sharing the data and the dynamic visualization features. If the underlying data is updated, your colleagues will be able to see the visualization change after doing a refresh to get the latest data.
Figure 19. Using the published version of the Power BI model from a browser. The M for married column is selected. The pie chart now shows that people who die as younger adults are more likely to be married than those who die as elderly adults.
Power BI can read all types of data sources, not just files. It natively supports reading files, folders of files, relational databases, Big Data stores, and more as you can see in the screen capture below.
Figure 20. Datasources supported by Power BI Desktop. If you have a csv file, select csv.
NOTE: Long time readers of my blog have seen numerous references in my posts to making accessibility accommodations. Since my last post, I found this research titled Good Fonts for Dyslexia by Luz Rello and Ricardo Baeza-Yates indicating that italics decrease the readability of text for people with dyslexia. In the past, the captions underneath my screen captures were italicized. I will avoid the use of italics going forward. Also notice that I provided a hyperlink instead of an APA formatted citation. I think MLA and APA formatting are not up with the times. Strict adherence to these legacy formatting rules with origins long before the digital age causes people to waste time on pedantic matters. Use of hyperlinks to primary sources placed directly in the text are more in keeping with the digital millennium providing both attribution and convenience. As always, my opinions are my own and not those of my employer or graduate school.