In this blog, we will look into different methods of data transformation, data exploration and data visualization using Power BI.
In this blog, we will look into different methods of data transformation, data exploration and data visualization using Power BI.
Power BI transforms your data into visually immersive and interactive insights. It connects your multiple sources of data with the help of apps, software services, and connectors.
Whether you save your data on an excel spreadsheet, on cloud premises, or on on-premises data warehouses, Power BI gathers and shares your data easily with anyone whenever you want.
The use of it may vary depending on the purpose you need to fulfill. Mostly, the software is used for presenting reports and viewing data dashboards and presentations. If you are responsible for creating reports, presenting weekly datasheets, or even being involved in data analysis then probably you might make extensive use of Power BI Desktop or Report Builder to create reports. Also, it allows you to publish your report to its service where you can view and share it later.
Whereas developers use Power BI APIs to push data into datasets or to embed dashboards and reports into their own custom applications.
Let’s learn how Power BI works step by step:
On the dashboard, there are a number of options to use for uploading or importing your dataset. So, the first step is to import your dataset. The software supports a number of data reports formats that we discussed earlier. Let’s say you add an excel sheet to Power BI, for that click on excel workbook on the main screen and simply select the file you want to upload.
As your data is visible now, first you need to perform data pre-processing which requires cleaning up your data and then transforming your data. As you click on transform data, you will be taken to the power query editor.
Power Query is the engine behind Power BI. All the data pre-processing is going to be done in this window. It cleans and import millions of rows into the data model to help you perform data analysis after.
The tool is simple to use and requires no code to do any task. With the help of Power Query, it is possible to Extract, Transform, and Load the data. The tool offers the following benefits and simplify the tasks you perform regularly:
You can check out a number of Power BI visualizations that you can choose from the visualization pane. Simply choose from the range of visuals available in the panel.
You can create custom data visualizations if you can’t find the visual you want in AppSource. To differentiate your organization and build something distinctive, personalize data visualizations. When they’re ready, you can share what you’ve created with your team or publish it to its community.
Working with the eye-catching visuals increase comprehension, retention, and appeal that help you interact with your data and make informed decisions quickly.
Watch this video to learn each step of developing visuals for your specific industry and business:
It is a data visualization and analysis tool that offers different types of visualizations. The most popular and useful ones are Charts, Maps, Tables, and Data Bars.
Charts are a simple way to present data in an easy-to-understand format. They can be used for showing trends, comparisons or changes over time. A map is a great way to show the geographical location of certain events or how they relate to each other on a map. A table provides detailed information that can be sorted by columns and rows so it’s easier to analyze the information in the table. Data bars are used to show progress towards goals or targets with their height representing the amount of progress made.
Senior Business Intelligence Analyst
Senior Software Engineer
Recently, the use of this tool has increased and has been adopted widely in multiple industries. It includes IT, healthcare, financial services, insurance, staffing & recruiting, and computer software. Some of the major companies that use the tool include:
Conde Nast (USA)
Hospital Montfort (Canada)
Kraft Heinz Co (USA)
Rolls-Royce Holdings PLC (UK)
The average annual salary of a Power BI professional in Unites States is $100,726 /yr.
The advantage of this visualization tool is its ease of use, even by people who don’t consider themselves to be very technologically proficient. As long as you have access to the data sources, the dashboard, and a working network connection, you can use it to process the information, create the necessary reports, and send them off to the right teams or individuals.
Start learning Power BI today with Data Science Dojo and excel your career
In this blog, we will discuss the key ingredients for a great chart. We will highlight the Data Science Dojo session held by Nick Desbarats.
The current world relies on data visualization for things to run smoothly. There have been multiple research projects on nonverbal communication and many researchers came to comparable results that 93% of all communication is nonverbal. Whether you are scrolling on social media or watching television, you are consuming data. Data scientists strongly believe that data can create or break your business brand.
The concept of content marketing strategy requires you to have a unique operating model to attain your business objective. Remember that everybody is busy, and no one has time to read dull content on the internet.
This is where the art of data visualization comes in to help the dreams of many digital marketers come true. Below are some practical data visualization techniques that you can use to supercharge your content strategy!
Everybody loves to read the information they can rely on and use in decision-making. When you present data to your audience in the form of visualization make sure the data is accurate and mention its source to gain the trust of your audience. You need to ensure that all the information you have is highly accurate and can be utilized in decision-making.
If your business brand presents inaccurate data, you are likely to lose many potential clients who depend on your Company. Obviously, customers are likely to come and view your visual content, but they won’t be happy because your data is inaccurate. Remember that there is no harm in gathering information from a third-party source. You only need to ensure that the information is accurate.
According to the ERP-information data can never be 100% accurate but it can be more or less accurate depending on how close it adheres to reality. The closer that data sticks to reality, the higher its accuracy.
Posting real-time data is an excellent way of attracting a significant number of potential customers. Many people opt for brands that present data on time, depending on the market situation. This strategy proved to be efficient during the black Friday season, whereby companies recorded a significant number of sales within the shortest time.
In addition, real-time data plays a critical role in building trust between a brand and its customers. When customers realize that you are posting things that are just happening, their level of true skyrockets.
Once you have decided about including visual content in your content strategy, you also need to find out an exciting story that the visual will present to the audience. Before you start authoring the story, think about the ins and outs of your content to ensure that you have nailed everything in your head.
You can check out the types of visual content that have been created by some of the big brands on the internet. Try to mimic how these brands present their stories to the audience.
Promoting imagery content does not mean that you need to spend the whole day working on a single visual. Create simpler and more interactive excel charts (Bar chart, Line chart, Sankey diagram, and Box and Whisker Plot, etc.) to encourage your audience. This is not what promoting means! It means that you need to communicate to your audience directly through different social media platforms.
Also, you can opt to send direct emails, given the fact that you have their contact details. The ultimate goal of this campaign is to make your visual go viral across the internet and reach as many people as possible. Ensure that you know your target audience to make your efforts yield profit.
Representation of data plays a fundamental role when developing a unique identity for your brand. You have the power to use visuals to make your brand stand out from your competitors. Collecting and presenting unique data gives you an added advantage in business that makes you unique.
To achieve this level of big data, you need to conduct in-depth research and dig down across different variables to find unique data. Even though it may sound simple, this is not the case. Also, selecting big data is simple, but the complexity comes with selecting the most appropriate data points.
Getting to know your audience is a fundamental aspect that you should always consider. It gives you detailed insights not about understanding the nature of your content but also about promoting your visualization. To be able to encourage your visualization ideally, you need to understand your audience.
When designing different visualization types, you should also channel all your eyes to the platform you are targeting. Decide on the media where you are sharing various types of content depending on the nature of the audience available on the respective platforms.
Conduct in-depth research to understand what works for you and what doesn’t work. For instance, one of the benefits of data visualization is that it reduces the time it takes to read through loads of content. If you are mainly writing content for your readers to share across the market audience, a maximum of two hundred and thirty words is enough.
It is an art and science that requires you to conduct remarkable research to uncover essential information. Once you uncover the necessary information, you will definitely get to know your craft.
The digital marketing world involves continuous learning to remain at the top of the game. The best way to learn in business is to monitor what the developed brands are doing to succeed. You can learn the content strategy used by international companies such as Netflix to get a test of what it means to promote your brand across its target market.
After conducting your research and settling on a story that reciprocates your brand, you have to gather the Respective tools necessary to generate the story you need. You would acquire creative tools with a successful track record of developing quality output.
There are multiple data visualization tools on the web that you can choose and use. However, some people recommend starting from scratch, depending on the nature of the output they want. Some famous data visualization tools are Tableau, Microsoft Excel, Power BI, ChartExpo, and Plotly.
Do not forget about the power of research and testing. Acquire different tools to help you conduct research and test different elements to check if they can work and generate the desired results. You should be keen to analyze what can work for your business and what cannot.
The business world is in dire need of representing data to enhance competitive content strategies. A study done by the Wharton School of Business has revealed that appealing visuals of complex data can shorten a business meeting by 24% since all the essential elements are outlined clearly. However, to grab the attention of your target market, you need to come up with something unique to be successful.
Data visualization tools are used to gain meaningful insights from data. Learn how to build visualization tools with examples.
The content of this blog is based on examples/notes/experiments related to the material presented in the “Building Data Visualization Tools” module of the “Mastering Software Development in R” Specialization (Coursera) created by Johns Hopkins University .
ggplot2, a system for “declaratively” creating graphics, based on “The Grammar of Graphics.”
gridExtra, provides a number of user-level functions to work with “grid” graphics.
dplyr, a tool for working with data frame-like objects, both in and out of memory.
viridis, the Viridis color palette.
ggmap, a collection of functions to visualize spatial data and models on top of static maps from various online sources (e.g Google Maps)
# If necessary to install a package run # install.packages("packageName") # Load packages library(ggplot2) library(gridExtra) library(dplyr) library(viridis) library(ggmap)
ggplot2 package includes some datasets with geographic information. The
ggplot2::map_data() function allows to get map data from the
maps package (use
?map_data form more information).
<code class="highlighter-rouge">italy dataset  is used for some of the examples below. Please note that this dataset was prepared around 1989 so it is out of date, especially information pertaining to provinces (see
# Get the italy dataset from ggplot2 # Consider only the following provinces "Bergamo" , "Como", "Lecco", "Milano", "Varese" # and arrange by group and order (ascending order) italy_map <- ggplot2::map_data(map = "italy") italy_map_subset <- italy_map %>% filter(region %in% c("Bergamo" , "Como", "Lecco", "Milano", "Varese")) %>% arrange(group, order)
Each observation in the dataframe defines a geographical point with some extra information:
lat, longitude and latitude of the geographical point
group, an identifier connected with the specific polygon points are part of – a map can be made of different polygons (e.g. one polygon for the mainland and one for each island, one polygon for each state, …)
order, the order of the point within the specific
group– how all of the points are part of the same
groupshould be connected in order to create the polygon
region, the name of the province (Italy) or state (USA)
head(italy_map, 3) ## long lat group order region subregion ## 1 11.83295 46.50011 1 1 Bolzano-Bozen ## 2 11.81089 46.52784 1 2 Bolzano-Bozen ## 3 11.73068 46.51890 1 3 Bolzano-Bozen
Having spatial information in the data gives the opportunity to map the data or, in other words, visualizing the information contained in the data in a geographical context. R has different possibilities to map data, from normal plots using longitude/latitude as
x/y to more complex spatial data objects (e.g. shapefiles).
The most basic way to create maps with your data is to use
ggplot2, create a ggplot object and then, add a specific geom mapping longitude to
x aesthetic and latitude to
y aesthetic  . This simple approach can be used to:
Create a map showing “Bergamo,” Como,” “Varese,” and “Milano” provinces in Italy using simple points…
When plotting simple points the
geom_point function is used. In this case the polygon and order of the points is not important when plotting.
italy_map_subset %>% ggplot(aes(x = long, y = lat)) + geom_point(aes(color = region))
Create a map showing “Bergamo,” Como,” “Varese,” and “Milano” provinces in Italy using lines…
geom_path function is used to create such plots. From the R documentation,
geom_path “… connects the observation in the order in which they appear in the data.” When plotting using
geom_path is important to consider the polygon and the order within the polygon for each point in the map.
The points in the dataset are grouped by
region and ordered by
order. If information about the region is not provided then the sequential order of the observations will be the order used to connect the points and, for this reason, “unexpected” lines will be drawn when moving from one
region to the other.
On the other hand if information about the region is provided using the
color aesthetic, mapping to
region, the “unexpected” lines are removed (see example below).
plot_1 <- italy_map_subset %>% ggplot(aes(x = long, y = lat)) + geom_path() + ggtitle("No mapping with 'region', unexpected lines") plot_2 <- italy_map_subset %>% ggplot(aes(x = long, y = lat)) + geom_path(aes(group = region)) + ggtitle("With 'group' mapping") plot_3 <- italy_map_subset %>% ggplot(aes(x = long, y = lat)) + geom_path(aes(color = region)) + ggtitle("With 'color' mapping") grid.arrange(plot_1, plot_2, plot_3, ncol = 2, layout_matrix = rbind(c(1,1), c(2,3)))
ggplot2 is possible to create more sophisticated maps like choropleth maps . The example below, extracted from , shows how to visualize the percentage of Republican votes in 1976 by states.
# Get the USA/ state map from ggplot2 us_map <- ggplot2::map_data("state") # Use the 'votes.repub' dataset (maps package), containing the percentage of # republican votes in the 1900 elections by state. Note # - the dataset is a matrix so it needs to be converted to a dataframe # - the row name defines the relevant state votes.repub %>% tbl_df() %>% mutate(state = rownames(votes.repub), state = tolower(state)) %>% right_join(us_map, by = c("state" = "region")) %>% ggplot(mapping = aes(x = long, y = lat, group = group, fill = `1976`)) + geom_polygon(color = "black") + theme_void() + scale_fill_viridis(name = "RepublicannVotes (%)")
ggmappackage, Google Maps API and others
Another way to create maps is to use the
ggmap package (see Google Maps API Terms of Service). As stated in the package description…
“A collection of functions to visualize spatial data and models on top of static maps from various online sources (e.g Google Maps). It includes tools common to those tasks, including functions for geolocation and routing.” R Documentation
The package allows to create/plot maps using Google Maps and few other service providers, and perform some other interesting tasks like geocoding, routing, distance calculation, etc. The maps are actually ggplot objects making possible to reuse the
ggplot2 functionality like adding layers, modify the theme, etc…
“The basic idea driving ggmap is to take a downloaded map image, plot it as a context layer using ggplot2, and then plot additional content layers of data, statistics, or models on top of the map. In ggmap this process is broken into two pieces – (1) downloading the images and formatting them for plotting, done with get_map, and (2) making the plot, done with ggmap. qmap marries these two functions for quick map plotting (c.f. ggplot2’s ggplot), and qmplot attempts to wrap up the entire plotting process into one simple command (c.f. ggplot2’s qplot).” 
How to create and plot a map…
ggmap::get_mapfunction is used to get a base map (a
ggmap object, a raster object) from different service providers like Google Maps, OpenStreetMap, Stamen Maps or Naver Maps (default setting is Google Maps). Once the base map is available, then it can been plotted using the
ggmap::ggmap function. Alternatively the
ggmap::qmap function (quick map plot) can be used.
# When querying for a base map the location must be provided # name, address (geocoding) # longitude/latitude pair base_map <- get_map(location = "Varese") ggmap(base_map) + ggtitle("Varese") # qmap is a wrapper for # `ggmap::get_map` and `ggmap::ggmap` functions. qmap("Varese") + ggtitle("Varese - qmap")
How to change the zoom in the map…
zoom argument (default value is
ggmap::get_map the function can be used to control the zoom of the returned base map (see
?get_map for more information). Please note that the possible values/range for the
zoom argument changes with the different sources.
# An example using Google Maps as a source # Zoom is an integer between 3 - 21 where # zoom = 3 (continent) # zoom = 10 (city) # zoom = 21 (building) base_map_10 <- get_map(location = "Varese", zoom = 10) base_map_18 <- get_map(location = "Varese", zoom = 16) grid.arrange(ggmap(base_map_10) + ggtitle("Varese, zoom 10"), ggmap(base_map_18) + ggtitle("Varese, zoom 18"), nrow = 1)
How to change the type of map…
maptype argument in
ggmap::get_map the function can be used to change the type of map aka map theme. Based on the R documentation (see
?get_map for more information)
‘[maptype]… options available are “terrain”, “terrain-background”, “satellite”, “roadmap”, and “hybrid” (google maps), “terrain”, “watercolor”, and “toner” (stamen maps)…’.
# An example using Google Maps as a source # and different map types base_map_ter <- get_map(location = "Varese", maptype = "terrain") base_map_sat <- get_map(location = "Varese", maptype = "satellite") base_map_roa <- get_map(location = "Varese", maptype = "roadmap") grid.arrange(ggmap(base_map_ter) + ggtitle("Terrain"), ggmap(base_map_sat) + ggtitle("Satellite"), ggmap(base_map_roa) + ggtitle("Road"), nrow = 1)
How to change the source for maps…
While the default source for maps with
ggmap::get_map is Google Maps, it is possible to change the map service using the
source argument. The supported map services/sources are Google Maps, OpenStreeMaps, Stamen Maps, and CloudMade Maps (see
?get_map for more information).
# An example using different map services as a source base_map_google <- get_map(location = "Varese", source = "google", maptype = "terrain") base_map_stamen <- get_map(location = "Varese", source = "stamen", maptype = "terrain") grid.arrange(ggmap(base_map_google) + ggtitle("Google Maps"), ggmap(base_map_stamen) + ggtitle("Stamen Maps"), nrow = 1)
How to geocode a location…
ggmap::geocode function can be used to find latitude and longitude of a location based on its name (see
?geocode for more information). Note that Google Maps API limits the possible number of queries per day,
geocodeQueryCheck can be used to determine how many queries are left.
# Geocode a city geocode("Sesto Calende") ## lon lat ## 1 8.636597 45.7307 # Geocode a set of cities geocode(c("Varese", "Milano")) ## lon lat ## 1 8.825058 45.8206 ## 2 9.189982 45.4642 # Geocode a location geocode(c("Milano", "Duomo di Milano")) ## lon lat ## 1 9.189982 45.4642 ## 2 9.191926 45.4641 geocode(c("Roma", "Colosseo")) ## lon lat ## 1 12.49637 41.90278 ## 2 12.49223 41.89021
How to find a route between two locations…
ggmap::route function can be used to find a route from Google using different possible modes, e.g. walking, driving, … (see
?ggmap::route for more information).
“The route function provides the map distances for the sequence of “legs” which constitute a route between two locations. Each leg has a beginning and ending longitude/latitude coordinate along with a distance and duration in the same units as reported by mapdist. The collection of legs in sequence constitutes a single route (path) most easily plotted with geom_leg, a new exported ggplot2 geom…” 
route_df <- route(from = "Somma Lombardo", to = "Sesto Calende", mode = "driving") head(route_df) ## m km miles seconds minutes hours startLon startLat ## 1 198 0.198 0.1230372 52 0.8666667 0.014444444 8.706770 45.68277 ## 2 915 0.915 0.5685810 116 1.9333333 0.032222222 8.705170 45.68141 ## 3 900 0.900 0.5592600 84 1.4000000 0.023333333 8.702070 45.68835 ## 4 5494 5.494 3.4139716 390 6.5000000 0.108333333 8.691054 45.69019 ## 5 205 0.205 0.1273870 35 0.5833333 0.009722222 8.648636 45.72250 ## 6 207 0.207 0.1286298 25 0.4166667 0.006944444 8.649884 45.72396 ## endLon endLat leg ## 1 8.705170 45.68141 1 ## 2 8.702070 45.68835 2 ## 3 8.691054 45.69019 3 ## 4 8.648636 45.72250 4 ## 5 8.649884 45.72396 5 ## 6 8.652509 45.72367 6 route_df <- route(from = "Via Gerolamo Fontana 32, Somma Lombardo", to = "Town Hall, Somma Lombardo", mode = "walking") qmap("Somma Lombardo", zoom = 16) + geom_leg( aes(x = startLon, xend = endLon, y = startLat, yend = endLat), colour = "red", size = 1.5, alpha = .5, data = route_df) + geom_point(aes(x = startLon, y = startLat), data = route_df) + geom_point(aes(x = endLon, y = endLat), data = route_df)
How to find the distance between two locations…
ggmap::mapdist function can be used to compute the distance between two location using different possible modes, e.g. walking, driving, … (see
?ggmap::mapdist for more information).
Pro tip: Learn to use data to drive decision making
choroplethrMapspackages, see “Mapping US counties and states” section in 
 Peng, R. D., Kross, S., & Anderson, B. (2016). Lean Publishing.
 Unesco. (1987). [Italy Map]. Unpublished raw data.
 Choropleth map. (2017, October 17).
 Kahle, D., & Wickham, H. (2013). Ggmap: Spatial Visualization with ggplot2. The R Journal,5(1), 144-161.
 Agafonkin, V. (2010). RStudio, Inc. Leaflet for R. Retrieved from https://rstudio.github.io/leaflet/
 Paracchini, P. L. (2017, July 05). Building Data Visualization Tools: basic plotting with R and ggplot2.
 Paracchini, P. L. (2017, July 14). Building Data Visualization Tools: ‘ggplot2’, essential concepts.
 Paracchini, P. L. (2017, July 18). Building Data Visualization Tools: guidelines for good plots.
Data Science Dojo has launched Jupyter Hub for Data Visualization using Python offering to the Azure Marketplace with pre-installed data visualization libraries and pre-cloned GitHub repositories of famous books, courses, and workshops which enable the learner to run the example codes provided.
What is data visualization?
It is a technique that is utilized in all areas of science and research. We need a mechanism to visualize the data so we can analyze it because the business sector now collects so much information through data analysis. By providing it with a visual context through maps or graphs, it helps us understand what the information means. As a result, it is simpler to see trends, patterns, and outliers within huge data sets because the data is easier for the human mind to understand and pull insights from the data.
It may assist by conveying data in the most effective manner, regardless of the industry or profession you have chosen. It is one of the crucial processes in the business intelligence process, takes the raw data, models it, and then presents the data so that conclusions may be drawn. Data scientists are developing machine learning algorithms in advanced analytics to better combine crucial data into representations that are simpler to comprehend and interpret.
Given its simplicity and ease of use, Python has grown to be one of the most popular languages in the field of data science over the years. Python has several excellent visualization packages with a wide range of functionality for you whether you want to make interactive or fully customized plots.
PRO TIP: Join our 5-day instructor-led Python for Data Science training to enhance your visualization skills.
Individuals who want to visualize their data and want to start visualizing data using some programming language usually lack the resources to gain hands-on experience with it. A beginner in visualization with programming language also faces compatibility issues while installing libraries.
Our Offer, Jupyter Hub for Visualization using Python solves all the challenges by providing you with an effortless coding environment in the cloud with pre-installed Data Visualization python libraries which reduces the burden of installation and maintenance of tasks hence solving the compatibility issues for an individual.
Additionally, our offer gives the user access to repositories of well-known books, courses, and workshops on data visualization that include useful notebooks which is a helpful resource for the users to get practical experience with data visualization using Python. The heavy computations required for applications to visualize data are not performed on the user’s local machine. Instead, they are performed in the Azure cloud, which increases responsiveness and processing speed.
Listed below are the pre-installed data visualization using python libraries and the sources of repositories of a book to visualize data, a course, and a workshop provided by this offer:
Because the human brain is not designed to process such a large amount of unstructured, raw data and turn it into something usable and understandable form, we require techniques to visualize data. We need graphs and charts to communicate data findings so that we can identify patterns and trends to gain insight and make better decisions faster. Jupyter Hub for Data Visualization using Python provides an in-browser coding environment with just a single click, hence providing ease of installation. Through our offer, a user can explore various application domains of data visualizations without worrying about the configuration and computations.
At Data Science Dojo, we deliver data science education, consulting, and technical services to increase the power of data. We are therefore adding a free Jupyter Notebook Environment dedicated specifically for Data Visualization using Python. The offering leverages the power of Microsoft Azure services to run effortlessly with outstanding responsiveness. Make your complex data understandable and insightful with us and Install the Jupyter Hub offer now from the Azure Marketplace by Data Science Dojo, your ideal companion in your journey to learn data science!
There is so much to explore when it comes to spatial visualization using Python’s
For problems related to crime mapping, housing prices or travel route optimization, spatial visualization could be the most resourceful tool in getting a glimpse of how the instances are geographically located. This is beneficial as we are getting massive amounts of data from several sources such as cellphones, smartwatches, trackers, etc. In this case, patterns and correlations, which otherwise might go unrecognized, can be extracted visually.
This blog will attempt to show you the potential of spatial visualization using the
Folium library with Python. This tutorial will give you insights into the most important visualization tools that are extremely useful while analyzing spatial data.
Folium is an incredible library that allows you to build Leaflet maps. Using latitude and longitude points,
Folium can allow you to create a map of any location in the world. Furthermore,
Folium creates interactive maps that may allow you to zoom in and out after the map is rendered.
We’ll get some hands-on practice with building a few maps using the Seattle Real-time Fire 911 calls dataset. This dataset provides Seattle Fire Department 911 dispatches, and every instance of this dataset provides information about the address, location, date/time and type of emergency of a particular incident. It’s extensive and we’ll limit the dataset to a few emergency types for the purpose of explanation.
Folium can be downloaded using the following commands.
$ pip install folium
$ conda install -c conda-forge folium
Start by importing the required libraries.
import pandas as pd import numpy as np import folium
Let us now create an object named ‘seattle_map’ which is defined as a
folium.Map object. We can add other
folium objects on top of the
folium.Map to improve the map rendered. The map has been centered to the longitude and latitude points in the location parameters. The zoom parameter sets the magnification level for the map that’s going to be rendered. Moreover, we have also set the tiles parameter to ‘OpenStreetMap’ which is the default tile for this parameter. You can explore more tiles such as StamenTerrain or Mapbox Control in
seattle_map = folium. Map (location = [47.6062, -122.3321], tiles = 'OpenStreetMap', zoom_start = 11) seattle_map
We can observe the map rendered above. Let’s create another map object with a different tile and zoom_level. Through ‘Stamen Terrain’ tile, we can visualize the terrain data which can be used for several important applications.
We’ve also inserted a
folium. Marker to our ‘seattle_map2’ map object below. The marker can be placed to any location specified in the square brackets. The string mentioned in the popup parameter will be displayed once the marker is clicked as shown below.
seattle_map2 = folium. Map (location=[47.6062, -122.3321], tiles = 'Stamen Terrain', zoom_start = 10) #inserting marker folium.Marker( [47.6740, -122.1215], popup = 'Redmond' ).add_to(seattle_map2) seattle_map2
We are interested to use the Seattle 911 calls dataset to visualize the 911 calls in the year 2019 only. We are also limiting the emergency types to 3 specific emergencies that took place during this time.
We will now import our dataset which is available through this link (in CSV format). The dataset is huge, therefore, we’ll only import the first 10,000 rows using pandas
read_csv method. We’ll use the head method to display the first 5 rows.
(This process will take some time because the data-set is huge. Alternatively, you can download it to your local machine and then insert the file path below)
path = "https://data.seattle.gov/api/views/kzjm-xkqj/rows.csv?accessType=DOWNLOAD" seattle911 = pd.read_csv(path, nrows = 10000) seattle911.head()
Using the code below, we’ll convert the datatype of our Datetime variable to Date-time format and extract the year, removing all other instances that occurred before 2019.
seattle911['Datetime'] = pd.to_datetime(seattle911['Datetime'], format='%m/%d/%Y %H:%M', utc=True) seattle911['Year'] = pd.DatetimeIndex(seattle911['Datetime']).year seattle911 = seattle911[seattle911.Year == 2019]
We’ll now limit the Emergency type to ‘Aid Response Yellow’, ‘Auto Fire Alarm’ and ‘MVI – Motor Vehicle Incident’. The remaining instances will be removed from the ‘seattle911’ dataframe.
seattle911 = seattle911[seattle911.Type.isin(['Aid Response Yellow', 'Auto Fire Alarm', 'MVI - Motor Vehicle Incident'])]
We’ll remove any instance that has a missing longitude or latitude coordinate. Without these values, the particular instance cannot be visualized and will cause an error while rendering.
#drop rows with missing latitude/longitude values seattle911.dropna(subset = ['Longitude', 'Latitude'], inplace = True) seattle911.head()
Now let’s step towards the most interesting part. We’ll map all the instances onto the map object we created above, ‘seattle_map’. Using the code below, we’ll loop over all our instances up to the length of the dataframe. Following this, we will create a
folium.CircleMarker (which is similar to the
folium.Marker we added above). We’ll assign the latitude and longitude coordinates to the location parameter for each instance. The radius of the circle has been assigned to 3, whereas the popup will display the address of the particular instance.
As you can notice, the color of the circle depends on the emergency type. We will now render our map.
for i in range(len(seattle911)): folium.CircleMarker( location = [seattle911.Latitude.iloc[i], seattle911.Longitude.iloc[i]], radius = 3, popup = seattle911.Address.iloc[i], color = '#3186cc' if seattle911.Type.iloc[i] == 'Aid Response Yellow' else '#6ccc31' if seattle911.Type.iloc[i] =='Auto Fire Alarm' else '#ac31cc',).add_to(seattle_map) seattle_map
Let us now move towards slightly advanced features provided by
Folium. For this, we will use the National Obesity by State dataset which is also hosted on data.gov. There are 2 types of files we’ll be using, a csv file containing the list of all states and the percentage of obesity in each state, and a geojson file (based on JSON) that contains geographical features in form of polygons.
Before using our dataset, we’ll create a new
folium.map object with location parameters including coordinates to center the US on the map, whereas, we’ve set the ‘zoom_start’ level to 4 to visualize all the states.
usa_map = folium.Map( location=[37.0902, -95.7129], tiles = 'Mapbox Bright', zoom_start = 4) usa_map
We will assign the URLs of our datasets to ‘obesity_link’ and ‘state_boundaries’ variables, respectively.
obesity_link = 'http://data-lakecountyil.opendata.arcgis.com/datasets/3e0c1eb04e5c48b3be9040b0589d3ccf_8.csv' state_boundaries = 'http://data-lakecountyil.opendata.arcgis.com/datasets/3e0c1eb04e5c48b3be9040b0589d3ccf_8.geojson'
We will use the ‘state_boundaries’ file to visualize the boundaries and areas covered by each state on our
folium.Map object. This is an overlay on our original map and similarly, we can visualize multiple layers on the same map. This overlay will assist us in creating our choropleth map that is discussed ahead.
The ‘obesity_data’ dataframe can be viewed below. It contains 5 variables. However, for the purpose of this demonstration, we are only concerned with the ‘NAME’ and ‘Obesity’ attributes.
obesity_data = pd.read_csv(obesity_link) obesity_data.head()
Now comes the most interesting part! Creating a choropleth map. We’ll bind the ‘obesity_data’ data frame with our ‘state_boundaries’ geojson file. We have assigned both the data files to our variables ‘data’ and ‘geo_data’ respectively. The columns parameter indicates which DataFrame columns to use, whereas, the key_on parameter indicates the layer in the GeoJSON on which to key the data.
We have additionally specified several other parameters that will define the color scheme we’re going to use. Colors are generated from Color Brewer’s sequential palettes.
By default, linear binning is used between the min and the max of the values. Custom binning can be achieved with the bins parameter.
folium. Choropleth( geo_data = state_boundaries, name = 'choropleth', data = obesity_data, columns = ['NAME', 'Obesity'], key_on = 'feature.properties.NAME', fill_color = 'YlOrRd', fill_opacity = 0.9, line_opacity = 0.5, legend_name = 'Obesity Percentage').add_to(usa_map) folium.LayerControl().add_to(usa_map) usa_map
Folium. We can visualize the obesity pattern geographically and uncover patterns not visible before. It also helped us in gaining clarity about the data, more than just simplifying the data itself.
You might now feel powerful enough after attaining the skill to visualize spatial data effectively. Go ahead and explore
Folium‘s documentation to discover the incredible capabilities that this open-source library has to offer.
Thanks for reading! If you want more datasets to play with, check out this blog post. It consists of 30 free datasets with questions for you to solve.
Power BI and R can be used together to achieve analyses that are difficult or impossible to achieve.
It is a powerful technology for quickly creating rich visualizations. It has many practical uses for the modern data professional including executive dashboards, operational dashboards, and visualizations for data exploration/analysis.
Microsoft has also extended Power BI with support for incorporating R visualizations into its projects, enabling a myriad of data visualization use cases across all industries and circumstances. As such, it is an extremely valuable tool for any Data Analyst, Product/Program Manager, or Data Scientist to have in their tool belt.
At the meetup for this topic presenter David Langer showed how it can be using R visualizations to achieve analyses that are difficult, or not possible, to achieve with out-of-the-box features.
A primary focus of the talk was a number of “gotchas” to be aware of when using R Visualizations within the projects:
David also covered best practices for using R visualizations within its projects, including using R tools like RStudio or Visual Studio R Tools to make R visualization development faster. A particularly interesting aspect of the talk was how to engineer R code to allow for copy-and-paste from RStudio into Power BI.
The talk concluded with examples of how R visualizations can be incorporated into a project to allow for robust, statistically valid analyses of aggregated business data. The following visualization is an example from the talk:
Learn more about Power BI with Data Science Dojo
Designers don’t need to use data-driven decision-making, right? Here are 5 common design problems you can solve with the data science basics.
Design is a busy job. You have to balance both artistic and technical skills and meet the needs of bosses and clients who might not know what they want until they ask you to change it. You have to think about the big picture, the story, and the brand, while also being the person who spots when something is misaligned by a hair’s width.
The ‘real’ artists think you sold out, and your parents wish you had just majored in business. When you’re juggling all of this, you might think to yourself, “at least I don’t have to be a numbers person,” and you avoid complicated topics like data analytics at all costs.
If you find yourself thinking along these lines, this article is for you. Here are a few common problems you might encounter as a designer, and how some of the basic approaches of data science can be used to solve them. It might actually take a few things off your plate.
If you have any experience with designing for other people, you know exactly what this really means. You might be asked to make something vague such as “a flyer that says who we are to potential customers and has a lot of photos in it.” A dozen or so drafts later, you have figured out plenty of things they don’t like and are no closer to a final product.
What you need to look for are the company’s needs. Not just the needs they say they have; ask them for the data. The company might already be keeping their own metrics, so ask what numbers most are concerning to them, and what goals they have for improvement. If they say they don’t have any data like that – FALSE!
Every organization has some kind of data, even if you have to be the one to put it together. It might not even be in the most obvious of places like an Excel file. Go through the customer emails, conversations, chats, and your CRM, and make a note of what the most usual questions are, who asks them, and when they get sent in. You just made your own metrics, buddy!
Now that you have the data, gear your design solutions to improve those key metrics. This time when you design the flyer, put the answers to the most frequent questions at the top of the visual hierarchy. Maybe you don’t need a ton of photos but select one great photo that had the highest engagement on their Instagram. No matter how picky a client is, there’s no disagreeing with good data.
This problem is especially popular in digital design. Whether it’s an app, an email, or an entire website, you have a lot of elements to deal with, and need to figure out how to navigate the audience through all of it. For those of you who are unaware, this is the basic concept of UX, short for ‘User Experience.’
The dangerous trap people fall into is asking for opinions about UX. You can ask 5 people or 500 and you’re always going to end up with the same conclusion: people want to see everything, all at once, but they want it to be simple, easy to navigate and uncrowded.
The perfect UX is basically impossible, which is why you instead need to focus on getting the most important aspects and prioritizing them. While people’s opinions claim to prioritize everything, their actual behavior when searching for what they want is much more telling.
Capturing this behavior is easy with web analytics tools. There are plenty of apps like Google Analytics to track the big picture parts of your website, but for the finer details of a single web page design, there are tools like Hotjar. You can track how each user (with cookies enabled) travels through your site, such as how far they scroll and what elements they click on.
If users keep leaving the page without getting to the checkout, you can find out where they are when they decide to leave, and what calls to action are being overlooked.
When you really get the hang of it, UX will transform from a guessing game about making buttons “obvious” and instead you will understand your site as a series of pathways through hierarchies of story elements. As an added bonus, you can apply this same knowledge to your print media and make uncrowded brochures and advertisements too!
Should the dress be pink, or blue? Unfortunately, not all of us can be Disney princesses with magic wands to change constantly back and forth between colors. Unless, of course, you are a web designer from the 90’s, and in that case, those rainbow shifting gifs on your website are wicked gnarly, dude.
For the rest of us, we have to make some tough calls about design elements. Even if you’re used to making these decisions, you might be working with other people who are divided over their own ideas and have no clue who to side with. (Little known fact about designers: we don’t have opinions on absolutely everything.)
This is where a simple concept called “A/B testing” comes in handy. It requires some coding knowledge to pull it off yourself or you can ask your web developer to install the tracking pixel, but some digital marketing tools have built-in A/B testing features. (You can learn more about A/B testing in Data Science Dojo’s comprehensive bootcamps cough cough)
Other than the technical aspect, it’s beautifully simple. You take a single design element, and narrow it down to two options, with a shared ultimate goal you want that element to contribute to. Half your audience will see the pink dress, and half will see the blue, and the data will show you not only which dress was liked by the princesses, but exactly how much more they liked it. Just like magic.
This is such a common problem, so well understood that the inside jokes about it between designer’s risk flipping all the way around the scale into a genuine appreciation of bad design elements. But what do you do when you have a person who sincerely asks you what’s wrong with using the same font Avatar used in their logo?
The solution to this is kind of dirty and cheap from the data science perspective, but I’m including it because it follows the basic principle of evidence > intuition. There is no way to really explain a design faux-pas because it comes from experience. However, sometimes when experience can’t be described, it can be quantified.
Ask this person to look up the top competitors in their sector. Then ask them to find similar businesses using this design element you’re concerned about. How do these organizations compare? How many followers do they have on social media? When was the last time they updated something? How many reviews do they have?
If the results genuinely show that Papyrus is the secret ingredient to a successful brand, then wow, time to rethink that style guide.
Unless you have skipped to the end of this article, you already know the solution to this one. No matter what kind of design you do, it’s meant to fulfill a goal. And where do data scientists get goals? Metrics! Some good metrics for UX that you might want to consider when designing a website, email, or ad campaign are click-through-rate (CTR), session time, page views, page load, bounce rate, conversions, and return visits.
This article has already covered a few basic strategies to get design related metrics. Even if the person you’re working for doesn’t have the issues described above (or maybe you’re working for yourself) it’s a great idea to look at metrics before and after your design hits the presses.
If the data doesn’t shift how, you want it to, that’s a learning experience. You might even do some more digging to find data that can tell you where the problem came from, if it was a detail in your design or a flaw in getting it delivered to the audience.
When you do see positive trends, congrats! You helped further your organization’s goals and validated your design skills. Attaching tangible metrics to your work is a great support to getting more jobs and pay raises, so you don’t have to eat ramen noodles forever.
If nothing else, it’s a great way to prove that you didn’t need to major in accounting to work with fancy numbers, dad.
When it comes to using data for social responsibility, one of the most effective ways of dispensing information is through data visualization.
It’s getting harder and harder to ignore big data. Over the past couple of years, we’ve all seen a spike in the way businesses and organizations have ramped up harvesting pertinent information from users and using them to make smarter business decisions. But big data isn’t just for capitalistic purposes — it can also be utilized for social good.
Nathan Piccini discussed in a previous blog post how data scientists could use AI to tackle some of the world’s most pressing issues, including poverty, social and environmental sustainability, and access to healthcare and basic needs. He reiterated how data scientists don’t always have to work with commercial applications and that we all have a social responsibility to put together models that don’t hurt society and its people.
When it comes to using data for social responsibility, one of the most effective ways of dispensing information is through data visualization. The process involves putting together data and presenting it in a form that would be more easily comprehensible for the viewer.
No matter how complex the problem is, visualization converts data and displays it in a more digestible format, as well as laying out not just plain information, but also the patterns that emerge from data sets. Maryville University explains how data visualization has the power to affect and inform business decision-making, leading to positive change.
With regards to the concept of income inequality, data visualization can clearly show the disparities among varying income groups. Sociology professor Mike Savage also reiterated this in the World Social Science Report, where he revealed that social science has a history of being dismissive of the impact of visualizations and preferred textual and numerical formats. Yet time and time again, visualizations proved to be more powerful in telling a story, as it reduces the complexity of data and depicts it graphically in a more concise way.
Take this case study by computational scientist Javier GB, for example. Through tables and charts, he was able to effectively convey how the gap between the rich, the middle class, and the poor has grown over time. In 1984, a time when the economy was booming and the unemployment rate was being reduced, the poorest 50% of the US population had a collective wealth of $600 billion, the middle class had $1.5 trillion, and the top 0.001% owned $358 billion.
Three decades later, the gap has stretched exponentially wider: the poorest 50% of the population had negative wealth that equaled $124 billion, the middle class owned wealth valued $3.3 trillion, while the 0.001% had a combined wealth of $4.8 trillion. By having a graphical representation of income inequality, more people can become aware of class struggles than when they only had access to numerical and text-based data.
The New York Times also showed how powerful data visualization could be in their study of a pool of black boys raised in America and how they earned less than their white peers despite having similar backgrounds. The outlet displayed data in a more interactive manner to keep the reader engaged and retain the information better.
The study followed the lives of boys who grew up in wealthy families, revealing that even though the black boys grew up in well-to-do neighborhoods, they are more likely to remain poor in adulthood than to stay wealthy. Factors like the same income, similar family structures, similar education levels, and similar levels of accumulated wealth don’t seem to matter, either. Black boys were still found to fare worse than white boys in 99 percent of America come adulthood, a stark contrast from previous findings.
Vox also curated different charts collected from various sources to highlight the fact that income inequality is an inescapable problem in the United States. The richest demographic yielded a disproportional amount of economic growth, while wages for the middle class remained stagnant. In one of the charts, it was revealed that in a span of almost four decades, the poorest half of the population has seen its income plummet steadily, while the top 1 percent have only earned more. Painting data in these formats adds more clarity to the issue compared to texts and numbers.
There’s no doubt about it, data visualization’s ability to summarize highly complex information into more comprehensible displays can help with the detection of patterns, trends, and outliers in various data sets. It makes large numbers more relatable, allowing everyone to understand the issue at hand more clearly. And when there’s a better understanding of data, the more people will be inclined to take action.
Instead of loading clients up with bullet points and long-winded analysis, firms should use data visualization tools to illustrate their message.
Every business is always looking for a great way to talk to their customers. Communication between the company’s management team and customers plays an important role. However, the hardest part is finding the best way to communicate with users.
Although it is visible in many companies, many people do not understand the power of visualization in the customer communication industry. This article sheds light on several aspects of how data visualization plays an important role in interacting with clients.
Any interaction between businesses and consumers indicates signs of success between the two parties. Communicating with the customer through visualization is one of the best communication channels that strengthens the relationship between buyers and sellers.
While data visualization is the best way to communicate, many industry players still don’t understand the power of this aspect. The display helps the commercial teams improve the operating mode of your customer and create an exceptional business environment. Additionally, visualization saves 78% of the time spent capturing customer information to improve services within the enterprise environment.
Any business that intends to succeed in the industry needs to have a compelling for customers.
Currently, big data visualization in business has dramatically changed how business talks to clients. The most exciting aspect is that you can use different kinds of visualization.
While using visualization to enhance communication and the entire customer experience, you need to maintain the brand’s image. Also, you can use visualization in marketing your products and services.
To enhance customer interaction, data visualization (Sankey Chart, Radial Bar Chart, Pareto Chart, and Survey Chart, etc.) is used to create dashboards and live sessions that improve the interaction between customers and the business team members. The team members can easily track when customers make changes by using live sessions.
This helps the business management team make the required changes depending on the customer suggestions regarding the business operations. Communication between the two parties continues to create an excellent customer experience by making changes.
By creating a good client communication channel, you can easily identify some of the customers who are experiencing problems from time to time. This makes it easier for the technical team to separate customers with recurring issues.
The technical support team can opt to attach specific codes to the clients with issues to monitor their performance and any other problem. Data visualization helps in separating this kind of data from the rest to enhance clients’ well-being.
It helps when the technical staff communicates with clients individually to identify any problem or if they are experiencing any technical issue. This promotes personalized services and makes customers more comfortable.
Through regular communication between clients and the business management team, the brand gains loyalty making it easier for the business to secure a respectable number of potential customers overall.
Once you have implemented visualization in your business operations, you can solve various problems facing clients using the data you have collected from diverse sources. As the business industry grows, data visualization becomes an integral part in business operations.
This makes the process of solving customer complaints easier and creates a continued communication channel. The data needs to be available in real-time to ensure that the technical support team has everything required to solve any customer problem.
The most exciting data visualization application is integrating a dashboard on a website with a mobile fast communication design. This is an exciting innovation that makes it easier for the business to interact with clients from time to time.
A good number of companies and organizations are slowly catching up with this innovative trend powered by data visualization. A business can easily showcase its stats to its customers on the dashboard to help them understand the milestones attained by the business.
Note that the stats are displayed on the dashboard depending on the customer feedback generated from the business operations. The dashboards have a fast mobile technique that makes communication more convenient.
This aspect is made to help clients access the business website using their mobile phones. An excellent operating mechanism creates a creative and adaptive design that enables mobile phone users to communicate efficiently.
This technique helps showcase information to mobile users, and clients can easily reach out to the business management team and get all their concerns sorted.
Data visualization is a wonderful way of enhancing the customer experience. Visualization collects data from customers after purchasing products and services to take note of the customer reviews regarding the products and services.
By collecting customer reviews, the business management team can easily evaluate the performance of their products and make the desired changes if the need arises. The data helps reorganize customer behavior and enhance the performance of every product.
The data points recorded from customers are converted into insights vital for the business’s general success.
Customer communication and experience are major points of consideration for business success. By enhancing customer interaction through charts and other forms of communication, a business makes it easy to flourish and attain its mission in the industry.
Subscribe to our weekly newsletter & stay up-to-date with current data science news, blogs, and resources.