Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more


Data Science Dojo
Srishti Puri
| August 3

How does Expedia determine the hotel price to quote to site users? How come Mac users end up spending as much as 30 percent more per night on hotels? Digital marketing analytics, a torrent flowing into all the corners of the global economy has revolutionized marketing efforts, so much so, that resetting it all together. It is safe to say that marketing analytics is the science behind persuasion.

Marketers are able to learn so much about the users, their likes, dislikes, goals, inspirations, drop-off points, inspirations, needs, and demands. This wealth of information is a gold mine but only for those who know how to use it. In fact, one of the top questions that marketing managers struggle with is


“Which metrics to track?” 


Furthermore, there are several platforms that report on marketing, such as email marketing software, paid search advertising platforms, social media monitoring tools, blogging platforms, and web analytics packages. It is a marketer’s nightmare to be buried under sets of reports from different platforms while tracking a campaign all the way to conversion.

Definitely, there are smarter ways to track. But before we take a deep dive into how to track smartly, let me clarify why you should be investing half the time measuring while doing:

  • To identify what’s working
  • To identify what’s not working
  • Identify strategies to improve
  • Do more of what works

To gain a trustworthy answer to the aforementioned, you must: measure everything. While you attempt at it, arm yourself with the lexicon of marketing analytics to form statements that communicate results, for example:


“Twitter mobile drove 40% of all clicks this week on the corporate website” 

Every statement that you form to communicate analytics must state the source, the segment, value, metric, and range. Let us break down the above example:

  • Source: Twitter
  • Segment: Mobile
  • Value: 40%
  • Metric: Clicks
  • Range: This week

To be able to report such glossy statements, you will need to get your hands dirty. You can either take a campaign-based approach or a goals-based approach.


Campaign-based approach to marketing analytics


In a campaign-based approach, you measure the impact of every campaign, for example, if you have social media platforms, blogs, and emails trying to get users to sign up for an e-learning course, this approach will enable you to get insight into each.

In this approach we will discuss the following in detail:

  1. Measure the impact on the website
  2. Measure the impact of SEO
  3. Measure the impact of paid search advertising
  4. Measure the impact of blogging efforts
  5. Measure the impact of social media marketing
  6. Measure the impact of e-mail marketing

Measure the impact on the website


  • Unique visitors

How to use: Unique visitors account for a fresh set of eyes on your site.  If the number of unique visitors is not rising, then it is a clear indication to reassess marketing tactics.


  • Repeat visitors

How to use: If you have visitors revisiting your site or a landing page, it is a clear indication that your site sticks or offers content people want to return to. But if your repeat visitor rate is high then it is indicative of your content not gauging new audiences.


  • Sources

How to use: Sources are of three types: organic, direct, and referrals. Learning about your traffic sources will give you clarity on your SEO performance. Also, it can help you find answers to questions like what is the percentage of organic traffic of total traffic?


  • Referrals

How to use: This is when the traffic arriving on your site is from another website. Aim for referrals to deliver 20-30% of your total traffic. Referrals can help you identify the types of sites or bloggers that are linking to your site and the type of content they tend to share. This information can be fed back into your SEO strategy, and help you produce relevant content that generates inbound links.


  • Bounce rate

How to use: High bounce rate indicates trouble. Maybe the content is not relevant, or the pages are not compelling enough. Perhaps the experience is not user-friendly. Or the call-to-action buttons are too confusing? A high bounce rate reflects problems, and the reasons can be many.


Measure the impact of SEO 

Similarly, you can measure the impact of SEO using the following metrics:


  • Keyword performance and rankings:

How to use: You can use tools like Google AdWords to identify keywords that optimize your website. Check if the chosen keywords are driving traffic to your site or if they are improving your site’s keywords.


  • Total traffic from organic search:

How to use: This metric is a mirror of how relevant your content is. Low traffic from the organic search may mean it is time to ramp up content creation – videos, blogs, webinars or expand into newer areas, such as e-books and podcasts that can be ranked higher by search engines.

Measure the impact of paid search advertising

Likewise, it is equally important to measure the impact of your paid search, also known as pay per click (PPC), in which you pay for every click that is generated by paid search advertising. How much are you spending in total? Are those clicks turning into leads? How much profit are you generating from this spend? Some of the following metrics can help you clarify:


  • Click through rate:

How to use: This metric helps you determine the quality of your ad. Is it effective enough to prompt a click? Test different copy treatments, headlines, and URLs to figure out the combination that boosts the CTR for a specific term.


  • Average cost per click:

How to use: Cost per click determines the amount you spend for each click on a paid search ad. Combine this conversion rate and earning from the clicks.


  • Conversion rate:

How to use: Is conversion always a purchase? No! Each time a user takes the action you want them to on your site, such as clicking on a button, sign-up for a form, or subscribing, it is accounted as a conversion.


Measure the impact of blogging efforts 

Going beyond the website and SEO metrics, you can also measure the impact of your blogging efforts. Since a considerable amount of organizational resources is invested in creating blogs that can develop backlinks to the website. Some of the metrics that can get you clarity on whether you are generating relevant content:

  • Post Views
  • Call to action performance
  • Blog leads

Measure the impact of social media marketing

 Very well-known and quite widely implemented are the strategies to measure social media marketing. Especially now, as the e-commerce industry is expanding, social media can make or break your image online. Some of the commonly measured metrics are:

  • Reach
  • Engagement
  • Mentions to assess the brand perception
  • Traffic
  • Conversion rate


Measure the impact of e-mail marketing

Quite often, the marketing strategy runs on the crutches of e-mail. E-mails are a good place to start visibility efforts and can be very important in maintaining a sustainable relationship with your existing customer base. Some of the metrics that can help you clarify if your emails are working their magic or not are:

  • Bounce rate
  • Delivery rate
  • Click through rate
  • Share/forwarding rate
  • Unsubscribe rate
  • Frequency of emails sent

Goals-based approach

A goals-based approach is defined based on what you’re trying to achieve by a particular campaign. Are you trying to acquire new customers? Or build a loyal customer base, increase engagement and improve conversion rate? Here are a few examples:

In this approach we will discuss the following in detail:

  • Audience analysis
  • Acquisition analysis
  • Behavioral analysis
  • Conversion analysis
  • A/B testing

 Audience analysis:

The goal is to know:


“Who are your customers?” 


Audience analysis is a measure that helps you gain clarity on who your customers are. The information can include demographics, location, income, age, and so forth. The following set of metrics can help you know your customers better.


  • Unique visitors
  • Lead score

  • Cookies

  • Segment

  • Label

  • Personally Identifiable Information (PII)
  • Properties

  • Taxonomy

Acquisition analysis:


The goal is to know:


“How do customers get to your website?” 


Acquisition analysis helps you understand which channel delivers the most traffic to your site or application. Comparing incoming visitors from different channels helps determine the efficacy of your SEO efforts on organic search traffic and see how well your email campaigns are running. Some of the metrics that can help you are:


  • Omnichannel

  • Funnel

  • Impressions

  • Sources

  • UTM parameters 

  • Tracking URL

  • Direct traffic

  • Referrers

  • Retargeting

  • Attribution

  • Behavioral targeting

Behavioral analysis:

 The goal is to know:


“What do the users do on your website?” 


Behavior analytics explains what customers do on your website. What pages do they visit? Which device do they use? From where do they enter the site? What makes them stay? How long do they stay? Where on the site did, they drop off? Some of the metrics that can help you gain clarity are:

  • Actions

  • Sessions

  • Engagement rate

  • Events

  • Churn

  • Bounce rate

Conversion analysis

The goal is to know:


“Whether customers take actions that you want them to take?” 


Conversions track whether customers take actions that you want them to take. This typically involves defining funnels for important actions — such as purchases — to see how well the site encourages these actions over time. Metrics that can help you gain more clarity are:

  • Conversion rate

  • Revenue report

A/B testing:

The goal is to know:


“What digital assets are likely to be the most effective for higher conversion?” 


A/B testing enables marketers to experiment with different digital options to identify which ones are likely to be the most effective. For example, they can compare one intervention (A Control Group) to another intervention (B). Companies run A/B experiments regularly to learn what works best.

In this article, we discussed what marketing analytics is, its importance, two approaches that marketers can take to report metrics, and the marketing lingo they can use while reporting results. Pick the one that addresses your business needs and helps you get clarity on your marketing efforts. This is not an exhaustive list of all the possible metrics that can be used to measure.

Of course, there are more! But this can be a good starting point until the marketing efforts expand into a larger effort that has additional areas that need to be tracked.


Upgrade your data science skillset with our Python for Data Science and Data Science Bootcamp training!

Data Science Dojo
Gibran Saleem
| September 23

Marketing analytics tells you about the most profitable marketing activities of your business. The more effectively you target the right people with the right approach, the greater value you generate for your business.

However, it is not always clear which of your marketing activities are effective at bringing value to your business.  This is where marketing analytics comes in. Running an Amazon seller competitor analysis is crucial to your success in the marketplace. Using a framework to monitor your competitors’ efforts is a great way to ensure you can beat them at their own game.

It guides you to use the data to evaluate your marketing campaign. It helps you identify which of your activities are effective in engaging with your audience, improving user experience, and driving conversions. 

Grow your business with Data Science Dojo 


Marketing analytics
6 marketing analytics features by Data Science Dojo

Data driven marketing is imperative in optimizing your campaigns to generate a net positive value from all your marketing activities in real-time. Without analyzing your marketing data and customer journey, you cannot identify what you are doing right and what you are doing wrong when engaging with potential customers. The 6 features listed below can give you the start you need to get into analyzing and optimizing your marketing strategy using marketing analytics 

 Learn about marketing analytics tools in this blog

1. Impressions 

In digital marketing, impressions are the number of times any piece of your content has been shown on a person’s screen. It can be an ad, a social media post, video etc. However, it is important to remember that impressions do not mean views, a view is an engagement, anytime somebody sees your video that is a view, but an impression would also include anytime they see your video in the recommended videos on YouTube or in their newsfeed on Facebook. The impression will be counted regardless of whether they watch your video or not. 

Learn more about impressions in this video


It is also important to distinguish between impressions and reach. Reach is the number of unique viewers, so for example if the same person views your ad three times, you will have three impressions but a reach of one.  

Impressions and reach are important in understanding how effective your content was at gaining traction. However, these metrics alone are not enough to gauge how effective your digital marketing efforts have been, neither impressions nor reach tell you how many people engaged with your content. So, tracking impressions is important, but it does not specify whether you are reaching the right audience.  


2. Engagement rate 

In social media marketing, engagement rate is an important metric. Engagement is when a user comments, likes, clicks, or otherwise interacts with any of your content. Engagement rate is a metric that measures the amount of engagement of your marketing campaign relative to each of the following: 

  • Reach 
  • Post 
  • Impressions  
  • Days
  • Views 

Engagement rate by reach is the percentage of people who chose to interact with the content after seeing it. It is calculated by the following formula. Reach is a more accurate measurement than follower count, because not all of your brands followers may see the content while those who do not follow your brand may still be exposed to your content. 

Engagement rate by post is the rate at which followers engage with the content. This metric shows how engaged your followers are with your content. However, this metric does not account for organic reach and as your follower count goes up your engagement by post goes down. 

Engagement rate by Impressions is the rate of engagement relative to the number of impressions. If you are running paid ads for your brand, engagement rate by impressions can be used to gauge your ads effectiveness.  

Average Daily engagement rate tells you how much your followers are engaging with your content daily. This is suitable for specific use cases for instance, when you want to know how much your followers are commenting on your posts or other content. 

Engagement rate by views gives the percentage of people who chose to engage with your video after watching them. This metric however does not use unique views so it may double or triple count views from a single user. 

Learn more about engagement rate in this video


3. Sessions 

Sessions are another especially important metric in marketing campaigns that help you analyze engagement on your website. A session is a set of activities by a user within a certain period. For example, a user spent 10 minutes on your website, loading pages, interacting with your content and completed an interaction. All these activities will be recorded in the same 10-minute session.  

In Google Analytics, you can use sessions to check how much time a user spent on your website (session length), how many times they returned to your website (number of sessions), and what interactions users had with your website. Tracking sessions can help you determine how effective your campaigns were in directing traffic towards your website. 

If you have an E-commerce website another very helpful tool on Google Analytics is behavioral analytics. With behavioral analytics you see what key actions are driving purchases on your website. The sessions report can be accessed under conversions tab on Google Analytics. This report can help you understand user behaviors such as abandon carts. This allows you to target these users with targeted ads or offering incentives to complete their purchase. 

Learn more about sessions in this video


4. Conversion rate 

Once you have engaged your audience the next step in the customers’ journey is conversion. A conversion is when you make the customer or user complete a specific action. This desired action can be anything from a form submission, purchasing a product or subscribing to a service. The conversion rate is the percentage of visitors who completed the desired action.

So, if you have a form on your website and you want to find out what the conversion rate is. You would simply divide the number of form submissions by the number of visitors on that form’s page (Total conversions/total interactions). 


Conversion rate is a very important metric that helps you assess the quality of your leads. While you may generate a large number of leads or visitors, if you cannot get them to perform the desired action you may be targeting the wrong audience. Conversion rate can also help you gauge how effective your conversion strategy is, if you aren’t converting visitors, it might indicate that your campaign needs optimization. 


5. Attribution  

Attribution is a sophisticated model that helps you measure which channels are generating the most sales opportunities or conversions. It helps you assign credit to specific touchpoints on the customers journey and understand which touchpoints are driving conversions the most. But how do you know which touchpoint to attribute to a specific conversion?  Well, that depends on which attribution models you are using. There are four common attribution models. 

First touch attribution models assign all the credit to the first touchpoint that drove the prospect to your website. It focuses on the top of the marketing efforts funnel and tells you what is attracting people to your brand 

Last touch attribution models assign credit to the last touchpoint. It focuses on the last touchpoint the visitor interacted with before they converted. 

Linear attribution model assigns an equal weight to all the touchpoints in the buyer’s journey. 

Time decay attributions is based on how close the touchpoint is to the conversion, where a weighted percentage is assigned to the most recent touchpoints. This can be used when the buying cycle is relatively short. 

What model you use is based on what product or subscription you are selling and what is the length of your buyer cycle. While attribution is very important in identifying the effectiveness of your channels, to get the complete picture you need to look at how each touchpoint drives conversion. 

 Learn more about attribution in this video


6. Customer lifetime value 

Businesses prefer retaining customers over acquiring new ones, and one of the main reasons is that attracting new customers has a cost. The customer acquisition cost is the total cost that you incur as a business acquiring a customer. The customer acquisition cost is calculated by dividing the marketing and sales cost by the number of new customers. 

Learn more about CLV in this video


So, as a business, you must weigh the value of each customer with the associated acquisition cost. This is where the customer lifetime value or CLV comes in. The Customer lifetime value is the total value of your customer to your business during the period of your relationship.

The CLV helps you forecast your revenue as well, the larger the average CLV you have the better your forecasted revenue will be. CLV is calculated by dividing the annual revenue generated from customers by the average retention period (in years).  If your CAC is higher than your CLV, then you are on average losing money on every customer you make.

This presents a huge problem. Metrics like CAC and CLV are very important for driving revenue. They help you identify high-value customers and identify low value customers so you can understand how to serve these customers better. They help you make more informed decisions regarding your marketing effort and build a healthy customer base. 


 Integrate marketing analytics into your business 

Marketing analytics is a vast field. There is no one method that suits the needs of all businesses. Using data to analyze and drive your marketing and sales effort is a continuous effort that you will find yourself constantly improving upon. Furthermore, finding the right metrics to track that have a genuine impact on your business activities is a difficult task.

So, this list is by no means exhaustive, however the features listed here can give you the start you need to analyze and understand what actions are important in driving engagement, conversions and eventually value for your business.  


Data Science Dojo
Phuc Duong
| March 28

Develop an understanding of text analytics, text conforming, and special character cleaning. Learn how to make text machine-readable.

Text analytics for machine learning: Part 2

Last week, in part 1 of our text analytics series, we talked about text processing for machine learning. We wrote about how we must transform text into a numeric table, called a term frequency matrix, so that our machine learning algorithms can apply mathematical computations to the text. However, we found that our textual data requires some data cleaning.

In this blog, we will cover the text conforming and special character cleaning parts of text analytics.

Understand how computers read text

The computer sees text differently from humans. Computers cannot see anything other than numbers. Every character (letter) that we see on a computer is actually a numeric representation to a computer, with the mapping between numbers and characters determined by an “encoding table.” The simplest, but most common, is ASCII encoding in text analytics. A small sample ASCII table is shown to the right.


To the left is a look at six different ways the word “CAFÉ” might be encoded in ASCII. The word on the left is what the human sees and its ASCII representation (what the computer sees) is on the right.

Any human would know that this is just six different spellings for the same word, but to a computer these are six different words. These would spawn six different columns in our term-frequency matrix. This will bloat our already enormous term-frequency matrix, as well as complicate or even prevent useful analysis.


ASCII Representation

Unify words with the same spelling

To unify the six different “CAFÉ’s”, we can perform two simple global transformations.

Casing: First we must convert all characters to the same casing, uppercase or lowercase. This is a common enough operation. Most programming languages have a built-in function that converts all characters into a string into either lowercase or uppercase. We can choose either global lowercasing or global uppercasing, it does not matter as long as it’s applied globally.

String normalization: Second, we must convert all accented characters to their unaccented variants. This is often called Unicode normalization, since accented and other special characters are usually encoded using the Unicode standard rather than the ASCII standard. Not all programming languages have this feature out of the box, but most have at least one package which will perform this function.

Note that implementations vary, so you should not mix and match Unicode normalization packages. What kind of normalization you do is highly language dependent, as characters which are interchangeable in English may not be in other languages (such as Italian, French, or Vietnamese).

Remove special characters and numbers

The next thing we have to do is remove special characters and numbers. Numbers rarely contain useful meaning. Examples of such irrelevant numbers include footnote numbering and page numbering. Special characters, as discussed in the string normalization section, have a habit of bloating our term-frequency matrix. For instance, representing a quotation mark has been a pain-point since the beginning of computer science.

Unlike a letter, which may only be capital or not capital, quotation marks have many popular representations. A quotation character has three main properties: curly, straight, or angled; left or right; single, double, or triple. Depending on the text analytics encoding used, not all of these may exist.

ASCII Quotations
Properties of quotation characters

The table below shows how quoting the word “café” in both straight quote and left-right quotes would look in a UTF-8 table in Arial font.

UTF 8 Form

Avoid over-cleaning

The problem is further complicated by each individual font, operating system, and programming language since implementation of the various encoding standards is not always consistent. A common solution is to simply remove all special characters and numeric digits from the text. However, removing all special characters and numbers can have negative consequences.

There is a thing as too much data cleaning when it comes to text analytics. The more we clean and remove the more “lost in translation” the textual message may become. We may inadvertently strip information or meaning from our messages so that by the time our machine learning algorithm sees the textual data, much or all the relevant information has been stripped away.

For each type of cleaning above, there are situations in which you will want to either skip it altogether or selectively apply it. As in all data science situations, experimentation and good domain knowledge are required to achieve the best results.

When do we want to avoid over-cleaning in your text analytics?

Special characters: The advent of email, social media, and text messaging have given rise to text-based emoticons represented by ASCII special characters.

For example, if you were building a sentiment predictor for text, text-based emoticons like “=)” or “>:(” are very indicative of sentiment because they directly reference happy or sad. Stripping our messages of these emoticons by removing special characters will also strip meaning from our message.

Numbers: Consider the infinitely gridlocked freeway in Washington state, “I-405.” In a sentiment predictor model, anytime someone talks about “I-405,” more likely than not the document should be classified as “negative.” However, by removing numbers and special characters, the word now becomes “I”. Our models will be unable to use this information, which, based on domain knowledge, we would expect to be a strong predictor.

Casing: Even cases can carry useful information sometimes. For instance, the word “trump” may carry a different sentiment than “Trump” with a capital T, representing someone’s last name.

One solution to filter out proper nouns that may contain information is through name entity recognition, where we use a combination of predefined dictionaries and scanning of the surrounding syntax (sometimes called “lexical analysis”). Using this, we can identify people, organizations, and locations.

Next, we’ll talk about stemming and Lemmatization as a way to help computers understand that different versions of words can have the same meaning (ex. run, running, runs).

Learn more

Want to learn more about text analytics? Check out the short video on our curriculum page OR

Data Science Dojo
Anna Kayfitz
| February 6

There are two key schools of thought on good practices for database management: data normalization and standardization. We will learn why does each matter? 

Organizations are investing heavily in technology as artificial intelligence techniques, such as machine learning, continue to gain traction across several industries.

  • A Price Water Cooper Survey pointed out that 40% of business executives in 2018 make major decisions at least once every 30 days using data and this is constantly increasing
  • A Gartner study states the 40% of enterprise data is either incomplete, inaccurate, or unavailable

As the speed of data entering the business increases with the Internet of Things becoming more mature, the risk of disconnected and siloed data grows if it is poorly managed within the organization. Gartner has suggested that a lack of data quality control costs average businesses up to $14 million per year.

The adage of “garbage in, garbage out” still plagues analytics and decision making and it is fundamental that businesses realize the importance of clean and normalized data before embarking on any such data-driven projects.

When most people talk about organizing data, they think it means getting rid of duplicates from their system which, although important, is only the first step in quality control and there are more advanced methods to truly optimize and streamline your data.

There are two key schools of thought on good practice: data normalization and standardization. Both have their place in data governance and/or preparation strategy.

Why data normalization?

A data normalization strategy takes database management and organizes it into specific tables and columns with the purpose of reducing duplication, avoiding data modification issues, and simplifying queries. All information is stored logically in one central location, reducing the propensity for inconsistent data (sometimes known as a “single source of truth”). In simple terms, it ensures your data looks and reads the same across all records.

In the context of machine learning and data science, it takes the values from the database and where they are numeric columns, changes them into a common scale. For example, imagine you have a table with two columns; one contains values between 0 and 1 and the other contains between 10,000 and 100,000.

The huge differences in scale might cause problems if you attempt to do any analytics or modeling. This strategy will take these two columns by creating a matching scale across all columns whilst maintaining the distribution e.g. 10,000 might become 0 and 100,000 becomes 1 with values in-between being weighted proportionality.

In real-world terms, consider a dataset of credit card information that has two variables, one for the number of credit cards and the second for income. Using these attributes, you might want to create a cluster and find similar applicants.

Both of these variables will be on completely different types of scale (income being much higher) and would therefore likely have a far greater influence on any results or analytics. Normalization removes the risk of this kind of bias.

The main benefits of this strategy in analytical terms are that it allows faster searching and sorting as it is better at creating indexes via smaller, logical tables. Also, in having more tables, there is a better use of segments to control the tangible placement of data store.

There will be fewer nulls and redundant data after modeling any necessary columns and bias/issues with anomalies are greatly reduced by removing the differences in scale.

This concept should not be confused with data standardization, and it is important that both are considered within any strategy.

What is data standardization?

Data standardization takes disparate datasets and puts them on the same scale to allow easy comparison between different types of variables. It uses the average (mean) and the standard deviation of a dataset to achieve a standardized value of a column.

For example, let’s say a store sells $520 worth of chocolate in a day. We know that on average, the store sells $420 per day and has a standard deviation of $50. To standardize the $520 we would do a calculation as follows:

520-420/50 = 100/50 = 2- our standardized value for this day is 2. If the sales were $600, we’d scale in a similar way as 600-420/50 = 180/50 = 3.6.

If all columns are done on a similar basis, we quickly have a great base for analytics that is consistent and allows us to quickly spot correlations.

In summary, data normalization processes ensure that our data is structured logically and scaled proportionally where required, generally on a scale of 0 to 1. It tends to be used where you have predefined assumptions of your model. Data standardization can be used where you are dealing with multiple variables together and need to find correlations and trends via a weighted ratio.

By ensuring you have normalized data, the likelihood of success in your machine learning and data science projects vastly improves. It is vital that organizations invest as much in ensuring the quality of their data as they do in the analytical and scientific models that are created by it. Preparation is everything in a successful data strategy and that’s what we mainly teach in our data science bootcamp courses.

Related Topics

Machine Learning
Generative AI
Data Visualization
Data Security
Data Science
Data Engineering
Data Analytics
Computer Vision
Artificial Intelligence