For a hands-on learning experience to develop LLM applications, join our LLM Bootcamp today.
First 6 seats get an early bird discount of 30%! So hurry up!

a/b testing

Every cook knows how to avoid Type I Error: just remove the batteries. Let’s also learn how to reduce the chances of Type II errors. 

Why type I and type II errors matter

A/B testing is an essential component of large-scale online services today. So essential, that every online business worth mentioning has been doing it for the last 10 years.

A/B testing is also used in email marketing by all major online retailers. The Obama for America data science team received a lot of press coverage for leveraging data science, especially A/B testing during the presidential campaign.

Hypothesis Testing Outcomes - type I and Type II errors
Hypothesis testing outcome – Data Science Dojo

Here is an interesting article on this topic along with a data science bootcamp that teaches a/b testing and statistical analysis.

If you have been involved in anything related to A/B testing (online experimentation) on UI, relevance or email marketing, chances are that you have heard of Type i and Type ii error. The usage of these terms is common but a good understanding of them is not.

I have seen illustrations as simple as this.

Examples of type I and type II errors

I intend to share two great examples I recently read that will help you remember this especially important concept in hypothesis testing.

Type I error: An alarm without a fire.

Type II error: A fire without an alarm.

Every cook knows how to avoid Type I Error – just remove the batteries. Unfortunately, this increases the incidences of Type II error.

Reducing the chances of Type II error would mean making the alarm hypersensitive, which in turn would increase the chances of Type I error.

Another way to remember this is by recalling the story of the Boy Who Cried Wolf.

Boy Who Cried Wolf

 

Null hypothesis testing: There is no wolf.

Alternative hypothesis testing: There is a wolf.

Villagers believing the boy when there was no wolf (Reject the null hypothesis incorrectly): Type 1 Error. Villagers not believing the boy when there was a wolf (Rejecting alternative hypothesis incorrectly): Type 2 Error

Tailpiece

The purpose of the post is not to explain type 1 and type 2 error. If this is the first time you are hearing about these terms, here is the Wikipedia entry: Type I and Type II Error.

June 15, 2022

Ethics in research and A/B testing is essential. A/B testing might not be as simple and harmless as it looks. Learn how to take care of ethical concerns in A/B tests.

The ethical way to A/B testing

We have come a long way since the days of horrific human experiments during World Wars, the Stanford prison experiment, the Guatemalan STD Study, and many more where inhumane treatments were all in the name of science.

However, we still have much to learn, with incidents like the clinical trial disaster in France and Facebook’s emotional and psychological experiments of recent years violating the rights of persons and serving as a clear reminder to constantly keep our ethics in research sharply upfront.

As data scientists, we are always experimenting – not only with our models or formulas but also with the responses from our customers. A/B tests or randomized experiments may require human subjects, who are willing to undertake a trial or treatment such as seeing certain content when using a Web app, or undergoing a certain exercise regime.

A/B Testing
Performing A/B test on a website

Facebook example

What may initially seem like a harmless experiment, might cause harm or distress. For example, Facebook’s experiment of provoking negative emotions from some users and positive emotions from others could have grave consequences. If a user, who was experiencing emotional distress happened to have seen content that provoked negative feelings, it could spur on a tragic event such as physical harm.

Careful understanding of our experiments and our test subjects may prevent inappropriate testing required prior to implementing our research, or products and services. Consent is the best tool to assist data scientists working with data generated by people. Similarly, to guidelines for clinical trials, it’s informed consent specifically that is needed to avoid potential unintended consequences of experiments.

If an organization specializing in exercise science accepted participation from a person who has a high risk of heart failure and did not ask for a medical examination before experimenting, then the organization is potentially liable for the consequences.

Often a simple, harmless A/B test might not be as simple and harmless as it looks. So how do we ensure we are not putting our human subjects’ well-being and safety in danger when we conduct our research and experiments?

First steps in research

The first port of call is using informed user consent. This doesn’t mean pages and pages of legal jargon on sign-up or being vague in an email when reaching out for volunteers for your study. This could rather be a popup window or email that is clear on the purpose of the experiment and any warnings or potential risks the person needs to be aware of.

Depending on how intense the treatment is, a medical or psychological examination is a good idea to ensure that the participant can cope with the given treatment. Being unaware of people’s vulnerabilities can lead to unintended consequences. This can be avoided through clearer warnings or the next level up which may be online assessments or even expert examinations.

The next step in ensuring your A/B test or experiment runs smoothly and ethically is making sure you understand local and federal regulations around conducting research experiments on humans. In the US, these regulations have been outlined above. The regulations mainly look at:

 

  • Informed consent, with a full explanation of any potential risks to the subject.
  • Providing additional safeguards for vulnerable populations such as children, mentally disabled people, mentally ill people, economically disadvantaged people, pregnant women, and so on.
  • Government-funded experiments need the approval of an Institutional Review Board or an independent ethics committee before conducting experiments.

During the A/B test or experiment, it’s also a good idea to regularly check in and see how your subjects are responding to the treatments, not only for scientific research but also to quickly solve any health or well-being issues.

This could be in the form of a short popup survey or email to check if the user is safe and well, or face-to-face consulting. Also, having an opt-out option allows the subject to take control if they feel their health or well-being is at risk. Having some people opt out might seem inconvenient for your study, but a serious or tragic incident as a result of a participant having to go through the full course of the treatment is a far worse outcome.

Observational studies might be a good alternative if the above steps are in no way feasible for your experiment. Observational studies are limited when making conclusions, and only real experiments allow you to make confident conclusions from the data. However, in some situations, it is not possible nor ethical to force treatments onto subjects.

For example, it’s not ethical to inject cancer cells into random subjects, but you can study cancer patients with the inherited attributes you are looking for to help with your research.

The ethical takeaway

It is understood that there can be some overhead in carefully preparing, setting up, and following ethical guidelines for an experiment or A/B test. However, the serious consequences of not doing it properly, as well as public distrust, will only lead to a reluctance to share data, hindering our ability to effectively do our work.

If you’re curious to learn more about A/B testing, watch the short video below.

June 14, 2022

Related Topics

Statistics
Resources
rag
Programming
Machine Learning
LLM
Generative AI
Data Visualization
Data Security
Data Science
Data Engineering
Data Analytics
Computer Vision
Career
AI