For a hands-on learning experience to develop LLM applications, join our LLM Bootcamp today.
First 5 seats get a 30% discount! So hurry up!

data privacy

Ever wonder what happens to your data after you chat with an AI like ChatGPT? Do you wonder who else can see this data? Where does it go? Can it be traced back to you?

These concerns aren’t just hypothetical.  

In the digital age, data is power. But with great power comes great responsibility, especially when it comes to protecting people’s personal information. One of the ways to make sure that data is used responsibly is through data anonymization.

It is a powerful technique that allows AI to learn and improve without compromising user privacy. But how does it actually work? How do tech giants like Google, Apple, and OpenAI anonymize data to train AI models without violating user trust? Let’s dive into the world of data anonymization to understand how it works.

 

LLM bootcamp banner

 

What is Data Anonymization? 

It is the process of removing or altering any information that can be traced back to an individual. It means stripping away the personal identifiers that could tie data back to a specific person, enabling you to use the data for analysis or research while ensuring privacy. 

Anonymization ensures that the words you type, the questions you ask, and the information you share remain untraceable and secure.

The Origins of Data Anonymization 

Data anonymization has been around for decades since governments and organizations began collecting vast amounts of personal data. However, with the rise of digital technologies, concerns about privacy breaches and data misuse grew, leading to the need for ways to protect sensitive information. 

Thus, the origins of data anonymization can be traced back to early data protection laws, such as the Privacy Act of 1974 in the United States and the European Data Protection Directive in 1995. These laws laid the groundwork for modern anonymization techniques that are now a critical part of data security and privacy practices. 

As data-driven technologies continue to evolve, data anonymization has become a crucial tool in the fight to protect individual privacy while still enabling organizations to benefit from the insights data offers.

 

You can also learn about the ethical challenges of LLMs

 

Key Benefits of Data Anonymization 

Data anonymization has a wide range of benefits for businesses, researchers, and individuals alike. Some key advantages can be listed as follows: 

  • Protects Privacy: The most obvious benefit is that anonymization ensures personal data is kept private. This helps protect individuals from identity theft, fraud, and other privacy risks. 
  • Ensures Compliance with Regulations: With the introduction of strict regulations like GDPR and CCPA, anonymization is crucial for businesses to remain compliant and avoid heavy penalties. 
  • Enables Safe Data Sharing: Anonymized data can be shared between organizations and researchers without the risk of exposing sensitive personal information, fostering collaborations and innovations. 
  • Supports Ethical AI & Research: By anonymizing data, researchers and AI developers can train models and conduct studies without violating privacy, enabling the development of new technologies in an ethical way. 
  • Reduces Data Breach Risks: Even if anonymized data is breached, it’s much less likely to harm individuals since it can’t be traced back to them. 
  • Boosts Consumer Trust: In an age where privacy concerns are top of mind, organizations that practice data anonymization are seen as more trustworthy by their users and customers. 
  • Improves Data Security: Anonymization reduces the risk of exposing personally identifiable information (PII) in case of a cyberattack, helping to keep data safe from malicious actors. 

In a world where privacy is becoming more precious, data anonymization plays a key role in ensuring that organizations can still leverage valuable insights from data without compromising individual privacy. So, whether you’re a business leader, a researcher, or simply a concerned individual, understanding data anonymization is essential in today’s data-driven world.

Let’s explore some important data anonymization techniques that you must know about. 

Key Techniques of Data Anonymization 

Data anonymization is not a one-size-fits-all approach. Different scenarios require different techniques to ensure privacy while maintaining data utility. Organizations, researchers, and AI developers must carefully choose methods that provide strong privacy protection without rendering data useless.

 

Key Data Anonymization Techniques

 

Let’s dive into understanding some of the most effective anonymization techniques. 

  1. Differential Privacy: Anonymization with Mathematical Confidence

Differential privacy is a data anonymization technique that adds a layer of mathematically calibrated “noise” to a dataset or its outputs. This noise masks the contributions of individual records, making it virtually impossible to trace a specific data point back to a person.

It uses noise injection to complete the process. For instance, for an exact number of users of an app (say 12,387), the system adds a small random number to it. It will return either 12,390 or 12,375. While the result is close to the truth for useful insights, it keeps the confidentiality of individuals intact.

This approach ensures mathematical privacy, setting differential privacy apart from traditional anonymization techniques. The randomness is carefully calibrated based on something called a privacy budget (or epsilon, ε). This value balances privacy vs. data utility. A lower epsilon means stronger privacy but less accuracy, and vice versa.

  1. Data Aggregation: Zooming Out to Protect Privacy

Data aggregation is one of the most straightforward ways to anonymize data. Instead of collecting and sharing data at the individual level, this method summarizes it into groups and averages. The idea is to combine data points into larger buckets, removing direct links to any one person. 

For instance, instead of reporting every person’s salary in a company, you might share the average salary in each department. This data aggregation transforms granular, potentially identifiable data into generalized insights. It is done through:

  • Averages: Like the average number of steps walked per day in a region. 
  • Counts or totals: Such as total website visits from a country instead of by each user. 
  • Ranges or categories: Instead of exact ages, you report how many users fall into age brackets.
  1. IP Address Anonymization: Hiding Digital Footprints

Every time you visit a website, your device leaves a digital breadcrumb called an IP address. It is like your home address that can reveal where you are and who you might be. IP addresses are classified as personally identifiable information (PII) under laws like the GDPR.

This means that collecting, storing, or processing full IP addresses without consent could land a company in trouble. Hence, IP anonymization has become an important strategy for organizations to protect user privacy. Below is an explanation of how it works: 

  • For IPv4 addresses (the most common type, like 192.168.45.231), anonymization involves removing or replacing the last segment, turning it into something like 192.168.45.0. This reduces the precision of the location, masking the individual device but still giving you useful data like the general area or city. 
  • For IPv6 addresses (a newer, longer format), anonymization removes more segments because they can pinpoint devices even more accurately.  

This masking happens before the IP address is logged or stored, ensuring that even the raw data never contains personal information. For example, Google Analytics has a built-in feature that anonymizes IP addresses, helping businesses stay compliant with privacy laws while analyzing traffic patterns.

  1. K-Anonymity (Crowd-Based Privacy): Blending into the Data Crowd

K-anonymity is like the invisibility cloak of the data privacy world. It ensures any person’s data record in a dataset is indistinguishable from at least K–1 other people, meaning your data looks just like a bunch of others. 

For instance, details like birthday, ZIP code, and gender do not seem revealing, but when combined, they can uniquely identify someone. K-anonymity solves that by making sure each combination of these quasi-identifiers (like age, ZIP, or job title) is shared by at least K people. 

It mainly relies on two techniques:

  • Generalization: replacing specific values with broader ones 
  • Suppression: removing certain values altogether when generalization is not enough 
  1. Data Masking

Data masking is a popular technique for protecting confidential information by replacing it with fake, but realistic-looking values. This approach is useful when you need to use real-looking data, like in testing environments or training sessions, without exposing the actual information. 

The goal is to preserve the format of the original data while removing the risk of exposing PII. Here are some common data masking methods:

  • Character Shuffling: Rearranging characters so the structure stays the same, but the value changes 
  • Substitution: Replacing real data with believable alternatives 
  • Nulling Out: Replacing values with blanks or null entries when the data is not needed at all 
  • Encryption: Encrypting the data so it is unreadable without a decryption key 
  • Date Shifting: Slightly changing dates while keeping patterns intact 

 

Explore the strategies for data security in data warehousing

 

  1. Data Swapping (Shuffling): Mixing Things Up to Protect Privacy

This method randomly rearranges specific data points, like birthdates, ZIP codes, or income levels, within the same column so that they no longer line up with the original individuals. 

In practice, data swapping is used on quasi-identifiers – pieces of information that, while not directly identifying, can become identifying when combined (like age, gender, or ZIP code). Here’s how it works step-by-step: 

  1. Identify the quasi-identifiers in your dataset (e.g., ZIP code, age). 
  2. Randomly shuffle the values of these attributes between rows. 
  3. Keep the overall data format and distribution intact, so it still looks and feels like real data. 

For example, students in a class write their birthdays on sticky notes, and then the teacher mixes them up and hands them out at random. Everyone still has a birthday, but nobody knows the exact birthday of anybody. 

  1. Tokenization: Giving Your Data a Secret Identity

Tokenization is a technique where actual data elements (like names, credit card numbers, or Social Security numbers) are replaced with non-sensitive, randomly generated values called tokens. These tokens look like the real thing and preserve the data’s format, but they’re completely meaningless on their own. 

For instance, when managing a VIP guest list, you avoid revealing the names by assigning them labels like “Guest 001,” “Guest 002,” and so on. This tokenization follows a simple but highly secure process: 

  1. Identify sensitive data
  2. Replace each data element with a token 
  3. Store the original data in a secure token vault 
  4. Use the token in place of the real data 

 

 

  1. Homomorphic Encryption: Privacy Without Compromise

It is a method of performing computations on encrypted data. Once the results are decrypted, it is as if the operations were performed directly on the original, unencrypted data. This means you can keep data completely private and still derive value from it without ever exposing the raw information. 

These are the steps to homomorphic encryption: 

  • Sensitive data is encrypted using a special homomorphic encryption algorithm. 
  • The encrypted data is handed off to a third party (cloud service or analytics team). 
  • This party performs analysis or computations directly on the encrypted data. 
  • The encrypted results are returned to the original data owner. 
  • The owner decrypts the result and gets the final output – accurate, insightful, and 100% private. 
  1. Synthetic Data Generation

Synthetic data generation fabricates new, fictional records that look and act like real data. That means you get all the value of your original dataset (structure, patterns, relationships), without exposing anyone’s private details. 

Think of it like designing a CGI character for a movie. The character walks, talks, and emotes like a real person, but no actual actor was filmed. Similarly, synthetic data keeps the realism of your dataset intact while ensuring that no real individual can be traced. 

Here’s a simplified look at how synthetic data is created and used to anonymize information:

  • Data Modeling: The system studies the original dataset using machine learning (often GANs) to learn its structure, patterns, and relationships between fields.

  • Data Generation: Based on what it learned, the system creates entirely new, fake records that mimic the original data without representing real individuals.

  • Validation: The synthetic data is tested to ensure it reflects real-world patterns without duplicating or revealing any actual personal information.

Data anonymization is undoubtedly a powerful tool for protecting privacy, but it is not without its challenges. Businesses must tread carefully and strike the right balance. 

 

comparing data anonymization techniques

 

Challenges and Limitations of Data Anonymization 

While data anonymization techniques offer impressive privacy protection, they come with their own set of challenges and limitations. These hurdles are important to consider when implementing anonymization strategies, as they can impact the effectiveness of the process and its practical application in real-world scenarios. 

 

Here’s a list of controversial experiments in big data ethics

 

Let’s dive into some of the major challenges that businesses and organizations face when anonymizing data. 

Risk of Re-Identification (Attackers Combining External Datasets) 

One of the biggest challenges with data anonymization is the risk of re-identification. Even if data is anonymized, attackers can sometimes combine it with other publicly available datasets to piece together someone’s identity. This makes re-identification a real concern for organizations dealing with sensitive information.

To reduce this risk, it’s important to layer different anonymization techniques, such as pairing K-anonymity with data masking or using differential privacy to introduce noise. Regular audits can help spot weak points in data, and reducing data granularity can assist in keeping individuals anonymous.

Trade-off Between Privacy & Data Utility

One of the biggest hurdles in data anonymization is balancing privacy with usefulness. The more you anonymize data, the safer it becomes, but it also loses important details needed for analysis or training AI models. For example, data masking protects identities, but it can limit how much insight you can extract from the data.

To overcome this, businesses can tailor anonymization levels based on the sensitivity of each dataset, anonymizing the most sensitive fields while keeping the rest intact for meaningful analysis where possible. Techniques like synthetic data generation can also help by creating realistic datasets that protect privacy without compromising on value.

Compliance Complexity (Navigating Regulations like GDPR, CCPA, HIPAA) 

For organizations working with sensitive data, staying compliant with privacy laws is a must. However, it is a challenge when different countries and industries have their own rules. Businesses operating across borders must navigate these regulations to avoid hefty fines and damage to their reputation.

Organizations should work closely with legal experts and adopt a compliance-by-design approach, ensuring privacy in every stage of the data lifecycle. Regular audits, legal check-ins, and reviewing anonymization techniques can help ensure everything stays within legal boundaries.

 

 

Thus, as data continues to be an asset for many organizations, finding effective anonymization strategies will be essential for preserving both privacy and analytical value. 

Real-World Use Cases of Data Anonymization 

Whether it’s training AI models, fighting fraud, or building smarter tech, anonymization is working behind the scenes. Let’s take a look at how it’s making an impact in the real world. 

Healthcare – Protecting Patient Data in Research & AI 

Healthcare is one of the most sensitive domains when it comes to personal data. Patient records, diagnoses, and medical histories are highly private, yet incredibly valuable for research and innovation. This is where data anonymization becomes a critical tool. 

Hospitals and medical researchers use anonymized datasets to train AI models for diagnostics, drug development, disease tracking, and more while maintaining patient confidentiality. By removing or masking identifiable information, researchers can still uncover insights while staying HIPAA and GDPR compliant. 

One prominent use case within this domain is the partnership between Google’s DeepMind and Moorfields Eye Hospital in the UK. They used anonymized medical data to train an AI system that can detect early signs of eye disease with high accuracy. 

 

Read more about AI in healthcare

 

Financial Services – Secure Transactions & Fraud Prevention 

A financial data leak could lead to identity theft, fraud, or regulatory violations. Hence, banks and fintech companies rely heavily on anonymization techniques to monitor transactions, detect fraud, and calculate credit scores while protecting sensitive customer information.

Companies like Visa and Mastercard use tokenization to anonymize payment data. Instead of the real card number, they use a token that represents the card in a transaction. Even if the token is stolen, it is useless without access to the original data stored securely elsewhere.

This boosts customer trust, strengthens security, and makes it possible to safely analyze transaction patterns and detect fraud in real time.

 

Explore the real-world applications of AI tools in finance

 

Big Tech & AI – Privacy-Preserving Machine Learning 

Tech companies collect huge amounts of data to power everything from recommendation engines to voice assistants. A useful approach for these companies to ensure user privacy is federated learning (FL), which allows AI models to be trained directly on users’ devices.

Combined with differential privacy, it adds statistical “noise” to individual data points, ensuring sensitive user data never leaves the device or gets stored in a central database.

For example, Google’s Gboard, the Android keyboard app, uses FL to improve word predictions and autocorrect. It learns from how users type, but the data stays on the phone. This protects user privacy while making the app smarter over time.

 

Explore a hands-on curriculum that helps you build custom LLM applications!

 

Despite these applications, it is important to know that each industry faces its own challenges. However, with the right techniques such as tokenization, federated learning, and differential privacy, organizations can find the perfect balance between utility and confidentiality.

Privacy Isn’t Optional: It’s the Future 

Data anonymization is essential in today’s data-driven world. It helps businesses innovate safely, supports governments in protecting citizens, and ensures individuals’ privacy stays intact. 

With real-world strategies from companies like Google and Visa, it is clear that protecting data does not mean sacrificing insights. Techniques like tokenization, federated learning, and differential privacy prove that security and utility can go hand-in-hand.

 

Learn more about AI ethics for today’s world

 

If you’re ready to make privacy a priority, here’s how to start:

  • Start small: Identify which types of sensitive data you collect and where it’s stored.
  • Choose the right tools: Use anonymization methods that suit your industry and compliance needs.
  • Make it a mindset: Build privacy into your processes, not just your policies.
April 7, 2025

A new era in AI: introducing ChatGPT Enterprise for businesses! Explore its cutting-edge features and pricing now.

To leverage the widespread popularity of ChatGPT, OpenAI has officially launched ChatGPT Enterprise, a tailored version of their AI-powered chatbot application, designed for business use.

Introducing ChatGPT enterprise

ChatGPT Enterprise, which was initially hinted at in a previous blog post earlier this year, offers the same functionalities as ChatGPT, enabling tasks such as composing emails, generating essays, and troubleshooting code. However, this enterprise-oriented iteration comes with added features like robust privacy measures and advanced data analysis capabilities, elevating it above the standard ChatGPT. Additionally, it offers improved performance and customization options.

These enhancements put ChatGPT Enterprise on a feature parity level with Bing Chat Enterprise, Microsoft’s recently released enterprise-focused chatbot service.

 

Introducing ChatGPT Enterprise
Introducing ChatGPT Enterprise

Privacy, Customization, and Enterprise Optimization

Today marks another step towards an AI assistant for work that helps with any task, protects your company data, and is customized for your organization. Businesses interested in ChatGPT Enterprise should get in contact with us. While we aren’t disclosing pricing, it’ll be dependent on each company’s usage and use cases.” – OpenAI 

Streamlining Business Operations: The Administrative Console

ChatGPT Enterprise introduces a new administrative console equipped with tools for managing how employees in an organization utilize ChatGPT. This includes integrations for single sign-on, domain verification, and a dashboard offering usage statistics. Shareable conversation templates enable employees to create internal workflows utilizing ChatGPT, while OpenAI’s API platform provides credits for creating fully customized solutions powered by ChatGPT.

Notably, ChatGPT Enterprise grants unlimited access to Advanced Data Analysis, a feature previously known as Code Interpreter in ChatGPT. This feature empowers ChatGPT to analyze data, create charts, solve mathematical problems, and more, even with uploaded files. For instance, when given a prompt like “Tell me what’s interesting about this data,” ChatGPT’s Advanced Data Analysis feature can delve into data, such as financial, health, or location data, to generate insightful information.

Large language model bootcamp

 

Priority Access to GPT-4: Enhancing Performance

Advanced-Data Analysis was previously exclusive to ChatGPT Plus subscribers, the premium $20-per-month tier for the consumer ChatGPT web and mobile applications. OpenAI intends for ChatGPT Plus to coexist with ChatGPT Enterprise, emphasizing their complementary nature.

ChatGPT Enterprise operates on GPT-4, OpenAI’s flagship AI model, just like ChatGPT Plus. However, ChatGPT Enterprise customers receive priority access to GPT-4, resulting in performance that is twice as fast as the standard GPT-4 and offering an extended context window of approximately 32,000 tokens (around 25,000 words).

Data Security: A Paramount Concern Addressed

The context window denotes the text the model considers before generating additional text, while tokens represent individual units of text (e.g., the word “fantastic” might be split into the tokens “fan,” “tas,” and “tic”). Larger context windows in models reduce the likelihood of “forgetting” recent conversation content.

OpenAI is actively addressing business concerns by affirming that it will not use business data sent to ChatGPT Enterprise or any usage data for model training. Additionally, all interactions with ChatGPT Enterprise are encrypted during transmission and while stored.

OpenAI’s Announcement on LinkedIn of ChatGPT Enterprise

 

ChatGPT’s Impact on Businesses

OpenAI asserts strong interest from businesses in a business-focused ChatGPT, noting that ChatGPT, one of the fastest-growing consumer applications in history, has been embraced by teams in over 80% of Fortune 500 companies.

Monetizing the Innovation: Financial Considerations

However, the sustainability of ChatGPT remains uncertain. According to Similarweb, global ChatGPT traffic decreased by 9.7% from May to June, with an 8.5% reduction in average time spent on the web application. Possible explanations include the launch of OpenAI’s ChatGPT app for iOS and Android and the summer vacation period, during which fewer students use ChatGPT for academic assistance. Increased competition may also be contributing to this decline.

OpenAI faces pressure to monetize the tool, considering the company’s reported expenditure of over $540 million in the previous year on ChatGPT development and talent acquisition from companies like Google, as mentioned in The Information. Some estimates suggest that ChatGPT costs OpenAI $700,000 daily to operate.

Nonetheless, in fiscal year 2022, OpenAI generated only $30 million in revenue. CEO Sam Altman has reportedly set ambitious goals, aiming to increase this figure to $200 million this year and $1 billion in the next, with ChatGPT Enterprise likely playing a crucial role in these plans.

 

Read more –> Boost your business with ChatGPT: 10 innovative ways to monetize using AI

ChatGPT Enterprise Pricing Details

Positioned as the highest tier within OpenAI’s range of services, ChatGPT Enterprise serves as an extension to the existing free basic service and the $20-per-month Plus plan. Notably, OpenAI has chosen a flexible pricing strategy for this enterprise-level service. Rather than adhering to a fixed price, the company’s intention is to personalize the pricing structure according to the distinct needs and scope of each business.

According to COO Brad Lightcap’s statement to Bloomberg, OpenAI aims to collaborate with each client to determine the most suitable pricing arrangement.

 

ChatGPT Pricing
ChatGPT Pricing

 

OpenAI’s official statement reads, “We hold the belief that AI has the potential to enhance and uplift all facets of our professional lives, fostering increased creativity and productivity within teams. Today signifies another stride towards an AI assistant designed for the workplace, capable of aiding with diverse tasks, tailored to an organization’s specific requirements, and dedicated to upholding the security of company data.”

This approach focused on individualization strives to render ChatGPT Enterprise flexible to a range of corporate prerequisites, delivering a more personalized encounter compared to its standardized predecessors.

Is ChatGPT Enterprise Pricing Justified?

ChatGPT Enterprise operates on the GPT-4 model, OpenAI’s most advanced AI model to date, a feature shared with the more affordable ChatGPT Plus. However, there are notable advantages for Enterprise subscribers. These include privileged access to an enhanced GPT-4 version that functions at double the speed and provides a more extensive context window, encompassing approximately 32,000 tokens, equivalent to around 25,000 words.

Understanding the significance of the context window is essential. Put simply, it represents the amount of text the model can consider before generating new content. Tokens are the discrete text components the model processes; envision breaking down the word “fantastic” into segments like “fan,” “tas,” and “tic.” A model with an extensive context window is less prone to losing track of the conversation, leading to a smoother and more coherent user experience.

Regarding concerns about data privacy, a significant issue for businesses that have previously restricted employee access to consumer-oriented ChatGPT versions, OpenAI assures that ChatGPT Enterprise models will not be trained using any business-specific or user-specific data. Furthermore, the company has implemented encryption for all conversations, ensuring data security during transmission and storage.

Taken together, these enhancements suggest that ChatGPT Enterprise could offer substantial value, particularly for organizations seeking high-speed, secure, and sophisticated language model applications.

 

Register today

August 29, 2023

There’s more to data security and access control than granting teams within a company different access levels and issuing user passwords.

As data scientists, our job is not to run the whole security operation in our organizations to avoid a security breach. However, as we work very closely with data, we must understand the importance of having good, robust mechanisms in place to prevent sensitive and personally identifiable information from getting into the wrong hands, or from any cyber attack. Hence, the need for data security.

Strong passwords? Not enough

Setting ourselves up with a strong password might not cut it in today’s world. Some of the world’s biggest banks, which have an army of highly skilled security professionals, have suffered ever-more smarter cyber attacks. Today, users are logging into work systems and databases through biometrics such as fingerprint scanning technology on smartphones, laptops, and other devices or computers.

Two-factor authentication is also a popular mechanism of data security, which goes beyond simply identifying and authenticating a user through their password alone. Users are now logging into systems using a one-time password – which is sent to their work email, requiring another form of login – in combination with their fingerprint password. Generating a random number or token string each time a user logs into a system can reduce the risk of a single password being decrypted or obtained some other way.

Finishing the equation

User identity and authentication are only half of the equation, however. The other half is using anomaly detection algorithms or machine learning to pick up on unusual user activity and behavior once a user has logged on. This is something we as data scientists can bring to the table in helping our organizations better secure our customer or business data. Some of the key features of anomaly detection models include the time of access, location of access, type of activity or use of the data, device type, and how frequently a user accesses the database.

The model collects these data security points every time a user logs into the database and continuously monitors and calculates a risk score based on these data security points and how much they deviate from the user’s past logins. If the user reaches a high enough score, an automated mobile alert can be sent to the security team to further investigate or to take action.

Data security examples

Some obvious data security examples include a user who lives in Boston who logged out of the database 10 minutes ago but is now accessing the database in Berlin. Or, a user who usually logs in to the database during work hours is now logging in at 3 am.

Other examples include an executive assistant, who rarely logs into the database, and is now frequently logging into the database every 10 minutes. A data scientist, who usually aggregates thousands of rows of data is now retrieving a single row.

A marketer, who usually searches the database for contact numbers, is now attempting to access credit card information, even though that marketer already knows she/he does not have access to this information.

Another way data scientists can safeguard their customer or business data is to keep the data inside the database rather than exporting a subset or local copy of the data onto their computer or device. Nowadays, there are many tools to connect different database providers to R or Python, such as the odbcConnect() function as part of the RODBC library in R, which reads and queries data from a database using an ID and password rather than importing data from a local computer.

The ID and password can be removed from the R or Python file once the user has finished working with the data, so an attacker cannot run the script to get the data without a login. Also, if an attacker were to crack open a user’s laptop, he or she would not find a local copy of the data on that device.

Row and column access is another example of data security through fine-grained access controls. This mechanism masks certain columns or rows for different users. These masked columns or rows in tabled data usually contain sensitive or personally identifiable information. For example, the columns which contain financial information might be masked by the data science team but not by the finance/payments processing team.

Conclusion & other tips

Other ways to safely deal with sensitive and personally identifiable information include differential privacy and k-anonymity. To learn about these techniques, please read Dealing with data privacy – anonymization techniques.

 

Written by Rebecca Merrett

August 18, 2022

Learn how to configure the windows security of your Windows 10 account and be the true owner of your personal data and your privacy.

Technology is something wonderful that for centuries has been improving people’s lives and facilitating day-to-day, allowing us to do things that until now were impossible. It is undeniable how the internet has changed our lives and connected us with people from all over the globe, but it also has its drawbacks.

The latest advances in technology have reduced our privacy to a level that we had never before reached. It is no longer necessary to be a famous star so that our privacy is worth a lot of money. Many of the services and tools we use every day are collecting our personal data without us being aware, in exchange for making our lives easier.

Whether to protect you from hackers who want to steal your banking information or because you do not feel comfortable sharing your location with all the applications you use, you must be aware of how to protect yourself from these dangers.

Here I propose a few essential steps in Windows 10 to be the owner of your data and know who wants to know where you are or what pages you visit, and even, avoid being spied on through your computer’s camera.

1. Say no to the fast installation of Windows 10

When it comes to Windows 10 or an application, many times we look for the fastest option in which we have to do as little as possible and companies increasingly offer more options for users to get rid of these very tedious processes.

The downside is that the more we disengage from the configuration and installation process, the more power we give companies to do what they want with our privacy, you know the saying, “if you want something to be done well, you have to do it yourself”.

When installing Windows 10 make sure you choose the custom configuration to monitor each of the permissions you give the system. Then go to the privacy settings. To get to this section, you only have to press the Windows key and the ‘I’ key at the same time. There you can configure the Windows 10 privacy to your liking, although you will not be able to access the configuration of the other applications. You will have to go one by one. Even if you are configuring your Windows Office, you can click here to read about how to do it properly.

2. Cut the wings to Cortana

Virtual assistants are very useful to make our lives easier but to know each other so well as to be really useful we must give them access to a whole range of personal data. If you work with your computer, you may not be interested in Cortana having access to your company’s data or bank details. Although in the end you do not care and let them work with this information, at least we recommend that you take a look at everything Cortana wants to know about you.

In the configuration section of Cortana, you can delete all the data that this assistant has from you or select those in particular that seem too private.

3. Turn off your location

As mobile phones do, Windows 10 automatically tracks your location at any time and saves this information for about 24 hours to also share it with any third-party application that you have downloaded.

Once again in the privacy settings section, you can deactivate the tracking of your location, or activate it in case you need it in a timely manner. In addition, it is recommended that you review the configuration of each of the applications that you install on your computer and be aware of which ones are trying to know where you are.

4. Block ad tracking

Whether you are browsing the different online stores looking for a new smartphone or you are simply checking hotels to find out how much it would cost you to go on a weekend trip, you will see hundreds of ads about things you have been looking for.

That’s ad tracking and it’s a nuisance, but the good thing is that it can be blocked. Obviously, by default Windows 10 has it activated because it is great for companies to know what you are thinking of spending your money on.

In the privacy settings, you will find a section where you can deactivate the option “Allow applications to use my advertising ID”.

5. Don’t allow access to your camera

How many times has Hollywood warned us with their films and TV series about hackers that are dedicated to controlling the cameras of other people’s laptops to spy on a girl and how many times we would have thought it was an invention of the screenwriters?

Well it is a very real possibility and experts warn of how easy it is to do this, so, it is more than recommended to block the camera and only activate it when you have to use it for a conference or a family video call. Moreover, you cannot only disable the camera from the computer configuration, but you can cover it with small protectors that give a lot of peace and do not cost a lot.

6. Disable access to the microphone

If we block the camera, how can we not block the microphone? It is true that if you want to give instructions to Cortana you need it, but if you have decided to silence her completely, you should also do the same with the microphone.

7. Who can see your account information?

Another data point we give to Windows and it shares with others is the account data, such as name and email address. If we go to the section “Account information” we will see that many of the applications we use have access to this data.

Customizing this section of the configuration never hurts and shares as little data as possible with applications.

8. Eliminate tracking of the timeline

This point is related to point number four. Surfing the internet should imply that others know what you are looking for even if they are completely harmless pages. Tracking the timeline involves tracking all the websites you have visited, eliminating it increases your privacy, and control of your data. The bad thing is that you have to do it on a regular basis, like the browser history, which the experts recommend to delete it in the usual way.

9. Check the privacy settings after each update

We are sorry to inform you that after a system update it is possible that many of the steps you have taken today will be lost and you will have to fix them where you left them. Microsoft usually re-establishes all the privacy settings and reinforces them by default when restarting. Many complaints have been made about this aspect but it has fallen on deaf ears. Nothing happens but this should not prevent us from being attentive. To stay alert, it is best to turn off automatic updates and be aware of when an important one arrives.

This does not mean it’s better not to update. Updates are important to get new interesting features and improvements in system security but we should not give them up unless we see that they return to give problems.

10. Do not leave your privacy in the hands of others

This advice serves both Windows 10 users and any other system and device users, even Apple. User data is becoming the great business of the 21st century. Our privacy, tastes, interests, and other personal data are worth a lot to companies but they should be worth more to ourselves and we should protect them as if they were gold.

It requires a little effort to be aware of tricks like these and steps or tools that help us keep our privacy as intact as possible.

June 14, 2022

Related Topics

Statistics
Resources
rag
Programming
Machine Learning
LLM
Generative AI
Data Visualization
Data Security
Data Science
Data Engineering
Data Analytics
Computer Vision
Career
AI