Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

data privacy

Data Science Dojo Staff
| August 29

A new era in AI: introducing ChatGPT Enterprise for businesses! Explore its cutting-edge features and pricing now.

To leverage the widespread popularity of ChatGPT, OpenAI has officially launched ChatGPT Enterprise, a tailored version of their AI-powered chatbot application, designed for business use.


Introducing ChatGPT enterprise

ChatGPT Enterprise, which was initially hinted at in a previous blog post earlier this year, offers the same functionalities as ChatGPT, enabling tasks such as composing emails, generating essays, and troubleshooting code. However, this enterprise-oriented iteration comes with added features like robust privacy measures and advanced data analysis capabilities, elevating it above the standard ChatGPT. Additionally, it offers improved performance and customization options.

These enhancements put ChatGPT Enterprise on a feature parity level with Bing Chat Enterprise, Microsoft’s recently released enterprise-focused chatbot service.


Introducing ChatGPT Enterprise
Introducing ChatGPT Enterprise

Privacy, customization, and enterprise optimization

Today marks another step towards an AI assistant for work that helps with any task, protects your company data, and is customized for your organization. Businesses interested in ChatGPT Enterprise should get in contact with us. While we aren’t disclosing pricing, it’ll be dependent on each company’s usage and use cases.” – OpenAI 

Streamlining business operations: The administrative console

ChatGPT Enterprise introduces a new administrative console equipped with tools for managing how employees in an organization utilize ChatGPT. This includes integrations for single sign-on, domain verification, and a dashboard offering usage statistics. Shareable conversation templates enable employees to create internal workflows utilizing ChatGPT, while OpenAI’s API platform provides credits for creating fully customized solutions powered by ChatGPT.

Notably, ChatGPT Enterprise grants unlimited access to Advanced Data Analysis, a feature previously known as Code Interpreter in ChatGPT. This feature empowers ChatGPT to analyze data, create charts, solve mathematical problems, and more, even with uploaded files. For instance, when given a prompt like “Tell me what’s interesting about this data,” ChatGPT’s Advanced Data Analysis feature can delve into data, such as financial, health, or location data, to generate insightful information.

Large language model bootcamp


Priority access to GPT-4: Enhancing performance

Advanced-Data Analysis was previously exclusive to ChatGPT Plus subscribers, the premium $20-per-month tier for the consumer ChatGPT web and mobile applications. OpenAI intends for ChatGPT Plus to coexist with ChatGPT Enterprise, emphasizing their complementary nature.

ChatGPT Enterprise operates on GPT-4, OpenAI’s flagship AI model, just like ChatGPT Plus. However, ChatGPT Enterprise customers receive priority access to GPT-4, resulting in performance that is twice as fast as the standard GPT-4 and offering an extended context window of approximately 32,000 tokens (around 25,000 words).


Data security: A paramount concern addressed

The context window denotes the text the model considers before generating additional text, while tokens represent individual units of text (e.g., the word “fantastic” might be split into the tokens “fan,” “tas,” and “tic”). Larger context windows in models reduce the likelihood of “forgetting” recent conversation content.

OpenAI is actively addressing business concerns by affirming that it will not use business data sent to ChatGPT Enterprise or any usage data for model training. Additionally, all interactions with ChatGPT Enterprise are encrypted during transmission and while stored.


OpenAI’s announcement on LinkedIn of ChatGPT enterprise

ChatGPT’s impact on businesses

OpenAI asserts strong interest from businesses in a business-focused ChatGPT, noting that ChatGPT, one of the fastest-growing consumer applications in history, has been embraced by teams in over 80% of Fortune 500 companies.

Monetizing the innovation: Financial considerations

However, the sustainability of ChatGPT remains uncertain. According to Similarweb, global ChatGPT traffic decreased by 9.7% from May to June, with an 8.5% reduction in average time spent on the web application. Possible explanations include the launch of OpenAI’s ChatGPT app for iOS and Android and the summer vacation period, during which fewer students use ChatGPT for academic assistance. Increased competition may also be contributing to this decline.

OpenAI faces pressure to monetize the tool, considering the company’s reported expenditure of over $540 million in the previous year on ChatGPT development and talent acquisition from companies like Google, as mentioned in The Information. Some estimates suggest that ChatGPT costs OpenAI $700,000 daily to operate.

Nonetheless, in fiscal year 2022, OpenAI generated only $30 million in revenue. CEO Sam Altman has reportedly set ambitious goals, aiming to increase this figure to $200 million this year and $1 billion in the next, with ChatGPT Enterprise likely playing a crucial role in these plans.


Read more –> Boost your business with ChatGPT: 10 innovative ways to monetize using AI

ChatGPT enterprise pricing details

Positioned as the highest tier within OpenAI’s range of services, ChatGPT Enterprise serves as an extension to the existing free basic service and the $20-per-month Plus plan. Notably, OpenAI has chosen a flexible pricing strategy for this enterprise-level service. Rather than adhering to a fixed price, the company’s intention is to personalize the pricing structure according to the distinct needs and scope of each business.

According to COO Brad Lightcap’s statement to Bloomberg, OpenAI aims to collaborate with each client to determine the most suitable pricing arrangement.


ChatGPT Pricing
ChatGPT Pricing


OpenAI’s official statement reads, “We hold the belief that AI has the potential to enhance and uplift all facets of our professional lives, fostering increased creativity and productivity within teams. Today signifies another stride towards an AI assistant designed for the workplace, capable of aiding with diverse tasks, tailored to an organization’s specific requirements, and dedicated to upholding the security of company data.”

This approach focused on individualization strives to render ChatGPT Enterprise flexible to a range of corporate prerequisites, delivering a more personalized encounter compared to its standardized predecessors.


Is ChatGPT enterprise pricing justified?

ChatGPT Enterprise operates on the GPT-4 model, OpenAI’s most advanced AI model to date, a feature shared with the more affordable ChatGPT Plus. However, there are notable advantages for Enterprise subscribers. These include privileged access to an enhanced GPT-4 version that functions at double the speed and provides a more extensive context window, encompassing approximately 32,000 tokens, equivalent to around 25,000 words.

Understanding the significance of the context window is essential. Put simply, it represents the amount of text the model can consider before generating new content. Tokens are the discrete text components the model processes; envision breaking down the word “fantastic” into segments like “fan,” “tas,” and “tic.” A model with an extensive context window is less prone to losing track of the conversation, leading to a smoother and more coherent user experience.

Regarding concerns about data privacy, a significant issue for businesses that have previously restricted employee access to consumer-oriented ChatGPT versions, OpenAI assures that ChatGPT Enterprise models will not be trained using any business-specific or user-specific data. Furthermore, the company has implemented encryption for all conversations, ensuring data security during transmission and storage.

Taken together, these enhancements suggest that ChatGPT Enterprise could offer substantial value, particularly for organizations seeking high-speed, secure, and sophisticated language model applications.


Register today

Data Science Dojo
Rebecca Merrett
| November 22

There’s more to data security and access control than granting teams within a company different access levels and issuing user passwords.

As data scientists, our job is not to run the whole security operation in our organizations to avoid a security breach. However, as we work very closely with data, we must understand the importance of having good, robust mechanisms in place to prevent sensitive and personally identifiable information from getting into the wrong hands, or from any cyber attack. Hence, the need for data security.

Strong passwords? Not enough

Setting ourselves up with a strong password might not cut it in today’s world. Some of the world’s biggest banks, which have an army of highly skilled security professionals, have suffered ever-more smarter cyber attacks. Today, users are logging into work systems and databases through biometrics such as fingerprint scanning technology on smartphones, laptops, and other devices or computers.

Two-factor authentication is also a popular mechanism of data security, which goes beyond simply identifying and authenticating a user through their password alone. Users are now logging into systems using a one-time password – which is sent to their work email, requiring another form of login – in combination with their fingerprint password. Generating a random number or token string each time a user logs into a system can reduce the risk of a single password being decrypted or obtained some other way.

Finishing the equation

User identity and authentication are only half of the equation, however. The other half is using anomaly detection algorithms or machine learning to pick up on unusual user activity and behavior once a user has logged on. This is something we as data scientists can bring to the table in helping our organizations better secure our customer or business data. Some of the key features of anomaly detection models include the time of access, location of access, type of activity or use of the data, device type, and how frequently a user accesses the database.

The model collects these data security points every time a user logs into the database and continuously monitors and calculates a risk score based on these data security points and how much they deviate from the user’s past logins. If the user reaches a high enough score, an automated mobile alert can be sent to the security team to further investigate or to take action.

Data security examples

Some obvious data security examples include a user who lives in Boston who logged out of the database 10 minutes ago but is now accessing the database in Berlin. Or, a user who usually logs in to the database during work hours is now logging in at 3 am.

Other examples include an executive assistant, who rarely logs into the database, and is now frequently logging into the database every 10 minutes. A data scientist, who usually aggregates thousands of rows of data is now retrieving a single row.

A marketer, who usually searches the database for contact numbers, is now attempting to access credit card information, even though that marketer already knows she/he does not have access to this information.

Another way data scientists can safeguard their customer or business data is to keep the data inside the database rather than exporting a subset or local copy of the data onto their computer or device. Nowadays, there are many tools to connect different database providers to R or Python, such as the odbcConnect() function as part of the RODBC library in R, which reads and queries data from a database using an ID and password rather than importing data from a local computer.

The ID and password can be removed from the R or Python file once the user has finished working with the data, so an attacker cannot run the script to get the data without a login. Also, if an attacker were to crack open a user’s laptop, he or she would not find a local copy of the data on that device.

Row and column access is another example of data security through fine-grained access controls. This mechanism masks certain columns or rows for different users. These masked columns or rows in tabled data usually contain sensitive or personally identifiable information. For example, the columns which contain financial information might be masked by the data science team but not by the finance/payments processing team.

Conclusion & other tips

Other ways to safely deal with sensitive and personally identifiable information include differential privacy and k-anonymity. To learn about these techniques, please read Dealing with data privacy – anonymization techniques.

Data Science Dojo
Raja Iqbal
| November 6

What do you think about your privacy? Do you wonder why data privacy and data anonymization are important? Read along to find all the answers. 

Internet companies of the world have the resources and power to be able to collect a microscopic level of detail on each and every one of its users and build their user profiles. In this day and age, it’s almost delusional to think that we still operate in a world that sticks by the good, old ideals of data privacy.

You have experienced, at some point in your life, a well-targeted email, phone call, letter, or advertisement.

Why should we care?

“If someone had nothing to hide, why should she/he care?” You have heard this argument before. Let’s use an analogy that explains why some people *do *care about privacy, despite having “nothing to hide”:

You just came home from a date. You are excited and can’t believe how awesome the person you are dating is. In fact, it feels too good to be true how this person just “gets you,” and it feels like he/she has known you for an exceptionally long time. However, as time goes by, the person you are dating starts to change and the romance wears off.

You notice from unintentionally glimpsing at your date’s work desk that there is a folder stuffed with your personal information. From your place of birth to your book membership status and somehow even your parents’ contact information! You realize this data was used to relate to you on a personal level.

The folder doesn’t contain anything that shows you are of bad character, but you still feel betrayed and hurt that the person you are dating disingenuously tried to create feelings of romance. As data scientists, we don’t want to be the date who lost another person’s trust, but we also don’t want to have zero understanding of the other person. How can we work around this challenge?

Learn more about data science for business leaders

Simple techniques to anonymize Data

A simple approach to maintaining personal data privacy when using data for predictive modeling or to glean insightful information is to scrub the data.

Scrubbing is simply removing personally identifiable information such as name, address, and date of birth. However, cross-referencing this with public data or other databases you may have access to could be used to fill in the “missing gaps” in the scrubbed dataset.

The classic example of this was when then MIT student Latanya Sweeny was able to identify an individual using a scrubbed health records and cross-referencing it with voter-registration records.

Tokenization is another commonly used technique to anonymize sensitive data by replacing personally identifiable information such as a name with a token such as a numerical representation of that name. However, the token could be used as a reference to the original data.

Sophisticated techniques to anonymize data

More sophisticated workarounds that help overcome the de-anonymization of data are differential privacy and k-anonymity.

data Privacy
Importance of privacy

Differential privacy

Differential privacy uses mathematical mechanisms to add random noise to the original dataset to mask personally identifiable information, while making it possible to probabilistically return similar search results if you were to run the same query over the original dataset. An analogy is trying to disguise a toy panda with a horse head, creating just enough of a disguise to not recognize it’s a panda.

When queried, it returns the counts of toys, which the disguised panda belongs to, without recognizing an individual panda toy.

Apple, for example, has started using differential data privacy with its iOS 10 devices to uncover patterns in user behavior and activity without having to identify individual users. This allows Apple to analyze purchases, web browsing history, and health data while maintaining your privacy.


K-anonymity also aggregates data. It takes the approach of looking for k specified number of people that contain the same identifiable combination of attributes so that an individual is hidden within that group. Identifiable information such as age can be generalized so that age is replaced with an approximation such as less than 25 years of age or greater than 50 years of age.

However, lack of randomization to mask sensitive data means k-anonymity can be vulnerable to being hacked.

Remember: It’s your data privacy, too

As data scientists, it can be easy to disassociate ourselves from data, which is not personally our own, but other people’s. It can be easy to forget that the data we hold in our hands are not just endless records but are the lives of the people who kindly gave up some of their data privacy so that we could go about understanding the world better.

Besides the serious legal consequences of breaching data privacy, remember that it could be your personal life records in a stranger’s hands.

Data Science Dojo
Mohd Sohel Ather
| June 24

Learn how to configure the security of your Windows 10 account and be the true owner of your personal data and your privacy.

Technology is something wonderful that for centuries has been improving people’s lives and facilitating day-to-day, allowing us to do things that until now were impossible. It is undeniable how the internet has changed our lives and connected us with people from all over the globe, but it also has its drawbacks.

The latest advances in technology have reduced our privacy to a level that we had never before reached. It is no longer necessary to be a famous star so that our privacy is worth a lot of money. Many of the services and tools we use every day are collecting our personal data without us being aware, in exchange for making our lives easier.

Whether to protect you from hackers who want to steal your banking information or because you do not feel comfortable sharing your location with all the applications you use, you must be aware of how to protect yourself from these dangers.

Here I propose a few essential steps in Windows 10 to be the owner of your data and know who wants to know where you are or what pages you visit, and even, avoid being spied on through your computer’s camera.

1. Say no to the fast installation of Windows 10

When it comes to Windows 10 or an application, many times we look for the fastest option in which we have to do as little as possible and companies increasingly offer more options for users to get rid of these very tedious processes.

The downside is that the more we disengage from the configuration and installation process, the more power we give companies to do what they want with our privacy, you know the saying, “if you want something to be done well, you have to do it yourself”.

When installing Windows 10 make sure you choose the custom configuration to monitor each of the permissions you give the system. Then go to the privacy settings. To get to this section, you only have to press the Windows key and the ‘I’ key at the same time. There you can configure the Windows 10 privacy to your liking, although you will not be able to access the configuration of the other applications. You will have to go one by one. Even if you are configuring your Windows Office, you can click here to read about how to do it properly.

2. Cut the wings to Cortana

Virtual assistants are very useful to make our lives easier but to know each other so well as to be really useful we must give them access to a whole range of personal data. If you work with your computer, you may not be interested in Cortana having access to your company’s data or bank details. Although in the end you do not care and let them work with this information, at least we recommend that you take a look at everything Cortana wants to know about you.

In the configuration section of Cortana, you can delete all the data that this assistant has from you or select those in particular that seem too private.

3. Turn off your location

As mobile phones do, Windows 10 automatically tracks your location at any time and saves this information for about 24 hours to also share it with any third-party application that you have downloaded.

Once again in the privacy settings section, you can deactivate the tracking of your location, or activate it in case you need it in a timely manner. In addition, it is recommended that you review the configuration of each of the applications that you install on your computer and be aware of which ones are trying to know where you are.

4. Block ad tracking

Whether you are browsing the different online stores looking for a new smartphone or you are simply checking hotels to find out how much it would cost you to go on a weekend trip, you will see hundreds of ads about things you have been looking for.

That’s ad tracking and it’s a nuisance, but the good thing is that it can be blocked. Obviously, by default Windows 10 has it activated because it is great for companies to know what you are thinking of spending your money on.

In the privacy settings, you will find a section where you can deactivate the option “Allow applications to use my advertising ID”.

5. Don’t allow access to your camera

How many times has Hollywood warned us with their films and TV series about hackers that are dedicated to controlling the cameras of other people’s laptops to spy on a girl and how many times we would have thought it was an invention of the screenwriters?

Well it is a very real possibility and experts warn of how easy it is to do this, so, it is more than recommended to block the camera and only activate it when you have to use it for a conference or a family video call. Moreover, you cannot only disable the camera from the computer configuration, but you can cover it with small protectors that give a lot of peace and do not cost a lot.

6. Disable access to the microphone

If we block the camera, how can we not block the microphone? It is true that if you want to give instructions to Cortana you need it, but if you have decided to silence her completely, you should also do the same with the microphone.

7. Who can see your account information?

Another data point we give to Windows and it shares with others is the account data, such as name and email address. If we go to the section “Account information” we will see that many of the applications we use have access to this data.

Customizing this section of the configuration never hurts and shares as little data as possible with applications.

8. Eliminate tracking of the timeline

This point is related to point number four. Surfing the internet should imply that others know what you are looking for even if they are completely harmless pages. Tracking the timeline involves tracking all the websites you have visited, eliminating it increases your privacy, and control of your data. The bad thing is that you have to do it on a regular basis, like the browser history, which the experts recommend to delete it in the usual way.

9. Check the privacy settings after each update

We are sorry to inform you that after a system update it is possible that many of the steps you have taken today will be lost and you will have to fix them where you left them. Microsoft usually re-establishes all the privacy settings and reinforces them by default when restarting. Many complaints have been made about this aspect but it has fallen on deaf ears. Nothing happens but this should not prevent us from being attentive. To stay alert, it is best to turn off automatic updates and be aware of when an important one arrives.

This does not mean it’s better not to update. Updates are important to get new interesting features and improvements in system security but we should not give them up unless we see that they return to give problems.

10. Do not leave your privacy in the hands of others

This advice serves both Windows 10 users and any other system and device users, even Apple. User data is becoming the great business of the 21st century. Our privacy, tastes, interests, and other personal data are worth a lot to companies but they should be worth more to ourselves and we should protect them as if they were gold.

It requires a little effort to be aware of tricks like these and steps or tools that help us keep our privacy as intact as possible.

Related Topics

Machine Learning
Generative AI
Data Visualization
Data Security
Data Science
Data Engineering
Data Analytics
Computer Vision
Artificial Intelligence