SafetyDetectives spoke with the co-founder of Private AI, Patricia Thaine, about the risks of using a service like ChatGPT without a security layer, cybersecurity challenges created by AI, common misconceptions about online privacy, and more.
Hi Patricia, Thank you for taking the time to talk with us today. Can you tell me about your background and what motivated you to start Private AI?
My background is in research in privacy, applied cryptography, natural and spoken language processing. Before co-founding Private AI, I was doing a PhD in Computer Science at the University of Toronto, which is currently on hold.
The main motivation around private AI was the growing need for developers to integrate privacy into their software. Besides data protection regulations, that integration is the difference between getting access to data they need in order to improve their systems (or even make them work) or not getting access at all.
At the time when we started, in 2019, no one was tackling the very core of privacy well, which is identifying personal information in the first place (names, addresses, SSNs, credit card numbers, originals, etc.) with high enough accuracy when it came to 80-90% of the data collected (text, documents, audio, images, etc.). Our goal therefore became building the best system in the world for tackling this problem internationally, making tools that would make it easy for developers to integrate within their software, and ensuring that everything we build is very generalizable for solving problems within the privacy space in our mission to create the Privacy Layer for Software.
What are the main services that Private AI offers?
Since day one, our focus has been helping companies unlock the value of their data without compromising privacy. We use state-of-the-art AI to detect, redact and replace over 50 types of Personally Identifiable Information (PII) in over 49 languages. What’s more, our solution is deployed within the customers’ own environment, meaning their data never has to be transferred to a third-party data processor. The product covers data protection regulations such as GDPR, CPRA, and HIPAA.
We recently launched the Privacy Layer for ChatGPT, PrivateGPT, which redacts PII from user prompts before sending it through to ChatGPT and then re-populates the PII within the answer for a seamless and secure user experience. Entities can be toggled on or off to provide ChatGPT with the context it needs to successfully answer the query or privacy mode can be turned off entirely if no sensitive information needs to be filtered out.
What are the risks of using ChatGPT without a security layer?
Large Language Models are not excluded from data protection laws such as the GDPR, HIPAA, PCI DSS, or the CPRA. The GDPR, for example, requires companies to get consent for all uses of EU citizens’’ personal data (regardless of where they are in the world) and also comply with requests to be forgotten. By sharing personal information with third-party organizations, like Open AI, they lose control over how that data is stored and used, putting themselves at serious risk of compliance violations.
With PrivateGPT, only necessary information gets shared with OpenAI, and companies can rest assured that besides remaining compliant with data protection regulations, no personal information is subject to unwanted data exposure due to bugs or data breaches.
Compliance with data privacy regulations, such as GDPR or CCPA, is crucial in the handling of personal data. How does Private AI address the challenges and complexities of compliance while effectively utilizing AI and machine learning algorithms to scrub PII from various document types?
We use the latest advancements in machine learning to achieve high accuracy in identifying PII. We aim to reach 99.5%+ accuracy on the types of data our clients handle, processing up to 70,000 words per second, and supporting 49 languages. Private AI’s solution goes beyond simple regex-based methods and can handle real-time redaction, ensuring that sensitive information is effectively scrubbed from the data.
What’s more, the solution is deployed as a container so organizations can maintain full control over their data – it will never be shared with external parties, including Private AI.
Our solution removes unnecessary personal data, allowing security-conscious businesses to filter out all PII, Payment Card Industry (PCI) data, and Protected Health Information (PHI) that doesn’t serve a clear purpose before it even reaches their servers. This approach minimizes the amount of personal data stored, reducing the risk of data breaches and potential liabilities.
We also help companies of all sizes comply with a range of privacy regulations, including GDPR, CPRA, HIPAA, LGPD, and PCI DSS. By integrating Private AI into their cloud infrastructure or database, organizations can identify and redact PII from text, documents, audio recordings, and images to better understand risk and, in turn, reduce it.
In turn, we practice what we preach and only train our models on data that does not contain personally identifiable information. Our model training process is, therefore exempt from data protection regulations since we don’t store or process any personal data within our systems.
What are some common misconceptions about online privacy that you can clear up?
Unfortunately, data privacy is still a foreign concept for many. In fact, a 2019 survey showed that 63% of Americans say they understand very little or nothing at all about the laws and regulations that are currently in place to protect their privacy.
One of the most common misconceptions is that “only illegal activities are monitored” and that you have nothing to worry about if you are not doing anything wrong. That’s not true – corporations can collect a significant amount of data from users for legitimate purposes like improving services or targeted advertising. Protecting your online privacy is essential regardless of your activities being legal, since your data can provide means by which to manipulate you, or can even lead to identity theft if information like your SSN are stolen. Privacy is about control over your data and the key to maintaining that control is having organizations maintain strong data protection practices.
On the same note, some believe that they are safe simply because they are “nobody.” Young, old, rich, poor, celebrity, CEO, stay-at-home parent – it does not matter, your personal information is still at risk of being used for identity theft, fraud and other online abuses. It doesn’t matter who you are or what you do – you still want to safeguard your privacy and ensure your information is not misused and is properly protected.
While it may be overwhelming to ensure that every online service you are using is respectful of our privacy, there are a couple of things that one could do that already make a difference: 1) ensuring privacy is a core component of the main technologies you use (browser, search engine, email), and 2) encouraging your state or federal representatives to take privacy and data protection seriously within legislation.
What are some of the cybersecurity challenges that AI is creating as it becomes more mainstream?
Model inversion attacks are a concern. Attackers can attempt to extract sensitive information from AI models and can gain insights into the underlying training data, proprietary algorithms, or confidential information that the AI model was trained on.
That said, personal and sensitive data can be output at inference even without a model user explicitly attempting to extract it. In order to learn and create accurate outputs, AI systems rely on large amounts of data. Sometimes, that data can include personal and sensitive information, making it vulnerable to being exposed in future outputs. A good example of things going wrong is the ScatterLab Lee-Luda chatbot scandal, where a chatbot trained on intimate conversations started spewing out names, addresses, usernames, and other personal information in production. It’s crucial to protect data throughout its lifecycle, from collection and storage to processing and sharing, to prevent unauthorized access to sensitive and personal data.
In addition, there’s the risk of cyberthreat generation. Since its launch, ChatGPT has helped millions of users create text, music, and even write software code. However, it has also been reported that cybercriminals are also using the tool to generate malware. While OpenAI can continue to enhance ChatGPT’s restrictions to address this issue, hackers are sure to find a way to continue to abuse this technology, as well as other large language models.