Giskard is at the forefront of tackling AI safety and quality, as emphasized by co-founder and Chief Product Officer, Jean-Marie John-Mathews, in a recent interview with SafetyDetectives. The company’s open-source ML model testing framework and Python library aim to identify a wide range of vulnerabilities in AI models, including performance bias, data leakage, and ethical concerns. Giskard’s server serves as a collaborative platform for users to refine their models further, emphasizing the need for AI safety in production environments. Given the rising complexity and ethical implications associated with Machine Learning (ML) and AI, Giskard’s focus on rigorous testing, continuous monitoring, and education stands out as an industry best practice for ensuring the security and reliability of AI systems.
Can you introduce yourself and talk about what motivated you to co-found Giskard?
I’m Jean-Marie John-Mathews, co-founder and Chief Product Officer at Giskard. During my previous experiences in academics and industry, I found that many AI projects did not achieve their potential due to misunderstandings with business stakeholders. My co-founder, Alex, was facing the same issues, and we were often disappointed by the tools available to data scientists and alarmed by the lack of robustness in the AI workflow. We believed that for AI to truly make an impact, we needed to open up the so-called “black boxes” and make the technology more transparent and understandable to the public. Thus, Giskard was born. Our mission has always been to bridge the gap between complex AI technology and its application in real-world scenarios, ensuring robustness, and transparency.
What are Giskard’s flagship services?
At Giskard, our flagship services are designed with AI safety and quality in mind. We’ve developed an open-source ML model testing framework. Our Python library detects vulnerabilities in AI models—from tabular to Large Language Models. This includes issues like performance bias, data leakage, overconfidence, and many more. Users can scan their models directly in their notebooks or scripts, identify vulnerabilities, and generate tailored test suites. Additionally, our Giskard server acts as a collaborative platform, enabling users to curate domain-specific tests, compare and debug models, and collect feedback on ML model performance, all with the aim of ensuring AI safety when going to production.
What are the primary vulnerabilities you’ve seen in ML models?
In our experience working with ML models, we’ve identified several major vulnerabilities:
- Performance Bias: This occurs when a model shows discrepancies in performance metrics like accuracy or recall across different data segments, even if overall dataset performance seems satisfactory.
- Unrobustness: Models that are overly sensitive to minor input perturbations, such as simple typos, demonstrate this issue. These models may produce varying predictions for slightly altered data, leading to unreliability.
- Overconfidence & Underconfidence: Overconfidence in ML pertains to models that assign high confidence scores to incorrect predictions. Conversely, underconfidence happens when models predict with low certainty, even when they should be more assured. Both issues can mislead decision-making processes.
- Ethical Concerns: We’ve seen models that show biases based on gender, ethnicity, or religion, which is deeply concerning, especially when such biases lead to significant real-world implications.
- Data Leakage: This is when external information unintentionally creeps into the training dataset, often inflating performance metrics and misleading about the model’s true generalization ability.
- Stochasticity: Some models yield different results upon each execution with the same input, due to inherent randomness in certain algorithms. Ensuring consistent and reproducible results is crucial to build trust in ML systems.
- Spurious Correlation: At times, a model might rely on features that seem statistically correlated to predictions, but this relationship is coincidental, not causative, which can lead to misleading results.
These vulnerabilities underscore the need for rigorous testing and validation of ML models. This is why we created a scan functionality, to help identify these issues, ensuring more reliable and ethically responsible ML models.
With the increasing complexity of ML models, security becomes a concern. How can organizations ensure the security of their ML deployments?
As ML models grow in complexity, ensuring their security becomes essential. Organizations can increase the security of their ML deployments by:
- Thorough Testing: Regularly testing models against a wide range of inputs, especially adversarial examples, to uncover and rectify vulnerabilities.
- Continuous Monitoring: Setting up real-time monitoring to track the model’s behavior, detecting and addressing any anomalies.
- Data Privacy: Protecting training data, ensuring that sensitive information is never leaked.
- Regular Audits: Conducting security audits by third-party experts to identify potential weak spots and ensure compliance with security standards.
- Tooling Audits: Ensuring the open or closed-source tools used come from reputable sources, have been vetted by a community, incorporate the latest security patches and improvements, and have been analyzed for security vulnerabilities.
- Training: Educating teams on the latest security threats and best practices in ML.
Could you explain the significance of integrating quality standards from mature engineering practices, such as manufacturing and software development, into the realm of AI?
While Quality Assurance (QA) has been a cornerstone in traditional software development, its application in AI is quite recent. This is primarily because the AI development lifecycle presents unique challenges, such as the non-deterministic nature of AI models, intricate feedback loops with data, and unpredictable edge cases.
At Giskard, we recognize these challenges and have fused expertise from three AI research domains: ML Testing, XAI (Explainable AI), and FairML.
We bring new methodologies and software technology for ensuring quality in the development lifecycle of Artificial Intelligence, which is a completely new value chain.
Are there any best practices for staying up-to-date with the latest advancements and security measures in the field of ML and AI?
I’d recommend regularly following publications from top AI research institutions and actively engaging with the AI community through conferences like NeurIPS and online platforms such as ArXiv. Establishing partnerships with academia can yield valuable insights, while continuously monitoring AI journals ensures updated knowledge on best practices in security. Finally, cultivating a culture of shared learning within teams ensures the advancements do not go under the radar.