Safety guidelines provide necessary first layer of data protection in AI gold rush

4 months ago 32
BOOK THIS SPACE FOR AD
ARTICLE AD
AI safety concept
da-kuk/Getty Images

Safety frameworks will provide a necessary first layer of data protection, especially as conversations around artificial intelligence (AI) become increasingly complex. 

These frameworks and principles will help mitigate potential risks while tapping the opportunities for emerging technology, including generative AI (Gen AI), said Denise Wong, deputy commissioner of Personal Data Protection Commission (PDPC), which oversees Singapore's Personal Data Protection Act (PDPA). She is also assistant chief executive of industry regulator, Infocomm Media Development Authority (IMDA). 

Also: AI ethics toolkit updated to include more assessment components

Conversations around technology deployments have become more complex with generative AI, said Wong, during a panel discussion at Personal Data Protection Week 2024 conference held in Singapore this week. Organizations need to figure out, among other issues, what the technology entails, what it means for their business, and the guardrails needed. 

Providing the basic frameworks can help minimize the impact, she said. Toolkits can provide a starting point from which businesses can experiment and test generative AI applications, including open-source toolkits that are free and available on GitHub. She added that the Singapore government will continue to work with industry partners to provide such tools.

These collaborations will also support experimentation with generative AI, so the country can figure out what AI safety entails, Wong said. Efforts here include testing and red-teaming large language models (LLMs) for local and regional context, such as language and culture. 

She said insights from these partnerships will be useful for organizations and regulators, such as PDPC and IMDA, to understand how the different LLMs work and the effectiveness of safety measures. 

Singapore has inked agreements with IBM and Google to test, assess, and finetune AI Singapore's Southeast Asian LLM, called SEA-LION, during the past year. The initiatives aim to help developers build customized AI applications on SEA-LION and improve cultural context awareness of LLMs created for the region. 

Also: As generative AI models evolve, customized test benchmarks and openness are crucial

With the number of LLMs worldwide growing, including major ones from OpenAI and open-source models, organizations can find it challenging to understand the different platforms. Each LLM comes with paradigms and ways to access the AI model, said Jason Tamara Widjaja, executive director of AI, Singapore Tech Center at pharmaceutical company, MSD, who was speaking on the same panel. 

He said businesses must grasp how these pre-trained AI models operate to identify the potential data-related risks. Things get more complicated when organizations add their data to the LLMs and work to finetune the training models. Tapping technology such as retrieval augmented generation (RAG) further underscores the need for companies to ensure the right data is fed to the model and role-based data access controls are maintained, he added.

At the same time, he said businesses also have to assess the content-filtering measures on which AI models may operate as these can impact the results generated. For instance, data related to women's healthcare may be blocked, even though the information provides essential baseline knowledge for medical research.  

Widjaja said managing these issues involves a delicate balance and is challenging. A study from F5 revealed that 72% of organizations deploying AI cited data quality issues and an inability to expand data practices as key challenges to scaling their AI implementations. 

Also: 7 ways to make sure your data is ready for generative AI

Some 77% of organizations said they did not have a single source of truth for their datasets, according to the report, which analyzed data from more than 700 IT decision-makers globally. Just 24% said they had rolled out AI at scale, with a further 53% pointing to the lack of AI and data skillsets as a major barrier.

Singapore is looking to help ease some of these challenges with new initiatives for AI governance and data generation. 

"Businesses will continue to need data to deploy applications on top of existing LLMs," said Minister for Digital Development and Information Josephine Teo, during her opening address at the conference. "Models must be fine-tuned to perform better and produce higher quality results for specific applications. This requires quality datasets."

And while techniques such as RAG can be used, these approaches only work with additional data sources that were not used to train the base model, Teo said. Good datasets, too, are needed to evaluate and benchmark the performance of the models, she added.

Also: Train AI models with your own data to mitigate risks

"However, quality datasets may not be readily available or accessible for all AI development. Even if they were, there are risks involved [in which] datasets may not be representative, [where] models built on them may produce biased results," she said. In addition, Teo said datasets may contain personally identifiable information, potentially resulting in generative AI models regurgitating such information when prompted. 

Putting a safety label on AI

Teo said Singapore will release safety guidelines for generative AI models and application developers to address the issues. These guidelines will be parked under the country's AI Verify framework, which aims to offer baseline, common standards through transparency and testing.

"Our guidelines will recommend that developers and deployers be transparent with users by providing information on how the Gen AI models and apps work, such as the data used, the results of testing and evaluation, and the residual risks and limitations that the model or app may have," she explained 

The guidelines will further outline safety and trustworthy attributes that should be tested before deployment of AI models or applications, and address issues such as hallucination, toxic statements, and bias content, she said. "This is like when we buy household appliances. There will be a label that says that it has been tested, but what is to be tested for the product developer to earn that label?"

PDPC has also released a proposed guide on synthetic data generation, including support for privacy-enhancing technologies, or PETs, to address concerns about using sensitive and personal data in generative AI. 

Also: Transparency is sorely lacking amid growing AI interest

Noting that synthetic data generation is emerging as a PET, Teo said the proposed guide should help businesses "make sense of synthetic data", including how it can be used.

"By removing or protecting personally identifiable information, PETs can help businesses optimize the use of data without compromising personal data," she noted. 

"PETs address many of the limitations in working with sensitive, personal data and open new possibilities by making data access, sharing, and collective analysis more secure."

Read Entire Article