Objective: Ensure all data used with AI technologies is handled according to legal, institutional, and ethical standards, with a strong emphasis on respecting intellectual property.
Practical Application: Users should adhere to the following guidelines for data handling and intellectual property:
- Familiarize yourselves with the data classification guidelines as outlined at https://cybersecurity.illinoinns.edu/data-classification/ and obtain necessary permissions before leveraging external data within AI models. This includes conducting due diligence on the source of data and the rights associated with it.
- Ensure that all data used in conjunction with generative AI complies with relevant privacy laws and regulations (e.g., HIPAA, FERPA).
- Where possible, anonymize data to protect individual identities. Remove or obfuscate personally identifiable information (PII) before processing with AI systems.
- Store all data securely using encryption and access control measures. Regularly audit and update security protocols to safeguard against unauthorized access and data breaches.
- Collect only the data that is necessary for the intended purpose. Avoid the over-collection of data which can increase privacy risks and storage costs.
- Ensure the accuracy and quality of the data used. Use clean, relevant, and up-to-date data to train and deploy AI models to avoid biases and inaccuracies.
- When using content for purposes like education, research, or commentary, ensure it falls under fair use provisions. Understand the legal boundaries of fair use to avoid infringement.
- Always attribute the original creators of any data or content used. Adhere to licensing agreements and understand the terms of use for any third-party data or content.
- Secure intellectual property rights for AI-generated content where applicable. This includes understanding the legal status of AI-generated works and pursuing appropriate copyright or patent protections.
- Address and mitigate any biases in the data and models used. Ensure that AI systems do not perpetuate or exacerbate existing biases and inequalities.
- Engage with stakeholders, including data subjects, legal experts, and ethical advisors, to ensure responsible and effective use of generative AI.
Examples:
- When using Azure OpenAI, OpenAI’s GPT-4, Google’s Bard or other Gen AI to analyze data, ensure that any personal information is either anonymized or the customer has given explicit consent for their data to be used.
- Before uploading a dataset to Azure OpenAI, IBM Watson, Amazon Comprehend or other Gen AI for natural language processing (NLP), use tools to anonymize data by replacing names, addresses, and other PII with pseudonyms or masked values.
- Utilize Azure Key Vault, AWS Key Management Service (KMS), or Google Cloud Key Management to securely manage and access sensitive information, ensuring that only authorized applications and users can access the data used by generative AI services.
- When collecting data for training a language model on Azure OpenAI, Google’s PaLM, Meta’s LLaMA or other AI models, focus on acquiring relevant and specific datasets instead of large, indiscriminate collections of text.
- When using Azure OpenAI, OpenAI’s DALL-E, Google’s Imagen or other Gen AI to generate marketing content, ensure that any images, text, or media incorporated from external sources are properly attributed and licensed.
- When integrating third-party datasets into an Azure OpenAI model, Google’s BERT, LLaMA or other models, review and comply with the licensing terms of those datasets, and provide proper attribution where required.
- If using Azure OpenAI, OpenAI’s Codex, GitHub Copilot or other Gen AI to create proprietary software code or unique designs, consult with legal experts to secure the necessary intellectual property protections.
- Regularly evaluate the outputs of Azure OpenAI, OpenAI’s GPT-4, Meta’s LLaMA or other models for bias and implement corrective measures, such as re-training models with more balanced datasets or using fairness-enhancing algorithms.
- Implement content filters and review processes to ensure that Azure OpenAI, OpenAI’s ChatGPT, Google’s Bard or other Gen AI-generated content does not include offensive, misleading, or inappropriate material.
- When deploying Azure OpenAI, IBM Watson, Google’s Med-PaLM or other Gen AI services in a healthcare setting, collaborate with medical professionals, ethicists, and patients to ensure the AI’s use aligns with ethical standards and patient care priorities.
Outcome: The aim is to maintain integrity and compliance in all activities involving data, mitigating legal risks and upholding the organization’s reputation.
Guidelines for Users
- Data Handling and Intellectual Property