Provide a basic architecture for AI data governance based on existing laws and regulations and engineering practices.
Data governance can be described in a number of dimensions, such as data traceability, intellectual property rights, content legitimacy, and individual privacy protection.
- Provenance
- Intellectual property
- Content
- Privacy
- Quality
- Security
- Ethics
- Lifecycle
Characteristics | Risk | Solution |
---|---|---|
Provenance | Introduction of data from illegitimate sources Inability to provide data source information to customers or regulators |
Data Versioning Data Provenance |
Intellectual property | Unauthorized use, non-compliance with use agreement resulting in legal action Illegal use of copyrighted data by third parties |
Data Compliance |
Privacy | Sensitive personal information used for AI training Failure to fulfill personal information protection obligations |
Protection of private information |
Content | Erotic, violent and other harmful content used for AI training |
Content Risk Control |
Ethics | Use of data resulting in gender, national and ethnic discrimination | Synthetic data |
Quality | Pre-trained data, labeling unable to meet the requirements of authenticity, accuracy, objectivity and diversity |
Data quality evaluation |
Security | Data stolen by third parties during transmission due to unsecured channels Data stolen by third parties during processing due to insecure production environment |
Data lineage tracing Data encryption Data system security |
Lifecycle | Data & Modeling, Data & Product Lifecycle Matching Full management of data from a product perspective |
Data lifecycle management |