ai training Archives - News/Media Alliance https://www.newsmediaalliance.org/tag/ai-training/ Thu, 26 Oct 2023 14:16:35 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.2 Global Principles on Artificial Intelligence (AI) https://www.newsmediaalliance.org/global-principles-on-artificial-intelligence-ai/ https://www.newsmediaalliance.org/global-principles-on-artificial-intelligence-ai/#respond Wed, 06 Sep 2023 11:55:17 +0000 https://www.newsmediaalliance.org/?p=14056 This document sets out principles that the undersigned publisher organisations believe should govern the development, deployment, and regulation of Artificial Intelligence systems and applications.

The post Global Principles on Artificial Intelligence (AI) appeared first on News/Media Alliance.

]]>

Credit: blackdovfx / iStock/Getty Images Plus via Getty Images

Download as a PDF

Introduction

AI developers and regulators have a unique opportunity to establish an ethical AI framework to boost innovation and create new business opportunities, while ensuring that AI develops in a way that is responsible and sustainable. To achieve this, it is essential that AI systems are trained on content and data which is accessed lawfully, including by appropriate prior authorisations obtained for the use of copyright protected works and other subject matter, and that the content and sources used to train the systems are clearly identified. This document sets out principles that the undersigned publisher organisations believe should govern the development, deployment, and regulation of Artificial Intelligence systems and applications. These principles cover issues related to intellectual property, transparency, accountability, quality and integrity, fairness, safety, design, and sustainable development.

The proliferation of AI Systems, especially Generative Artificial Intelligence (GAI), present a sea change in how we interact with and deploy technology and creative content. While AI technologies will provide substantial benefits to the public, content creators, businesses, and society at large, they also pose risks for the sustainability of the creative industries, the public’s trust in knowledge, journalism, and science, and the health of our democracies.

We, the undersigned organisations, fully embrace the opportunities AI will bring to our sector and call for the responsible development and deployment of AI systems and applications. We strongly believe that these new tools will facilitate innovative breakthroughs when developed in accordance with established principles and laws that protect publishers’ intellectual property (IP), valuable brands, trusted consumer relationships, and investments. The indiscriminate appropriation of our intellectual property by AI systems is unethical, harmful, and an infringement of our protected rights.

Our organisations represent thousands of creative professionals around the world, including news, magazine, and book publishers and the academic publishing industry such as learned societies and university presses. Our members invest considerable time and resources creating high-quality content that keeps our communities informed, entertained, and engaged. These principles – applying to the use of our content to train and deploy AI systems, as they are understood and used today – are aimed at ensuring our continued ability to innovate, create and disseminate such content, while facilitating the responsible development of trustworthy AI systems.

Intellectual Property

1) Developers, operators, and deployers of AI systems must respect intellectual property rights, which protect the rights holders’ investments in original content. These rights include all applicable copyright, ancillary rights, and other legal protections, as well as contractual restrictions or limitations imposed by rightsholders on the access to and use of their content. Therefore, developers, operators, and deployers of AI systems—as well as legislators, regulators, and other parties involved in drafting laws and policies regulating AI—must respect the value of creators’ and owners’ proprietary content in order to protect the livelihoods of creators and rightsholders.

2) Publishers are entitled to negotiate for and receive adequate remuneration for use of their IP. AI system developers, operators, and deployers should not be crawling, ingesting, or using our proprietary creative content without express authorisation. Use of intellectual property by AI systems for training, surfacing, or synthesising is usually expressly prohibited in online terms and conditions of the rightsholders, and not covered by pre-existing licensing agreements. Where developers have been permitted to crawl content for one purpose (for example, indexing for search), they must seek express authorisation for use of the IP for other purposes, such as inclusion within LLMs. These agreements should also account for harms that AI systems may cause, or have already caused, to creators, owners, and the public.

3) Copyright and ancillary rights protect content creators and owners from the unlicensed use of their content. Like all other uses of protected works, use of protected works in AI systems is subject to compliance with the relevant laws concerning copyrights, ancillary rights, and permissions within protocols. To ensure that access to content for use in AI systems is lawful, including through appropriate licenses and permissions obtained from relevant rightsholders, it is essential that rightsholders are able effectively to enforce their rights, and where applicable, require attribution and remuneration.

4) Existing markets for licensing creators’ and rightsholders’ content should be recognised. Valuing publishers’ legitimate IP interests need not impede AI innovation because frameworks already exist to permit use in return for payment, including through licensing. We encourage efficient licensing models that can facilitate training of trustworthy and high-quality AI systems

Transparency

5) AI systems should provide granular transparency to creators, rightsholders, and users. It is essential that strong regulations are put in place to require developers of AI systems to keep detailed records of publisher works and associated metadata, alongside the legal basis on which they were accessed, and to make this information available to the extent necessary for publishers to enforce their rights where their content is included in training datasets. The obligation to keep accurate records should go back to the start of the AI development to provide a full chain of use regardless of the jurisdiction in which the training or testing may have taken place. Failure to keep detailed records should give rise to a presumption of use of the data in question. When datasets or applications developed by non-profit, research, or educational third parties are used to power commercial AI systems, this must be clearly disclosed so that publishers can enforce their rights. Where developers use AI tools as a component into the process of generating knowledge from knowledge, there should be transparency on the application of these tools, including appropriate and clear accountability and provenance mechanisms, as well as clear attribution where appropriate in accordance with the terms and conditions of the publishers of the original content. Without limiting and subject to paragraphs 6 and 9, AI developers should work with publishers to develop mutually acceptable attribution and navigation standards and formats. Users should also be provided with comprehensible information about how such systems operate to make judgments about system and output quality and trustworthiness.

Accountability

6) Providers and deployers of AI systems should cooperate to ensure accountability for system outputs. AI systems pose risks for competition and public trust in the quality and accuracy of informational and scientific content. This can be compounded by AI systems generating content that improperly attributes false information to publishers. Deployers of AI systems providing informational or scientific content should provide all essential and relevant information to ensure accountability and should not be shielded from liability for their outputs, including through limited liability regimes and safe harbours.

Quality and Integrity

7) Ensuring quality and integrity is fundamental to establishing trust in the application of AI tools and services. These values should be at the heart of the AI lifecycle, from the design and building of algorithms, to inputs used to train AI tools and services, to those used in the  practical application of AI. A fundamental principle of computing is that a process can only be as good or unbiased as the input used to teach the system (rubbish-in-rubbish-out). AI developers and deployers should recognise that publishers are an invaluable part of their supply chain, generating high-quality content for training, and also for surfacing and synthesising. Use of high-quality content upstream will contribute to high-quality outputs for downstream users.

Fairness

8) AI systems should not create, or risk creating, unfair market or competition outcomes. AI systems should be designed, trained, deployed, and used in a way that is compliant with the law, including competition laws and principles. Developers and deployers should also be required to ensure that AI models are not used for anti-competitive purposes. The deployment of AI systems by very large online platforms must not be used to entrench their market power, facilitate abuses of dominance, or exclude rivals from the marketplace. Platforms must adhere to the concept of non-discrimination when it comes to publishers exercising their right to choose how their content is used.

Safety

9) AI systems should be trustworthy. AI systems and models should be designed to promote trusted and reliable sources of information produced according to the same professional standards that apply to publishers and media companies. AI developers and deployers must use best efforts to ensure that AI generated content is accurate, correct and complete. Importantly, AI systems must ensure that original works are not misrepresented. This is necessary to preserve the value and integrity of original works, and to maintain public trust.

10) AI systems should be safe and address privacy risks. AI systems and models in particular should be designed to respect the privacy of users who interact with them. Collection and use of personal data in AI system design, training, and use should be lawful with full disclosure to users in an easily understandable manner. Systems should not reinforce biases or facilitate discrimination.

By Design

11) These principles should be incorporated by design into all AI systems, including general purpose AI systems, foundation models, and GAI systems. They should be significant elements of the design, and not considered as an afterthought or a minor concern to be addressed when convenient or when a third party brings a claim.

Sustainable Development

12) The multi-disciplinary nature of AI systems ideally positions them to address areas of global concern. AI systems bear the promise to benefit all humans, including future generations, but only to the extent they are aligned to human values and operate in accordance with global laws. Long-term funding and other incentives for suppliers of high-quality input data can help to align systems with societal aims and extract the most important, up-to-date, and actionable knowledge.

Endorsing Organizations*

(Click image to expand)

*Additional organizations to endorse the Principles following publication include: AMI – Asociación de Medios de Información (Spanish News Media Association); APImprensa, the Portuguese Press Editors and Publishers Association; Association of Online Publishers (AOP) (UK); ARI, Asociación de Revistas (Spanish Magazine Media Association); TU – Swedish Media Publishers Association

Full list of organizations signing onto the Global AI Principles:

  • AMI – Colombian News Media Association
  • AMI – Asociación de Medios de Información (Spanish News Media Association)
  • APImprensa, the Portuguese Press Editors and Publishers Association
  • Asociación de Entidades Periodísticas Argentinas (Adepa)
  • Association of Learned & Professional Society Publishers
  • Association of Online Publishers (AOP) (UK)
  • Associação Nacional de Jornais (Brazilian Newspaper Association) (ANJ)
  • Czech Publishers’ Association
  • Danish Media Association
  • Digital Content Next
  • European Magazine Media Association
  • European Newspaper Publishers’ Association
  • European Publishers Council
  • FIPP
  • Grupo de Diarios América
  • Inter American Press Association
  • Korean Association of Newspapers
  • Magyar Lapkiadók Egyesülete (Hungarian Publishers’ Association)
  • NDP Nieuwsmedia
  • News/Media Alliance
  • News Media Association
  • News Media Canada
  • News Media Europe
  • News Media Finland
  • News Publishers’ Association
  • Nihon Shinbun Kyokai (The Japan Newspaper Publishers & Editors Association)
  • Professional Publishers Association
  • ARI, Asociación de Revistas (Spanish Magazine Media Association)
  • STM
  • TU – Swedish Media Publishers Association
  • World Association of News Publishers (WAN-IFRA)

Related resources:

Joint G7 letter on development of global AI principles (News/Media Alliance, European Publishers Council, and Digital Content Next)

Back to top

The post Global Principles on Artificial Intelligence (AI) appeared first on News/Media Alliance.

]]>
https://www.newsmediaalliance.org/global-principles-on-artificial-intelligence-ai/feed/ 0
News/Media Alliance AI Principles https://www.newsmediaalliance.org/ai-principles/ https://www.newsmediaalliance.org/ai-principles/#respond Thu, 20 Apr 2023 14:36:34 +0000 https://www.newsmediaalliance.org/?p=13607 This document highlights the overarching principles that must guide the development and use of GAI systems as well as the policies and regulations governing them.

The post News/Media Alliance AI Principles appeared first on News/Media Alliance.

]]>

Credit: metamorworks / iStock/Getty Images Plus via Getty Images

Download as a PDF

The News/Media Alliance (NMA) represents the most trusted publishers in print and digital media based in the United States, from small, local outlets to national and international publications read around the world. Every day, these publishers invest in producing high-quality creative content that is engaging, informative, trustworthy, accurate and reliable. In doing so, they not only make significant economic contributions, but they also play a crucial role in educating, upskilling and informing our communities, building our democracy and economy, and furthering America’s economic, security and political interests abroad.

Introduction

As generative artificial intelligence (GAI) technologies become more prevalent, our membership believes these new tools must only be developed respecting journalistic and creative content, in accordance with principles that protect publishers’ intellectual property (IP), brands, reader relationships, and investments. The unlicensed use of content created by our companies and journalists by GAI systems is an intellectual property infringement: GAI systems are using proprietary content without permission. It’s also critical to acknowledge the societal risks associated with the proliferation of mis- and dis-information through GAI, which high-quality, original content, produced by skilled humans and trusted brands, can help to combat.

GAI developers and deployers must negotiate with publishers for the right to use their content in any of the following manners:

  • Training: Including publishers’ content in datasets and using it for GAI system training and testing.
  • Surfacing: The serving of publishers’ content in response to user inputs, possibly including a cover note generated by the GAI system of what is contained in the surfaced content.
  • Synthesizing: Summaries, explanations, analyses etc. of source content in response to a query.
This document highlights the overarching principles that must guide the development and use of GAI systems as well as the policies and regulations governing them. These principles are founded on our understanding of these systems and technologies as they are currently used – and may therefore be amended as these technologies and uses develop – and apply equally to all publisher content, whether in text, image, audiovisual or any other format.

AI Principles

Intellectual Property

Developers and deployers of GAI must respect creators’ rights to their content. These rights include copyright and all other legal protections afforded to content creators and owners, as well as contractual restrictions or limitations imposed by publishers for the access and use of their content (including through their on-line terms of service). Developers and deployers of GAI systems—as well as legislators, regulators and other parties involved in drafting laws and policies regarding GAI—must maintain an unwavering respect for these rights and recognize the value of creators’ proprietary content. GAI developers and deployers should not use publisher IP without permission, and publishers should have the right to negotiate for fair compensation for use of their IP by these developers. Professional journalism is particularly valuable due to its reliability, accuracy, coherency and timeliness, enhancing GAI system outputs and improving perceptions of system quality. Absent permission and specific licenses, GAI systems are not simply using publishers’ content, they are stealing it.

Use of publishers’ IP requires explicit permission. Use of publisher content by GAI systems for training, surfacing and synthesizing is not authorized by most publishers’ terms and conditions, and authorization for search should not be construed as an authorization for uses such as training GAI systems or displaying more content than contemplated for or as used in traditional search.  GAI system developers and deployers should not be crawling, ingesting or using publishers’ proprietary content without express authorization; requiring publishers to opt out is not acceptable. Negotiating written, formal agreements is therefore necessary.  Industry standards should be developed to allow for automatic detection of permissions that distinguish among potential uses of crawled or scraped content.  These standards and usage agreements can also address other issues such as attribution, monetization, responsibility, and derivative uses.

Compensation agreements must account for harms GAI systems may cause publishers and the public. GAI system surfacing and synthesizing are providing much more proprietary content and information from the original sources than traditional search and often provide little or no attribution, and will exacerbate the growing trend toward zero-click, reducing or even eliminating value for publishers. GAI systems use publishers’ proprietary content to generate outputs that may replace their role in the consumer/information provider relationship. In addition to reducing traffic, this harms publisher brands that have taken years, decades, or even centuries to build.

Copyright laws must protect, not harm, content creators. The fair use doctrine does not justify the unauthorized use of publisher content, archives and databases for and by GAI systems.  Any previous or existing use of such content without express permission is a violation of copyright law. The Section 1201 triennial rulemaking process should not be used to allow for the bypassing of content protections for GAI development purposes. Exceptions to copyright protections for text and data mining (TDM) should be narrowly tailored to limited nonprofit and research purposes that do not damage publishers or become pathways for unauthorized uses that would otherwise require permission.  The U.S. also has made international law commitments in this area that protect its IP-based businesses across multiple sectors and these must be upheld in its approach to AI.

There is an existing market for licensing publishers’ news content. Valuing publishers’ legitimate IP interests need not impede GAI innovation because compensation frameworks (for example, licensing) already exist to permit use in return for payment. GAI innovation should not come at the expense of publishers, but rather at the expense of developers and deployers.  Publishers encourage the use of efficient ways to license through standard-setting organizations that can facilitate efficient training of GAI systems.

Transparency

GAI systems should be transparent to publishers. Publishers have a right to know who copied our content and what they are using it for. We call for strong regulations and policies imposing transparency requirements to the extent necessary for publishers to enforce their rights. Publishers have a legitimate interest in determining what content of theirs has been and is used in GAI systems. Using datasets or applications developed by non-profit, research, or educational third parties to power commercial GAI systems must be clearly disclosed and not used to evade transparency obligations or copyright liability.

GAI systems should be transparent to users. Direct relationships between users and publishers are critical for the sustainability of the news media and informational content sector. Surfaced and synthesized outputs should connect, not disintermediate, users with publishers. Members of the public should know the source of information that may affect them.  Generative outputs should include clear and prominent attributions in a way that identifies to users the original sources of the output and encourages users to easily and directly navigate to those products, as well as to let them know when content is generated by GAI. Transparency into GAI systems can also help prevent misuse and the spread of mis- and dis-information. Similarly, it enables the evaluation of GAI systems for unintended bias to avoid discriminatory outcomes.

Accountability

Deployers of GAI systems should be held accountable for system outputs. GAI systems pose risks for competition, the integrity of news and creative content, and for public trust in the journalistic and creative content. This is aggravated by the ability of AI applications to devalue publisher brands by generating content that attributes false or inaccurate information to publishers who have not published the information and who have processes in place to prevent such publication in the first place. Accordingly, deployers of GAI systems should not be shielded from liability for their outputs—to do so would be to provide deployers of GAI systems with an unfair advantage against which traditional publishers cannot compete and increase the danger to the public and institutions from the unchecked power of this technology.

Fairness

GAI systems should not create, or risk creating, unfair market or competition outcomes. Regulators should be attuned to ensuring GAI systems are designed, deployed, and used in a way that is compliant with competition laws and principles. Developers and deployers should also use their best efforts to ensure that GAI models are not used for anti-competitive purposes. The use of publisher content for GAI purposes without express permission from content owners by firms that have market power in online content distribution should be considered evidence of a violation of competition laws.  Regulators should be vigilant for other anti-competitive uses of GAI systems.

Safety

GAI systems should be safe and avoid privacy risks. GAI systems, including GAI models, should be designed to respect the privacy of users who interact with them. Early indications are that GAI tools will exacerbate trends towards digital platforms collecting large volumes of user data. The collection and use of personal data in GAI system design, training and use should be minimal and should be disclosed to users in an easily understandable manner so that users can make informed judgments about how their data is used in exchange for the GAI service. Users should be informed about, and should have the right to prevent, the use of their interactions with GAI systems for the purposes of training or collection of personal data.  Systems should also be designed in a way that means paywalled and otherwise protected content cannot be exposed (including but not limited to, for example, by membership inference methods).

Design

All of the principles discussed above should be incorporated in the very design of GAI systems, as significant elements of the design, and not considered as an afterthought or a minor concern to be addressed when convenient or when a third party brings a claim.

Back to top

 

The post News/Media Alliance AI Principles appeared first on News/Media Alliance.

]]>
https://www.newsmediaalliance.org/ai-principles/feed/ 0