Coding with AI in an Open-Source World

Understanding Generative AI

Stay ahead of this new and developing technology with more insights.

The following is an edited version of the content presented in the video.

Many generative AI systems are very good at writing code, and developers increasingly use them to develop high-quality software faster and more efficiently than they could without these systems. But companies must take precautions when using AI to write code to ensure that they are not violating copyright protections. We addressed general concerns related to IP risk and AI coding in a previous article. Here we focus on risks related to use of open-source software (OSS).

The OSS Challenge

Generative AI systems are trained on massive troves of data, including troves of code. Some of the largest available sources for code are OSS repositories, such as the thousands of publicly available OSS projects on GitHub. In response to prompts from users, AI systems may generate new code based on code from these repositories, sometimes without changing it much or at all. Moreover, it is difficult for users to know when open-source code has been incorporated or where the code was sourced from in the first place.

Nearly all popular OSS licenses contain conditions. “Permissive licenses” often require users to provide notice or attribution when they distribute the OSS code. “Copyleft licenses” (e.g., the “GNU General Public License” family) can include more onerous obligations, including the requirement to make works derived from the OSS available under an open-source license.

Companies that use AI to generate code must take care to ensure they do not unknowingly incorporate snippets of third-party code made available under an OSS license. If they do, they could be liable for copyright infringement. This is particularly important for companies that develop code for use in proprietary products.

Addressing Risks Before, During, and After Coding

Companies can take three main steps to reduce OSS-related risks when using AI to generate code.

Before coding, set some boundaries. When possible, set configuration options on the AI system to reduce the likelihood that it will generate problematic code. GitHub Copilot, for example, allows users to enable a filter that prevents the system from suggesting code that is a close or exact match to public code that is available on GitHub. Companies should also consider putting limits on the length of AI-generated code snippets that developers are allowed to use. Smaller segments may be less likely to include third-party code that is entitled to copyright protection.
While coding, tag code that was generated by AI. Companies should consider requiring developers to include tags in source code files to identify AI-generated code wherever it is included. This simple intervention can facilitate additional scrutiny in code reviews.
Prior to production, use scanning tools to vet code generated by AI. Companies can deploy scanning tools to detect open-source and other third-party code that may be included in a code base. Many software composition analysis (SCA) tools can identify third-party files using hashes and open-source licenses by matching keywords, but the options for identifying unlabeled snippets of third-party code are limited. However, SCA tools that do perform snippet matching, such as Black Duck SCA and Revenera Code Insight, can accommodate generative AI use cases.

* * *

AI is making it easier for companies to generate code and develop software, creating new opportunities for business innovation but also introducing risks that can have significant implications.

This informational piece, which may be considered advertising under the ethical rules of certain jurisdictions, is provided on the understanding that it does not constitute the rendering of legal advice or other professional advice by Goodwin or its lawyers. Prior results do not guarantee a similar outcome.

Contacts

Steven R. Argentieri
Partner
sargentieri@goodwinlaw.com
Boston+1 617 570 1063
Silicon Valley
/en/people/a/argentieri-steven

Coding with AI in an Open-Source World

Understanding Generative AI

The OSS Challenge

Addressing Risks Before, During, and After Coding

Contacts

Steven R. Argentieri

Related Content

EU AI Act Implementation Timeline

Double Clicking on Innovation in Consumer Finance: Responsible Use of AI

How States Are Stepping in to Regulate AI

Are AI Inventions Harder to Patent?

USPTO Issues Further Guidance on AI-Related Patent Eligibility

The Rapid Rise of AI, in Five Charts

Opening the Black Box of Generative AI: Explainability in Bankruptcy Cases

AI Act Published — What’s Next?

Examining Patent Subject Matter Eligibility of AI Inventions (Law360)

Goodwin Accelerates Activity for Artificial Intelligence and Machine Learning Transactions in 2024

Technology Partner Andrew Harper Joins Goodwin in Silicon Valley

Goodwin Welcomes Private Investment Funds Partner Alicja Biskupska-Haas in New York

Forecasting the DOJ’s AI enforcement priorities (Compliance Week)

Premier Technology and Life Sciences Team Joins Goodwin in Boston

“AI Ventures: Navigating Legal Frontiers” with Marty Gomez (The Harvard Law Entrepreneurship Project Podcast)

Decoding DOJ's New 'Justice AI' Initiative (Law.com)

Open Source Software

Technology

Artificial Intelligence & Machine Learning

Copyrights

Intellectual Property