The AI paradox: can AI and open source development co-exist?

Image Credit: Pixabay (Image credit: Geralt / Pixabay)

Open source software thrives on transparency and collaboration, while today’s most advanced AI coding assistants are often built as closed, proprietary systems.

As generative AI becomes more widespread, developers and organizations are asking whether these two worlds can truly co-exist.

Opposing philosophies: Open vs. closed development

On the surface, the philosophies of open source development and current AI development appear completely opposed. Open source projects are transparent – anyone can inspect the code, reuse it under defined licenses, and contribute improvements.

In open source, attribution and licensing are core; developers choose licenses that specify how their code can be used and that often require preserving credits.

AI coding assistants, however, operate as opaque learned models. They ingest vast amounts of code (much of it open source) and produce suggestions without revealing the original sources.

The AI’s knowledge is a statistical amalgamation, often lacking clear provenance for the code it generates. Snyk’s researchers warn that black-box AI tools may blend code from multiple sources, risking inadvertent violations.

Danny Allan

CTO, Snyk.

While open source is built on shared ownership, most AI tools are driven by corporate interests and remain closed. Once AI-generated code is written, there is generally no clear mechanism in place to track, update, or secure code if it turns out to be flawed.

In contrast, open source projects typically release regular updates and security patches, helping to safeguard code where projects remain actively maintained.

Opening up models and data

Companies are often reluctant to open up their models or training data, citing competitive advantage and security. This lack of transparency can clash with open source values. In fact, parts of the free/open-source (FOSS) community have reacted strongly against the incursion of black-box AI code into their domain.

There’s a very real fear that AI tools could siphon off open source code without proper credit or compliance, undermining the very premise of open collaboration.

Yet, despite these differences, AI and open source are deeply interconnected. Modern AI code assistants owe much of their prowess to open source code – in fact they are typically trained on millions of public GitHub repositories and other open code archives. One study found that an average application is about 70% composed of open source components.

This in itself can create vulnerabilities. In Snyk’s 2023 AI Code Security Report, over half of developers said they frequently encounter security issues in AI-generated code – because the AI was trained on open source code that contained known bugs or vulnerabilities.

In other words, AI assistants are standing on the shoulders of open-source giants, but they also inherit open source’s ‘warts’ and licensing obligations. What is needed are strategies that marry the speed and power of AI with the transparency and legal clarity of open source.

Where the two solutions intersect

There are some natural alignments between AI-driven development and open source. Both aim to democratize software creation – open source by sharing code, and AI assistants by enabling coding through natural language.

Both can accelerate innovation and productivity. And importantly, both rely on a healthy developer community. AI tools don’t spontaneously generate quality code – they learn from code written by human developers, and they improve through feedback loops with users.

Developers aren’t about to give up helpful AI assistants – nor should they have to, given the benefits – but they need to remain wary of the risks.

Achieving harmony between AI tools and open source development will require effort on both sides: AI providers must build in safeguards and transparency, and developers and communities must adapt their policies and workflows.

Best practices for a peaceful co-existence

Emerging tools can compare AI-generated code with public repositories to show licensing info. This helps developers assess reuse risks and avoid violations – especially if AI assistants can cite sources in a manner similar to academic references.

The easiest way to avoid license violations is to prevent them at the root. If an AI model is only trained on code that is permissively licensed or in the public domain, then the risk of it regurgitating proprietary code without permission drops dramatically.

Snyk’s own AI-based security engine, for instance, continuously learns from open source repositories with very specific licenses that allow commercial use. In future, training on permissible data will become a baseline expectation.

AI tools must become safety-conscious citizens of the developer ecosystem. This means building in checks for security vulnerabilities and license compliance as code is being generated. Developers using AI assistants should treat AI outputs with the same diligence as they would third-party code from an unknown source.

Open source communities and enterprise teams alike should develop clear policies on the use of AI-generated code. Blanket bans are one approach, but many projects may opt for a middle ground: allowing AI-assisted contributions with proper oversight, requiring prior approval for any AI-derived code, for example. Regular training and awareness are key so developers understand both the benefits and risks of generative AI in coding.

It’s also important to consider exactly what’s being fed into AI. Organizations that use open source tooling alongside AI must be wary of data privacy. Sharing that code with an AI assistant could inadvertently make it part of the model’s knowledge. Only share what you’re willing to share and keep truly private code away from third-party AI systems.

A path forward: Bridging open source and AI

The simplest way to overcome the paradox can be resolved by adopting the best of both worlds. We are already seeing movement toward that middle ground, with open source principles influencing AI and vice versa.

For AI coding assistants to truly flourish in the long term, trust is the key. Developers need to trust that tools will help, not harm, their codebases, which means no hidden security holes and no hidden legal strings attached.

By insisting on openness in our AI and responsibility in our use of open source, we can resolve the paradox, accelerating innovation while upholding the values that made open source a success in the first place.

We list the best sites for hiring developers.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

TOPICS

CTO, Snyk.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.