Opening up Generative AI

(Image credit: Shutterstock / Ryzhi)

Generative AI has huge potential to revolutionize business, create new opportunities and make employees more efficient in how they work. According to McKinsey, more than a quarter of company leaders say that generative AI is a board level agenda item while 79 percent of those surveyed have already used generative AI.

These technologies are already affecting the software industry - IDC found that 40 percent of IT executives think generative AI “will allow us to create much more innovative software”, while GBK Collective estimates that 78 percent of companies expect to use AI for software development within the next three to five years. Around half of video game companies already use generative AI in their working processes, according to research by the Game Developer Conference.

All these signals show that generative AI is growing in use. However, the number of developers with the right skills to work on putting together generative AI-powered applications themselves is limited. For enterprises that want to build and operate their own generative AI-powered services, rather than consuming a service from a provider, integration will be essential in order to make use of company data more effectively.

Carter Rabasa

Head of Developer Relations at DataStax.

Where are the gaps?

So what are the challenges that exist around generative AI? The first of these is around how to get data ready for generative AI systems. The second is how to integrate these systems together and how to develop software around generative AI capabilities.

For many companies, generative AI is inextricably linked to large language models (LLMs) and services like ChatGPT. These tools take text input, translate it into a semantic query that the service can understand, and then provide responses based on their training data. For simple queries, a ChatGPT response can be adequate. But for businesses, this level of general knowledge is not enough.

To solve this problem, techniques like Retrieval Augmented Generation are needed (RAG). RAG covers how companies can take their data, make it available for querying, and then deliver that information to the LLM for inclusion. This data can exist in multiple formats, from company knowledge bases or product catalogues through to text in PDFs or other documents. The data has to be gathered and turned into vectors, which codify data into numeric values that retain semantic information and relationships.

This process involves a process called chunking - splitting up your text into discrete units that can then be represented by vectors. There are several approaches possible here, from looking at individual words through to sentences or paragraphs. The smaller the chunk of data that you use, the more capacity and cost it will take; conversely, the bigger each chunk is, you will end up with less accurate data. Chunking data is still a very new area and best practices are still being developed here, so you may need to experiment with your approach in order to get the best results.

Once your data is chunked and converted into vectors, you then have to make it available as part of your generative AI system. When a user request comes in, it is converted into a vector which can then be used to conduct a search across your data. In comparing your user’s search request against your company vector data, you can find the best semantic matches. These matches can then be shared to your LLM, and used to provide context when the LLM creates the response to the user.

RAG data has two main benefits - firstly, it allows you to provide information to your LLM service for processing, but without adding that data into the LLM so it can be used in any other response. This means that you can use generative AI with sensitive data, as RAG allows you to stay in control of how that data is used. Secondly, you can provide more time sensitive data in your responses too - you can keep updating the data in your vector database so it is as up to date as possible, then share this to customers when the right request comes in.

Implementing RAG is a potential challenge, as it relies on multiple systems that are currently very new and developing quickly. The number of developers that are familiar with all the technology involved - data chunking, vector embeddings, LLMs and the like - is still relatively small, and there is a lot of demand for those skills. So, making it easier for more developers to get working with RAG and with generative AI will help everyone.

This is where there can be challenges for developers. Generative AI is most associated with Python, the software language used by data scientists when building data pipelines. However, Python is only third on the list of most popular languages according to Stack Overflow’s research for 2023. Extending support for other languages like JavaScript (the most popular programming language) will allow more developers to get involved in building generative AI applications or integrating them with other systems.

Abstracting AI with APIs

One approach that can make this process easier is to support APIs that developers want to work with. By looking at the most common languages and providing APIs for them, developers can get to grips with generative AI faster and more efficiently.

This also helps to solve another of the bigger problems for developers around generative AI - how to get all the constituent parts working together effectively. Generative AI applications will cover a wide range of use cases, from extending today’s customer service bots or search functions into more autonomous agents that can take on full work processes or customer requests. Each of these steps will involve multiple components working together to fulfil a request.

This integration work will be a significant overhead if we cannot abstract this away using APIs. Each connection between system components would need to be managed, updated and altered as more functionality is requested or more new elements are added to the AI application. By using standardized APIs instead, the job will be easier for developers to manage over time. This will also open up generative AI to more developers as they can work with components through APIs as services, rather than having to create and run their own instances for vector data, data integration or chunking. Developers can also choose the LLM that they want to work with and switch if they find a better alternative, rather than being tied to a specific LLM.

This also makes it easier to integrate generative AI systems into front-end developer frameworks like React and Vercel. Empowering developers to implement generative AI in their applications and websites combines front-end design and delivery with back-end infrastructure, so simplifying the stack will be essential for more developers to get involved. Making it easier to work with the full Retrieval Augmented Generation stack of technologies - or RAGStack - will be needed if companies are going to use generative AI in their businesses.

We've featured the best AI writer.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

TOPICS

Carter Rabasa is is Head of Developer Relations at DataStax, where he leads the team equipping developers to build real-time generative AI applications. Carter also leads the CascadiaJS developer conference and is an advisor for Heavybit.