Where the LLM Rubber Meets the Road

Practical Questions to Ask Before Starting a Generative AI Project

It’s been over a year and half since OpenAI made their LLM (Large Language Model) generally available to the public, introducing the world to an exciting new technology that was leaps and bounds above anything anyone had seen in the space so far.

A tidal wave of new ideas, platforms, and businesses quickly surfaced as businesses raced to understand what this new technology was capable of and what types of problems it could solve. As we partner with our clients integrating this new AI technology into their business spaces, a list of common questions has materialized that helps determine if this technology is a good fit for the problem at hand.

Below are a few key items from that list:

What are the volume and length of questions you will be asking/receiving?

One of the first questions to be answered when integrating an LLM is cost, and the easiest way to understand cost is to estimate:

How many questions will I need to ask the model?
How many characters (text length) are the question(s) and answers likely to be?

A lot of great LLM ideas are infeasible because the answer to either of the above was “too many”. Because these models require high computing power as compared to traditional workloads, every request to the model carries a financial impact, some much larger than others, which is then amplified by total volume. You can approximately think of cost with the equation:

Total Requests x (Question Character Count + Answer Character Count) = Total Cost*

Axian has found in some instances, applying a LLM to a given process doesn’t make sense because the cost far exceeds the benefits it provides. That said, cost also provides a useful design constraint:

Can you design your interactions such that a short response will suffice (example: have the model answer with a “yes” or “no”)?
Can you ask a question in a more concise way?
Can you find a way to leverage LLMs for only a subset of total volume?

Getting creative with the way you ask questions and shape LLM responses can turn an integration from cost prohibitive into financially feasible.

Do the model’s answers need to be “factually accurate” or “specifically correct”?

Because of the internal workings of a GPT/LLM, there is an inherent problem called “hallucinations” where the model will occasionally provide a response that is completely incorrect. This behavior is unpredictable and usually uncommon but planning for hallucinations has a direct impact on possible use cases:

What would the impact be if the model returned something completely wrong 1% of the time?
Can your use case/process handle model responses that are entirely unexpected in content or format?
Does it impact your business at large if a portion of the model’s responses are not 100% accurate?
Are the model’s responses going to be returned directly to end-users without review and could you be held legally liable for them?

Accuracy and correctness cannot be guaranteed 100% of the time, and so the use of this technology should be carefully evaluated for negative outcomes.

Consider (for example), if a model hallucination could cause:

A significant legal obligation
Influence over a financial transaction
Repeated poor customer support experiences
Improper access to restricted information

A successful use of this technology will be intentionally designed to prevent exposure from these types of issues.

How much does speed matter?

The process a Large Language Model uses to determine the proper response to a query is very processing heavy and generally takes a long time when compared to what we now expect from our software. Depending on the size of your input, the complexity of the problem, and the length of response the model needs to output to meet your criteria, a query to an LLM can take anywhere from a few hundred milliseconds to five or ten seconds, or even more. Additionally, it is difficult to guarantee a consistent response time.

Due to the unpredictable nature of processing time, and knowing that at best it will still be slow compared to more standard application processes, it is good to consider how this might affect your planned implementation by asking questions like:

Does the proposed solution depend on retaining a customer’s attention and if so, how long will the average customer wait?
If processing in batch and given an extreme processing time of 10 seconds or more per query, will the batch finish in time to be useful?
If the process takes longer to complete than expected due to a model being slow, will that cause a bottleneck that affects downstream systems or processes?
Does your use case require mostly single-threaded processing and therefore compound unexpected increases in processing time?
Do you need to iterate N times on a query in order to achieve desired results?

For some solutions, it is possible that generative AI is a cost-efficient solution, but due to tricky timing restraints it is actually not a great fit. In these scenarios there might be creative solutions to still utilize this technology without negatively impacting time-sensitive business, but sometimes it simply means we need to find another solution.

Summary

Generative AI is proving to be incredibly exciting and is evolving almost daily, availing new avenues to create and drive business value. Knowing how to qualify distractions that aren’t yet a fit for this technology helps plan and design solutions to ensure success and build value sooner.

Please reach out if you would like to discuss further, want more detailed examples, or are interested in hearing how Axian can help you!