A Proposal for the Joint Development of Generative AI for the Dispute Resolution Profession

From Gary Doernhoefer:

Open AI’s very public unveiling of ChatGPT has launched the debate in many fields over how artificial intelligence (AI) could be put to good use, contrasted with its potential for misuse.  There is a path for the dispute resolution profession to lead as an example of responsible development of this technology.  First though, let’s look at what this new technology is, how it was developed thus far, and some of the issues arising in its application.

ChatGPT, its successor version, GPT-4, and the comparable tools released by others like Google’s Bard, are based on large language models.  These models have been given vast amounts of examples of human communication available from the internet: books, articles, social media, video, fictional and facts, true or false, these communications have been used to “train” the system in what is known as “unsupervised learning.”

When given a prompt, the system is able to use algorithms to detect patterns from this vast data set and predict the likelihood of responses that are contextually correct.   It responds using perfect grammar, punctuation and spelling.  So, a chat prompt of “Hello. How are you?” will likely generate a reply of “I am fine. How are you?”  It doesn’t really care how you are, it simply can see in its data set what the statistically best reply to your prompt seems to be.

The system’s power comes from the breadth of its data set, giving it the apparent ability to answer questions in various fields of study, draft entire term papers, or participate in what appears to be a conversation.  These systems can write poetry, or mimic literary styles from Shakespeare to Michener.  Their capabilities are based on the consumption of data examples and the ability to discern patterns and then apply them in response to prompts.

Problems with these systems quickly became apparent. Early versions could generate responses that were profane, racist, or misogynistic.  They are not consistently factually accurate.  Responses often are generic, lacking any novel content.  Litigation has erupted over the intellectual property rights to the input data that is arguably now being used to create derivative works.  There are reports of complete fabrications in which the system assembled bits of information in completely fanciful ways dubbed “hallucinations.”  Emotion-laden prompts or philosophical questions could lead the system to respond with emotion-laden replies or answers that implied sentience.

These problems are the result of these systems simply formulating responses based on patterns from unsupervised learning data inputs, without “knowing” what they are saying in the way a human does.  Nevertheless, these large language models are a major step in computer capabilities on the scale of the invention of the internet.

The challenge then is to devise a means of harnessing and applying the potential of these tools while mitigating the problems. In a given field, this starts by (1) refining the data set with context or applications-specific supervised learning inputs, and (2) establishing “guardrails” using coding to prevent socially inappropriate replies despite a prompt that might tend to seek them.  Further, the benefits can be enhanced by carefully considering what one asks the system to do, which gives rise to a new skill called “prompt engineering.”  This requires proper regard for the ongoing role of humans “in the loop” working with the systems’ output.

The experience of CaseText, the provider of CoCounsel legal support software, is instructive.  It reportedly invested 4,000 hours using 30,000 legal questions to test multiple legal skills in a broad range of legal subject matter before launching CoCounsel with ChatGPT in March, 2023.  It now boasts that CoCounsel outperforms some humans on a state bar exam.

Before ChatGPT or other large language models could be genuinely useful to the dispute resolution profession, a similar, but likely narrower, process of fine tuning a data set should take place.  The ideal model would be for a collaboration in the dispute resolution field to create the refined data set, establish the guardrails, and set privacy parameters for the use of the data received in prompts instead of leaving that to individual companies like CaseText.

This would have the advantages of a centralized advisory board to address the concerns, such as setting privacy requirements for how the queries are received, stored and used, the concentration of expertise to curate the additional training materials, shared costs of development, and gaining the cooperation of industry authors whose materials might be included in the training data set.

Large language models improve with use and ongoing feedback. Thus, once in operation, a single field-specific data set would also improve more rapidly than dividing queries among multiple disparate models.

Imagine an AI system jointly developed by experts in dispute resolution that could interactively answer parties’ questions about the pros and cons and processes of mediation or arbitration, and party’s rights related to those processes.  All neutrals could offer access to it from their website.  And these systems can receive queries and reply in text or voice, and in multiple languages available 24×7.  This could enhance access to justice for hearing or visually impaired people and non-English speakers.

If a collaboratively developed data set, fine-tuned for dispute resolution, could be created then individuals or technology service providers could pay a fee for access and independently develop additional ways to apply it.  This would involve competition to use the best prompt engineering, ease of use and user interface design, and selection of appropriate tasks for the technology.

Here are some examples:  Some applications might serve the neutral in specific case management by generating lists of likely issues to be addressed, potential questions a neutral might ask, or potential proposals to consider that would be prompted with only the type of dispute and party position statements.

Even today, an AI assistant can access some literature on mediation and negotiation skills.  A prompt asking for tactics to break an impasse in negotiations will instantly generate a list of suggestions and citations to the literature, and this would improve with a tuned data set.  An AI assistant could generate a solid first draft of an agreement prompted by the type of dispute and a few notes of agreed terms. The system could be directed to generate terms that are informal and non-binding or prepare a draft formal legal agreement for review.  It could even offer advice on sticky ethical questions like the need for disclosures to address potential conflicts of interest in a specific situation.  One of its strengths is that it has read and can recall everything. With the right prompt, it can recall and apply what it has read in context for the user.

The new generative AI tools like ChatGPT have tremendous potential to work together with the profession’s trained practitioners.  We should avoid the trap of refusing to begin working with this technology until all objections have been resolved.  If it were not for the forcing function of the pandemic, that would likely have been the fate of the use of videoconferencing in dispute resolution.

We should embrace this new technology.  Our field should begin exploring the best ways to put it to work and contemporaneously address the concerns that arise after we have a better sense of the technology’s best uses and capabilities.

We have an opportunity to lead in dispute resolution.  Let’s work together to act upon it.

Gary Doernhoefer
Founder, ADR Notable

One thought on “A Proposal for the Joint Development of Generative AI for the Dispute Resolution Profession”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.