Promise Classification With Dual LLMs

Jan 17, 2026 by Editorial Team 38 views

Promise Classification with Dual LLM Consensus: A Deep Dive

Hey guys, let's dive into a cool project! We're building a system to classify chat messages as promises using two powerful Large Language Models (LLMs). This is super interesting because it helps us understand how people make commitments in online conversations, like in group chats during experiments. We'll be using OpenAI and Gemini models to get this done, aiming for a super high level of agreement between them. Sounds fun, right?

Understanding the Core Goal: Classifying Promises

So, what's the main idea behind this project? We want to build a system that can accurately identify promises made in chat messages. Think about it like this: in a group chat, someone might say, "Let's all contribute 25 coins." That's a clear promise, right? Or, someone replies with, "Okay" in response to a proposal - that can be seen as a commitment too. Our system needs to be smart enough to recognize these kinds of phrases and classify them as promises. On the other hand, a message like, "What do you think?" isn't a promise; it's just asking for an opinion. This system is designed to distinguish between these two scenarios.

To make things even more interesting, our system will consider the context of the conversation. This means it won't just look at a single message in isolation. It'll also take into account the previous messages in the chat. This is crucial because the meaning of a message can change depending on what was said before. For example, a simple "Yes" might be a promise if it's responding to a specific request, but it might not be a promise if it's just agreeing with a general statement. The goal is to accurately classify these messages as promises, using the conversation's context. This will involve the use of two LLMs, OpenAI and Gemini. We are expecting a high level of agreement (95%+) between the LLMs to ensure accuracy. The output will be a detailed dataset with promise classifications and relevant information like the round in which the chat took place. This is a very interesting project, let's continue with it!

The Dual LLM Approach: OpenAI and Gemini

Why are we using two LLMs? Great question! We're using a dual LLM approach with OpenAI and Gemini for a couple of key reasons. Firstly, it boosts the accuracy of our classifications. By having two different models analyze the same messages, we can compare their results and identify any disagreements. When both models agree that a message is a promise, we can be more confident in our classification. Secondly, this dual approach helps to reduce the risk of any single model's biases influencing our results. Each LLM has its own strengths and weaknesses. By combining them, we can get a more balanced and reliable analysis.

We'll be using the APIs of both OpenAI and Gemini for this task. We'll need to create prompts for each model that will guide them in classifying the chat messages. These prompts will need to be carefully designed to provide the models with the necessary context and instructions. We'll need to process all chat messages, ensuring that we consider the entire conversation to understand the meaning of each message. This will allow the models to make informed decisions about whether a message contains a promise or not. One of the main goals is to achieve a 95% or higher agreement rate between the two LLMs. This high level of agreement will give us the confidence that our classifications are accurate and reliable. The implementation will include handling edge cases such as "okay", "yes", "sounds good" and their context. Now, let's explore the implementation details!

Implementation Details: From Prompts to Output

Alright, let's get into the nitty-gritty of how this project will be implemented. First off, we need to implement the OpenAI API classification. This involves creating context-aware prompts. We'll craft these prompts so that they take into account the full conversation history. This context will give the LLM more information to make an informed decision about classifying messages. For example, if someone previously proposed something, and the next message is, "Sounds good", then it's very likely to be a promise. Next, we will implement the Gemini API classification, ensuring we use a prompt structure that mirrors the OpenAI prompts. This will allow us to compare the outputs of both models. For both, we need to process all chat messages, making sure that we've included the full context of the conversation. Then we calculate the inter-LLM agreement rate. The goal is to get 95% agreement or higher. This high agreement rate is a sign that our classifications are robust.

Finally, we will generate a CSV file in the datastore/derived/ directory containing all the data. This file will include promise classifications for each message at the individual-round level. The dataset will include details on the session, supergame, round, player, group, and contribution data, to give us a complete picture of the chat messages. We have to consider and handle edge cases as well. For example, a simple, "Okay" can be a promise based on the context. If it's a direct response to a proposal, then it's a promise. We have to make sure that these edge cases are considered when classifying these messages. This process should also be adaptable to the context of the conversations.

Handling Edge Cases and Contextual Understanding

Dealing with edge cases is key to building a robust promise classification system. Edge cases are situations where the meaning of a message might not be immediately obvious. For example, a simple word like "Okay", "Yes", or "Sounds good" can be a promise, but it really depends on the context of the conversation. If someone has just made a proposal, and the response is "Sounds good", it's a pretty strong indication that the person is agreeing to the proposal. However, if the same phrase is used in a different context, it might just be a general agreement and not a commitment. That's why considering the context is so important.

To handle these edge cases effectively, we'll need to develop sophisticated prompts for both OpenAI and Gemini. These prompts should provide the LLMs with instructions on how to interpret ambiguous phrases based on the preceding messages. For example, the prompt could say something like, "If the previous message was a proposal, and the current message is "Okay", classify it as a promise." This detailed approach will help the LLMs to make accurate classifications, even when they encounter unusual or ambiguous language. It's really the combination of careful prompting and contextual awareness that makes this project so interesting and useful! Remember, the accuracy of our system relies on how well it can understand the meaning of each message within the context of the entire conversation. We'll be testing and refining our prompts and edge-case handling throughout the development process to make sure our system is as accurate as possible.

Dataset Output and Relevant Round Info

Once the promise classification system is up and running, we'll need to generate a structured dataset with all the results. We will create a CSV file in the datastore/derived/ directory containing detailed information about each chat message, its classification as a promise or not, and various other relevant details. At the individual-round level, the output will include several important pieces of information. This is to ensure that the dataset is useful for analysis and understanding the context of each promise. We need to include data, such as: the session identifier, the supergame details, the specific round in which the message was sent, the player who wrote the message, the group the player was in, and any contribution data related to the round. This way, we will have a comprehensive view of each interaction.

This kind of detailed data is incredibly useful for several reasons. It allows us to analyze the types of promises that are made in different situations. For example, are there more promises in certain rounds of the game? Does the group size influence the likelihood of a promise? Having this information also helps us to track patterns and trends in how participants communicate and commit to actions. We can then use this data to investigate different research questions. The goal is to make a really well-organized dataset that can be used for more complex analysis later on. This will give us a deeper understanding of how promises work in the experiment. The format of the dataset is important to allow for easy analysis and integration with other datasets.

Expected Outcomes and Benefits

So, what do we expect to gain from all this? The main outcome is a highly accurate system that can identify promises in chat messages with a high degree of confidence. We're aiming for 95%+ agreement between OpenAI and Gemini, which tells us that the classifications are reliable. This system will give us a rich dataset with the classifications, and information from each round. This dataset can then be used for lots of different kinds of analysis. For example, we could explore the relationship between the number of promises made in a round and the players' behavior in the game. We can also compare how people make promises in different types of situations and experiment designs. This is awesome because it can help researchers to understand the dynamics of cooperation and communication in online games and other collaborative settings.

This project will provide valuable insights into how people make commitments in online communication. By understanding how promises are made, we can better design online environments to encourage cooperation and trust. It could also have applications in areas like understanding negotiation and agreement in online marketplaces or even in customer service interactions. Ultimately, this work is designed to improve the way we understand and support online interactions, helping us to design more effective and user-friendly communication tools.

Conclusion: Looking Ahead

Alright, guys, we've covered the basics of our promise classification system. We're on the right track to create a system that can accurately identify promises in chat messages, using the combined power of OpenAI and Gemini. We're going to put a lot of effort into making sure our system can handle the nuances of human language. We are working to make it able to understand the context of each message. This is all to make sure our results are accurate and reliable.

This project is really interesting, and we're excited to see the results. Keep an eye out for updates as we progress through implementation, testing, and dataset generation. We are going to continue improving our understanding of online communication. We believe that this system will provide valuable insights into how people make commitments online. Let's make it happen! Thanks for sticking around, and we'll keep you posted on our progress!