AI Experiments v1.1: Weekly Progress and Learning

October 15th, 2024

Objectives

We’re expanding our data analysis capabilities by integrating more Helm data around Harvest time tracking entries [combined with our project management data from last week], allowing for more comprehensive insights into our team's work patterns and project allocations.
‍
Simultaneously, we created a sophisticated chat onboarding through the implementation of an AI + human guided system, designed to provide a more intimate experience for new users. This onboards people and builds a unique profile for each person.

These objectives aimed to improve our operational efficiency at Helm, and explore new ways to leverage artificial intelligence in how it onboards users into an application.

Learning 1: Balancing Data and Context

Chris tackled the challenge of incorporating Harvest time entries and our Asana project management data into our AI system. He structured the data as JSON, then vectorizing the data and storing it in a Postgres database, creating a searchable index of our work data.
‍
The application uses a chat interface where users can ask questions about time allocation and project status. When a query is received, the system performs a search on the vectorized data, retrieving up to 100 relevant documents (data entries) that best match the query.
‍
The system can now answer questions about team members' hours and current projects, how busy they are, when they can start a project, and what’s a lingering challenge. Pulling up to 100 relevant documents for each query.
‍
This approach works well for recent, limited-scope queries. However, we quickly discovered limitations when dealing with broader time ranges or more complex questions. The system struggles with queries like "What was our team's total billable hours last month?" because the 100-document limit often doesn't capture all relevant entries, and we quickly hit the context window limits of GPT-4.

Chris found that the AI sometimes provides inaccurate calculations due to incomplete data, highlighting the need for a more condensed data structure. To address this, we're exploring ways to consolidate our project and time entry data, potentially grouping entries by day or week into single JSON files. This would allow us to provide more comprehensive context within the token limits, enabling more accurate and insightful responses for broader queries.

Learning 2: Guided AI Interactions

Louis worked on the onboarding chatbot, revealing a clever approach. Instead of relying entirely on the model to generate questions, he's implemented a system with predefined questions. This allows us to hand hold what we need the answers to. The AI's role is then to validate user responses and guide the conversation flow.

This approach uses OpenAI's API to detect whether a user has adequately answered a question. If not, it prompts for clarification. This method gives us greater control over the onboarding process while still leveraging AI's language understanding capabilities.

The system is built to handle various scenarios, including people who provide irrelevant responses or ask their own questions mid-onboarding. In these cases, the AI can adapt and provide appropriate responses without derailing the overall process.

Learning 3: Model Selection Matters

Louis demonstrated the importance of strategic model selection in the workflow of the process. For simple answer validation tasks, he's using less expensive models like Davinci 003. This model checks if user responses adequately answer the predefined questions.

For more complex interactions or when users go off-script, Louis leverages more advanced models like GPT-4o. This tiered approach optimizes costs while maintaining the ability to handle nuanced interactions when necessary.

Additionally, Louis is using the text-embedding model to create embeddings for user responses and messages. These embeddings are then stored in a Pinecone vector database for efficient retrieval and context-building in future interactions.

Challenges

Context limitations: We're bumping up against token limits when trying to analyze longer time periods. This is particularly evident when attempting to query data spanning months and years.

Deployment hiccups: Chris encountered difficulties deploying the system to our server. The Phi data streamlit wrapper, which promises one-click deployment to Amazon, didn't work as seamlessly as expected. This highlights the often-overlooked challenges in transitioning from development to production environments.

Balancing AI freedom and structure: Finding the optimal balance between guiding the AI with predefined structures and allowing it flexibility to handle unexpected inputs is an ongoing challenge.

Next Up

Data condensation: We're going to package our data more efficiently. This should significantly increase the amount of context we can provide within token limits.

Expanding data sources: We plan to incorporate forward-looking data, such as project forecasts and upcoming scheduled milestones from Asana. This will let us make predictions and provide insights into future workloads.

Enhanced group interactions: We’re next working on improving the AI's ability to interact in group settings. This involves developing methods to efficiently query both group messages and individual user profiles. We're exploring how to structure and search across multiple namespaces in our vector database to enable these more complex interactions.

Optimizing model usage: We'll continue refining our approach to model selection, potentially incorporating even more granular choices for different tasks. This might include exploring newer models as they become available.

We're not just implementing off the shelf solutions, but actively grappling with the challenges of integrating AI into our real-world systems at Helm. We're sharing it all here with you - every obstacle, all learning, every opportunity.