Meeting Summary for CS 486-686 Lecture/Lab – Spring 2025

Date: March 20, 2025
Time: 08:11 AM Pacific Time (US and Canada)
Meeting ID: 893 0161 6954

Quick Recap

Greg led discussions on enhancing the embedding model for code analysis and retrieval. The conversation explored various techniques to improve chunking, indexing, and querying processes. The meeting covered:

Different language models and embedding approaches
Methods to optimize performance and relevance in code summarization and question-answering tasks
Encouragement for experimentation, creativity, and collaboration within the team

Next Steps

Data Commitment: Greg will commit and push the remaining ground truth data for student use.
System Implementation: Students will implement and test enhanced RAG systems using the provided framework and ground truth data.
Embedding Experimentation: Students will experiment with local versus cloud-based embeddings.
Model Exploration: Investigate various embedding models (e.g., Voyage, OpenAI).
Metadata Enhancement: Consider incorporating additional meta-information into embeddings to improve retrieval performance.
Pre-embedding Summarization: Experiment with LLM summarization of code blocks before embedding.
Query Development: Develop and test queries targeting specific code patterns or structures.
Performance Evaluation: Utilize the provided retrieval performance script to generate metrics and compare enhancements against baseline implementations.

Detailed Discussion Topics

Enhancing Embedding Model and Hybrid Experiment

Greg presented recent work on enhancing the embedding model while emphasizing the need for higher confidence in the ground truth data. Key discussion points included:

Creating a new prompt that is fed into Claude 3.7 to identify necessary code chunks for answering questions.
Evaluating a hybrid experiment using local solutions for XP 6, which is smaller and more manageable on available hardware.
Soliciting feedback and suggestions for further improvements.

Code Chunk Reduction Discussion

The team examined efforts to reduce the number of code chunks with mixed results:

Inconsistent outcomes when reducing chunks were noted.
There was discussion about the relevance of individual functions within the code and their detectability during retrieval.
It was proposed that a more restricted number of chunks might actually benefit retrieval performance.
The matter was set aside for further discussion.

Improving Code Retrieval and Summarization

Several strategies for enhancing retrieval and summarization were reviewed:

Modifying the ground truth prompt to provide more contextual information about the chunking process.
Comparing performance between pruned and original sets of code chunks.
Planning to evaluate enhancements by comparing ground truth data with baseline setups, utilizing a selected embedding model and Chroma as the vector database.
Employing a retrieval performance script to generate comparative metrics.

Improving System Performance and Chunk Sizes

Efforts to boost system performance were discussed through:

Adjusting chunk sizes to achieve higher precision.
Introducing a tool for more targeted retrieval.
Developing a new subcommand, “judge,” to better evaluate answer quality.
Early trials indicated that pre-processing interventions to reduce chunk sizes were promising.

Code Summarization System Improvements

Improvements to the code summarization system were highlighted:

The system now removes irrelevant information, producing concise code chunks (2–3 lines), which increases the KE value without manual intervention.
It can identify and discard less relevant chunks during intervention.
Upgrades have been implemented using GPT-3.7 and above (native utility issues arise with versions below 3.0).
The process is divided into separate steps—chunking, indexing, and querying—allowing modifications of intermediate files without needing to start over.

Simplifying JSON Files for Retrieval

Discussions also focused on:

Utilizing different language models (e.g., GPT-3.5 Turbo, GPT-4 Mini) to answer questions.
Recognizing the challenges of using large language models as the ultimate judge in retrieval-augmented generation (RAG).
Streamlining and organizing JSON files by removing unnecessary elements.

Code Analysis With RAG Systems

The potential of RAG systems for code analysis was a key topic:

Emphasized the importance of analyzing code structure rather than relying solely on comments or names.
Proposed incorporating additional meta-information into embeddings.
Suggested developing new prompts for more insightful retrievals.
Experimented with queries targeting specific patterns, such as loops, and encouraged further suggestions from the team.

Exploring AI Models and Embedding Techniques

Finally, the discussion featured an exploration of various embedding strategies and models:

Weighed local versus cloud-based embeddings.
Considered different AI models and techniques, such as those offered by Voyage and OpenAI.
Stressed the role of semantic similarity in query formulation.
Encouraged creative approaches and individual input for enhancing system performance.

Visual Representation: Code Summarization Workflow

The following diagram illustrates the overall workflow for code summarization and retrieval improvements:

flowchart TD
    A[Raw Code Input]
    B[Chunking Process]
    C[Indexing with Enhanced Embedding Model]
    D[Querying & Retrieval]
    E[Code Summarization & QA]
    A --> B --> C --> D --> E

Visual Representation: Project Enhancement Components

This diagram summarizes the key components and experimental approaches discussed during the meeting:

flowchart LR
    A[Enhancing Embedding Models] --> B[Improved Code Retrieval]
    A --> C[Enhanced Summarization]
    A --> D[Dynamic Query Generation]
    B --> E[Performance Evaluation]
    C --> E
    D --> E

This summary provides an organized overview of the meeting discussions and clearly outlines the next steps and experimental approaches aimed at enhancing the code analysis and retrieval system.