Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Meeting Summary: CS 486-686 Lecture/Lab, Spring 2025

Date: February 25, 2025, 08:05 AM (Pacific Time – US & Canada)
Meeting ID: 893 0161 6954


Quick Recap

  • Project Progress:
    Greg discussed the development of a baseline implementation for a RAG (Retrieval-Augmented Generation) system using the Llama Index and Chroma dB as the vector index.

  • Challenges:
    The discussion highlighted several obstacles:
    • Difficulties with using a well-known code base (e.g., Unix kernel) in the models.
    • Issues encountered with the code splitter.
    • The need to thoroughly evaluate the retrieval process.
  • Team Collaboration:
    The introduction of GitHub Classroom for organizing group projects was discussed.

  • Roadmap:
    The immediate roadmap includes working on the baseline implementation and reading an assigned paper for class discussion.

Next Steps

  • Read the Project 2 Specification: All students are to review the specification posted on the course website.
  • Review the Assigned Paper: All students should read the RAG paper to be discussed in the upcoming Tuesday class.
  • Begin the Baseline RAG Implementation: Start working on implementing the RAG system using the code splitter.
  • Team Formation: Students may form groups of up to 3 members for Project 2 as preferred.
  • Troubleshoot Code Splitter Issues: Greg will continue to address the C code parsing issues with the code splitter.
  • Explore Local Embedding Models: Students should investigate local embedding models as alternatives to cloud APIs.
  • Research Evaluation Metrics: Experiment with different metrics for assessing RAG system performance.
  • Attend Guest Lecture: All students must attend tomorrow’s session for part 2 of the guest lecture by Ke Carpathis.
  • Prepare Metrics Materials: Greg is tasked with preparing materials on RAG metrics for Thursday’s class.
  • Join GitHub Classroom: Students should join the GitHub Classroom for Project 2 and form teams if collaborating.
  • Share Progress on Campus Wire: Those who make headway with the code splitter issues are encouraged to share their findings.

Detailed Discussion Topics

Project Progress and Upcoming Tasks

  • The project specifications and the upcoming discussion paper are available on the class website.
  • The team is working on a baseline implementation using a code splitter.
  • Students are encouraged to form groups (maximum of three), keeping in mind that larger teams face higher expectations.
  • Collaboration on troubleshooting the code splitter is encouraged due to persistent parsing issues with C code.

LLM Performance and Reasoning Challenges

  • Performance Measurement:
    Addressing the challenges in constructing an LLM application, the discussion emphasized the importance of:
    • Quantifying system performance.
    • Measuring retrieval effectiveness.
    • Establishing solid ground truth metrics.
  • Reasoning Capabilities:
    The conversation also covered the nuances of reasoning in LLMs, referencing work by OpenAI and DeepSeek. Additionally, it was noted that Claude’s model claims the ability to balance fast responses with variable reasoning depth—though the accuracy of these claims remains in question.

Ader Leader Board Performance and Tools

  • Benchmark Performance:
    The Ader Leader Board was noted to perform slightly better than the previous leader (Deep Seek R.1 with Cloud 3.5 Sonnet).

  • Tool Highlights:
    • Discussion on the cost implications of running benchmarks.
    • An upgrade from version 3.5 to 3.7 was mentioned.
    • Interest in Anthropic’s new model card and API was expressed.
    • The team introduced Quad Code, a terminal-based coding assistant.
  • Knowledge Sources:
    The potential of using existing code bases as knowledge sources was discussed, with examples such as Xd 6 and the Llama Index. The role of code splitter and parser tools was underscored as crucial for broader applicability.

Developing the Baseline RAG System

  • Implementation Strategy:
    The baseline was developed leveraging the Llama Index and Chroma dB.

  • Enhancements Proposed:
    • Utilize a code splitter to build a simple RAG system.
    • Improve chunking techniques and metadata post-processing.
    • Verify the functionality of the chunker.
    • Enhance data with additional context such as file paths and code line information.
  • Embedding Options:
    Different methods for computing embeddings were discussed:
    • OpenAI’s embedding service.
    • Stevens’ free service.
    • Local embedding models.

Below is a mermaid diagram outlining the RAG system development process:

flowchart TD
    A[Start: Baseline RAG System]
    B[Integrate Llama Index & Chroma dB]
    C[Implement Code Splitter]
    D[Enhance Chunking Techniques]
    E[Add Metadata (File Paths, Code Lines)]
    F[Compute Embeddings (Multiple Options)]
    G[Evaluate Retrieval Performance]
    
    A --> B --> C --> D --> E --> F --> G

Measuring Retrieval Performance and Model Evaluation

  • Key Considerations:
    • The distinction between the system’s ability to retrieve relevant information and its effectiveness in using that information.
    • Employing human inspection alongside automated evaluations.
    • Using older models to generate synthetic data as a testing method.
  • Prompt Construction:
    An LLM can be used to create detailed prompts, thus aiding in the measurement of RAG and embedding performance.

Challenges of Advanced Open Weight Models

  • Code Base Dilemma:
    Using well-known code bases (e.g., the Unix kernel) makes it difficult to source less common ones.

  • Local Model Limitations:

    • Many companies opt for local models to avoid cloud APIs and protect intellectual property.
    • Running advanced open weight models locally demands significant hardware investment. Although quantized versions are more accessible, they do not perform as well as full models.

APIs for Chat GPT Security

  • Data Security Concerns:
    • There is ongoing concern about the security of coding and data when using external APIs.
    • Greg argued that language models should be treated like any other cloud service, given that businesses routinely trust these services with their data.
  • Execution Over Ideas:
    The discussion emphasized that execution and dedication to improvement are paramount—illustrated by a reference to the Winklevoss twins’ lawsuit against Mark Zuckerberg, suggesting that success relies more on effective execution rather than the strength of the initial idea.

Evaluation of the Retrieval Step

  • Importance of Evaluation:
    Evaluating the retrieval step is crucial for benchmarking the extraction of relevant code snippets and related information.

  • Proposed Methods:

    • Automated evaluation with possible human oversight.
    • Using a reasoning model to generate ground truth for comparisons.
    • Continued work on enhancing the baseline implementation through improved chunking and metadata inclusion.

Technical Discussions and API Usage

  • Topics Covered:
    • Code splitting techniques.
    • Deployment and usage of models from Hugging Face on CPUs.
    • Laptop memory requirements for running large models.
    • Code changes and troubleshooting for the code splitter.
  • Additional Remarks:
    Brief mentions were made about office temperature issues and the potential reactivation of a cooling system that had been non-operational for seven years.

Code Splitter Troubleshooting and GitHub Classroom

  • Troubleshooting Efforts:
    • Upgrading Tree-sitter and adjusting language-specific settings.
    • Considering the setup of a new virtual environment.
  • Team Collaboration:
    • The use of GitHub Classroom was introduced to facilitate group projects.
    • Emphasis was placed on forming teams, inviting collaborators, and choosing effective team names.

Project Progress and Future Plans

  • Current Challenges:
    • Addressing installation issues with a specific tool.
    • Considering the creation of a custom node parser for the Obama index.
    • Managing simultaneous work on a shared repository, which may introduce merge conflicts.
  • Planned Supports:
    • Utilizing Git GPT for assistance.
    • Continuing the baseline implementation and discussing metrics for RAG performance on Thursday.
  • Timeline:
    The project is due after spring break, and a demo day is scheduled to showcase progress.

Below is a mermaid flowchart representing the overall project workflow and next steps:

flowchart LR
    A[Review Project 2 Spec & RAG Paper]
    B[Begin Baseline RAG Implementation]
    C[Form Groups (Up to 3 Members)]
    D[Join GitHub Classroom]
    E[Troubleshoot Code Splitter Issues]
    F[Enhance RAG System (Chunking, Metadata)]
    G[Evaluate Retrieval Performance]
    H[Prepare for Demo Day Post Spring Break]

    A --> B
    B --> C
    C --> D
    D --> E
    E --> F
    F --> G
    G --> H

Conclusion

The meeting covered extensive topics including project progress, challenges in LLM performance and reasoning, leader board performance, technical hurdles, and security issues with APIs. The discussion underlined the importance of collaborative problem-solving, rigorous evaluation metrics, and systematic improvements to the RAG system. Clear next steps were outlined for advancing the project and preparing for upcoming class sessions and deliverables.

References