Meeting Summary: CS 486-686 Lecture/Lab – Spring 2025

Date: January 23, 2025, 08:07 AM Pacific Time (US and Canada)
ID: 893 0161 6954

Quick Recap

Greg led a comprehensive discussion on Large Language Models (LLMs), covering a range of topics including:

Tools and Frameworks: An overview of various tools for utilizing LLMs.
AI Applications and Limitations: Discussion on AI’s potential, its limitations, and multimodal capabilities.
Open Weights and AI Agents: Exploration of topics such as open weights and the concept of autonomous AI agents.
Artificial General Intelligence: Considerations for AGI and the importance of evaluation within AI systems.
Balancing AI and Human Interaction: Insights into how AI efficiency compares with the nuances of human interaction.

Next Steps

Tutorial Completion:
Students are to complete the real Python tutorial on prompt engineering.
Experimentation:
Students should experiment with modifying data or approaches in the tutorial to enhance its appeal.
Support:
Students are encouraged to reach out on Campus Wire or attend office hours for help with Light LLM or the tutorial.
Lecture Materials:
Greg will make the previous evening’s lecture materials available.
Continued Discussion:
Greg will continue discussing the AI/LLM landscape in the next lecture.
Exploring AI Agents:
The class will further debate the role and implications of AI agents throughout the semester.
Evaluation Methods:
The class is to investigate appropriate evaluation and validation methods for AI agents based on their responsibilities.
Advanced Topics:
The class will explore synthetic data generation and fine-tuning techniques later in the semester.

Summary of Topics Covered

Exploring LLM Tools and Frameworks

Tool Setup and Usage:
Discussion included setting up various tools for working with LLMs.
OpenRouter:
Highlighted for accessing a wide range of LLMs (e.g., Google and OpenAI models) and its potential for automated testing.
Framework Concerns:
Emphasis on the profusion of available frameworks with a suggested focus on LiteLLM and Llama Index.
Introduction to Aider:
An AI Pair Programmer.
Evaluation Strategies:
Stress on the importance of implementing evaluation processes when working with LLMs.
Encouragement to Experiment:
Students were encouraged to explore and test the discussed tools.

Language Models and Chatbot Arena

Model Classification:
Language models were divided into frontier models and open-source models.
Narrowing Gap:
Noted that the distinction between frontier and open-source models is decreasing.
Chatbot Arena:
Introduced as a crowdsourced leaderboard examining models based on human feedback.
Proprietary vs. Open Weight Models:
The top models are mostly proprietary, although Meta’s Llama was noted as an open weight model.
Quality, Cost, and Latency Trade-offs:
Discussion on the balance between model performance, cost, and response times.

Exploring AI Models and Multimodal Capabilities

Comparing AI Models:
Limitations and potentials of various models were addressed, comparing free but limited models (e.g., Claude Sonnet) to more expensive, robust alternatives (e.g., OpenAI).
Industry Investment:
Highlighted the trend of increased investment in AI to maintain competitive edges.
Shift to Multimodal Models:
Emphasis on models capable of analyzing and generating rich media—demonstrated by creating a photorealistic image with an LLM.
Experimental Models:
Mention of exploring models like Deep Seek and the potential of Open Router.

Multimodal AI, Open Weights, and LLMs

Advancements in Processing:
Discussion on progress in multimodal and audio processing, particularly for enhancing conversational interactions.
Clarification on Open Weights vs. Open Source:
- Open Weights: Provide only the model parameters.
- Open Source: Involves access to the training data along with the model.
Copyright Considerations:
Addressed the controversy regarding the use of copyrighted material for AI training.
Programming Landscape:
Stressed the significance of understanding LLM frameworks like PyTorch for future industry applications.

Model APIs and Performance Optimization

API Usage:
Covered popular options such as Llama, Lm Studio, and OpenAI.
AWS Bedrock:
Noted for providing an abstraction layer and a converse API for model access.
Stateless Models and Caching:
Discussion on models being stateless (not retaining chat history) and the use of prompt caching to reduce latency and costs.
Memory Limitations:
Although prompt caching optimizes performance, it is not a substitute for genuine memory, which remains an area of challenge.

Discussion Break, Simon’s Work, and Experimentation

Break:
A 10-minute break was proposed before resuming discussions on Simon’s work.
Continuation of AI Landscape Discussion:
The AI landscape will be revisited in future lectures.
Parallel Computing Experiment:
An experiment was suggested to validate variations in parallel computing due to floating point precision.
Concise Summaries with Expandable Details:
The idea of offering summaries that can expand into detailed discussions for complex topics was introduced.
Feedback Encouraged:
Academic dialogue was encouraged regarding any surprising elements or questions from Simon’s presentation.

Understanding AI Agents and Decision-Making

Definition and Role:
AI agents were defined as systems automating decision-making processes and interacting autonomously with their environment.
Output vs. Action Agents:
Differentiation was made between agents that generate outputs (e.g., reports, code) and those that take actions (e.g., unsubscribing from services or transferring funds).
Human Role in Complex Processes:
Skepticism was expressed regarding the capability of current agents to replace humans in complex, decision-intensive roles, such as HR.
Trust Issues:
Concerns were noted over trusting autonomous agents lacking shared human experiences and values.

Artificial General Intelligence and Oversight

AGI Importance:
The discussion emphasized understanding and evaluating artificial general intelligence (AGI).
Targeted AI Tasks:
Stress was placed on defining specific tasks for AI to prevent negative outcomes.
Example – Self-driving Cars:
The self-driving car example illustrated that, despite AI’s superior performance in some areas, human oversight remains necessary.
Shared Responsibilities:
Highlighted the importance of aligning AI systems with shared human values and responsibilities.

AI’s Role in Human Interaction and Data

Potential Replacement of Human Roles:
Discussion on AI’s potential to substitute human roles while emphasizing the need for rigorous evaluation.
Balancing Efficiency and Empathy:
Emphasized that while AI may provide efficiency, human interaction and empathy are critical.
Synthetic Data:
The concept of synthetic data was mentioned as a way to bolster AI model performance.
Encouragement for Further Exploration:
Students were inspired to delve deeper into prompt engineering by working through a Python tutorial.

References

AI Agents:
- https://aws.amazon.com/what-is/ai-agents/
- https://www.salesforce.com/agentforce/what-are-ai-agents/
AWS Bedrock:
- https://aws.amazon.com/bedrock/
Aider:
- https://aider.chat/
Artificial General Intelligence:
- https://en.wikipedia.org/wiki/Artificial_general_intelligence
Chatbot Arena:
- https://lmarena.ai/
- https://huggingface.co/spaces/lmarena-ai/chatbot-arena-leaderboard
- https://openlm.ai/chatbot-arena/
Claude Sonnet:
- https://www.anthropic.com/news/claude-3-5-sonnet
- https://claude.ai/
DeepSeek:
- https://www.deepseek.com/
- https://www.reuters.com/technology/artificial-intelligence/what-is-deepseek-why-is-it-disrupting-ai-sector-2025-01-27/
Large Language Models (LLMs):
- https://en.wikipedia.org/wiki/Large_language_model
- https://www.ibm.com/think/topics/large-language-models
- https://aws.amazon.com/what-is/large-language-model/
LiteLLM:
- https://github.com/BerriAI/litellm
- https://www.litellm.ai/
Llama:
- https://www.llama.com/
LlamaIndex:
- https://www.llamaindex.ai/
- https://github.com/run-llama/llama_index
- https://www.ibm.com/think/topics/llamaindex
Lm Studio:
- https://lmstudio.ai/
- https://github.com/lmstudio-ai
Multimodal Models:
- https://cloud.google.com/use-cases/multimodal-ai
Open Router:
- https://openrouter.ai/
- https://www.reddit.com/r/ChatGPTCoding/comments/1fdwegx/eli5_how_does_openrouter_work/
Open Weights:
- https://github.com/Open-Weights/Definition
- https://promptengineering.org/llm-open-source-vs-open-weights-vs-restricted-weights/
OpenAI:
- https://openai.com/
- https://openai.com/index/openai-o3-mini/
- https://en.wikipedia.org/wiki/OpenAI
Prompt Caching:
- https://platform.openai.com/docs/guides/prompt-caching
- https://www.anthropic.com/news/prompt-caching
- https://aws.amazon.com/bedrock/prompt-caching/
Prompt Engineering (real Python tutorial):
- https://realpython.com/practical-prompt-engineering/
PyTorch:
- https://pytorch.org/
Synthetic Data:
- https://en.wikipedia.org/wiki/Synthetic_data
- https://mostly.ai/what-is-synthetic-data
- https://research.ibm.com/blog/what-is-synthetic-data