Chip Huyenのフィード｜生成AI・LLM関連テックブログRSS

Chip Huyen

I help companies deploy machine learning into production. I write about AI applications, tooling, and best practices.

フィード

Common pitfalls when building generative AI applications

Chip Huyen

As we’re still in the early days of building applications with foundation models, it’s normal to make mistakes. This is a quick note with examples of some of the most common pitfalls that I’ve seen, both from public case studies and from my personal experience. Because these pitfalls are common, if you’ve worked on any AI product, you’ve probably seen them before. 1. Use generative AI when you don't need generative AI Every time there’s a new technology, I can hear the collective sigh of senior engineers everywhere: “Not everything is a nail.” Generative AI isn’t an exception — its seemingly limitless capabilities only exacerbate the tendency to use generative AI for everything. A team pitched me the idea of using generative AI to optimize energy consumption. They fed a household’s list of energy-intensive activities and hourly electricity prices into an LLM, then asked it to create a schedule to minimize energy costs. Their experiments showed that this could help reduce a household’s ...

2ヶ月前

Agents

Chip Huyen

Intelligent agents are considered by many to be the ultimate goal of AI. The classic book by Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (Prentice Hall, 1995), defines the field of AI research as “the study and design of rational agents.” The unprecedented capabilities of foundation models have opened the door to agentic applications that were previously unimaginable. These new capabilities make it finally possible to develop autonomous, intelligent agents to act as our assistants, coworkers, and coaches. They can help us create a website, gather data, plan a trip, do market research, manage a customer account, automate data entry, prepare us for interviews, interview our candidates, negotiate a deal, etc. The possibilities seem endless, and the potential economic value of these agents is enormous. This section will start with an overview of agents and then continue with two aspects that determine the capabilities of an agent: tools and planning. Agents,...

2ヶ月前

Building A Generative AI Platform

Chip Huyen

After studying how companies deploy generative AI applications, I noticed many similarities in their platforms. This post outlines the common components of a generative AI platform, what they do, and how they are implemented. I try my best to keep the architecture general, but certain applications might deviate. This is what the overall architecture looks like. This is a pretty complex system. This post will start from the simplest architecture and progressively add more components. In its simplest form, your application receives a query and sends it to the model. The model generates a response, which is returned to the user. There are no guardrails, no augmented context, and no optimization. The Model API box refers to both third-party APIs (e.g., OpenAI, Google, Anthropic) and self-hosted APIs. From this, you can add more components as needs arise. The order discussed in this post is common, though you don’t need to follow the exact same order. A component can be skipped if your sy...

7ヶ月前

Measuring personal growth

Chip Huyen

My founder friends constantly think about growth. They think about how to measure their business growth and how to get to the next order of magnitude scale. If they’re making $1M ARR today, they think about how to get to $10M ARR. If they have 1,000 users today, they think about how to get to 10,000 users. This made me wonder if/how people are measuring personal growth. I don’t want to use metrics like net worth or the number of followers, because that’s not what I live for. After talking with a lot of friends, I found three interesting metrics: rate of change, time to solve problems, and number of future options. Some friends told me they find this blog post mildly sociopathic. Why do I have to measure everything? Life is to be lived, not to be measured. As someone lowkey fascinated by numbers, I don’t see why measuring and living have to be mutually exclusive – measuring often helps me live better – but I see where they come from. This post is more of a thought exercise than a rigoro...

1年前

What I learned from looking at 900 most popular open source AI tools

Chip Huyen

[Hacker News discussion, LinkedIn discussion, Twitter thread] Four years ago, I did an analysis of the open source ML ecosystem. Since then, the landscape has changed, so I revisited the topic. This time, I focused exclusively on the stack around foundation models. The full list of open source AI repos is hosted at llama-police. The list is updated every 6 hours. You can also find most of them on my cool-llm-repos list on GitHub. Data I searched GitHub using the keywords gpt, llm, and generative ai. If AI feels so overwhelming right now, it’s because it is. There are 118K results for gpt alone. To make my life easier, I limited my search to the repos with at least 500 stars. There were 590 results for llm, 531 for gpt, and 38 for generative ai. I also occasionally checked GitHub trending and social media for new repos. After MANY hours, I found 896 repos. Of these, 51 are tutorials (e.g. dair-ai/Prompt-Engineering-Guide) and aggregated lists (e.g. f/awesome-chatgpt-prompts). While thes...

1年前

Predictive Human Preference: From Model Ranking to Model Routing

Chip Huyen

A challenge of building AI applications is choosing which model to use. What if we don’t have to? What if we can predict the best model for any prompt? Predictive human preference aims to predict which model users might prefer for a specific query. Human preference has emerged to be both the Northstar and a powerful tool for AI model development. Human preference guides post-training techniques including RLHF and DPO. Human preference is also used to rank AI models, as used by LMSYS’s Chatbot Arena. Chatbot Arena aims to determine which model is generally preferred. I wanted to see if it’s possible to predict which model is preferred for each query. One use case of predictive human preference is model routing. For example, if we know in advance that for a prompt, users will prefer Claude Instant’s response over GPT-4, and Claude Instant is cheaper/faster than GPT-4, we can route this prompt to Claude Instant. Model routing has the potential to increase response quality while reducing c...

1年前

Generation configurations: temperature, top-k, top-p, and test time compute

Chip Huyen

ML models are probabilistic. Imagine that you want to know what’s the best cuisine in the world. If you ask someone this question twice, a minute apart, their answers both times should be the same. If you ask a model the same question twice, its answer can change. If the model thinks that Vietnamese cuisine has a 70% chance of being the best cuisine and Italian cuisine has a 30% chance, it’ll answer “Vietnamese” 70% of the time, and “Italian” 30%. This probabilistic nature makes AI great for creative tasks. What is creativity but the ability to explore beyond the common possibilities, to think outside the box? However, this probabilistic nature also causes inconsistency and hallucinations. It’s fatal for tasks that depend on factuality. Recently, I went over 3 months’ worth of customer support requests of an AI startup I advise and found that ⅕ of the questions are because users don’t understand or don’t know how to work with this probabilistic nature. To understand why AI’s responses ...

1年前

Multimodality and Large Multimodal Models (LMMs)

Chip Huyen

For a long time, each ML model operated in one data mode – text (translation, language modeling), image (object detection, image classification), or audio (speech recognition). However, natural intelligence is not limited to just a single modality. Humans can read, talk, and see. We listen to music to relax and watch out for strange noises to detect danger. Being able to work with multimodal data is essential for us or any AI to operate in the real world. OpenAI noted in their GPT-4V system card that “incorporating additional modalities (such as image inputs) into LLMs is viewed by some as a key frontier in AI research and development.” Incorporating additional modalities to LLMs (Large Language Models) creates LMMs (Large Multimodal Models). Not all multimodal systems are LMMs. For example, text-to-image models like Midjourney, Stable Diffusion, and Dall-E are multimodal but don’t have a language model component. Multimodal can mean one or more of the following: Input and output are o...

1年前

Open challenges in LLM research

Chip Huyen

[LinkedIn discussion, Twitter thread] Never before in my life had I seen so many smart people working on the same goal: making LLMs better. After talking to many people working in both industry and academia, I noticed the 10 major research directions that emerged. The first two directions, hallucinations and context learning, are probably the most talked about today. I’m the most excited about numbers 3 (multimodality), 5 (new architecture), and 6 (GPU alternatives). 1. Reduce and measure hallucinations Hallucination is a heavily discussed topic already so I’ll be quick. Hallucination happens when an AI model makes stuff up. For many creative use cases, hallucination is a feature. However, for most other use cases, hallucination is a bug. I was at a panel on LLM with Dropbox, Langchain, Elastics, and Anthropic recently, and the #1 roadblock they see for companies to adopt LLMs in production is hallucination. Mitigating hallucination and developing metrics to measure hallucination is a ...

2年前

Generative AI Strategy

Chip Huyen

I had a lot of fun preparing the talk: “Leadership needs us to do generative AI. What do we do?” for Fully Connected. The idea for the talk came from many conversations I’ve had recently with friends who need to figure out their generative AI strategy, but aren’t sure what exactly to do. This talk is a simple framework to explore what to do with generative AI. Many ideas are still being fleshed out. I hope to convert this into a proper post when I have more time. In the meantime, I’d love to hear from your experience through this process. I couldn’t figure out how to make the slides centered on the page. You might want to download the slides. Thanks everyone who responded to my post and shared your thoughts on what I should include in the talk. Thanks Kyle Gallatin, Goku Mohandas, Han-chung Lee, and Jamie de Guerre for thoughtful feedback on the talk.

2年前