2/18/2024 Generative AI In Review
This past week there were so many advancements and announcements I decided to put together a video summary walking through everything in brief detail. Check it out and let me know if you'd like to see more of this type of content!
Links mentioned in this video:
Gemini 1.5 Pro Announcement
Gemini 1.5 Technical Paper
OpenAI SORA Text-to-Video Announcement
Lymsys Chatbot Arena
Chatbot by Mason
Groq
Groq LPU PCI-e Gen4 x16
Mac Pro Technical Specs
Video Transcription:
Hey team. Today I wanted to look at some of the most recent advancements in large language models and generative AI. I thought the easiest way to do it might just be to do a video rather than an in-depth blog post. I will put links below this video to everything that we discuss here so if you want to look at it later totally fine you should be able to find that information below. To start off with we got to talk about Google and their latest model Gemini 1.5. This is a major advancement over the previous model Gemini 1.0 which just came out a couple months ago. So this is exciting to see them iterating so quickly.
This was just released February 15th and there's two major things that I wanted to look at with this new large language model Gemini 1.5. The first is right here at the top of the page. I think this really tells the story. This is their current Gemini 1.0 Pro model compared to Gemini 1.5 Pro. Now this is showing the difference in size of the context window. The context window is basically how much memory this large language model has in a specific conversation. 1 million is what they'll be releasing with Gemini 1.5 Pro but you can see they did up to 10 million tokens in research. Compare that to even the latest models here GPT-4 and CLOG2. You can see this is a big advancement in terms of how much content a generative AI model may be able to hold in conversation. Really exciting to think about what might be possible with million tokens.
You can see they show you here that's gonna be an hour of video, 11 hours of audio, 700,000 words. Really really impressive and that's just at the million level. Now one of the other things that I wanted to talk about with the Gemini Pro model, especially the 1.5 release, is a problem and an area that AI researchers are talking about with the context window which is called needle in a haystack. So the needle in a haystack problem or a view of a context window is how well does this AI recall specific facts from the entire context window.
So in other words if GPT-4 has a 32,000 token context window, how good is it at spotting those tokens and pulling up the right bits across the entire spectrum and across all 32,000. And what we've found is that with models like GPT-4 and CLOG2, they're really really good at their accuracy, especially at the end of the context window, right? Whatever is at the very end. They're not great at the middle. Beginning they're okay but really the end is where we see them really shine. And so the way you see that play out is folks will write prompts or they'll create you know an elaborate request for one of these models and they make sure to hit a few times in their prompt a specific key point and they'll really hit it hard at the end. They're trying to make sure that the model recognizes that and produces the best results. That eliminates things like where the model gets it wrong and we call that hallucination, right?
So Gemini 1.5 Pro as you can see on the screen here take a look and as I get across our context window how accurate are we seeing the model recall these items? And we have just stunning results. There's almost no degradation from the very beginning with you know up to 32,000 tokens all the way to 1 million and again in research they even took it to 10 million and that's simply where they stopped with the research for this paper. In other words that's not a cap limited by the technology other than maybe compute but that's just where they they chose to stop because they were at 10 million already. So this is a major major improvement for Gemini 1.5 Pro. Likely we'll see this released in the next couple of months. It's already out as a developer preview. Few folks do have access to it right now but not a lot of folks but again we expect general availability on this within the next couple of months and in 2024 we do expect that Gemini 2.0 will be released and all bets are off on what will be included given the advancement that we've seen here in 1.5. Now if you've been following generative AI at all you know about OpenAI, you know about chatGPT and their text model. You've probably heard of DALI and their image model.
Last week the same day as this Gemini 1.5 Pro model was announced by Google, OpenAI announced their Sora model. Now Sora is a video model. You can provide it with a text description and it will create a video for you. This is not entirely new. This technology has been around for a couple of years and we've seen companies produce things like two or three or five second videos that you know are pretty decent. There was a very funny video going around a year ago where Will Smith was eating some spaghetti. Feel free to look that up but just kind of showing where the state of video AI was you know 12 months ago. This is what OpenAI released last week. So check out this video which is totally rendered by AI. We're seeing that this model even did a cut there to the same woman. It's following the prompt appropriately and it's got stunning detail on the blur effect, the city that she's walking through.
There's several examples on this page and I'm gonna again share the link to this so you all can take a look at these videos for yourself. This one just kind of shows that it can do different styles. Here's like you know California at the gold rush period. Look at the camera movements and the way you know people continue. That was all decided by the large language model. This is probably one of my favorites. It almost feels like you're driving a video game as you see we're following along behind this vehicle as it goes down a winding road around the mountain. Just amazing that this can be done via a text model. This is another one of my absolute favorites here. The prompt is just a snowy Tokyo City bustling and we're kind of following these two people. Notice the way the shot is very cinematic right as we come down behind this couple walking. It's really impressive. There's a lot of them here and of course with any of these AI models they're not perfect. So I also have some examples like this one where just the model is totally hallucinating. What a chair would look like coming out of sand. Let's actually look at that again. I just think this is crazy. Wow. It forms into a chair. So lots of exciting things that are happening here with OpenAI and their SORA model.
Now this model is not released yet and we don't expect it to be released within the next 10 months. That follows the historical pattern that we've seen with GPT-4 with DALI-3. They all from the time they were sort of exposed to the public and brought into awareness it was always about 8 to 10 months before public access and we can kind of guess why. This year in the United States is an election year and there's already a ton of concern over misinformation and how this technology might be used to mislead people. And so I can imagine OpenAI's leadership making decision that sometime after the election might be a better time, safer time for them to release this model. So it's not available yet but it is an incredibly exciting preview of what is available today. Again this is the worst this technology will ever be. It gets better from here. So this is a really exciting place.
The third thing that I wanted to mention was something I saw on X. X user was looking at an arena, a chatbot arena for open source AI models. So this would not be GPT-4 and CLAUD and Gemini. These are open source models that are available to the public. You can download them and if you have hardware that works well enough that can support the models you can actually run them locally. And one of the top companies in this space most exciting is a company called Mistral. Who have said they have a model that they are going to release that can beat GPT-4. We haven't actually seen that model yet. We've seen the paper. It does appear that this model is now available in limited form via API access.
I had a look at this myself and found that yeah this model here called Mistral Next does create some pretty good prompts. So let's just think about what we might use this for. How many islands are there in the map? I don't know. Let's just see what happens here. Approximately 20-30,000 depending on the size of the island, the exact number. So we're getting a pretty good response here. A pretty fast response. Again this is an open source model. We'll do some direct comparisons between Mistral Next and let's say GPT-4. This arena site is really interesting because we can actually do things like that side-by-side. Here's Gemini Pro's dev. We can do the GPT-4 preview. Pretty cool.
Now if you're interested in testing out these models for yourself I do have a spot for you all to do that. You can head over to the chat link on my website and that will take you to this page where you can actually sign up for an account. I already have one here. But I have made some of these models available to folks to use in your own testing. So if you take a look up here we've got GPT-4 Turbo. We do have some of the Mistral models. I have the tiny small medium, not the next model that we were just looking at, but that is something I'm looking at adding. We also have the perplexity models if you're interested in those. And I'm always adding and testing and tweaking the models that are available here.
So if you're interested in this, explore this and check it out. Come over here to this chat link and give it a try. Let me know what you think about it. And if you have any additional questions about these large language models I'd be happy to hear from you and learn more about what your specific questions and needs are with chat models. The final thing that I thought was really interesting that came out last week, again from Twitter, was this company called Groq. Now Groq has been around for a while. They've been exploring hardware for generative AI. And what they've done is they've been working on what they call LPU, a language processing unit, that's different from a CPU and it's, you know, again, it's specific hardware meant for large language models. So I took a look at what they have here and, you know, let's give this a try.
Again, interesting they're using the open source model of Mistral here, but this is running on their cloud infrastructure. So different hardware than what we would typically expect for these types of models to run on. And let's run that same query again. How many islands are there in the South Pacific? So let's see how this goes. And look at that, we're already done. Let's do a different prompt actually because that was so fast. I don't think we really got to understand, you know, that was 427 tokens per second. But let's do something a little more complicated. It's a cold day and I want to create a warm party soup. Help me create a recipe includes fresh mushrooms. What else do we want in there? And tomato. There we go. It's already done. So did you catch that? In 2.5 seconds we have sure, I'd be happy to help you come up with something warm, hearty soup. Here's your ingredients. Mushrooms, onion, garlic, tablespoon of olive oil. There we go. The tomatoes are in there too. Vegetable broth. Sounds pretty good. There we go.
Now this preview model, we can't have ongoing conversations, but this is really interesting, right? This is extremely fast. This is probably three times faster, maybe eight times faster than you'll see in standard models like GPT-4 or Gemini. This is really, really fast. That got me thinking, what are they really doing here? And so I took a look at Groq's website and you can read a little bit about like their LPU specifically and what that means and why is that so much faster. And if you scroll down on this page, you'll see they even show their hardware. So what we just tested was Groq Cloud. That's the inference that they're running on their own special cluster of these LPUs. If you look down the line here, you'll see they also have the Groq Rack, Groq Node, and Groq Card.
The Groq Card itself, which is sort of the beginning or the foundational piece of this, is a single chip in a standard PCI Gen 4 by 16 form factor, right? That's a single card. They have a 4U, what they call a node, and that has eight interconnected cards. And then they have the Groq Rack, which is 64 of these integrated chip cards. Who knows how many they have running on Groq Cloud, which is what we just got to experience. But if you want a Groq Card, one of these new, not a GPU, but an LPUs, they actually are available right now. And so I took a look to see what these looked like and you can see here, I could go ahead and order one right now for the convenient price of $20,000. It's a lot of money. That's what one of these are. You can sort of work backwards from that to think of how many, okay, well this has eight of them, this has 64 of them, you know, the type of costs that we're talking about with these. But a single one of them, $20,000. What in the world would I run that on?
Well, I need something that has a PCI Gen 4 by 16 form factor, first of all. And wouldn't you know it, looking at the new Mac Pro, either of these versions, but they do have a rack-mounted one, which I kind of like. As I was looking through their expansion support, they have two of these by 16, 16 size, full-length PCI Express Gen 4 slots. So you could install two of these LPUs inside of a Mac Pro. If you're spending, you know, $45,000 on a couple of LPUs, maybe a $7,500 Mac Pro makes a little bit more sense.
I have no idea when and if Mac, Apple will ever actually support these LPUs or what that looks like now. I imagine you would need to do a completely different operating system and maybe some sort of dual boot with a Mac to use them. No idea. If you're interested in exploring this and you have the resources to find out, I'd love to talk. That sounds like that would be a lot of fun. But it is interesting to speculate on where this could be going and how, you know, this hardware will become cheaper and cheaper and more accessible. And obviously the language models themselves are improving and they're becoming more efficient. And it's, you know, you can kind of see in the distance where we may have a future of in-home AI, kind of a picture like an AI that runs the house for you, right? Very interesting stuff. Let me know if you like this type of content where I look at the different information and we kind of do a more casual video form here. Take a look at the links below and I will see you all next week.