Arize:Observe is June 4
| Register now

Frank Liu and Jason Lopatecki

Extending the Context Window of LLaMA Models Paper Reading

Published Aug 7, 2023

Sarah Welsh

Contributor

Introduction

During this week’s paper reading event, we are thrilled to announce that we will be joined by Frank Liu, Director of Operations, and ML Architect at Zilliz, who will be sharing valuable insights with us. This paper examines Position Interpolation (PI), a method extending context window sizes of LLaMA models up to 32,768 positions with minimal fine-tuning. The extended models showed strong results on tasks requiring long context and retained their quality within the original context window. PI avoids catastrophic attention score issues by linearly down-scaling input position indices. The method’s stability was demonstrated, and existing optimization and infrastructure could be reused in the extended models.

Join us every Wednesday as we discuss the latest technical papers, covering a range of topics including large language models (LLM), generative models, ChatGPT, and more. This recurring event offers an opportunity to collectively analyze and exchange insights on cutting-edge research in these areas and their broader implications.

Watch

Dive in:

Main Takeaways

Positional Embeddings are not talked about as much as they should be, important signal in transformers understanding ordering of tokens
RoPE extrapolation was causing catastrophic issues when extending context windows, a ton of the research community was off in some what looks like “incorrect” directions on the reason why
LLaMA community hacker / builder – kaiokendev discovered the problem and solution with incredibly solid debugging
Meta paper showed the problem with a lot of technical backing, being the extrapolation of the positional embedding causing attention to explode
Interpolation of RoPE was a tiny change that was able to 4x the context windows with solid performance on current models

Transcript

Jason Lopatecki, Co-Founder and CEO, Arize AI: How’s it going? Hey, Frank, how are you?

Frank Liu, Director of Operations, and ML Architect, Zilliz Okay

Jason Lopatecki: Could give some time for folks to hop in. Where are you based today, are you in the South Bay?

Frank Liu: Yeah, we’re in our office. So we’re in Redwood City–Redwood Shores to be precise. So right in the middle of the peninsula, right between Sunnyvale and San Francisco. So yeah, feel free to hit me up and grab a coffee or do something along those lines.

Jason Lopatecki: Yes, definitely. Is it warm out there today?

Frank Liu: It was actually cloudy this morning, which I was pretty surprised about. But aside from that it’s warming up. And in particular, this room gets so much sunlight right around noon, and in the morning as well that it ends up getting pretty warm here, even with the air conditioning. So not a bad thing, but it’s quite nice. That’s one of the reasons I love the San Francisco Bay area, right?

Jason Lopatecki: So here’s a side question for you. Where do you think the center of Bay Area AI is? Is it San Francisco these days, is it the Peninsula fighting back?

Frank Liu: It’s definitely San Francisco, 100% San Francisco. So you know I’ll go up to San Francisco every now and then to attend. Some meet up at some of these events. But it’s definitely San Francisco. Are you familiar with the Cerebral Valley meetup spreadsheet?

Jason Lopatecki: Yeah, I have seen that.

Frank Liu: Yeah, if you look at that one. And you look at the location San Francisco is by and large that everything’s in SF, and it’s not even a close second is actually remote. So you know, I suppose, make up that way.

Jason Lopatecki: So for those who are just joining, we’re talking about the AI center of the world. Have you been to any kind of meetups in the valley recently? Is it just not happening as much but still happening down there in Mountain View, Palo Alto.

Frank Liu: You know it still happens. Last week I went to one hosted by Snorkel, that one actually had quite a few people, so I don’t think it’s that there’s people who don’t want to go to these meet ups in the Peninsula, but they’re just not as popular, right? And there are also a lot of great meetups in San Francisco period. One of my favorite ones is actually the ones with Arize and with Ray. But hopefully, we see a bit of a resurgence in the Peninsula here. It’s just not as active right now, but we’ll see.

Jason Lopatecki: Awesome. Do you want to introduce yourself?

Frank Liu: Yeah absolutely. So hey everybody my name is Frank, officially Director of Operations as well as ML Architect here at Zilliz. And at Zilliz we’re the creators of arguably the world’s most popular open source vector database ,and as a part of the Milvus community one of my key goals is to encourage the adoption of a variety of different embedding models. I sound like a nerd saying this, but oftentimes I’ll train my own models of the weekend and really see how performance of these different model architectures enables different activities, particularly in computer vision. So it’s only recently, probably in the past year or two that I’ve gotten a little bit more into natural language.

But yeah, you know, it’s a pleasure to be here. Thank you for inviting me and looking forward to getting this paper reading under way.

Jason Lopatecki: Yeah, excited to have you, and we’ll do more of these for sure. So for those of you who don’t know me, I’m Jason, Co-Founder here at Arize. I’d say, also somewhat of a nerd hacker, and try to touch and build this stuff as much as I can–it’s something I just love doing. So this one is a fascinating one where we’re going to go between a little bit of a blog and a paper. And so this paper was written by the Meta team, but a hacker in the Llama community kind of hopped ahead of the results of this one with an amazing story of them discovering this issue. And it’s really in this abstract area around positional embeddings.

Frank, had you thought about positional embeddings before? Had you read papers prior to this?

Frank Liu: Oh, yeah, absolutely. I think it was very, very difficult for me to understand coming from the perspective of computer vision what Positional embeddings were really there for, until I got a better understanding of how these attention scores work, and in particular–I sort of want to draw this diagram out, and then I’ll hand it back over to you, Jason, but the way self attention works is you maintain what is effectively this large self-attention matrix, right? So it looks something like that. And each sort of axis corresponds to the same sentence. Effectively, each dimension is a token. I’ll call them A, B, C, D. And what we are really interested in doing is computing these attention scores between all of these tokens. So in particular, when it comes to transformers, if I were to swap any 2 of these. So if I, for example, if I swapped B and C here, and instead, it was CB, you’ll notice that my attention matrix is also these two dimensions are also transposed. So I really lose that positional information. And that’s really where positional embeddings come into play. And in particular, for the original transformer paper. What they would actually do is you have these, you essentially have these multiple sinusoidal waves. So it might look something like this. And then I have a higher frequency one. And I have a very, you know, much higher frequency one. And then I have a very, very high frequency one, and they sample at these linear intervals, these high dimensional waves. So I might sample right here, here and here. And the concatenation of all of these values would give me the positional embedding at a particular token. So these are called absolute positional embeddings. Right?

The reason why you need positional embeddings is because you lose that information without them. And when it comes to these embeddings can really seem a bit like black magic for the original Vision transformer paper. I believe there’s additive embeddings. And really it shouldn’t work if you think about it from a theoretical perspective, or even intuitive perspective. But it does right, and that I think is one of the amazing things about deep learning in general, which is that you can do amazing things with back propagation. You can do amazing things with these different model architectures. But that was a bit of a long-winded answer. I apologize about that, Jason. I’ll sort of stop my sharing here and head back to you.

Jason Lopatecki: I guess the biggest picture I take away from this stuff is basically you’re saying to make transformers fast and the ability to do stuff in parallel and over what was your RNN previously, you lose that positional information. And essentially these positional embeddings kind of add a little bit of that signal back. So the models can kind of understand a bit of the order of the words which you kind of lost in the transformer architecture, which I think is like the big, the big point you’re making. And what’s so interesting is that the core problem right now we go to people we’re trying to solve right now is you’ve got this model trained on this small context window, why can’t I extend that window with fine tuning? I should be able to like, take a model. It’s training something small, but extending it. But you’ll see from a lot of this that, like things, are just breaking in in very non-obvious ways, and by walking through a little bit of the blog, I think you’ll see the confusion everyone was having in the industry. And I think what you’re gonna find here is a very subtle issue with positional embeddings that was causing a myriad of experiments and problems and things across a lot of different papers.

So the question is, really, kind of this: Can we extend the context window of an existing pre-trained LLM? Can I take 2,000 tokens and bring it to the 16 or 32 from the current Llama versions I have? And can I take a fine tuned transformer, extend that context window? And in this work they introduce this concept of position interpolation. You’ll see this idea of interpolation versus extrapolation, as kind of coming in throughout the paper. Where you use some of the signals in an area that the model seen before versus, you know, signals outside of that area. And we’ll talk a bit about what that means further on here as you go through. I think this is kind of your drawing of the positional embeddings, and in the regions.

Frank Liu: Before I go too deep into this. I was very surprised that this paper in particular, actually didn’t cite the original vision transformers paper, or even the data efficient Image Transformer’s paper, and I’ll go. I’ll go a little bit into that. I’ll go a little bit deeper into why I think that is right. But in particular, what you see here is, if I have a context window of, let’s say, size 2,048 or it could be size 512, and if I try to extrapolate the positional embedding. So we were talking about those sinusoidal waves a little bit earlier, right? And how the positional embeddings are effectively these multi-frequency, sinusoidal embeddings that I simply sample at these individual intervals. If I try to extrapolate into that unseen range, and maybe we actually want to go to that–there’s a there’s a there’s another graph down at the bottom that shows the attention scores.

Looking at this, you know, it really just clicked to me, because, in particular, if you look at the effect of extrapolation, you can see that the attention scores go wild. They go crazy. Right? So whereas they’re mostly bounded between 3 and -3, went for the context window that I have trained. If I try to extrapolate those sinusoidal waves, I end up getting these positional embeddings that the network has never seen before.

And that causes the absolute value of my attention. Scores to go well, first it drops off a cliff, and then it goes to Mars. Right? But instead, if I look at the interpolation right, if I look at the effect of interpolation on these attention scores. I can see that they are much, much more well behaved, and that they’re very, very bounded. I know it doesn’t really seem like that. Unless you look at the scales, the scale for the graph on the right is negative point 2 to point 2. Looking at those positional differences. And that is where I think that is a key contribution of this paper.

If we go back to that diagram up there, I think this is really honestly an incredibly simple idea, and I’m surprised that it works so well. Well, I’m not surprised that it works so well but I’ll get into that. I’ll get into why, that is a little bit later. But if we have this context window in the pre-trend area. And then what we do is we want to extrapolate that into this unseen range. That doesn’t work that well. But instead, what we do is we interpolate it. Now you can think of it as sort of squishing the range of the sinusoidal waves that we’ve seen, or the range of these that we’ve seen. So now, instead of now, as I do the fine tuning, a token gets interpolated to what the original model considers a token at position 2,048 that ends up working much, much better, and that gives us much, much better performance. I think if we look at some of the stuff down at the very bottom, some of the results, I think it goes into that a little bit deeper, and they also have some proofs in here.

Back when I was on the computer vision and machine learning team over at Yahoo, ee would do these paper readings. And there was one paper reading that had this long proof of how quantization aware training is the optimal way to do things, right? And I realized at the very end that what they had really done is they had taken the result that they wanted, and they worked backwards to derive it, if that makes sense. So I’m a bit wary about a lot of these sorts of proofs especially in machine learning, it often happens you’ll have proofs that don’t necessarily translate to something that’s actually practical. But that’s a story for another day.

Jason Lopatecki: And I think this is kind of the meat of it here, too, which is kind of three things it’s pretty easy to do with a pre-trained model, which is kind of step one here not require that many steps. The position interpolation generates strong results which make use of the extended model. You see that in the results, where they have kind of a couple of different tests where they test these extended model windows that we’re fine tuned on. And lastly, does it, the question might be, does it degrade performance in the thing you originally trained on in that current window? And it’s a slight degradation, but actually, really small. So it works and works really well. The thing I want to highlight too– some people believe this was possible, but really we’re running into a lot of issues, and there’s this concurrent work which is this blog is such a good example of troubleshooting. And I’m actually going to pull it up, but it’s like all of this paper kind of all these papers typically hide the hard, hard, daily work that goes into troubleshooting a really really tough issue. And I think this blog kind of hits on those issues. I think a lot of people thought this could happen, but no one really knew what was going on.

So let me bring this up, the person who wrote this up, and it was just like his story. So he kind of notifies that, you know Meta has to reference. So basically, he wrote this up, then Meta’s paper came out, had to reference this blog here, which kind of hit on the same things that the paper hit on. Which is amazing, you got this Llama hacker doing amazing work here to figure out what the problems were in a very thoughtful way. And coming to the same conclusion of an entire team at Meta. And so he really was starting to think about extending the sequence, and like what papers defined the problem. And there’s quite a lot out there. There’s like, why, it fails. Maybe it’s distracting tokens. You know, maybe there’s bias, you know, in terms of positional beddings. there’s a bunch of these where they propose fixes, you know what, during long reposition embeddings are updated much fewer times, they add padding in. But don’t get that big of a difference. So community wise people are kind of stuck on this. You feel like lots of different people are trying to figure out why this is happening. And he’s kind of reading and trying to figure out what he should do to test some stuff. and he’s like, Okay, I have a better solution to the problem, what are some remedies? And there’s some around like you know, stuff you might do in pre-training. There’s just a bunch of like shots at fixing this by trying to figure it out. Anything else to add, Frank?

Frank Liu: Yeah, I think all of these sorts of papers, and all these strategies have their own merit. And I don’t want to sort of take away from these ideas in and of themselves, but oftentimes you can think of them as more of trying to build a top of what there already is, rather than looking at the core or the crux of the problem, which this person goes into. It’s a great blog post. Which goes into talking about how we compute attention and what the role of the positional embeddings are themselves. You know, they go into some potential solutions. And they actually, there’s actually one line where they highlight… And I feel really bad for skipping over all this stuff. There’s a ton of work here.

Jason Lopatecki: And then he gets to the very end, like the different things he does to try to test these and reproduce and just keeps getting insane results. And just like you saw those intention values explode, the models were not just getting slightly worse, but seem to be getting much, much worse the more you go over your window length. And so just these very unexpected results, it sounds like what was happening as it was occurring.

And he gets the spot where he’s like: all I can think of is I got it upside down, there does not seem to be any reason why the particular model cannot extrapolate, but it can’t. And then he’s like: well, I was thinking wrong, you know. Maybe there’s something preventing it from doing so or suppressing the behavior in it, which is kind of like, okay, now let’s start from basics, which is, you know, a lot of us have done this debugging a million things. Let’s rip everything out and just start with something simple. Which is kind of what he did here with like this sliding mask approach, which is I guess, a part of a long former paper kind of uses this, but just try to keep the window in a more fixed length, but basically simplify the problem. And still getting to a spot where things weren’t working and then put simply, he gets a hunch that the model is understanding the positions or learning the positions and basically causing problems when you’re applying, you know, positions outside of the stuff it’s learned. And so it comes down to well, let me do position IDs but keep it fixed in the range. And then, voila, he’s got something that works and kind of it comes to the same final approach as the Meta paper, which is amazing.

So, it’s just an incredible debugging, hacking job, and goes to show what Llama’s open source is doing in the community, too, I think. Having the ability for an individual to tackle this just wouldn’t be possible without it.

Frank Liu: Yeah, I totally want to echo what you said, Jason. Talking about Llama… I think open source, irrespective of what OpenAI or Anthropic or any of these other proprietary LLM vendors say, I think open source definitely has a place in the broader AI space today. And the reason is because, you see hackers like Kaio Kendev again, I don’t know if I pronounced that right, I apologize if you’re watching, but you see these amazing engineers like, Kaio Kendev, who are able to hack on top of Llama, who are able to hack on top of these LLMs and come up with these amazing results that in a typical sense, would be reserved for these large, well-funded research teams at at big tech companies such as Meta, right? And I think it’s really amazing, I hope we see more of it.

Jason Lopatecki: Yeah, I agree. And I do think with the Llama 2 license moves you’re definitely kind of having them go more in that direction.

Now we get into a little bit of the tech, a little bit of the math behind this, and I think the core takeaway is kind of what you’re hinting at here– if they turn that embedding into something which adds in that kind of positional embedding signal, sinusoidal which also can be represented by a complex number which is what they kind of create here out of that embedding position. And the point of this math is just to kind of show that I guess the tension should be based upon relative position, which is kind of the goal of the positional embedding itself, the attenuation of the attention should be based upon that.

And then I think they talk about here, while attention score depends on relative position. So the point is just, I don’t think this is new. I think they’re just kind of highlighting that first before they start to tell you like, why it might explode, even though this equation kind of holds. So, RoPE only depends on relative positions but extrapolation performance is not great. And it’s really like unseen values of this function outside of this range is kind of what causes problems. And it says that you’re going to see these bounds being held for probably the region of area you’ve trained, but probably the bounds don’t hold out outside of it, is kind of the point they get to in the section. They say: We see catastrophic issues beyond a certain location, and what is the reason behind it? How could this happen if the attention score decays relative to this? So that’s the kind of point they’re making. There’s this hard equation above, it’s supposed to say attention to the case in some natural way.

And they go on to theorize a bit about why this might be true. It didn’t feel like a hard proof necessarily, they talk about how the positional beddings can act as a set of basis functions. And those basis functions can represent any possible function in this range. So, those basic functions could look great and represent something really well in that range, but can look very different outside of that range, was my very high-level take of what this section is trying to get at here. Which is just a possible explanation for why things fit well in this region, but don’t fit well outside of it.

And then they talk about the solution, which is funny enough. Kaio Kendev’s exact same solution both came to independently. And it’s basically interpolating and fitting. those points into the window, a very simple change. And what’s amazing about this is you don’t need to change Llama like it’s not like a pre training change. It’s simply the Llama with, you know, it’s simply use the Llama models out there and make this adjustment, fine tuning to extend the window. And voila, you’ve got something that actually works.

This is again deep in the math section, where they’re just trying to talk about why there’s some really good bounds to this. Tying to justify why, they’re not going to get that big explosion like you saw in the picture, and in the positives of the architecture. Talk about fine-tuning, so what’s needed to extend that window, and then they did talk about maybe there is some way in the future we might extrapolate, you know, you might be able to add some regularization so that the extrapolation could be used. But they actually don’t test it. And given how all this works I don’t know when we’ll get someone doing the extrapolation anytime, but they do hint that this is an area they’re looking at.

The experiments I thought were pretty good. As I looked at it, the results showed two things in my mind: it didn’t degrade the stuff in the small window and seemed to work if you use the large windows. That was to take away from these sections.

Frank Liu: Yeah, I think you see, especially in some of the numbers that as they use their method here in the paper as they use PI to do the fine tuning, you can actually see the perplexity scores decrease for those longer context windows, which I think is pretty unique. Now, if you use just pure fine tuning. So in all those rows that say FT. If you use pure fine tuning, and you try to extrapolate those positional embeddings rather than squeezing those waves and doing interpolation, you actually get worse results. So your perplexity increases, even though your context window increases as well. So definitely some great results here. And I think it shows the power of interpolation.

Jason Lopatecki: There’s a question here in the chat: “I’m a mathematician and data scientist and Google partner here in Brazil. Do you think there’s some research for math PhDs using techniques or these ideas thinking about this, Considering problems like Fourier components? Do you think there are techniques to reduce components and improve precision?”

Absolutely. I think, positional embeddings we’ve got like we’re in kind of the first inning, probably on usage. And I think being able to use long context windows. Well, and really understand how this stuff is all working and how it ties into, you know, attention. I think there’s so many interesting areas probably to go into in terms of like, yeah, we work on different ways to use positional vetting different approaches to them. Absolutely.

Frank Liu: Totally. And I want to second Jason’s point there as well, which is, I think, for analysis in particular when it comes to positional embeddings is really crucial, because you can see what folks are trying to do with RoPE and with some of these other positional embedding strategies as well, is to try to solve a problem but using methods that are, I’m not going to say “simple,” but using methods that have better explainability. And if we can take frequency domain analysis into consideration when we’re doing a lot of these positional embeddings, positional embedding research in particular, I think it could be very helpful for the broader machine learning community in general. But, again, that’s just my two cents. I have a background in electrical engineering, and one of my favorite classes in college was actually, Fourier transforms. So just my two cents there.

Jason Lopatecki: Signals and Systems rearing its head.

So, the takeaway on this context window is kind of this–this is the blow up you were talking about, Frank–and it’s just, numbers are insanely massive in terms of how bad they are versus. It’s kind of when you get the PI, you’ve got some amazing results with some simple addition.

And then the study you know. So they do kind of approaches to perplexity. But let’s try some tasks that are harder and actually see if it’s using the context window. So this one’s like the random pass key where you have a long document, it’s got to retrieve it. So it’s got to use attention going back some distance to be able to actually respond and find that item. It’s almost like a memory lookup type thing. And you know again the results were pretty good relative to, you know, is, is it actually using the window sizes to do what’s needed? this might not be the right one. This is probably the right one. Is it using the window sizes that are needed? What’s the window length based upon kind of what’s the estimated when the length based upon how well you’re retrieving?

And these all seem to be using the window when the link to you you’d think you could get to notice that you know this one. You know some of these I guess some of the other ones also do another benchmark on. So I skipped through some section here. I think, so there’s key size, there’s a summarization approach as well. Not so they do a key. Look up in a summarization on all of context. This is a benchmark on just is that small window degraded? And then, yeah, here’s the long documents conversation. So they do a summarization. So again, let’s test what we really were after. In the first place, which is the long context window, is it working and doing what we wanted to do. Again, perplexity is a nice measure, but it’s just not task oriented. So this is another one, and you just look at the results and you know the results down here. And it, you know, it was in the kind of top results of benchmark models, too. Go out and use the Llama long context windows, they look good.

Frank Liu: They look good, yeah. And I would be surprised if you know, it’s like Anthropic with their 100 gate models doing this sort of internally as well. I mean, the results are really quite good. But yeah, Jason, did you want to go over the rest of the paper like, I actually have some, some, some some thoughts I’d I’d love to share as well.

Jason Lopatecki: Yeah, let’s do that. We’re kind of at the end here.

Frank Liu: This is just general commentary here, but I actually want to draw everybody’s attention to the Original Vision Transformers paper. And the reason why I want to do that is, if you look in this paper they actually talk about how they do fine tuning at high resolution. So I want to highlight in particular, they say: “We therefore perform 2D interpolation of the pre-trend position embeddings according to their location in the original image.”

So I’ll explain very quickly what’s happening here–I’m a computer vision nerd, so I love talking about this stuff. If I have an image. And again, this is a vision transformer. It’s 224 by 224, and I want to fine tune something larger, let’s say 512 by 512. Vision transformers like natural language transformers use positional embeddings as well. But interpolation in this case, I believe they’re doing bilinear interpolation of the embeddings, they’re not actually interpolating the original sinusoidals themselves. And this is a very natural thing for computer vision, and it turns out to work very, very well. And the reason why it’s so natural is because if I look at these two corners right here, as I scale the image up, these two corners actually correspond to the same regions, they actually correspond to the same information in an image. And that’s why, as I do to the interpolation for vision transform, it’s a very natural thing to do. And it works well. There’s another paper called Data Efficient Image Transformers that shows this. And I’m not going to pull that up here, but the motivation behind a lot of this work here comes from a paper called Fixing the Train Test Resolution Discrepancy from again, folks at Facebook AI Research. I find this very, very funny, because the paper that we were talking about today is from Facebook AI Research, and then Hugo is one of the first authors on Llama one and Llama 2 as well, but this is just an interesting tidbit I figured I’d share with everybody.

But that is really critical, I think, when it comes to understanding the broader machine learning world is that there’s so many ideas that can be borrowed from other modalities. And we want to be able to just keep about things with an open mind. That is the point that I wanted to make much earlier, but in particular, I think there’s a lot of positional embeddings are sort of like black magic. And I think there’s a lot of work that needs to be done, not only to understand them better, but also potentially to say: Hey, how can we really use them to great effect to extend the context length of a lot of these transformers?

Jason Lopatecki: Yeah, awesome. I think we have one more question that might be a little beyond me. Do you think that in this problem, we have problems like sparsity, that reduce model precision, thinking about Fourier analysis, can we correlate interpolation with this behavior?

This is probably slightly beyond my comment ability, but I think the use of these here is really to signal in a way that has a certain set of properties, a signal to the model for each position that has some really nice properties and depending upon your choice of positional embedding the properties can vary a bit. And so I don’t know if sparsity is kind of the issue I’ve heard relative to this, but there’s definitely different approaches to creating these which have a whole different set suite of properties. And then I’ve seen recent papers that combine some of them together to get joint properties so similar to wavelets and Fourier analysis, I mean there are ways of combining, you know, decaying things with, you know, with varying things and getting different properties. But the big idea is you’re trying to focus attention and understand position and have those be signals that the model can actually pick up, which is kind of what drives a lot of the creation of these. I don’t know if I answered the question specifically, but I skirted around the edges of it.

Frank Liu: Yeah, I suppose I don’t quite understand how sparsity relates in particular to positional embeddings, but when it comes to models in general, and especially when it comes to being a part of a vector database company, I think we’re all about vector embeddings, sparse embeddings definitely have a place when it comes to understanding, not just text, not just documents, but any type of unstructured data. I think it would be a little bit harder to correlate that with Fourier analysis. But maybe there is something there as well, I don’t know

Jason Lopatecki: I did want to give a plug for an August 24th event in SF, so if you’re local and you want to come, we’ll be talking about vector databases and troubleshooting vector databases. And you know, probably will be pretty technica, how do you debug embeddings? How do you troubleshoot vector, databases? So, definitely want to give a plug for the August 24th event. And if you can make time and the other 100 AI events you’re doing, this is a good one to do. And thanks, Frank, we’ll do more of these. People are asking kind of where they can find details for the August 24th event. I think we do have an events page. Our team can actually based upon the people who attended we’ll send out a link to that or something. But I will say, just follow Arize on Linkedin and on Twitter, and we’ll be posting it there, and we also have a meeting up on August 10th as well, so it’ll be a little bit smaller. But again, we’ll have folks from both the Arize as well as from Zilliz on August 10th as well as August 24th.

One more person asked what’s the next community reading? I think we’ve been trying to do them every week, but I think we might be skipping next week. So I think it’s probably 2 weeks from today. But we’ve been trying to do them every week.

Thank you, Frank.

Frank Liu: Let’s definitely do another one soon. I’m looking forward to the next one. Jason,

Jason Lopatecki: Thank you.

Share

Suggested reading

Bar chart titled "Correctness" comparing four arms on a 0 to 1.000 scale. LobeHub 0.826, Vault 0.833, MCP 0.834, Baseline 0.845. All four bars sit in a tight cluster just above 0.825.

MCP vs. CLI Skills for agents: what our eval found (and which you should use)

Best AI Observability Tools for Autonomous Agents in 2026