00:12
Hey, Len.
Hey, Itami.
How goes?
How goes? Sometimes that's a loaded question.
But I mean, yeah, things are going.
I think that just became my, you know, neutral response.
Things are going.
Wow, I, you know, there's so much that can be said
when somebody says the word how goes.
You're very right.
How are you is another one.
You know, most of the time we just say what up.
For anybody out there who pays attention to English memes,
there's the, you know, smells like up dog in here.
What's up, dog?
Oh, you know, nothing.
And then cries.
Oh my god, that was another thing you shared with me
as a potential topic, right?
I think at some point I shared a variety of these,
like wordplay type things.
Yes, yes.
Yeah, it's the, you know, it's the indirect way
of getting somebody to say the sounds
that are asking how you are, right?
Oh, well, but that's fine.
We'll do another probably fun episode
on the lots of confusing and conflicting use
of ups and downs and ins and outs.
Oh yeah, that was another good one.
Of English.
Yeah, yeah, yeah.
Yeah, we'll cover that some other time.
But I recognize that for you, things are simply going,
which is a nice way to say, well, it's fine, probably.
Yeah, I mean, like nothing exceptionally notable.
It's going, just going.
Yeah, it's not, it's not a bad thing, right?
So not a bad thing.
Not a bad thing.
I think you had something in your mind
you wanted to share.
Yeah, I do.
ChatGPT 5の発表
So I guess my first question,
which I sort of tentatively asked you earlier,
is had you heard anything about the sort of new
OpenAI's release of ChatGPT 5?
Yes, this morning when I opened ChatGPT
to talk to ChatGPT as usually I do,
it gave me a little
and said, I'm brand new.
ChatGPT 5 is available.
Come talk to me.
And that's all I noticed.
And honestly, I have not read the press release
or anything like that.
Yep, that's probably pretty normal for most people, right?
It's most people aren't trying to follow
what's happening around it,
other than maybe as they're playing with the tool,
notification pops up, something has changed, right?
So this is pretty recent.
So I won't necessarily ask, I guess,
if you felt like it's changed or feels different in any way.
I think I know the ways you use it,
but feel free to share at any point.
But I'll get to kind of the OpenAI's,
let's say, presentation around GPT 5.
I think that's going to be the most interesting.
And I will note that there is also another interesting thing
about how the GPT 5 was simply updated
and may or may not have upset a number of people.
All right.
Yeah.
So maybe two things.
But for here, presentation.
Are anybody interested?
So I first became aware of this via The Verge.
So the journal, The Verge.
And it was via their Instagram page that this popped up.
And then I went to their article.
The main point here is that OpenAI decided to,
I don't know, run some sort of release presentation,
which is normal.
That's all fine.
They want to tell you things like, quote,
the best model in the world, end quote, at coding and writing.
This is from the subtitle.
Oh, not vague at all.
No.
However, for it being the best chart or best thing in the world
with coding and writing or whatever,
their entire presentation was full of nonsensical visual graphs.
Entirely.
I have to wonder if somebody in OpenAI either run on ChatGPT
to make a new presentation for them,
or maybe simply this is just how sloppy humans are.
AIの情報の正確性
Yep.
So look, there might be suspicion in why they came out this way.
But I'll tell you what.
These types of errors might be human possible.
But if you're checking the numbers as you go,
you won't fuck it up this bad.
Okay.
So I think some of their responses,
like in the Verge post, it mentions that Altman commented,
quote, the numbers here were accurate,
but we screwed up the bar charts in the live stream overnight.
On another slide, we screwed up numbers.
Oh, that sounds like my dog ate the homework type excuses.
Yes.
It's 100% the dog ate my homework.
It's 100% avoiding the fact that they definitely generated
these charts with their AI model.
And somebody was too sloppy to check it,
or maybe it's not as great as they claim to be.
Yeah.
Look, maybe they didn't just straight up use their AI model to generate them.
But I would say that there's so many ways to create charts, right?
Like that you cannot mess it up like this
unless you have just let something put the numbers together
and expect the visual to be like themed appropriately.
Right.
You know, for instance, one of the big ones here is like,
it's hard to get some pictures from it
without like going to the video release and stuff.
But there is an image which they have focused on in the Verge article about deception.
So this is a not necessarily measurable feature,
but it is certainly one that we attempt to measure
in the way that a model's output, not the model, right?
The model is not a personified thing, right?
The model's output uses deceptive language, right?
Or perhaps attempts to evade, you know, the fact that there's uncertainty or...
Right, yeah, yeah, yeah.
You sense how the definition is a little hard to latch down as well.
It is, it is.
But I think I know almost exactly what you're talking about
because this is kind of precisely what I experienced using these LLMs,
where sometimes you notice that there are some very blatantly obvious mistakes
that the model output generates.
And you, let's say, give them a feedback saying, hey, that's wrong.
And then they try to sort of compensate for it by throwing some more nonsensical things.
And that's when you realize, hey, it'll be faster if I just put this in my Excel.
Yeah, yeah, exactly.
And this observation, right, that sometimes it would be faster doing it myself
and the awareness of being able to do that is, like, really crucial.
But it requires that the person has developed an awareness that there is a sort of,
not like good or bad thing, but that there is a purpose and reason
and a type of effectiveness towards anything that we make, right?
Yeah, I think this is the language that people like using is like AI literacy or like LLM literacy,
where it's kind of on us as well to get the most out of these new technologies.
And unfortunately, unlike learning how to read, you know, do some like reading
comprehensions, like we've been trained since we're kids, you know,
all of us are encountering this new model for the first time.
So not every one of us have this kind of like critical thinking ways of engaging with LLMs.
And they certainly do a very good job trying, like sounding really convincing,
you know, that's kind of their whole output goal to sound very sound, right?
Yeah, their goal is to not, I would go as far as to reject that their goal is to sound sound,
but their goal is to sound expectedly natural.
Right.
Okay.
Okay.
So, okay.
I can see.
LLMの最適化と出力の性質
Yeah, I hope the subtlety is like, the subtle differences is clear to the listeners.
Yeah, to the listener, you that's, that's us trying to tear apart
nuance in words and way of describing, right?
The way in which Eileen is to take away the reasoning that people attempt to, you know,
sort of analogize here.
But yes, like, I think, well, correct me if I'm wrong, but you're trying to remind us that
what LLM is optimized for is to simply to hit the center of the bell curve every time.
And in their next predictions, and, you know, try to make that, like, based on the trainings
that they've had, but that tends to sound quite convincing to humans so far.
Yeah.
I mean, if you trained it on garbage, you will get stuff that doesn't sound convincing at all.
Right, right, right.
But, but so far, they sound a reasonable job sounding reasonable.
And, and, and, but that's just it's not like they understand the logic behind it.
It's, it's just them being optimized to find mathematically, sort of most expected
and most natural sounding outcome.
Right.
Yeah, yes.
Holding that is probably the key part.
And what a lot of these, the differences between tools are doing, you know, there's
some background adjustments to like, you know, rewards and to the way that the
I'm going to try to be a little careful with terminology.
Apologies if I mess this up for being in my tired state.
Um, within the actual transformer itself within the attention blocks,
you know, the feature of how they do it is the same.
You can make tweaks to those.
But then, you know, the easiest way is to tweak them sort of at the end,
which is when you hear the word temperature for like value models that you use.
It's just saying we've made it more random at the end.
Um, and so, you know, something to keep in mind.
Uh, that's, that's where some of that nuance or not nuance, but that like deviation,
right, that character, that flavor comes from is in sort of embodying, adding,
influencing to have a little bit more of a variety versus there being a
like new reasoning pathway that the model takes.
Like, right, right, right, right, right.
And, you know, I think there was a,
I will toss this out for anybody interested to go look for it.
I don't have the citation or the title off the top of my head.
I think there was a study that might've happened recently about
using training data that like was basically incorrect math.
Um, and this somehow influencing the output in a way that was,
I don't think this was correlated, but maybe the paper says more that it changed
the sort of output to be a lot more like it wasn't maybe evasive or like negative
or deceptive or something.
Right.
So there it's this messy interwoven connectiveness of math where you've just
sort of mashed it all together and said, associate these things based on their
Not just the frequency to each other, but the frequency within context and like,
you know, all this stuff.
And at the end, you can also throw a wrench into the plans and make it more random.
Um, so yeah, yeah, it's math, the numbers, frequency, and a lack of thought.
Um, yeah, which is interesting.
And I apologize in advance if I'm like sort of straying you away.
No, no, you're good.
That reminds me that, you know, on those pop-up notification that popped up,
um, when they told me that they now have a new brain, basically is, um,
that I forgot the exact wording, but it's like, it's chat GPT,
but with thinking or like something like the word thinking was in it.
AIの思考と推論の誤解
Yes.
Like it, that was a little red flag for me.
I'm like, Hmm, like, what do they mean by that?
Um, yeah.
To, to be fair more to myself than to open AI.
Um, yeah, I certainly need to, or would benefit from trying to read more of like the stuff
that they try to argue is reasoning, um, whatever they've done to like tune, you know,
kind of the math behind the scenes, so to speak, to put like checking.
Yeah, this could be, this could be a good way to, to, to, to, to, to, to, to, to, to,
this could be like an ongoing conversation, you know, that we do time to time.
Yeah. That, which is, that's, that's also fine because I, you know,
I do this as part of the work that I do, so I'm, I'm happy to do, gives me another reason.
Um, but this right, um, this thinking terminology, this reasoning terminology,
I would say at least two concerns.
One by using the terminology, you equate it to thinking and thus bypass the fact that a human's
thinking is different. Um, and the second one, uh, is that the thinking and reasoning terminology
is not how the model functions. And so unless you can like, unless you yourself, yeah,
unless you yourself can reason to me why this new version of the model is actually reasoning,
I don't believe that you should be using that. Right. Even the ones that are like
perplexity AI does this as well. It's like, it's got its reasoning model and like they do all this
stuff, but I think the word is useless. I would say that the important bit is that they're
essentially dragging in context and trying to tune their model to take your words,
expand the context through internet related search, dump that all through the model and
then give you textual output. Um, but this is, I think there's sort of a danger in using
a colloquially used terms like thinking and reasoning, because that is kind of misleading.
It makes it sound like this model can think and reason, which it doesn't not at least not in the
same way humans can. And that's neither good or bad so far, but it's just different, but we don't
call it differently. We don't separate it. So it's easy for people who are glancing at the headlines
to like kind of confuse the two. Yeah, exactly. And this, you describing it as a danger, right?
This, this risk of, of having it advertised that way. It starts with the same idea of advertising
these as artificial intelligence. Um, right. Right. It doesn't, it doesn't get better when they
continue to do this. Right. Um, I think this sort of, there's a few things that I'll, I'll probably
look into and maybe we can, maybe we can continue on some of these conversations. Yeah. Yeah. Yeah.
Um, this will give me a reason to, to dig a little bit, but to sort of return us to an example of
what was happening with these graphs. And then I'll make sort of a, maybe a closing message on
why this is also such a problem, right? Not just the thinking idea, uh, which was lacking in their
presentation. Um, but the, the issues with this. So when you get a graph, we mentioned a graph.
That's it for the show today. Thanks for listening and find us on X at
science. That is E I G O T E S C I E N C. See you next time.