I took some time to reflect on this post because I don’t prefer to be the moron, hopefully that didn’t slow you down because your response is a good response.
To start I think that hyping up the thinking people do as a counter argument is exactly the same thing I was doing by describing recent advancements in AI and I’m not sure it moves us foward, so I will count that as a wash.
And you basically hit it for me: my biggest gripe is that we are not doing one word at a time predictors any more. So functionally yes it’s true, but practically we are doing next word prediction for LLMs in the same way you’re doing next word prediction when you talk: you know where you’re going and we build sentences one word at a time.
I also don’t think it’s fair to say that “oh fundamentally LLM stuff is just electromagnetism” because while that is literally true, I think it’s still a better analogy to say statistics vs chemistry. The comments are going to make you feel a certain way, you’ll make a response based on that feeling and what you know, and we will continue. I think that’s pretty functionally the same as “here’s an open prompt, we need to answer it, let’s take a statistical approach to the specific wording we will use”.
Now that we are including end goals, reflection, and rule sets, I don’t think it’s fair to say it’s just one word at a time prediction anymore because training and optimization are happening at the prompt and response scale, not the word scale anymore.
I have not read anything recently that is purporting to do something useful by editing NN weights directly, but that doesn’t mean it isn’t happening. I think that is actually just a side discussion in the end.
In the end, if there is evidence of planning and then adding words to meet this plan, even if it goes one word at a time, I think we are escaping “statistical word generator”. And if you say that we didn’t escape that threshold, I would suggest that when we talk we are doing the same thing: we understand grammar at a pretty fundamental level, but when it comes to vocab there are only a handful of words that make sense and we are making that decision in a way that is not altogether different from LLM sentence generation. I think the only sane way to disprove that is if you want to go looking for substantially offbeat phrasing or expression that would be outside the bounds of “statistical regularity” that LLMs are using.
Unless you want to show a history of totally whackadoodle phrasing for ideas in lemmy, LLMs are no more staticstical word predictors than you are.
Okay, yes, we aren’t doing single words at a time (technically some models do chunk long words anyway so even from near the beginning we weren’t doing “singular” words at any time)
However, you are both right and wrong in your assertion that LLMs our equivalent human responses.
The translation of thoughts to words and words to motor functions both can be approximated by ANNs. And yes, the work done to select words we use is a probabilistic process like you describe. We hear patterns in language and that makes us more likely to use that phrasing. The more you hear a phrase the more likely you are to use it over another and when two or more phrases would communicate what you want to say your brain basically just picks one.
So, If speech production (or sentence construction) was what you meant by saying “it’s the same for human responses,” then yes, we agree. Both are probabilistic word generators and likely work in similar ways. (In fact I think place cells were found in Wernicks area (?) or one of the other speech corteces which means some of our word selection is likely similar to the results from transformer architectures)
However, if you meant the entirety of human response—as in from hearing/reading a comment, thinking about it, responding—is the same as current LLMs generating text. I strongly disagree.
The actual process of “thinking” is not something an ANN (especially a non-recurrent one) can do. The ability to ruminate on thoughts and make changes/learn-new-things simply by trying to formulate ideas before even deciding to comment cannot be accomplished with a pre-trained static net, not even one with memory or the illusion of memory like current LLMs. (Not to mention that identity also plays a large role in our responses and it too cannot arise from current deterministic architectures)
As for me asserting human response is chemistry is more like asserting AI is electromagnetism, there are many reasons why, but the simplest illustration would be this:
I think it is entirely possible to build an inorganic but still functional human brain on electrical hardware. (In other words, full blown transhumanism or at the very least, “AGI”) If human response is chemistry in organics, it would be electromagnetism in silicon.
So, If speech production (or sentence construction) was what you meant by saying “it’s the same for human responses,” then yes, we agree. Both are probabilistic word generators and likely work in similar ways.
This is what I meant.
However, if you meant the entirety of human response—as in from hearing/reading a comment, thinking about it, responding—is the same as current LLMs generating text. I strongly disagree.
That would be crazy, I’m glad you would disagree with that.
As for me asserting human response is chemistry is more like asserting AI is electromagnetism, there are many reasons why, but the simplest illustration would be this:
I think it is entirely possible to build an inorganic but still functional human brain on electrical hardware. (In other words, full blown transhumanism or at the very least, “AGI”) If human response is chemistry in organics, it would be electromagnetism in silicon.
This is moving into a funny gray area, but what you are talking about is, I think, only possible if you take a route like the one covered in Jeff Hawkins’ “A thousand brains”. It’s not the most fun read if you’re not into neuroscience, but the second half is pretty relevant regardless.
I haven’t read the book but I am familiar with the thousand brains hypothesis. The real problem as far as I can tell seems to be the variations in morphology and connectivity of different neurons.
The brain might make every column the same to begin with but if that’s the case, the diversity of the initial columns is immense. So many different genes even for just pyramidal neurons. Not to mention the inter neurons and other glia.
Plus the function of many cells are still unknown like chandelier cells. They’re everywhere, they regulate firing, but we’re not sure how. They can be inhibitory or excitatory and sometimes they can fire in response to both inhibitory or excitatory input, Etc.
And don’t even get me started on how no one actually seems to agree on the function of the layers of the neocortex. Every paper I read on the topic poses almost entirely different hypotheses for the function of each layer and the few connection maps you can find will show many connections that violate the idea each layer takes specific inputs.
Sure spiking networks are much more biologically plausible (and fun to work with so I recommend you try one out if you’re interested in this field) but the connections and learning rules are what seems to matter more.
I took some time to reflect on this post because I don’t prefer to be the moron, hopefully that didn’t slow you down because your response is a good response.
To start I think that hyping up the thinking people do as a counter argument is exactly the same thing I was doing by describing recent advancements in AI and I’m not sure it moves us foward, so I will count that as a wash.
The next thing I will add here is an article I read… idk awhile ago but it shows the explainability of AI decision making in a more advanced way than like Shapley parameters for example: https://transformer-circuits.pub/2025/attribution-graphs/methods.html
And you basically hit it for me: my biggest gripe is that we are not doing one word at a time predictors any more. So functionally yes it’s true, but practically we are doing next word prediction for LLMs in the same way you’re doing next word prediction when you talk: you know where you’re going and we build sentences one word at a time.
I also don’t think it’s fair to say that “oh fundamentally LLM stuff is just electromagnetism” because while that is literally true, I think it’s still a better analogy to say statistics vs chemistry. The comments are going to make you feel a certain way, you’ll make a response based on that feeling and what you know, and we will continue. I think that’s pretty functionally the same as “here’s an open prompt, we need to answer it, let’s take a statistical approach to the specific wording we will use”.
Now that we are including end goals, reflection, and rule sets, I don’t think it’s fair to say it’s just one word at a time prediction anymore because training and optimization are happening at the prompt and response scale, not the word scale anymore.
I have not read anything recently that is purporting to do something useful by editing NN weights directly, but that doesn’t mean it isn’t happening. I think that is actually just a side discussion in the end.
In the end, if there is evidence of planning and then adding words to meet this plan, even if it goes one word at a time, I think we are escaping “statistical word generator”. And if you say that we didn’t escape that threshold, I would suggest that when we talk we are doing the same thing: we understand grammar at a pretty fundamental level, but when it comes to vocab there are only a handful of words that make sense and we are making that decision in a way that is not altogether different from LLM sentence generation. I think the only sane way to disprove that is if you want to go looking for substantially offbeat phrasing or expression that would be outside the bounds of “statistical regularity” that LLMs are using.
Unless you want to show a history of totally whackadoodle phrasing for ideas in lemmy, LLMs are no more staticstical word predictors than you are.
Okay, yes, we aren’t doing single words at a time (technically some models do chunk long words anyway so even from near the beginning we weren’t doing “singular” words at any time)
However, you are both right and wrong in your assertion that LLMs our equivalent human responses.
The translation of thoughts to words and words to motor functions both can be approximated by ANNs. And yes, the work done to select words we use is a probabilistic process like you describe. We hear patterns in language and that makes us more likely to use that phrasing. The more you hear a phrase the more likely you are to use it over another and when two or more phrases would communicate what you want to say your brain basically just picks one.
So, If speech production (or sentence construction) was what you meant by saying “it’s the same for human responses,” then yes, we agree. Both are probabilistic word generators and likely work in similar ways. (In fact I think place cells were found in Wernicks area (?) or one of the other speech corteces which means some of our word selection is likely similar to the results from transformer architectures)
However, if you meant the entirety of human response—as in from hearing/reading a comment, thinking about it, responding—is the same as current LLMs generating text. I strongly disagree.
The actual process of “thinking” is not something an ANN (especially a non-recurrent one) can do. The ability to ruminate on thoughts and make changes/learn-new-things simply by trying to formulate ideas before even deciding to comment cannot be accomplished with a pre-trained static net, not even one with memory or the illusion of memory like current LLMs. (Not to mention that identity also plays a large role in our responses and it too cannot arise from current deterministic architectures)
As for me asserting human response is chemistry is more like asserting AI is electromagnetism, there are many reasons why, but the simplest illustration would be this:
This is what I meant.
That would be crazy, I’m glad you would disagree with that.
This is moving into a funny gray area, but what you are talking about is, I think, only possible if you take a route like the one covered in Jeff Hawkins’ “A thousand brains”. It’s not the most fun read if you’re not into neuroscience, but the second half is pretty relevant regardless.
I haven’t read the book but I am familiar with the thousand brains hypothesis. The real problem as far as I can tell seems to be the variations in morphology and connectivity of different neurons.
The brain might make every column the same to begin with but if that’s the case, the diversity of the initial columns is immense. So many different genes even for just pyramidal neurons. Not to mention the inter neurons and other glia.
Plus the function of many cells are still unknown like chandelier cells. They’re everywhere, they regulate firing, but we’re not sure how. They can be inhibitory or excitatory and sometimes they can fire in response to both inhibitory or excitatory input, Etc.
And don’t even get me started on how no one actually seems to agree on the function of the layers of the neocortex. Every paper I read on the topic poses almost entirely different hypotheses for the function of each layer and the few connection maps you can find will show many connections that violate the idea each layer takes specific inputs.
Sure spiking networks are much more biologically plausible (and fun to work with so I recommend you try one out if you’re interested in this field) but the connections and learning rules are what seems to matter more.
Deleted: accidentally posted same comment twice.