Wednesday, November 23, 2016

Some material to point to when the uninformed say that GG is dead, which it isn't

Here are several pieces by our own estimable Jeff Lidz that fight the good fight against the forces of darkness and ignorance.  We need much more of this. We need to get stuff into popular venues defending the work that we have done.[1]

The most important is this piece in Scientific American rebutting the profoundly ignorant and pernicious piece by Ibbotson and Tomasello (I&T). (see here and here and here for longer discussion). Jeff does an excellent job of pointing out the issues and debunking the “arguments” that I&T advance. It is amazing, IMO, that T’s views on these issues still garner any attention. They no doubt arise from the fact that he has done good work on non-linguistic topics. However, his criticisms of GG are both of long-standing and very low quality and have been so for as long as they have been standing.  So, it is good to see the Sci Am has finally opened up its pages to those willing to call junk junk. Read it and pass it around widely in your intellectual community.

Here are two other pieces (here and here). The latter is a response to this. This all appears in PNAS. The articles are co-atuhored with Chung-hye Han and Julien Musolino. The discussion is an accessible entry into the big issues GG broaches for the scientifically literate non GGer. As such, excellent for publicity purposes.

So, read and disseminate widely. It is important to call out the idiocy out there. It is even fun.

[1] I have a piece in Current Affairs with Nathan Robinson that I will link to when it is available on the web.

Monday, November 21, 2016

Two things to read

Here are a pair of easy pieces to look at.

The first (here) is a review by Steven Mithen (SM) of a new book on human brain size.  The received wisdom has been that human brains are large compared to our body size. The SM review argues that this is false. The book by Suzana Herculano-Houzel, a neuroscientist from Brazil, makes two important points (and I quote):

(i) What is perhaps more astounding than that number itself, one that is actually less than the often assumed 100 billion neurons, is that 86 billion makes us an entirely typical primate for our size, with nothing special about our brain at all, so far as overall numbers are concerned. When one draws a correlation between body mass and brain mass for living primates and extinct species of Homo, it is not humans—whose brains are three times larger than those of chimpanzees, their closest primate relative—that are an outlier. Instead, it is the great apes—gorillas and the orangutan—with brains far smaller than would be expected in relation to their body mass. We are the new normal in evolution while the great apes are the evolutionary oddity that requires explanation.
(ii) But we remain special in another way. Our 86 billion neurons need so much energy that if we shared a way of life with other primates we couldn’t possibly survive: there would be insufficient hours in the day to feed our hungry brain. It needs 500 calories a day to function, which is 25 percent of what our entire body requires. That sounds like a lot, but a single cupful of glucose can fuel the brain for an entire day, with just over a teaspoon being required per hour. Nevertheless, the brains of almost all other vertebrates are responsible for a mere 10 percent of their overall metabolic needs. We evolved and learned a clever trick in our evolutionary past in order to find the time to feed our neuron-packed brains: we began to cook our food. By so doing, more energy could be extracted from the same quantity of plant stuffs or meat than from eating them raw. 
 What solved the energy problem? Cooking. So, human brain size to mass ratio is normal but the energy the brain uses is off the charts. Cooking then, becomes part of the great leap forward.

The review (and the book) sound interesting. For the minimalistically inclined the last paragraph is particularly useful. It seems that the idea that language emerged very recently is part of the common physical anthro world view. Here's the SM's prose:
If a new neuronal scaling rule gave us the primate advantage at 65 million years ago, and learning to cook provided the human advantage at 1.5 million years ago, what, one might ask, gave us the “Homo sapiens advantage” sometime around 70,000 years ago? That was when our ancestors dispersed from Africa, to ultimately replace all other humans and reach the farthest corners and most extreme environments of the earth. It wasn’t brain size, because the Neanderthals’ matched Homo sapiens. My guess is that it may have been another invention: perhaps symbolic art that could extend the power of those 86 billion neurons or maybe new forms of connectivity that provided the capacity for language.
 So 75kya something happened that gave humans a way of using their new big energy consuming brains another leg up. This adventitious change was momentous. What was it? Who knows. The aim of the Minimalist Program is to abstractly characterize what this could have been. It had to be small given the short time span. This line of reasoning seems to be less and less controversial. Of course what the right characterization of the change is at any level of abstraction is still unclear. But it's nice to know the problem is well posed.

Here's a "humorous" piece by Rolf Zwaan by way of Andrew Gelman. It's a sure fire recipe for getting things into the top journals. It focuses on results in "social priming" but I bet clever types can make the required adaptations for their particular areas of interest. My only amendment would be regarding the garnish in point 2. I believe that Greek Philosophers really are best.

Have a nice Thanksgiving (if you are in the USA). I will be off for at least a week until the turkey festivities end.

Sunday, November 20, 2016

Revisiting Gallistel's conjecture

I recently received two papers that explore Gallistel’s conjecture (see here for one discussion) concerning the locus of neuronal computation. The first (here) is a short paper that summarizes Randy’s arguments and suggests a novel view of synaptic plasticity. The second (here:[1] accept Randy’s primary criticism of neural nets and couples a neural net architecture with a pretty standard external memory system. Let me say a word about each.

The first paper is by Patrick Trettenbrein (PT) and it appears in Frontiers in Systems Neuroscience. It does three things.

First, it reviews the evidence against the idea that brains store information in their “connectivity profiles” (2). This is the classical assumption that inter-neural connection strengths are the locus of information storage. The neurophysiological mechanisms for this are long term potentiation (LTP) and long term depression (LTD). LTP/D are the technical terms for whatever strengthens or weakens interneuron connections/linkages. I’ve discussed Gallistel and Matzel’s (G&M) critique of the LTP/D mechanisms before (see here). PT reviews these again and emphasizes G&M’s point that there is an intimate connection between this Hebbian “fire together wire together” LTP/D based conception of memory and associationist psychology. As PT puts it: “Crucially, it is only against this background of association learning that LTP and LTD seem to provide a neurobiologically as well as psychologically plausible mechanism for learning and memory” (88). This is why if you reject associationsim and endorse “classical cognitive science” and its “information processing approach to the study of the mind/brain” you will be inclined to find contemporary connectionist conceptions of the brain wanting (3).

Second, there is recent evidence that connection strength cannot be the whole story. PT reviews the main evidence. It revolves around retaining memory traces despite very significant alterations in connectivity profiles. So, for example, “memories appear to persist in cell bodies and can be restored after synapses have been eliminated” (3), which would be odd if memories lived in the synaptic connections. Similarly it has recently been shown that “changes in synaptic strength are not directly related to storage of new information in memory” (3). Finally, and I like this one the best (PT describes it as “the most challenging to the idea that the synapse is the locus of memory in the brain”), PT quotes a 2015 paper by Bizzi and Ajemian which makes the following point:

If we believe that memories are made of patterns of synaptic connections sculpted by experience, and if we know, behaviorally, that motor memories last a lifetime, then how can we explain the fact that individual synaptic spines are constantly turning over and that aggregate synaptic strengths are constantly fluctuating?

Third, PT offers a reconceptualization of the role these neural connections. Here’s an extended quote (5):

…it occurs to me that we should seriously consider the possibility that the observable changes in synaptic weights and connectivity might not so much constitute the very basis of learning as they are the result of learning.

This is to say that once we accept the conjecture of Gallistel and collaborators that the study of learning can and should be separated from the study of memory to a certain extent, we can reinterpret synaptic plasticity as the brain's way of ensuring a connectivity and activity pattern that is efficient and appropriate to environmental and internal requirements within physical and developmental constraints. Consequently, synaptic plasticity might be understood as a means of regulating behavior (i.e., activity and connectivity patterns) only after learning has already occurred. In other words, synaptic weights and connections are altered after relevant information has already been extracted from the environment and stored in memory.

This leaves a place for connectivity, but not as the mechanism of memory but as what allows memories to be efficiently exploited.[2] Memories live within the cell but putting these to good use requires connections to other parts of the brain where other cells store other memories. That’s the basic idea. Or as PT puts it (6):

The role of synaptic plasticity thus changes from providing the fundamental memory mechanism to providing the brain’s way of ensuring that its wiring diagram enables it to operate efficiently…

As PT notes, the Gallistel conjecture and his tentative proposal are speculative as theories of the relevant cell internal mechanisms don’t currently exist. That said, neuroiphsyiological (and computational, see below) evidence against the classical Hebbian view are mounting and the serious problems for storing memories in usable form in connections strengths (the bases of Gallistel’s critique) are becoming more and more well recognized.

This brings us to the second Nature paper noted above. It endorses the Gallistel critique of neural nets and recognizes that neural net architectures are poor ways of encoding memories. It adds a conventional RAM to a neural net and this combination allows the machine to “represent and manipulate complex data structures.”

Artificial neural networks are remarkably adept at sensory processing, sequence learning and reinforcement learning, but are limited in their ability to represent variables and data structures and to store data over long timescales, owing to the lack of an external memory. Here we introduce a machine learning model called a differentiable neural computer (DNC), which consists of a neural network that can read from and write to an external memory matrix, analogous to the random-access memory in a conventional computer. Like a conventional computer, it can use its memory to represent and manipulate complex data structures, but, like a neural network, it can learn to do so from data.

Note that the system is still “associationist” in that learning is largely data driven (and as such will necessarily run into PoS problems when applied to any interesting cognitive domain like language) but it at least recognizes that neural nets are not good for storing information. This latter is Randy’s point. The paper is significant for it comes from Google’s Deep Mind Project and this means that Randy’s general observations are making intellectual inroads with important groups. Good.

However, this said, these models are not cognitively realistic for they still don’t make room for the domain specific knowledge that we know characterizes (and structures) different domains. The main problem remains the associationism that the Google model puts at the center of the system. As we know that associationism is wrong and that real brains characterize knowledge independently of the “input,” we can be sure that this hybrid model will need serious revision if intended as a good cog-neuro model.

Let me put this another way. Classical cog sci rests on the assumption that representations are central to understanding cognition. Fodor and Pylyshyn and Marcus long ago agued convincingly that connectionism did not successfully accommodate representations (and, recall, that connectionist agreed that their theories dumped representations) and that this was a serious problem for connectionist/neural net architectures. Gallistel further argued that neural nets were poor models of the brain (i.e. and not only of the mind) because they embody a wrong concpetion of memory; one that that makes it hard to read/write/retrieve complex information (data structures) in usable form. This, Gallistel noted, starkly contrasts with more classical architectures. The combined Fodor-Pylyshyn-Marcus-Gallistel critique then is that connectionist/neural net theories were a wrong turn because they effectively eschewed representations and that this is a problem both from the cognitive and the neuro perspective. The Google Nature paper effectively concedes this point, recognizes that representations (i.e. “complex data structures) are critical  and resolves the problem by adding a classical RAM to a connectionist front end.

However, there is a second feature of most connectionist approaches that is also wrong. Most such architectures are associationist. They embody the idea that brains are entirely structured by the properties of the inputs to the system. As PT puts it (2):

Associationism has come in different flavors since the days of Skinner, but they all share the fundamental aversion toward internally adding structure to contingencies in the world (Gallistel and Matzel 2013).

Yes! Connectionists are weirdly attracted to associationism as well as rejecting representations. This is probably not that surprising. Once on thinks of representations then it quickly becomes clear that many of their properties are not reducible to statistical properties of the inputs. Representations have formal properties above and beyond what one finds in the input, which, once you look, are found to be causally efficacious. However, strictly speaking associationsim and anti-representationalism are independent dimensions. What makes Behaviorists distinctive among Empiricists is their rejection of representations. What unifies all Empiricists is their endorsement of associationism. Seen form this perspective, Gallistel and Fodor and Pylyshyn and Marcus have been arguing that representations are critical. The Google paper agrees. This still leaves associationism however, and position the Googlers embrace.[3]

So is this a step forward? Yes. It would be a big step forward if the information processing/representational model of the mind/brain became the accepted view of things, especially in the brain sciences. We could then concentrate (yet again) all of our fire on pernicious Empiricism so many Cog-neuro types embrace.[4] But, little steps my friends, little steps. This is a victory of sorts. Better to be arguing against Locke and Hume than Skinner![5]

That’s it. Take a look.

[1] Thx to Chris Dyer for bringing the paper to my attention. I put in the URL up rather than link to the paper directly as the linking did not seem to work. Sorry.
[2] Redolent of a competence/performance distinction, isn’t it?  The physiological bases of memory should not be confused with the physical bases for the deployment of memory.
[3] I should add that it is not clear that the Googlers care much about the cog-neuro issues. Their concerns are largely technological, it seems to me. They live in a Big Data world, not one where PoS problems (are thought to) abound. IMO, even in a uuuuuuge data environment, PoS issues will arise, though finding them will take more cleverness. At any rate, my remarks apply to the Google model as if intended as a cog-neuro one.
[4] And remember, as Gallistel notes (and PT emphasizes) much of the connectionism one sees in the brain sciences rests on thinking that the physiology has a natural associationist interpretation psychologically. So, if we knock out one strut, the other may be easier to dislodge as well (I know that this is wishful thinking btw).
[5] As usual, my thinking on these issues was provoked by some comments by Bob Berwick. Thx.

Saturday, November 12, 2016

Linguistic diversity

Here’s an interesting piece by Nick Evans on the indigenous languages of Australia. It is imbued with a sensibility concerning the study of language quite different than my own (which is partly why I found it interesting) but it also raises some questions that someone who approaches linguistic questions from my direction should find intriguing. In what follows I will discuss both points of con- and di-vergence. But before starting, let me reiterate that I found the piece intriguing and I could imagine spending quite a bit of pleasant time over several cold beers talking to Nick about his work, which is a long-winded way of saying that you should take a look at the piece for yourself.[1]

Some comments:

(1) Nick worries about a question whose utility from where I sit is not at all evident: How to distinguish a language from a dialect (see 4). This is in service of trying to establish the integrity of the Australian language family, which is in turn in service of trying to estimate how fast languages change and how old language families are. The idea that Nick moots is that the Australian language family is 60,000 years old and that this raises the possibility that the emergence of the Faculty of Language is much older still. In other words, Nick takes the dating of the language family question as bearing on the emergence of the FL question. Clearly, the second one is of interest to devotees of the Minimalist Program.

However, I am not sure that I would take the question as nearly as well posed as Nick does. I do not see that there is a principled way of distinguishing languages from dialects. The one that he proposes is the following: “a language is something that is distinct enough to needs its own distinctive descriptive grammar” (5). But what does ‘distinctive enough’ mean? Darn if I know. For me a G is a mental construct. It is almost certain that no two Gs are the same (i.e. no two people have exactly the same Gs).  So the question is one of more or less. But so far as I know this becomes a question of G overlap and the degree of overlap will not be precise. But we need some measure of this to see how different two Gs are so as to get a measure of G difference and hence, change.  Maybe such measures exist, but I know of none, and unless one specifies some dimensions of similarity (which may exist (recall, I am no expert on these matters)) then the rate of change issue becomes hard to specify.

This said, if we could establish a rate of G change then this might be useful in establishing how old FL is, and given that the only evidence we have for when it emerged is indirect (the emergence of complex cultural artifacts (i.e. the big bang)) this would be useful. That said, I doubt that it would significantly alter the backdrop for Darwin’s Problem as it applies to language. The big fact is that FL appeared more or less in one piece and it has not evolved since.  There is no indication from what Nick writes that these older Gs are qualitatively different from contemporary ones. This means that the FL required to acquire them is effectively the same as the one that we still possess. And if that is the case then the logic of Darwin’s Problem as it applies to MP remains unchanged. So far as someone with my interests is concerned, that is enough.

Let me add a question before moving on: is there a measure of G change (or the more ambitious rate of G change?) out there?  Note that this would be a measure of how Gs of the same language change. This seems to require reifying languages so that two Gs can be Gs of the same language even if different in detail. So far as I know, modern GG has only an inchoate qualitative purchase on the notion of a language, and it has not been important to make it more precise. In fact, it is part of a dispensable idealization concerning ideal-speaker hearers. Nick’s project requires theoretically grounding the informal notion sufficient for most GG inquiries. I am skeptical, but wish him luck.

(2) Nick raises a second question: why are there so many languages anyhow (8ff)?  He asks this in order to focus efforts on identifying “the social processes that drive differentiation.” I also find this question interesting, but in a slightly different way.  From my perspective, Gs are products of three factors: (i) the structure of FL/UG, (ii) the nature of the PLD (the input data that the LAD uses to construct its G given the options FL/UG allows) and (iii) the learning theory that LADs use to organize the PLD and uses to construct a particular G given (i) and (ii).[2] The question I find interesting is why FL/UG makes so many Gs available. Why not simply hardwire in one G and be done with it? Why is FL/UG so open textured and environmentally sensitive (i.e. open to the effects of PLD)? Note, that FL/UG could have specified one G in the species (say all Gs have more or less the syntax of “English”). This is roughly what happens in some songbirds: all birds of a species sing the same song. Why isn’t this what happened for language? In P&P terms this would mean an FL/UG with no parameters. Why don’t we have this?  And does the fact that we don’t have this tell us anything interesting about FL/UG?
There are several possibilities. Mark Baker has offered a kind of evolutionary rationale. He thinks that Gs are codes that enable speakers of the same language to conceal information from outsiders (here:8):
Suppose that the language faculty has a concealing function as well as a revealing function. Our language faculty could have the purpose of communicating complex propositional information to collaborators while concealing it from rivals that might be listening in.
I say evolutionary, for I am assuming that it is because concealment can confer selective advantages that we have such a code. Though an ingenious idea, I am skeptical for the obvious reasons. This parameterized coding scheme is now species wide and anyone can acquire any of the coding schemes (aka Gs) if placed in the right linguistic environment. If the goal was opacity useful for segregating in groups form out groups then one can imagine schemes that would make it impossible (or at least very difficult) for outlanders to acquire the code would have been a superior option. But so far as we can tell, all humans are equally adept at learning any G (i.e. set of parameter values). Perhaps what Mark has in mind is that it is hard to learn a non native G later in life and this suffices for whatever advantages concealment promotes. Maybe.

I have remarked before, that parametrization is a very curious fact (if it is a fact) (here), one that suggests that, contrary to standard assumptions, typological difference tell us very little about the structure of FL. However, putting this to one side, it is interesting that Gs can be so different and Nick’s question of why there is so much variation is a good one.

What’s his answer? There are social processes that drive differentiation and we need to identify these. He suggests two steps (8-9):

The first step is to see how new linguistic elements are born: new sounds, new grammatical structures, new words, new meanings. What makes the range of these more or less diverse in different groups? For example, does being multilingual add options to the pool? ...

The second step is to find how the society promotes one variant over another. It is clear that some groups have linguistic ideologies that place a high premium on harnessing linguistic means to say “Our clan is different”, “our moiety is different” and so on…

This might be right so far as it goes, but it presupposes that FL/UG allows all of these options to begin with. In other words, given that FL allows diverse Gs what drives the specific diversity we see. Baker (and me) are interested in another question: why does FL allow the diversity to begin with. What’s wrong with an FL that, as it were, had no parameters at all?

Here’s my thought: an FL with fixed parameters is more biologically expensive than an open textured one. The idea is that if evolution can rely on there always being enough PLD to allow a child to acquire the local G then there is no reaons for evolution to code information in the genome that the PLD makes readily available. If fixing info in the genome is costly then it will not be put there unless it must be. So, an open textured system is what we should expect. That’s the idea.

I think that this fits pretty well with MP thinking as well. If what allows FL to emerge is a small addition, say an operation like Merge, (an addition that remains very stable and unchanging over time) then given that Merge is consistent with various surface differences then so long as the non linguistic proprietary parts of FL suffice with Merge to generate Gs then we should not expect more linguistic proprietary info to be biologically coded. If Merge is enough, then it’s all that we will get. Note, that this suggests that MP like systems will not likely have an FL/UG specification of a particular parameter space (see here and here for some discussion). If this can be fleshed out, then the reason we have G diversity is that fixed parameters are costly and MP takes FL to be what we get we add only a smidgen of linguistically proprietary structure to an otherwise language ready cognitive system. In other words, typologically diversity (PLD sensitive G generation) is just what MP ordered.

(3) Nick provides sort of an antidote for my tolerance for inferring UG principles from the properties of a single G. As he puts it (12-3):

We are just coming out of half a century where generative linguistics, as inspired by the great linguist Noam Chomsky, placed great emphasis on ‘Universal Grammar’, very much seeing all languages as alike with only minor variations. Part of this emphasis meant claiming there are all sorts of imaginable design options that are simply not found in language. For example, Steven Pinker and Paul Bloom wrote, in the early 90s, that ‘‘no language uses noun affixes to express tense’’. Now clearly this is simply wrong for Kayardild. It is an example of what can go wrong, scientifically, when one extrapolates prematurely from too limited a range of cases. Now there’s nothing wrong with the scientific strategy of making strong statements to invite falsification. But what Kayardild shows us – and many other languages I could have used to illustrate the structural originality of Australian languages, in different ways – is that we really need to get out there and describe languages, as they are, to realize the full richness and diversity of how humans have colonized the design space of language through the languages they have built through use.

I say “sort of” because Nick’s observations are not couched in terms of Gs but in terms of languages and the problems he cites have less to do with the properties of Gs than with their surface manifestations. Chomsky did not (and does not) see “all languages alike.” What he saw/sees was/is that all I-languages are pretty much alike. Missing the ‘I’ prefix threatens confusing Chomsky for Greenberg. I can understand that if one’s interest are mainly typological and that diversity is what gets you excited then dropping the ‘I’ will seem like the best way to import Chomsky’s insights into your work. But this is a mistake (as you knew I would say). It is not the diversity of languages that we need to investigate if your goal is GGish, but the diversity of Gs and these will only be indirectly related to surface patterns we observe. The Pinker-Bloom example is very much a Greenberg conception of universal at least as Nick takes it to be refuted by Kyardild (it appears to deal with features of overt affixes). If we are to learn about FL/UG by exploring the rich “design space of language” then we need to keep in mind that it is I-language space we should be exploring. Moreover, when it comes to I-language space I am less sure than Nick is that

[t]he world of languages holds more possibilities than any linguist has imagined, and Australian languages have taken the ‘design space’ in lots of rare and unusual directions, so that we’re still finding new phenomena that people hadn’t imagined before (14).

In fact, from where I sit, we have actually found relatively few new universals since the mid 1980s. If this is correct, oddly, exploring the ‘design space’ has enriched our understanding of language diversity but has left our understanding of I­-language variation pretty much where it was when only a small number of languages served as linguistic model organisms.[3]

That’s it. I think that Nick has asked some interesting questions, the most interesting being why FL/UG allows G variation. We are interested in different things, but the paper was fun to read and Kayardild sounds like it can take you on a wild ride. Like I said, I’d love to have a beer with him.

[1] Thx to Kleanthes for sending me the URL.
[2] This follows Anderson discussed a bit here.
[3] See here for a partial list. The observant reader will note that most of these are very old. It would be nice to have some candidate universals that are of more recent vintage, say discovered in the last 20 years. If my hunch is right that recent contributions to the list have been sparse of late, this is interesting and worth trying to understand.