Sunday, October 25, 2015

Brains, grammars and hype (part 1)

This recent neuro-ling paper by Frankland and Greene (F&G) in PNAS has generated a lot of critical comment by linguists, and rightly so.  The paper demonstrates several unfortunate flaws in the thinking (and moral character (I return to this)) of cog-neuro (CN) types. The most startling, intellectually speaking, is CNers deep seated dualism. It appears (if judged by their writing rather than their occasional denunciation of ghosts and souls) that CNers do not believe in the identity thesis (i.e. minds and brains are the same thing). In fact, they seem to believe that behavioral evidence, no matter how subtle or empirically well grounded or replicable or statistically significant or robust or of large effect size or…is inherently incapable of doing much to advance our understanding about how brains are organized. The only real evidence, on this view, comes from fMRI/MEG/ etc. brain studies. Thus, mentalistic investigations of brains (behavioral, cognitive, psychological) cannot possibly inform us about how brains are structured. Talk about dualism! Not even Descartes would have been caught dead saying such things. But for many CNers it’s either show them the meat (literally) or take a hike.

This is quite evident in the F&G paper, which is why it has generated such push back from linguists. Angelika Kratzer’s reaction (here) is right on the mark. Let me quote her:

One quote (attributed to Steven Frankland) in the Harvard Gazette article may point to the source of the communication problem: “This [the systematic representation of agents and themes/patients, A.K.] has been a central theoretical discussion in cognitive science for a long time, and although it has seemed like a pretty good bet that the brain works this way, there’s been little direct empirical evidence for it.” This quote makes it appear as if the idea that the human mind systematically represents agents and themes/patients has had the mere status of a bet before the distinction could be actually localized in the brain. That the distinction is systematically represented in all languages of the world is not given the status of a fact in this quote - it doesn't count as  "empirical evidence". It's like denying your pulse the status of a fact before we can localize the mechanisms that regulate it in the brain.

Yup.[1] As Angelika notes, this quote presupposes that behavioral work, in this case linguistic work, no matter how extensive, does not even rise to the level of “evidence” about brain structure. It seems that minds are one thing and brains another. Dualism anyone?

At any rate, take a look at Angelika’s piece. It makes this point well. Consequently, I thought I would try to do something uncharacteristic in what follows. Instead of zeroing in on the inanities (which, to repeat, are many, and which I will be unable to refrain from mentioning from time to time), I would like to zero in on the substantive contribution F&G makes to our understanding of the brain bases of language competence. But read this for yourself. I am no expert in these matters. However, I am channeling the wisdom of others in what follows. The UMD ling dept congregated recently to discuss the paper and I left convinced that despite the over blown rhetoric (I will return to this) and false advertising (I return to this too) the paper does make a modest contribution. And in the spirit of babies and bathwaters I will try to outline what this might be.  It goes without saying, but I will say it nonetheless, that I am no expert in these matters and I am relying on the knowledge of others here, and I might have screwed things up in translation (and if so, I hope others will chime in), but with all these caveats, here’s why I think that the paper is not just wrongheaded (though it is that too), but makes a possible contribution to our understanding of a very difficult topic. [2]

F&G is a fishing expedition of a kind that we have seen before. For example, Pallier, Devauchelle and Dehaene (discussed here) do something similar in hunting for where the Merge operation lives in the brain. F&G are hunting for brain correlates of “thematic” (yes these are scare quotes, I return to this) information; specifically, “whether and how” (p.11732) the brain codes the “who did what to whom” information that sentence’s express.

The “whether” part of the question, linguists rightly believe, has been already well-established. The most generous reading of F&G is that it agrees but notes that this still leaves open three questions: (1) can we find more neuro based indices of this well established cognitive fact (i.e. fMRI or MEG/EEG or lesion data), (2) can we localize these “thematic” brain effects and (3) what might such localization tell us about how brains code this information. IMO, the most interesting features of F&G regards the first two questions, for which it provides tentative answers. What are these?

F&G conducts two kinds of experiments. The first “identifies a broad region,” the left medial Superior Temporal Cortex (aka: lmSTC) that is able to reliably distinguish sentences that express the same theta information. For example, it can distinguish the sentence pairs John kicked Mary/Mary was kicked by John from Mary kicked John/John was kicked by Mary. By “averaging” over the active and passive pairs, the experiment zeros in on the doers and done-tos and abstracts away from surface syntax.  At the least the experiment shows that this region is not sensitive to just the words involved as these are held constant in the contrasting pairs. What matters is the “thematic” structure.

How well does lmSTC do in distinguishing these contrasts? It succeeds about 57% of the time. By ling standards this is really not enough. After all, humans succeed about 100% of the time. Thus, we need to explain why a region that fails 43% of the time to correctly distinguish what speakers never fail to distinguish (and this is the kind of thing that speakers are virtually perfect at) nonetheless is the brain basis of this overt behavioral capacity.

And there is a ready possible account: the resolution of the fMRI probe is not good enough to eliminate interfering noise and this noise is what reduces discrimination to a mere 57%. However, as the area discriminates above chance then it is a reasonable guess that it is not only sensitive to thematic distinctions, but is where these distinctions get coded and we would see this yet more clearly were we able to get an even finer probe.[3]

So, lmSTC tracks thematic information.  Before going on, I should add that F&G identifies another area that tracks this information at with roughly the same accuracy (the right posterior insula/extreme capsule region (call this region R-2)). However, the paper treats the response of this region as (at best) secondary and most likely not relevant. Why? Because whereas the lmSTC predicts further downstream brain responses, the second area does not.  The neuro crowd at our little UMD discussion got all hot about this wrinkle, so let me tell you a bit about it.

F&G shows that there is a correlation between responses to sentences in lsSTC and the amygdala, where affective responses are apparently evoked. The paper shows that the amygdala gets all hyped up to sentence pairs like The grandfather kicked the baby/the baby was kicked by the grandfather but not to The baby kicked the grandfather/the grandfather was kicked by the baby. Why the differential response? Because the amygdala doesn’t like it when babies are badly treated by wicked granddads but thinks that babies kicking old folks is not really very bad (after all how much harm could a baby kick do?). F&G interprets this, reasonably enough, as showing that the info extracted in lmSTC is used by the amygdala in responding. R-2 shows no such correlation. So whatever is going on there does not correlate with downstream amygdala responses. F&G concludes that R-2 “failed to meet additional minimal functional criteria for encoding sentence meaning” (11733). From what I can tell, the only criteria it failed to code is this downstream impact. Make of this what you will. From where I sit, it does not imply that the same thematic distinctions are not coded in R-2 as in lmSTC, only that the active use of this information pipelines directly from the latter to the amygdala but not the former. But, the CNers really liked this, so I offer it to you for your appreciation.

The second experiment zeros in on the fine structure of lmSTC. In particular, it aims to see whether and where in this already identified region doers and done-tos are coded. F&G does this, again, by seeing how the region’s discrimination powers generalize. How do the subparts of the region react to doers and done-tos as such. Here’s how F&G describes the procedure for isolating doers and done-tos as such in lmSTC :

For our principal searchlight analyses, four-way classifiers were trained to identify the agent or patient using data generated by four out of five verbs. The classifiers were then tested on data from sentences containing the withheld verb. For example, the classifiers were tested using patterns generated by “th dog chased the man,” having never previously encountered patterns generated by sentences involving “chased,” but having been trained to identify “dog” as the agent and “man” as the patient” in other verb contexts. …Thus, this analysis targets regions that instantiate consistent patterns of activity for (for example) “dog as agent” across verb contexts, discriminable from “man as agent” …A region that carries this information therefore encodes “who did it?” across nouns and verb contexts tested. (11734).

It turns out that two proximate yet distinct parts of lmSTC seem to discriminate doers from done-tos. Actually, the results for done-tos is cleaner. The borders of the doer region is muddier (moreover, active and passives of the same roles don’t function quite in parallel).[4] At any rate, F&G conclude that the lmSTC spatially bifurcates the two roles, and, at least to me (and more importantly the neuro-psycho people in the UMD discussion group), this conclusion seems reasonable given the data.

That’s what F&G shows: if correct, it identifies a role sensitive region and areas within that region differentially sensitive to the doer and done-to roles. In other words, if correct, F&G offers a hypothesis about where roles get coded.  But F&G claims to do a whole lot more. Does it? We return to this in the next post.

[1] Lest you think that Frankland indulged in hyperbole for the delectation of the newshounds alone, the same sentiment permeates the PNAS piece.
[2] Thanks particularly to Ellen Lau for organizing this and to Allyson Ettinger for a vigorous defense of the paper. I have shamelessly stolen all that I could from the excellent discussion.
[3] Note that this “guess” is quite a bit more precarious than the “bet” that who did what to whom info is represented in brains. There is no doubt that brains code for such information, and this is much more solid than the proposal that it gets coded in lmSTC. This just reiterates Angelika’s apt remarks above.
[4] F&G discusses why not, but the discussion is pretty inconclusive. If lmSTC exclusively tracks “thematic” information then this results is clearly unexpected.d

Friday, October 23, 2015

Chomsky's dumb evolutionary conjecture

So, once again Chomsky’s naivety (nay, ignorance) has been revealed for all the world to see. Just imagine thinking that one could isolate a single factor as key to language facility, restrict it to but a single species and proclaim that it just popped into existence acting as a gateway innovation resulting in complex patterns of cognition and behavior all without the shaping effects of natural selection. Imagine it! The stupidity of endorsing the discredited  hopeful monster” point of view of language! How naïve! How uninformed! How irresponsible.

But wait. It seems that Chomsky is not the only naïf endorsing such views. It seems that he now has a fellow traveller (no doubt another one of his duped accolytes), a certain guy called Richard Dawkins. Some of you might have heard about him. He has apparently done some work on evolutionary theory (here). Almost certainly not in the same league as those evolutionary luminaries like Hurford, or Lieberman, or Pinker or Jackendoff, or Tomasello, but, I have been told, Dawkins is at least in the first tier of the second rank. Sort of like Francois Jacob, another biologist who has views not unlike Chomsky’s (see here). At any rate, Dawkins has recently come out and endorsed Chomsky’s evolutionary scenario, zeroing in on recursion as the key innovation behind the human leap into language (and subsequently culture) and arguing that this step had to be taken in one bound as there are no conceptually coherent scenario where smaller steps take you to unbounded recursion. Let me elaborate.

Recently, Bob Berwick told me he was reading the second installment of Dawkin’s autobiography (here). In it Dawkins discusses the evolution of language and Chomsky’s musings on the topic. I asked him for the page references so that I could share them with you. Here are some relevant quotes (with some comments).

As I mentioned on page 290, the main qualitative feature that separates human language from all other animal communication is syntax: hierarchical embedment of relative clauses, prepositional clauses etc. The software trick that makes this possible, at least in computer languages and presumably in human language too, is the recursive subroutine.

It looks as though the human brain must possess something equivalent to recursive subroutines, and it’s not totally implausible that such a faculty might have come about in a single mutation, which we should probably call a macro-mutation. (382)

Note the parts that I bolded. Dawkins’s accepts that the key linguistic innovation is recursion, in fact, hierarchical recursion. Moreover, it is not implausible to think that this recursive capacity arose in on go. Why does Dawkin’s think that this is “not implausible”? Here’s what he says:

The reason I am prepared to contemplate macro-mutation in this case is a logical one. Just as you can’t have half a segment, there are no intermediates between a recursive and a non-recursive subroutine. Computer languages either allow recursion or they don’t. There’s no such thing as half-recursion. It’s an all or nothing software trick. And once that trick has been implemented, hierarchically embedded syntax immediately becomes possible and capable of generating indefinitely extended sentences. The macro-mutation seems complex and ‘747-ish’ but it really isn’t. It’s a simple addition – a ‘stretched DC-8 mutation’ – to the software, which abruptly generates huge, runaway complexity as an emergent property. ‘Emergent’: important word, that. (383)

Again, note the bit in bold. This is an important point and, if correctly understood, it undercuts the relevance of those studies that take the existence of finite frames as important linguistic precursors of our kind of competence. So, many have pointed to proposed earlier stages of simple syntactic combination (e.g. NVN structures) as key evolutionary precursors of our full blown recursive mechanisms. Dawkins is pointing out the logical fallacy of this suggestion. There are no steps towards recursion. You either have it or you don’t. Thus, whether or not earlier “finite” stages existed cannot possibly explain how the recursive system arose. There is an unbridgeable logical gap between the two. And that’s an important point for it invalidates virtually all research trying to show that human language is just a simple quantitative extension of our ancestors capacities.

Dawkins continues the above quote with the following, where he asks whether the communicative function of language was a plausible driving force for spreading the novel language change:

If a mutant human was born, suddenly capable of true hierarchical syntax, you might well ask who she could talk to. Wouldn’t she have been awfully lonely? If the hypothetical ‘recursion gene’ was dominant, this would mean that our first mutant individual would express it and so would 50 per cent of her offspring. Was there a First Linguistic Family? Is it significant that Fox P2 actually does happen to be a genetic dominant? On the other hand, it’s hard to imagine how, even if a parent and half her children did share the software apparatus for syntax, they could immediately start using it to communicate. (383)

Like Chomsky, Dawkins does not see how the communicative function of language was a plausible force. He does not speculate, as Chomsky and Jacob have, that the capacity for recursion enhanced cognition in the lucky individual even if there was no plausible communicative benefit. However, just like Chomsky, he does not see how communicative benefits could play any useful role.

Dawkins ends with the following accreditation:

Noam Chomsky is the genius mainly responsible for our understanding of hierarchically nested grammar, as well as other linguistic principles. He believes that human children, unlike the young of any other species, are born with a genetically implanted language-learning apparatus in the brain. The child learns the particular language of her tribe or nation, of course, but it is easy for her to do so because she is simply fleshing out what her brain already ‘knows’ about language, using her inherited language machine.

But Chomsky’s hereditarian position in this one instance makes sense and, more to the point, interesting sense. The origin of language may represent a rare example of the ‘hopeful monster’ theory of evolution. (383-4)

Note one last time the bold stuff. Dawkins finds nothing evolutionarily suspect about Chomsky’s hypothesis. Indeed, it makes “interesting sense.” Might we say that it is a bold conjecture?

Does Dawkin’s endorsement show that Chomsky’s evolutionary conjecture is right? NO!! But Hopefully it will put to rest the idea that it’s some crackpot out in left field idea that anybody who knew anything about evolution would immediately see was ridiculous. It’s not and never has been. Maybe our local evolutionary mavens can stop suggesting otherwise. Or, more modestly, if what Chomsky believes is considered reasonable by Dawkins and Jacob (among other biologists I am quite sure) then maybe that is sufficient to indicate that it is not biologically suspect on its face. In fact, one might go further and note that it is the right kind of proposal; one that isolates a simple property that should it have arisen could be expected to have far reaching evolutionary consequences. So, Chomsky’s proposal might be wrong, but it is a contender, indeed an “interesting” one. And as the movie notes, all anybody really wants is to be a contender.

Thursday, October 22, 2015

Some things I've read lately; singing whales, beautiful theories and cargo cult neurosceince

Here are some things I’ve run across lately that might be of more general interest.

First, in the long line of singing animal posts (Go Mice!!), here is a nice review of the largest bass-baritone critters: whales. The piece compares their songs with that of birds. They are amazingly similar once one speeds up the whale stuff and adjusts the register or slows down the bird stuff and adjusts the register.  It appears that complex vocalization is something that sits there in many species quite far removed from one another on the evolutionary bush ready to evolve when the circumstances are propitious. So, mice, some birds, whales, humans, and I am sure, much more.

Second is this paper by Frank Wilczek (Nobel winner) on beauty in scientific theory. He identifies two properties that make a theory beautiful: (i) it has symmetrical laws and (ii) it is “exuberant.” 

Now, current linguistics is not physics, but it seems to me that theories do have aesthetic virtues that are revealing. We have no conception of symmetry (or none that I know of) but we do value theories that have fewer moving parts and are less “fine tuned” than their competitors. Thus one reason to value “reduction” (as in e.g. reducing anaphora or control to movement (te he!)) or unifying phrase building and displacement as instances of Merge is that it provides a prettier theory than one where all of these phenomena are treated as sui generic. Here “pretty” means more constraining and more explanatory. Here’s a corollary: one reason to be suspicious of the injudicious use of grammatical features is that they allow too much fine tuning of our accounts and explanation is at odds with fine tuning. Pretty theories explain, and that is part of what makes them pretty. For the interested there is a pretty good discussion of the vice of fine tuning and its relation to explanation in Steven Weinberg’s (another Nobelist) recent Whig history of modern physics (here).

The exuberance condition is also a good sign that your theory is onto something. I am sure I am not alone in being surprised that some account generalizes to phenomena it was not constructed to account for. Maxwell describes this (according to Wilczek) as “get[ting] more out of them [i.e our theories, NH] than we put into them.” Again exuberance and reduction/unification go hand in hand, as does the avoidance of fine tuning. As Wilczek puts it:

The second source of beauty in the laws of physics is their productivity – what I call their exuberance. Just a handful of basic principles generates an astonishing wealth of consequences – everything in the physical world! You can write the equations of the core theories of physics – known as the standard model – quite comfortably on a T-shirt. To paraphrase Hertz, they give back far more than we put in.

It is interesting that the real sciences consider such aesthetic discussions worth having while less mature disciplines (linguistics?) seem, IMO, to find them generally embarrassing. Maybe it is a mark of a field’s explanatory achievements that it is willing to entertain aesthetic considerations in its evaluation of truth.

Third, and last, here is a terrific rant on current neuroscience and how much we understand about the brain. Not much according to this piece.

The first point on C. Elegans is worth thinking through carefully. If it is correct (and I have heard the point made before) that we have the entire inventory of neurons and how they are wired up for C. Elegans but we still have no idea how its brain works then this should lead us to question the utility of complete wiring diagrams as the holy grail of neuro understanding. I really don’t know if this rant is accurate (though several neuro types I respect did not declare its contents BS), but if it is anywhere near the truth, then there is little current reason for thinking that the demand that cognitive claims should justify themselves in neuro terms should be afforded any respect. From what I can tell, rather the reverse should hold. We have pretty good stories about some domains of cognition (linguistics being one very good story) and next to nothing about neural mechanisms. So which should be cart and which horse? Here’s the rant’s useful warning:

So, the next time you see a pretty 3D picture of many neurons being simulated, think “cargo cult brain”. That simulation isn’t gonna think any more than the cargo cult planes are gonna fly. The reason is the same in both cases: We have no clue about what principles allow the real machine to operate. We can only create pretty things that are superficially similar in the ways that we currently understand, which an enlightened being (who has some vague idea how the thing actually works) would just laugh at.

Monday, October 19, 2015

What's in UG (part 3)

Here is the third and final post on the CHY paper (see here and here).

The second CHY argument goes as follows: (i) the clear categorical complementary distribution of BT-anaphors and pronominals that one finds in languages like English is merely a preference in other languages and (ii) ungrammaticality implies categorical unacceptability. In other words, mere preference (i.e. graded acceptability) is a sure indicator that the acceptability difference cannot reflect G structure.[1] This argument form is one that we’ve encountered before (see here) and it is no more compelling here than it was there, or so I will again argue. Let’s go to the videotape for details.

What is the CHY case of interest. It describes two dialects of Malay. In one the categorical judgments found in English are replicated (call this M-1). In the other, the same kinds of sentences evoke preference judgments rather than categorical judgments (call this M-2). The argument is that because Gs only license categorical judgments, M-2’s preferences cannot be explained Gishly. But as M-1 and M-2 are so similar, then whatever account offered for one must extend to the other. Thus, because the account of M-2 cannot be a Gish one, the account of M-1 can’t be either. That’s the argument. Not very good unless one accepts that categorical (un)acceptability is a necessary property of (un)grammaticality. Reject this and the argument goes nowhere. And reject this we should. Here’s why.

The basic judgment data that linguists use involve relative acceptability (usually under an interpretation). Sometimes, the relevant comparison class is obvious and the data is so clean that we can treat the data as categorical (as I argued here, I think that this is not at all uncommon). However, little goes awry if judgment data is glossed in terms of relative acceptability and virtually all the data can be so construed. Now, in these terms, the perception that some judgments (acceptability under an interpretation in this case) are preferable to others is perfectly serviceable. And it may (need not, but may) reflect underlying Gish properties. It will depend on the case at hand.

I mention this because, as noted, CHY describes M-1 and M-2 as making the same distinctions but with M-1 judgments being categorical and M-2 being preferences. CHY concludes that these should be treated in the same way. I agree. However, I do not see how this implies that the distinction is non-grammatical, unless one assumes that preferences cannot reflect underlying grammatical form.  CHY provides no argument for this. It takes it as obvious. I do not.

Is there anything to recommend CHY’s assumption? There is one line of reasoning that I can think of. How is one to explain gradient acceptability (aka preference) if one takes grammaticality to be categorical?  This is the question extensively discussed here (see the discussion thread in particular) and here. The problem in the domain of island phenomena is that even when we find the diagnostic marks of islandhood (super additivity effects) and we conclude that there is “subliminal” ungrammaticality, we are left asking why for speakers of one G the effects of ungrammaticality manifest themselves in stronger unacceptability judgments than for speakers of another G. In other words, why if island violations are ungrammatical do some find them relatively acceptable? The same question arises in the case that binding data that CHY discusses. And in both instances the question raised is a good one. What kind of answer should/might we expect?

Here’s a proposal: we should expect ungrammaticality ceteris paribus to get reflected in categorical judgments of unacceptability. However, ceteri are seldom paribused.  We know that lots goes into an acceptability judgment and it is hard to keep all things equal.  So for example, it is not inconceivable that sentences with many probable parses are more demanding performance wise than those without. More concretely, imagine a sentence where the visible functional surface vocabulary (FSV) fails to make clear what the underlying structure is. I am assuming, as is standard, that functional vocabulary can be a guide to underlying form (hence the relative “acceptability” of Jabberwocky). Say that in some languages the underlying surface morphology is more closely correlated to the underlying syntactic categories than in others. And say that this creates problems mapping from the utterance to the underlying G form. And say that this manifests itself in muddier (un)acceptability judgments. To say this another way; the less ambiguous the mapping from surface forms to underlying forms the more categorical the judgment will be. If something like this is right, then if we find a language where the morphology does not disambiguate BT-anaphors from exempt anaphors then we might expect acceptability to be less than categorical. Think of it as the acceptability judgment averaging over the two G possibilities (maybe a weighted average). On this scenario, then the absence of a “dedicated reflexive” form (see CHY p. 9) in M-2 will make it harder to apply BT than in a language where there is a dedicated form, as in M-1. Note, this is consistent with the assumption that in both languages the G distinguishes well-formed forms from ill-formed forms. However, it is harder to “see” this in M-2 given the obscurity of the surface FSVs than it is in M-1 where the distinction has been “grammaticalized.”[2]

I mention this option for it is consistent with everything that CHY discusses and, as I hope is evident, it leaves the question of the UG status of BT untouched. In short, in this particular case it is easy enough to cook up an explanation for why binding judgments in M-2 are murkier than those in M-1 without assuming that both reflect the operations of a common UG.[3] Thus, the CHY conclusion is not only based on a debatable premise, but in this particular case there is a pretty obvious way of explaining why the two dialects might provide different acceptability judgments. I should also add that this little story I’ve provided is more than CHY does. Here’s what I mean.

Curiously CHY does not explain how M-1 and M-2 are related except to say that M-1 has grammaticalized a distinction that M-2 has not. Which? M-1 has grammaticalized the notion notion of anaphor. What kind of process is “gramamticalizing”?  CHY does not say. It does not provide an account of what grammaticalization actually is, it only points to some of its effects in Malay and suggests that this goes on in creolization. Nor does CHY explain how undergoing grammataicalization renders preferences in the pre-grammaticalization period categorical in the post grammaticalization period? In CHY “grammaticalization” is Voltarian.[4] Let me offer a proposal of what gramamticalization is (actually this is implicit in CHY’s discussion).

Here’s one proposal: grammaticalization involves sharpening the FSV so that it more directly reflects the underlying G structure. In other words, grammaticalization is a process that aligns surface functional vocabulary with underlying grammatical forms. It might even be the case that language change is driven to sharpen this alignment (though I doubt that the force is very strong (personal opinion) Why? Because FSVs muddies overtime as well as sharpens and lots of FSV is very misleading). But if this is what grammaticalization is, then it can hardly challenge the UG nature of BT as it presupposes it. Grammaticalization is the process whereby the underlying categories of FL/UG act as attractors for overt functional morphology (i.e. LADs try to treat visible functional as reflecting underlying G categories and so over time the surface functional vocabulary will come to (more) perfectly delineate UG cleavages). In fact on this view, CHY, inadvertently, argues FOR the UG nature of BT for it assumes that grammaticalization is the operative process linking M-1 and M-2.

As noted, CHY does not explain what grammaticalization is (nor, to my knowledge has anybody else), though it does note what drives it. It is the usual suspect in such cases; the facilitation of processing (p. 17).  Unfortunately, even were this so (and I am skeptical that this actually means anything), it leaves unexplained how languages like M-2 could exist. After all, if processing ease is a good thing, then why should only M-1 partake?  The answer must be that something stops it from enjoying the fruits of parsing efficiency. What might this be? Well, how about the fact that the PLD only murkily maps the binding relevant FSVs (i.e. the surface forms of the anaphoric morphemes) onto the relevant underlying grammatical categories. But, as noted, if this is what grammaticalization is and what it does, then it is not merely compatible with the view that BT is part of UG and that it is innate, but virtually presupposes that something like this must be the case. Attractors cannot attract without existing.

Let me end here, with a diagnosis of what I take the fundamental error that drives the CHY discussion to be. It is not a new mistake, but one that is, sadly, endemic. It rests on the confusion between Greenberg and Chomsky Universals. CHY assumes that BT aims to catalogue surface distribution of overt morphemes. On this construal, BT is indeed not universal (as even a rabid nativist like me would concede). It is clearly not the case that languages all distinguish overt morphological categories subject to different BT principles. Some languages don’t clearly have a demarcated distinction between overt anaphors or pronominals among their FSVs, some don’t even have dedicated overt functional forms for reflexivization or pronominalization. If one understands UG as committing hostages to surface functional morphology, then CHY is right that BT is not universal. However, this is not how GGers ever understood (or, more accurately, ever should have understood) UG and universals. Chomsky universals are not Greenberg universals. They are more abstract and can be hard to discern from the surface (btw, this is what makes them interesting, IMO). Thus criticizing BT because it is wrong when understood in Greenbergian terms is not much of a fault given that it was not supposed to be so understood (i.e. another Dan Everett moment (see here)).  What is surprising is that the distinction between the two kinds of universals seems so difficult for linguists to grasp. Why is this?

Here’s an unfair (though I believe close to accurate) speculation: it results from the confluence of two powerful factors (i) the attraction of Empiricist conceptions of learning and (ii) the fascination with language diversity.

The first is a horse that I have hobbied on many times before. If you think that acquisition is largely inductive then universals without clear surface reflexes are a challenging concept. Being Eish with a taste for universals leads one to naturally erroneously understand Chomsky universals as Greenbrg universals (as Greenberg universals are the only ones that Eism tolerates).

The second force leading to the confusion between Greenberg and Chomsky universals comes from a fascination with linguistic variation (clearly something that is at the center of CHY). FL/UG rests on the idea that underlyingly there is very little real G variation. If one’s interest is in variation, then this notion of UG will seem way off track. Just look at all the differences! To be told that this just surface morphology will seem unhelpful at best and hostile at worst. The natural response is to look for helpful typological universals and these, not surprisingly will be Greenbergian. Here the generalizations concern surface patterns, as do typological differences if Chomsky’s conception of FL/UG is on the right track. Typological interests do not require embracing a Greenberg conception of universals (unlike a commitment to Eism, which does). However, it is, I believe, a constant temptation. The fact is that a Chomskyan conception of UG is consistent with the view that there are very few (if any) robust typological (i.e. surface true) universals. UG in Chomsky's sense doesn’t need them. It just needs a way of mapping overt forms to underlying forms. In other words, UG needs to be coupled with a theory of acquisition, but this theory does not require that there be surface true universals. Of course, there may be some but they are not conceptually required.

So that’s it. CHY’s conclusions only follow from a flawed understanding of what a universal is and what UG enjoins. The argument is not very good. Sadly, it might well be influential, which is why I spent so much effort trying to dismember it. It appeared in an influential cog sci journal. It will be read as undermining the notion of UG and the relevance of PoS reasoning. It will do so not because the arguments are sound but because this is a welcome conclusion to many. I strongly suggest that GGers educate their psycho counterparts and explain to them why Cognition has once again failed to understand what Chomskyan linguistics is all about. I also suggest that understanding a PoS argument be placed at the center of the field’s pedagogical concerns. It really helps to know how to construct one.

[1] I have no idea why this assumption is so robust among linguists. I don’t believe that anyone ever argued the case and many argued that it was not. So, for example, Chomsky explicitly denies this that one could operationalize grammaticality in terms of (categorical, or otherwise) judgments of acceptability (see, e.g. Current Issues: 7-9 and chapter 3). In fact, there is little reason to believe that there can be operational criteria of FL/UG notions of grammaticality, as holds true for any interesting abstract scientific notions (See Current Issues: 56-7).  If this is correct, then systematic preference judgments might be just as revealing of underlying grammatical form as categorical judgments. At any rate, the assumption that preferences exclusively reflect extra grammatical factors is tendentious. It really depends.
[2] I return on a moment to explicate this term.
[3] It is actually harder to come up with a good story for variable island effects, though as I’ve mentioned before I believe that Kush’s ambiguity hypothesis is likely on the right track.
[4] As in: why does this morphine put you to sleep? In virtue of its dormitive powers. 
I should add that CHY needs to offer an account of what the process consists in at pains of undermining its main argument. The argument is that M-1 and M-2 are sensitive to the same distinction. But this distinction cannot be a Gish one because Gish ones result on categorical judgments. But the M-2 judgments are not categorical therefore the distinction cannot be grammatical. How then to explain m-1 judgments? Well they are categorical because the non G distinction in M-2 has been grammaticalized in M-1. This invites the obvious question: what’s the output of the process of grammaticalization? It sounds like the end product is to render the distinction a grammatical one. But if this is so, then the premise that the distinction is the same in both M-1 and M-2 fails for what is a non-grammatical distinction in M-2 is a grammatical distinction in M-1. The only way to explicate what is going on in the argument is to specify what grammaticalization is and what it does. CHY does not do this.