Sunday, April 28, 2013

More on Science Hygiene

I was going to say a word or two about today's article in the NYT on scientific fraud (here). But Marc van Oostendorp beat me to it. I liked his discussion and have hoisted it from the comments for greater visibility. Here's his comment:

As it happens, the New York Times had an article about another case of fraud, in my home country:

I believe this case can serve to illustrate some of Norbert's points (even though in this case, the professor in question WAS fired). My impression is that social psychology is even worse than economics in its equating science with finding correlations in large data sets: especially if you read Stapel's book it becomes clear that social psychology is all about doing experiments and interpreting them in the 'right' way statistically, and hardly ever about trying to construct a theory with some explanatory depth.

If Stapel's research would not have been fraudulent, not many things would have changed. He found correlations between eating meat and being aggressive, or between seeing the word 'capitalism' and eating M&M's. In this way, he became an academic superstar, at least at the Dutch scale: he published in Science, was a Dean at Tilburg University (where as you may know, a thriving department of linguistics has been closed a few years ago because of its unfeasibility) and appeared on tv a lot with the outcomes of his 'research'.

People are now discussing what should be the consequences of this. The die-hard empiricists say that experiments should be more standardly replicated, we should do statistics on data sets just to see how likely it is that they have been made up, etc. But it seems to me that having a good and solid theory, or a number of competing theories, also helps here.

The point is, once you have a theory, some data can be PROBLEMATIC (or 'funny-looking', as Norbert says) for somebody believing in that theory, so that person will become suspicious and therefore motivated to replicate the experiments, or at least check all the relevant data. This apparently is hardly ever the case in social psychology: the (fabricated) observation that people who see the word 'capitalism' eat more M&Ms was just not problematic for anybody, since nobody had any deep expectations about the relation between seeing that word and consuming chocolate sweets to begin with.

But to be fair, it has to be noted that in this case after a number of years a few junior researchers were brave enough to discover the fraud and talk to the rector about it, and the guy was fired. (A detail which might interest linguists, and which is not mentioned in the NYT article, is that the committee which examined the fraud was led by the well-known psycholinguist Willem Levelt.) And that might shed some light on the asymmetry between the Hauser case and the RR case. The differences might have less to do with issues of methodology than with prestige and political power.
(I have to admit that I know much more about the Stapel case than about Hauser or RR.) 

Let me add a word or two.

First, one very significant difference (for me) between the Hauser case and the other two Bhattacharjee (the author) mentions is that all of Hauser's disputed work was REPLICATED.  This is really a very big deal.  It means that whatever shortcuts there may have been, they were inconsequential given that the results stand. In fact, I would go further, if the stuff replicates then this constitutes prima facie evidence that the investigation was done right to begin with.  Replicability is the gold standard. The problem with Suk's results and Stapel's is that the stuff was not only dishonest, but totally unstable, or so I gather from the article. Only they could get their results. In this regard Hauser's results are very different.

Second, as Marc points out, it looks like Stapel's work really didn't matter.  There was nothing deep there. Nor, a priori, could you expect there to be.  His experiments were unlikely to touch the underlying psychological mechanisms of the behavior of interest because the kinds of behaviors Stapel is interested in are just too damn complicated. Great experiments isolate single causal factors. The kinds of powers implicated in this experiment are many and interact, no doubt, in many complex ways. Thus, there is no surprise that the effect sizes were expected to be small (Stapel had to cook the numbers so that the effect sizes did not appear to be large ("He knew that the effect he was looking for had to be small in order to be believable..")). He was trolling for statistical, not scientific, significance.  His ambitions were political rather than scientific. At any rate, he believed that the appearance of credibility was tied to having small significant effect sizes. Maybe the right question is why anyone should care about small effects in this sort of domain to begin with.

Third, I think that Bhattacharjee ends by noting that fraud is not likely to be the most serious polluter of the data stream. Let me quote here:

   "Fraud like Satapel's -brazen and careless in hindsight- might represent the lesser threat to the integrity of science than the massaging of data and selective reporting of experiments...tweaking results  [NH]- like stopping data collection once the results confirm the hypothesis - is a common practice. "I would certainly see that if you do it in more subtle ways, it's more difficult to detect," Ap Dijksterhuis, one of the Netherlands best known psychologists, told me...

So, is fraud bad? Sure. Nobody endorses it in science any more than anywhere else.  But, this article, as Marc so eloquently notes, shows that it is easier to do where one is trolling for correlations rather than exploring underlying causal powers (i.e. in the absence of real theory) and where effect sizes are likely to be small because of the complex interaction of multiple causal factors.  Last, let's never forget about replicability. It matters, and where it exists, fraud may not be easily distinguishable from correct design.

Friday, April 26, 2013

UMD Mayfest 2013

Every year the Ling department runs a small conference on a specialized topic. It is generally relentlessly inter-disciplinary. This year is no exception. It is on the following topic.

Mayfest is a workshop that brings together researchers from a variety of disciplines and perspectives to discuss fundamental issues in linguistics. Over the course of two days, participants engage in talks and discussion sessions to stimulate new insights and collaboration. This year, we will be discussing the use of prediction in language and its neural instantiation. Researchers studying language perception, production, and development have been invited to speak about the representational properties, temporal dynamics, and neural underpinnings of expectations in language.

Admission is free (but please register), so if you are in the area you might want to drop by. Here is a link to the Schedule and short descriptions on what the invitees will be discussing.

Thursday, April 25, 2013

Methodological Hygiene

I was planning to write something thoughtful today on formal versus substantive universals and how we have made lots of progress wrt the former but quite a bit less wrt the latter. This post, however, will have to wait.  Why? Lunch!  Over lunch I had an intriguing discussion of the latest academic brouhaha, and how it compares with one that linguists know about “L’affaire Hauser.”  For those of you that don’t read the financial pages, or don’t religiously follow Krugman (here, first of many), or don’t watch Colbert (here) let me set the stage.

In 2010, two big name economists, Carmen Reinhart and Kenneth Rogoff (RR), wrote a paper building on their very important work (This Time is Different) chronicling the aftermath of financial crises through the ages. The book garnered unbelievable reviews and made the two rock stars of the Great Recession.  The paper that followed was equally provocative, though not nearly as well received.  The paper claimed to find an important kind of debt threshold which when crossed caused economic growth to tank.  Actually, this is a bit tendentious.  What the paper claimed was that there was a correlation between debt to GDP ratio of 90% and higher and the collapse of growth.  Note: correlation, not causation.  However, what made the paper hugely influential was the oft-suggested hint that the causality was from high debt to slow growth rather than the other way around or some combination of the two. The first interpretation was quickly seized upon by the “Very Special Serious People” (VSP), aka “austerians,” to justify policies of aggressively cutting budget deficits rather than fiscally priming the economic pump to combat high unemployment.[1] Keynesians like Krugman (and many others, including Larry Summers, another famous Harvardian) argued that the causality was from slow growth to large deficits and so the right policy was to boost government spending to fight unemployment as doing this would also alleviate the debt “problem.”[2] At any rate, it is safe to say that RR’s 2010 paper had considerable political and economic impact.  Ok, let’s shift to the present, or at any rate the last week.

Three U Mass economists (Herndon, Ash and Pollin:the lead being a first year grad student whose econometrics class project was to replicate some well known result in order to learn econometric methods) showed that the 2010 paper was faulty in several important ways: (i) there was a spread sheet error with some important info left out (this accounted for a small part of RR’s result), (ii) there was a trimming decision where some data points that could be deemed relevant as they trended against the RR conclusion were left out (this accounted for a decent percentage of the RR effect), and (iii) there was a weighting decision in which one year’s results were weighted the same as 17 year’s worth of results (this accounted for a good chunk of RR’s results).  All together, when these were factored in, RR’s empirical claim disappeared. Those who click onto the Colbert link above will get to meet the young grad student that started all of this.  If you are interested in the incident, just plug “Reinhart and Rogoff” into Google and start reading. To say that this is now all over the news is an understatement. Ok, why do I find this interesting for us.  Several reasons.

First, though this is getting well discussed and amply criticized in the media, I did not read anywhere that Harvard was putting together a panel to investigate bad scientific practice. Spreadsheet errors are to be expected. But the other maneuvers look like pretty shoddy empirical practice, i.e. even if defensible, they should be up front and center in any paper. They weren’t. But, still no investigation. Why not? It cannot be because this is “acceptable” for once exposed it seems that everyone finds it odd. Moreover, RR’s findings have been politically very potent, i.e. consequential.  So, the findings were important, false and shoddy. Why no investigation? Because this stuff though really important is hard to distinguish from what everyone does?

Second, why no expose in the Chronicle accompanied by a careful think piece about research ethics?  One might think that this would be front page academic news and that venues that got all excited over fraud might find it right up their alley to discuss such an influential case.

It is worth comparing this institutional complacency to the reaction our own guardians of scientific virtue had wrt Hauser.  They went ape (tamarin?) shit! Professors were impanelled to review his lab’s work, he was censured and effectively thrown out of the university, big shot journal editors reviled him in the blogosphere, and he was held out as an object lesson in scientific vice. The Chronicle also jumped onto the band wagon tsk-tsking about dishonesty and how it derails serious science. Moreover, even after all the results in all the disputed papers were replicated no second thoughts, no revisiting and re-evaluating the issues, nothing.  However, if one were asked to weigh the risks to scientific practice of RR’s behavior and Hauser’s alleged malpractice it’s pretty clear that the former are far more serious than the latter. Their results did not replicate. And, I am willing to bet, their sins are far more common and so pollute the precious data stream much much more. Indeed, there is a recent paper (here) that suggests that the bulk of research in neuroscience is not replicable, i.e. the data are simply not, in general, reliable. Do we know how generally replicable results in psycho are?  Anyone want to lay a bet that the number is not as high as we like to think?

Is this surprising? Not really, I think. We don’t know that much about the brain or the mind. It strikes me that a lot of research consists of looking for interesting phenomena rather than testing coherent hypotheses. When you know nothing, it’s not clear what to count or how to count it.  The problem is that the powerful methods of statistics encourages us to think that we know something when in fact we don’t. John Maynard Smith, I think, said that statistics is a tool that allows one to do 20 experiments and get one published in Nature (think p>.05).  Fraud is not the problem, and I suspect that it never has been. The problems lie in the accepted methods, which, unless used very carefully and intelligently, can muddy the empirical waters substantially. What recent events indicate (at least to me) is that if you are interested in good data, then it’s the accepted methods that need careful scrutiny. Indeed, if replicability is what we want (and isn’t that the gold standard for data?), maybe we should all imitate Hauser for he seems to know how to get results that others can get as well.

I will end on a positive note: we linguists are pretty lucky.  Our data is easily accessed and very reliable (as Sprouse and Almeida have made abundantly clear).  We are also lucky in that we have managed to construct non-trivial theories with reasonable empirical reach.  This acts to focus research and, just as importantly, makes it possible to identify “funny looking” data so that it can be subjected to careful test. Theories guard against gullibility.  So, despite the fact that we don’t in general gather our data as “carefully” as neuroscientists and psychologists and economists gather theirs, we don’t need to.  It’s harder to “cheat,” statistically or otherwise, because we have some decent theory and because the data is ubiquitous, easy to access and surprisingly robust.  This need not always be so. In the future, we may need to devise fancy experiments to get data relevant to our theories. But to date, informal methods have proven sufficient.  Strange that some see this as a problem, given the myriad ways there are to obscure the facts when one is being ultra careful.

[1] VSP is Krugman’s coinage. I am not sure who first coined the second.
[2] The scare quotes are to indicate that there is some debate about whether there actually is a problem, at least in the medium term.

Tuesday, April 23, 2013

Are Linguistic Journals Tombstones?

Paul Krugman notes (here) that within economics, dissemination of research is  is not via the journals.  Rather these act like repositories (he calls them 'tombstones') of stale information and is used mainly for "validating your work" and citable "when seeking tenure" but not the place to go to "keep up with what's happening now." I've tended to regard ling journals in the same way: their prime function is not to lead the investigative pack, but to bless work that has been done.  However, given the time and effort generated to reviewing, editing and publishing journals of record, it behooves us to consider whether or not all the effort is worth it. If the aim is to support research, it seems that far less formal venues are already doing most of the heavy lifting.  How important are journals in this context and are they worth the time, money and effort demanded? Are our journals mainly a service for deans and university presidents?  Would their importance go to zero were tenure to disappear?  Just asking.

Monday, April 22, 2013

A Nice Video

Christine notes the availability of a video of a recent Chomsky talk in Ireland (here). It's long, but pretty entertaining and informative.  He discusses issues concerning linear order and their place in FL around the 27 minute mark. He also gets asked to compare his notion of modularity with Jerry Fodor's (it comes at about the 1:08 mark) which is very interesting.  Chomsky and Fodor really don't agree on the right notion and Chomsky explains how he sees the issues here.  He also discusses (and criticizes) Fodor's referential/causal theory of content, but I can't give you the specific time reference.  At any rate, the video is fun to watch (the chair is clearly very amused at all the contrary positions that Chomsky takes). It is vintage stuff, and if you have a free 90 minutes, it's almost as good as Game of Thrones or Madmen.

Saturday, April 20, 2013

Mea Culpa: Update on some other SMT Relevant Papers

The material covered in detail in the posts on the SMT is not the only stuff that bears on the issues discussed. I mentioned, but did not discuss, the early work by Berwick and Weinberg. In addition, as David Pesetsky has pointed out to me, Martin Hackl has work analogous in many ways to that of PLHH that the interested can chase down (see here).

Furthermore, there is lots of interesting psycho work being done on matters other than islands. C-command is another hot topic and people like Sturt, Dillon, Kush, Lidz, Phillips (and many others that I don't know but papers by these people will lead you to) are finding that in many cases c-command is deployed online.  Just as interesting in many cases it seems that structure is ignored c.f. work by Vasishth, Lau and Wagers, Dillon and Xiang.  To be very clear: this is just stuff that has been thrown my way. So a warning: I am rather poorly read and what I discuss just happens to be what has crossed my path. It does not comprise what is worth reading. So apologies to those I have not mentioned, whose work I should know about.

Oh yes, one perfectly terrific way to rectify my ignorance and fruitfully add to the discussion is to mention stuff that you consider relevant in the comment sections.  And, please feel free to send me stuff that you think is fun and worth discussing, or even better, discuss it yourself and send it to me: I am looking for more guest posts.

Friday, April 19, 2013

One FInal Time into the Breach (I hope): More on the SMT

The discussion of the SMT posts has gotten more abstract than I hoped. The aim of the first post discussing the results by Pietroski, Lidz, Halberda and Hunter was to bring the SMT down to earth a little and concretize its interpretation in the context of particular linguistic investigations.  PLHH investigate the following: there are many ways to represent the meaning of most, all of which are truth functionally equivalent. Given this, are the representations empirically equivalent or are there grounds for arguing choosing one representation over the others. PLHH propose to get a handle on this by investigating how these representations are used by the ANS+visual system in evaluating dot scenes wrt statements like most of the dots are blue. They discover that the ANS+visual system always uses one of three possible representations to evaluate these scenes even when use of the others would be both doable and very effective in that context. When one further queries the core computational predilections of the ANS+visual system it turns out that the predicates that it computes easily coincide with those that the “correct” representation makes available. The conclusion is that the one of the three representations is actually superior to the others qua linguistic representation of the meaning of most, i.e. it is the linguistic meaning of most.  This all fits rather well with the SMT. Why? Because the SMT postulates that one way of empirically evaluating candidate representations is with regard to their fit with the interfaces (ANS+visual) that use it. In other words, the SMT bids us look to how grammars fit with interfaces and, as PLHH show, if one understands ‘fit’ to mean ‘be transparent with’ then one meaning trumps the others when we consider how the candidates interact with the ANS+visual system.

It is important to note that things need not have turned out this way empirically. It could have been the case that despite core capacities of the ANS+visual system the evaluation procedure the interface used when evaluating most sentences was highly context dependent, i.e. in some cases it used the one-to-one strategy, in others the ‘|dots ∩ blue| - |dots ∩ not-blue|’ strategy and sometimes the ‘|dots ∩ blue| - [|dots| - |dots ∩ blue|]’ strategy.  But, and this is important, this did not happen. In all cases the interface exclusively used the third option, the one that fit very snugly with the basic operations of the ANS+visual system. In other words, the representation used is the one that the SMT (interpreted as the Interface Transparency Thesis) implicates. Score one for the SMT. 

Note that the argument puts together various strands: it relies on specific knowledge on how the ANS+visual system functions. It relies on specific proposals for the meaning of most and given these it investigates what happens when we put them together. The kicker is that if we assume that the relation between the linguistic representation and what the ANS+visual system uses to evaluate dot scenes is “transparent” then we are able to predict[1] which of the three candidate representations will in fact be used in a linguistic+ANS+visual task (i.e. the task of evaluating a dot scene for a given most sentence[2]).[3]

The upshot: we are able to use information from how the interface behaves to determine a property of a linguistic representation.  Read that again slowly: PLHH argue that understanding how these tasks are accomplished provides evidence for what the linguistic meanings are (viz. what the correct representations of the meanings are). In other words, experiments like this bear on the nature of linguistic representations and a crucial assumption in tying the whole beautiful package together is the SMT interpreted along the lines of the ITT. 

As I mentioned in the first post on the SMT and Minimalism (here), this is not the only exemplar of the SMT/ITT in action. Consider one more, this time concentrating on work by Colin Phillips (here). As previously noted (here), there are methods for tracking the online activities of parsers. So, for example, the Filled Gap Effect (FGE) tracks the time course of mapping a string of words into structured representations.  Question: what rules do parsers use in doing this. The SMT/ITT answer is that parsers use the “competence” grammars that linguists with their methods investigate. Colin tests this by considering a very complex instance: gaps within complex subjects. Let’s review the argument.

First some background.  Crain and Fodor (1985) and Stowe (1986) discovered that the online process of relating a “filler” to its “gap” (e.g. in trying to assign a Wh a theta role by linking it to its theta assigning predicate) is very eager.  Parsers try to shove wayward Whs into positions even if filled by another DP.  This eagerness shows up behaviorally as slowdowns in reading times when the parser discovers a DP already homesteading in the thematic position it wants to shove the un-theta marked DP into. Thus in (1a) (in contrast to (1b), there is a clear and measurable slowdown in reading times at Bill because it is a place that the who could have received a theta role.

(1)  a. Who did you tell Bill about
b. Who did you tell about Bill

Thus, given the parser’s eagerness, the FGE becomes a probe for detecting linguistic structure built online. A natural question is where do FGEs appear? In other words, do they “respect” conditions that “competence” grammars code?  BTW, all I mean by ‘competence grammars’ are those things that linguists have proposed using their typical methods (one’s that some Platonists seem to consider the only valid windows into grammatical structure!)?  The answer appears to be they do. Colin reviews the literature and I refer you to his discussion.[4]  How do FGEs show that parsers respect grammatical structure? Well, they seem not to apply within islands! In other words, parsers do not attempt to related Whs to gaps within islands. Why? Well given the SMT/ITT it is because Whs could not have moved from positions wihin islands and so they are not potential theta marking sites for the Whs that the parser is eagerly trying to theta mark. In other words, given the SMT/ITT we expect parser eagerness (viz. the FGE) to be sensitive to the structure of grammatical representations, and it seems that it is.

Observe again, that this is not a logical necessity. There is no a priori reason why the grammars that parsers use should have the properties that linguists have postulated, unless one adopts the SMT/ITT that is. But let’s go on discussing Colin’s paper for it gets a whole lot more subtle than this. It’s not just gross properties of grammars that parsers are sensitive to, as we shall presently see.

Colin consider gaps within two kinds of complex subjects. Both prevent direct extraction of a Wh (2a/3a), however, sentences like (2b) license parasitic gaps while those like (3b) do not:

(2)  a. *What1 did the attempt to repair t1 ultimately damage the car
b. What1 did the attempt to repair t1 ultimately damage t1
            (3) a. *What1 did the reporter that criticized t1 eventually praise the war
                 b. *What did the reporter that criticized t1 eventually praise t1

So the grammar allows gaps related to extracted Whs in (2b) but not (3b), but only if this is a parasitic gap.  This is a very subtle set of grammatical facts.  What is amazing (in my view nothing short of unbelievable) is that the parser respects these parasitic gap licensing conditions.  Thus, what Colin shows is that we find FGEs at the italicized expressions in (4a) but not (4b):

(3)  a. What1 did the attempt to repair the car ultimately …
b.   What1 did the reporter that criticized the war eventually …

This is a case where the parser is really tightly cleaving to distinctions that the grammar makes. It seems that the parser codes for the possibility of a parasitic gap while processing the sentence in real time.  Again, this argues for a very transparent relation between the “competence” grammar and the parsing grammar, just as the SMT/ITT would require.

I urge the interested to read Colin’s article in full. What I want to stress here is that this is another concrete illustration of the SMT.  If grammatical representations are optimal realizations of interface conditions then the parser should respect the distinctions that grammatical representations make. Colin presents evidence that it does, and does so very subtly. If linguistic representations are used by interfaces, then we expect to find this kind of correlation. Again, it is not clear to me why this should be true given certain widely bruited Platonic conceptions. Unless it is precisely these representations that are used by the parser, why should the parser respect its dicta?  There is no problem understanding how this could be true given a standard mentalist conception of grammars. And given the SMT/ITT we expect it to be true. That we find evidence in its favor strengthens this package of assumptions.

There are other possible illustrations of the SMT/ITT.  We should develop a sense of delight at finding these kind of data. As Colin’s stuff shows, the data is very complex and, in my view, quite surprising, just like PLHH’s stuff. In addition, they can act as concrete illustrations of how to understand the SMT in terms of Interface Transparency.  An added bonus is that they stand as a challenge to certain kinds of Platonist conceptions, I believe.  Bluntly: either these representations are cognitively available or we cannot explain why the ANS+visual system and the parser act as if they were. If Platonic representations are cognitively (and neurally, see note 3) available, then they are not different from what mentalists have taken to be the objects of study all along. If from a Platonist perspective they are not cognitively (and neurally) available then Platonists and mentalists are studying different things and, if so, they are engaged in parallel rather than competing investigations. In either case, mentalists need take heed of Platonist results exactly to the degree that they can be reinterpreted mentalistically. Fortunately, many (all?) of their results can be so interpreted.  However, where this is not possible, they would be of absolutely no interest to the project of describing linguistic competence. Just metaphysical curiosities for the ontologically besotted.

[1] Recall, as discussed here, ‘predict’ does not mean ‘explain.’
[2] Remember, absent the sentence and in specialized circumstances the visual system has no problem using strategies that call on powers underlying the other two non-exploited strategies. It’s only when the visual system is combined with the ANS and with the linguistic most sentence probe that we get the observed results.
[3] Actually, I overstate things here: we are able to predict some of the properties of the right representation, e.g. that it doesn’t exploit negatively specified predicates or disjunctions of predicates.
[4] Actually, there are several kinds of studies reviewed, only some of which involve FGEs. Colin also notes EEG studies that show P600 effects when one has a theta-undischarged Wh and one crosses into an island. I won’t make a big deal out of this, but there is not exactly a dearth of neuro evidence available for tracking grammatical distinctions.  They are all over the place. What we don’t have are good accounts of how brains implement grammars. We have tons of evidence that brain responses track grammatical distinctions, i.e. that brains respond to grammatical structures. This is not very surprising if you are not a dualist. After all we have endless amounts of behavioral evidence (viz. acceptability judgments, FGEs, eye movement studies, etc.) and on the assumption that human behavior supervenes on brain properties it would be surprising if brains did not distinguish what human subjects distinguish behaviorally. I mention this only to state the obvious: some kinds of Platonism should find these kinds of correlations challenging. Why should brains track grammatical structure if these live in Platonic heavens rather than brains?  Just asking.

Wednesday, April 17, 2013

Always nice to see what our neuro freinds are saying...

Greg Hickok (an honest to God neuro type) writes (here) about "what language really is," observing that though "culture is reflected though language, ...doesn't mean that language IS culture." Amen. I like his analogy to vision where the same moves made in the language case would be strongly resisted. BTW, those of you that don't read Talking Brains might like to check out the site.

Minimalist Physics?

I was recently rereading some essays by Steven Weinberg (here) and was reminded why it is that I have a severe case of physics envy. It is not only the depth of the results, both theoretical and empirical, and not only the fact that when it comes to explanation, what you find in physics is the gold standard, but my envy is also grounded in a respect for the disciplined way that physicists talk about even the most obscure methodological matters. Weinberg, in discussing his reductionist urges, makes three points that should resonate with those who have a minimalist pulse.  He discusses these on pages 37-40.  Here’s “lessons” he draws from “the history of science in the last three hundred years” (btw, I am pretty sure that the last quoted passage is said/written with tongue firmly in cheek).

1.     The aim of fundamental physics is to “explain why everything is the way it is. This is Newton’s dream, and it is our dream.”
2.     “The importance of phenomena in everyday life is, for us, a very bad guide to their importance in the final answer.”
3.     Indeed, whether something is ubiquitous or common is a very unreliable guide to its interest or importance: “We do not know about muons in our everyday life. But as far as we know, muons play just as fundamental a role (which may or may not be very fundamental) as electrons  [which is “ubiquitous in ordinary matter” –NH] in the ultimate scheme of things.”
4.     “We are not particularly interested in our electrons or our muons. We are interested in the final principles that we hope we will learn about by studying these particles. So the lesson is that the ordinary world is not a very good guide to what is important.”
5.     “…if we are talking about very fundamental phenomena, then ideas of beauty are important in a way that they wouldn’t be if we were talking about mere accidents…[P]lanetary orbits don’t have to be beautiful curves like circles because planets are not very important on any fundamental level. On the other hand, when we formulate the equations of quantum field theories or string theories we demand a great deal of mathematical elegance, because we believe that the mathematical elegance that must exist at the root of things in nature has to be mirrored at the level where we are working.  If the particles and fields were working on were mere accidents…then the use of beauty as a criterion in formulating our theories would not be so fruitful.”
6.     “…in the theories we are trying to formulate, we are looking for a sense of uniqueness, for a sense that when we understand the final answer, we will see that it could not have been any other way. My colleague John Wheeler has formulated this as the prediction that when we learn the ultimate laws of nature we will wonder why they were not obvious from the beginning.”

These have more than a passing resemblance to oft quoted minimalist dicta.  (1) is the familiar minimalist credo: the aim is not only to know what is the case but to know why it is the case.

(2) sums up why it is that looking at large collections of surface data are very likely to be irrelevant.  Real explanations lay hidden beneath the surface of things, just as much in the study of FL as in the study of basic physics.

(3) reinforces the point in (2) and also suggests the hazards of concentrating on the common, a feature not unknown to the statistically inclined. Common/frequent does not imply theoretically important unless one has a pretty surfacy/empiricist conception of theory.

(4) is a favorite of mine: think of the distinction between linguistics and languistics. But it goes deeper.  We study languages because we believe that studying these will tell us something about the nature of human cognition and biology.  They are instruments for probing minds/brains, the latter (not languages) being the ultimate objects of inquiry.  In this sense linguistics is not about language any more than physics in Weinberg’s conception, is about electrons or muons. 

(5) starts moving onto very subtle but important territory. Weinberg and Chomsky share the idea that at fundamental levels reality is simple and elegant. The converse of this is that complexity is the result of interacting systems. As fundamental theory describes single (i.e. non-interacting) there is no place within fundamental theories for interaction effects. Thus we expect (and find) simplicity and elegance. So here’s a regulative ideal: simplicity holds at fundamental levels and complexity arises from the interaction of simple systems. Thus, when looking for the fundamental look for the simple and when it eludes you assume that the complexity is a sign of two or more interacting simple systems.  This is, admittedly, murky advice.  However, at times, murky methodological advice can be important and consequential. Weinberg and Chomsky identify two projects where this is likely to be the case.

And last we have (6): it provides a kind of ex post conception of a notion Chomsky has been fond of: “virtual conceptual necessity (VCN)” Ex post, VCN cannot guide research: it simply describes what we are looking for: a theory that when stated will seem both obviously true and inevitable.  We can, however, get an inkling about what kinds of notions such a theory will contain.  For example, FL will have to contain a rule that puts elements together (merge) and it would be very surprising given what we know if its operations were not regulated by natural computational considerations like cyclicity and locality.  These are candidate basic notions for a theory that has VCN. As Weinberg puts it:

That [Wheeler’s conception above -NH] may very well be true. If it is, I suspect it will be because by the time we learn the ultimate laws of nature, we will have been so much changed by the learning process that it will become difficult to imagine that the truth could be anything else…

Note the historical gloss Weinberg gives to Wheeler’s dictum. This is why I added “given what we know” above. Given minimalism’s roots in Generative Grammar research over the last 60 years operations like merge and notions like cyclicity and locality and economy must be part of any reasonable account. That’s Weinberg’s version of VCN, and as we can see, it’s not limited to the dreams of linguists.  All good science, at least all good basic science, appears to share the dream.