A Data Utopia for Science-of-Science

Here I want to briefly sketch out a vision for how to solve a key set of problems facing science-of-science researchers, using the relatively new idea of a ‘data trust.’

In my ideal world, science-of-science researchers would have access to a data utopia — a giant longitudinal dataset encompassing individual and project-level data on the end-to-end scientific process for researchers worldwide, which is technically secure and safeguards privacy.

I think if you’re one of the five people reading this, the value of such a dataset is self-evident, but to be explicit I think this would be a large advance in our ability to ask and answer questions about the nature of the scientific process, and doing so will ultimately help researchers reach their potential as they use better tools, methods, and processes, resulting in faster scientific progress to the benefit of us all!

A data utopia, maybe

Okay so to be more specific, here are the problems that I think such a data utopia would need to solve:

  1. Safeguarding privacy of the contributing individuals. This is goal number one — none of us wants to live in a world filled with surveillance.
    • Individuals need to be able to specify what data they’re comfortable sharing with science-of-science researchers, if any.
    • Individuals need to be able to specify how that data is to be used (e.g. for public research only; for private research conditional on payment, etc.)
    • Individuals need to be able to specify who is able to access and analyze the data
  2. Safeguarding the technical security of the data. All of (1) is for naught if the database is unsecure
  3. Unify individual-level and project-level data horizontally, across steps of the scientific process. That is, we would want to see as a given project advances from the idea-stage, to funding, to team formation, to experiments/modelling, to writing up the results, to presenting at conferences, to publication. Obviously these steps will look different for every project, but the point is we want a view of the same project over time, from start to finish.
  4. Unify individual-level and project-level data vertically, across disparate tools. That is, say Person A uses Microsoft Word to write their research paper, and Person B uses LaTeX Studio, and Person C uses Overleaf. These are all in the same stage of the scientific process — writing up the results — but right now there is no standardized metadata format to compare the writing done with each tool. So, we would need some standardization in the way these metadata are captured. Same thing goes for say, conference presentations — we should be able to represent a paper presentation at NeurIPS or at NBER in a standardized way, despite that they’re very different conferences serving a very different set of researchers.

So that sounds like a pipe dream.

But, I’m optimistic that these are solvable problems, and that some form of a data trust can solve them.

What is a data trust?

I don’t really know! I’m only just learning about them. My high-level understanding is this: it is a professionally-managed, legal organization which mediates the relationship between individuals who can provide their own data, and third-parties (say, academic researchers) who want to analyze that data. Critically, my understanding is that the professional managers of the data trust have a legal duty to the contributing individuals to act on their behalf, with their best interests in mind.

In practice, I think what that means is something like this: someone like the Sloan Foundation funds the creation of a new Science-of-Science Data Trust, whose explicit legally mandated purpose is to advance science-of-science research on behalf of anyone contributing data to the Trust.

The managers of the Trust go out to universities, national labs, etc., and try to convince some researchers to set up a data pipeline into the Trust (I’m sure they also need to negotiate with the university/lab lawyers — seems easy!). At this step, and through some always-available mechanism (say a web UX), the individual researchers contributing data would be able to specify their requirements for privacy as in (1) above — what data they’re willing to share, with whom, and for what purpose is that data able to be used. Said data pipelines would then send structured data about the scientific process of the contributing researchers to databases managed by the Trust.

Presumably, the Trust would develop metadata standards such that we can do things like in (4) and are able to compare data across disparate tools — they might need to go talk to e.g. Overleaf, LaTeX Studio, or the NeurIPS organizers, or the NBER organizers, such that conditional on individuals’ permission, the Trust can get standardized data through their pipelines from these third-parties (whether originating from the individual’s computing resources, their lab’s shared resources, or in fact the 3P’s servers).

With all this data assembled from contributing researchers, the Trust would engage academic researchers in some vetting process, taking proposals for doing academic research with these data; maybe there’s some mechanism where these proposals are also vetted by the trustees; and the academic researchers would get access to the data and be allowed to publish research papers about it.

What’s Next?

Well for me, first I need to read everything I can about data trusts, because as outlined above, I don’t really know very much. Maybe what I’m proposing is not at all compatible with the idea of data trusts, and reading more will set me right. Second, if in my reading data trusts still seem like they could solve these problems, I need to talk to a couple sets of people: (1) Any of the folks who right now are piloting data trusts in the wild, because I’m sure their practical experience will differ a lot from what I find in just reading about it, (2) Some folks at e.g. the Sloan Foundation, to see if anyone has interest in funding such a thing. Then, I don’t know, let’s see if we can get a few institutions to buy-in, fund a pilot, and try it out!

Many chances for failure, and I’m quite open to being wrong about any or all of this, but this has been in the back of my mind for a year or two now and nothing in my brain has yet realized why the idea is stupid and bad, so I’m taking that as a signal that it’s worth a try. Will write up more as I learn more; and if you read this and have useful direction, whether to stuff to read or people to talk to or reasons why this will all fail, please do let me know!

Engineering Proximity

Matt Clancy wrote a typically wonderful post, this one about how proximity affects innovation. If you haven’t read that yet, go read it first.

I want to follow up on one sentence in particular: “To maximize discovery, you should group people who are unlikely to know each via other avenues, but who might plausibly benefit from being able to share ideas.” Based on the literature to-date, this seems absolutely right, and I’d like to “yes, and” it by saying that this seems like a perfect place for some economic engineering.

To be specific, I think there’s a fruitful research agenda to be had engaging with the above as a formal optimization problem. I’m quite bad at notation but I believe it would go something like this:

Given a laboratory floorplan with desks d D, and scientists s S, match each scientist to a desk mds {0,1} in order to maximize publication count U. Now of course the fun bits are that each pair of seats are a fixed distance apart, xd1d2 , and each scientist pair has some relevant attributes like, “are unlikely to know each other via other avenues, but who might plausibly benefit from being able to share ideas.” You might proxy these by, I don’t know, asking people who they already know, so something like ks1s2 {0,1}, and then if you want to be fancy grabbing some doc2vec representations of each person’s research field from their prior publications, which can just be like rs [0,1]. Finally, we have some functions relating publication count to these attributes, such that for each pair of scientists, their publication count is given by Us1s2 = f(x, k, r). E.g., we can relate prior ties k and walking distance x to publication count U through functions like the ones from Hasan and Koning in Matt’s post:

And then from Lane et al. (2020) we can relate each scientist pair’s research similarity (rs1 – rs2) to publication counts U with an inverted-U shape (haha, told you I was bad at notation), expressing that scientists who have very similar research shouldn’t sit close together, but rather they should have moderately similar research in order to boost collaboration:

With that all formulated, it’s just a matter of solving the combinatorial optimization problem. Also if you want I think you can cast the problem at the team-level instead of individual-level, but not much changes in the setup.

So the first point I would make is, as far as I know such an algorithm does not exist today, and it is non-trivial to formulate and solve! And I don’t think you could get as far as the above formulation without the foundational research from all the people mentioned in Matt’s post (Roche, Catalini, Oettl, Hasan, Koning, Miranda, Claudel, and more), but there’s still a big missing piece in not having such an algorithm expressed formally with some simulated (or even better, real-world) results, articulated in a research paper.

I don’t know if economists take it as given/trivial that someone will do it, but I don’t believe it is. If I was asked to solve this problem for a lab, I would say “wait a minute” and go grab the nearest PhD to do it instead. Why? Not only do I think the above formulation is beyond most people to set up and solve (including me), it’s the easy version! As Matt notes: 1) This is actually a dynamic problem, where the functions are probably nonlinear with respect to time, so right away we go from a single-shot solve (LP) to multi-timestep problem (DP, RL, what have you) 2) Looks like we can’t just use “distance between desks” because walking paths also matter, 3) And besides which in a real lab you have people coming and going all the time, which means stochasticity. So all in all, not trivial, and you could easily get a bunch of solid research papers just from making the problem progressively more realistic.

And then the second point I would make is, why I would call this an ‘economic engineering’ problem and not just a straightforward optimization problem, is we really aren’t settled on the dynamics, functions, or relevant variables yet. That is, there’s still causal inference work to do. So I think a proper research agenda around this problem actually looks like solving the optimization problem, then maybe ping-ponging back to a causal inference toolbox for a bit to find out what we’re missing in the formulation, and then updating the optimization algorithm with new, more correct functional forms or more relevant variables.

Anyway I’d quite like to see someone do this. Clearly there can be a lot of real-world value, and the literature so far has done great work in establishing some of the causal structure of the problem, but I don’t think it’s close enough to application such that entrepreneurs can take the existing research and make a useful product for lab managers. I think the research agenda above would still be foundational enough that it’s most appropriate for academia, and I think it would be necessary before we see impact of this literature out in the world.

Dynamic Control vs. Causal Inference

A bunch of people had really good comments on the “No more RCTs” post, and I think I owe it to people’s good faith engagement to flesh out what was a very barebones thing. Before I dive in I want to be very clear that there are other, much more in-depth formal treatments of these issues (e.g. Dean Eckles, here), so I’m not claiming to be doing anything new, just trying to get some clarity for myself and think about it in the context of the field I care most about, which is science-of-science.

I’m going to start with some basic terminology and notation definitions, but I’ll label things with Section headers so you can skip those parts if you’re familiar. Where we’re headed is a more clearly defined version of the original hypothesis: I think people are spending too much time on causal inference relative to working on dynamic control.

RL basics

Okay if you already know all about reinforcement learning (RL) / Markov decision process terminology, feel free to skip this bit. And/or if you want the actual, formal treatment without what are sure to be mistakes on my part, Sutton/Barto is the way to go. I’m going to be talking in terms of RL because I’ve worked the most with it relative to other methods, but in practice you’ll see lots of techniques that are not machine-learning based, e.g. Linear Programming, Dynamic Programming, Mixed Integer Programming, Genetic Algorithms, and I’m sure a bunch of stuff I don’t know about. The important thing is to set some notation for talking about dynamic control problems, so:

So basically speaking, we’re going to have a State, an Action, and a Reward: {S,A,R}. RL is dynamic control: dynamic because we will proceed through timesteps, and what we do at timestep t has bearing on the state, and therefore our decision at timestep t+1; control because we’re concerned with choosing the action, at each timestep, that maximizes rewards (more accurately, we want the trajectory of actions over all timesteps in the episode (or over infinite horizon, with discounting), that maximizes the discounted sum of rewards).

The State is usually a vector of relevant contextual features about the environment. In a game of chess, this would be some representation of where all the pieces are on the board. The Action can actually be any vector representing any number of actions, discrete or continuous, so in chess, you’d have as many possible actions each timestep as there are pieces * feasible moves for each piece. And the Reward is a bit tricky, but in chess canonically it would just be [0,1] for either lose or win — of course people spend a lot of time and energy figuring out ways to look at more immediate rewards, so we can learn faster, but that’s for later.

Generally, you also have a State transition function, which a lot of times is going to be unknown to whatever algorithm you use. But it would be the specific dynamics, whether deterministic or stochastic, that governs how you get from one State to the next given your Action, or more formally, S_t+1 = {S_t, A_t}.

A Policy in this context has a specific meaning — using the Sutton/Barto definition (p.13) it’s a “decision-making rule,” usually given by Pi, π, and usually used to talk about π(a|s), or in other words “probability of taking action a in state s under stochastic policy π.” Basically can think of it as a mapping from state to action — this’ll be the output of your RL algorithm, a policy to control actions.

Bringing in Causal Graphs

Okay so probably you can already imagine how all this relates to causal inference, but the easiest bridge in my mind is via causal graphs. Generally speaking, you can choose to represent the MDP notation above just as well with a causal graph.

In some joint work with people much smarter than me, we take a few canonical toy problems and show how they look as causal graphs (we talk about ‘factored graphs’ in the paper because some reviewers got cranky about terminology, but they’re causal graphs). This is not at all the only work to do this, it’s not special, just easy for me to reference. So anyway you can take problems like Bin Packing, or Newsvendor, and show how they look with a causal graph, like so:

And now you see it’s only a hop skip and a jump over to whatever causal inference method you like, on any part of this graph. DAGs don’t need you to be in Pearl’s world, go wild with diff-in-diffs, RDDs, whatever, you can always translate back and forth to a graph, and you can usually contextualize a problem you’re attacking with causal inference inside a larger MDP graph that’s really a dynamic control problem.

Getting to the Point

So to give an example with the Newsvendor problem above, one obvious extension here is you might want to forecast customer demand, so imagine we also have a bunch of nodes pointing into “Customer Demand” as State variables. Like, I don’t know, we’re trying to buy Michael Jordan Jerseys, and so we want to figure out the demand for Michael Jordan Jerseys, and there’s a node going into demand that’s like… “Release of The Last Dance.”

If you’re just doing pure Reinforcement Learning, you don’t really care one whit about causal inference, you just toss that new variable into the State vector, and off you go.

But, if your whole toolkit comes from causal inference methods, what I would see a lot of people doing is caring about estimating only the causal effect of “Release of The Last Dance” on “Customer Demand for Michael Jordan Jerseys.” And you could of course do this in any number of ways, pick your favorite method — I was picking on RCTs, but obviously lots of people like diff-in-diffs, or maybe you’re well set up for an RDD, or whatever.

(One important side note is that a fair number of RCTs are actually about testing some Policy π_1 against π_2 — but a lot of times these are crude, inflexible/brittle π that are completely state-independent. I do think there’s a solid use-case for RCTs about π_1 against π_2, but I don’t really see people spend a lot of effort making sure either π_1 or π_2 is optimal. Like the Chetty paper testing a 4-week or 6-week deadline for peer reviews — why are we not treating the deadline weeks as an Action and trying to optimize it? Even if we’re going to ignore State, it’s at minimum a bandit problem to find the optimal number of weeks).

My main point is that, how much should we actually care about getting a really well identified causal estimate of “Release of The Last Dance” on “Customer Demand for Michael Jordan Jerseys,” as opposed to spending research effort on developing a really good Policy π(a|s) for the Action “Buy Michael Jordan Jerseys” to maximize the Reward “Profit”?

Or, in the context of a problem I care much more about, how much should we care about getting a really well identified causal estimate of “author-reviewer homophily” on “reviewer score” (which, remember, is just one really small piece of what is probably a very big causal graph in a {S,A,R} MDP formulation) instead of working on a dynamic control Policy π(a|s) for the Action “Accept/Reject Paper to Journal/Conference/Grant-Funding” to maximize the Reward “Scientific Impact”?

Where I’m Not Sure

So this is the part I’m not sure — when are causal inference methods actually the best use of limited research time/resources, and when should dynamic control methods be the focus of that limited research time/resource? My guess, my intuition, is you really want both, but right now I see the research community at least in science-of-science doing like 99% causal inference and if there’s a 1% doing dynamic control I’ve never met them / they’re computer scientists.

If you read through the paper I linked above, the mere fact of knowing the directional relationships of the causal graph helps you train a Policy faster. And I think there are some massive complications when we get to talking about science-of-science policies (or Policies), because as opposed to something quite well-known and well-studied like Newsvendor, I don’t think we know very much about the relevant State, let alone the State transition function — that is to say it would be hard to bring many practical dynamic control algorithms to bear, because we’d lack an accurate simulator, and I think you’d have to rely a lot just on the data + domain knowledge, which can be scary (or more relevantly, can lead you to implement bad Policies).

Probably most central, and the part where I really would not be able to do the math without a lot of help, is that I think we’re basically talking about sample efficiency here. Which is better — to spend 100 timesteps in the mode of an RCT to obtain an estimate between two nodes, then updating your Policy at the 101st timestep, or spend those same 100 timesteps actively trying to improve your Policy each timestep?

And of course, there are a number of other complications there, both theoretical and practical. For one thing, we often don’t face such a strict choice — why not both RCTs and dynamic control? Moreover, a lot of econometrics today makes use of historical (observational) data — who’s to say we shouldn’t use such data as a starting point, with causal inference methods to sharpen our estimates about the State transition function, and then use our preferred dynamic control algorithm, which will perform much better because we now have a nice simulator or the right assumptions or whatever. Also mostly people use citations or patents as a proxy for impact, and if we want to do control, we really are going to want some type of nearer-term reward signal. And then practically, as Tom Wollmann pointed out when I was complaining about this stuff awhile back, there are quite a lot of instances where you don’t have anything close to a nice enough setup where you can adjust your Policy on the fly. E.g., if you’re giving out cash transfers by physically writing checks, it’s kind of hard to change the amounts day by day (although, much easier if you’re doing the transactions programmatically through Mpesa or something).

Closing Thoughts

  1. A large chunk of the problems we care about in science-of-science can be cast as dynamic control problems, and I think keeping the larger {S,A,R} context in mind is really critical for researchers even if you’re focusing solely on a causal estimate between two state variables.
  2. I suspect we want researchers working on both causal inference methods and dynamic control algorithms in the science-of-science field, but right now it’s weighted really heavily toward causal inference, and that’s a problem because:
  3. Decision-makers have to make decisions every day anyway (on funding, on mentorship matching, on conference reviewer assignment, etc. etc. etc.), and they are implementing what I think are probably really suboptimal dynamic control policies on their own, given that those Policies are usually informal, human-driven, manual heuristics that often don’t even take into account the causal evidence anyway and aren’t optimizing for anything.

No more RCTs

I mean not really, but…

Here’s what I worry about. Pierre Azoulay is one of the smartest people I’ve ever met, and he can’t convince the NIH to do RCTs so we can discover better ways of allocating research funding. But more than that, let’s say Pierre was successful, the NIH saw the light, and decided to run a new RCT with its funding model every single year — would that actually maximize research output more than if we just gave the NIH some funding algorithms directly?

For example, one year you might play around with reviewer blinds. The next year maybe you swap around reviewers according to homophily with each other. Then homophily with the PIs. Then research similarity with the PIs. Then number of reviewers. Then the scoring system. Then the scoring system again. Etc., etc. Each time, let’s say for the sake of argument we get back results of the RCT within the year, get a nice causal estimate of X and Y, and then update how we do things next year.

And my worry is, isn’t that way, way, way too slow? Why wouldn’t we instead spend all that research effort just coming up with funding control algorithms, handing them off to NIH managers, and letting competition sort out the best algorithm in practice? Shouldn’t we just hand algorithmic control of the relevant action levers to an optimization agent, and not worry so much about perfect identification strategies to isolate the causal effect of X on Y, when we have maybe hundreds of Xs to sort through? Isn’t every single funding decision made under the previous policy by definition worse than the state of the art policy? Why would we wait a year for a robust causal estimate instead of updating our funding policy the second we get back enough data to shift the policy decision?

Anyway these are the things that keep me up at night. If anyone knows some proper papers that have done all this formally (i.e. under what conditions does it make sense to expend limited research effort on precise causal estimates instead of on direct control/optimization algorithms), do let me know.

Economic Engineering Pt II

A good natural follow-up question to the first post is what concrete examples do you have in mind?  The usual disclaimers apply, pretty much all covered by this nice paper that Wei Yang Tham pointed to, but: this doesn’t mean that research to-date hasn’t been valuable (it has!), doesn’t mean we should stop doing theory, or stop doing ‘basic’ research without a specific problem in mind, or stop doing work estimating causal quantities.  Just that the pendulum seems to me too far in one direction right now in econ (and consequently in science of science), and I think it would be productive for the field and the world if the pendulum swung a bit more toward implementable solutions. 

I’ll approach this from two angles: first, by way of pointing to other instances outside of the field that have done this well from time to time, and second, by diving into some sample problems from science of science and breaking down more specifically what I think should be done.

Market Design / Auction Design

I’ll start with a close cousin of economics, which is market design and auction design.  Here we’re thinking in the tradition of Roth, Milgrom, etc.  The moment you read one of these papers, it’s apparent they’re different in kind from most other econ.  In particular, they are proposing solutions to existing problems, like kidney exchange, hospital residency matching, and FCC spectrum auctions.  Crucially, you see this work getting picked up rather directly in practice, with economists like Al Roth and John McMillan directly employed in designing these things.

Modern work carries on this tradition – here’s a beautiful paper from 2015 by Eric Budish examining the game theoretic flaws in continuous time order book markets, and critically for our purposes, proposing and testing a specific solution in the form of discrete time frequent batched auctions.  Now, at least as far as 2017 people were still calling Eric a communist for all this, but eventually someone at NASDAQ or another exchange will go ahead and realize he and his coauthors are right and they’ll implement it, or something very like it (if they haven’t already).

There are a few items of note here.  First, I’m sure there’s some, but the absence of anyone caring about estimating a causal quantity is notable to me.  Second, notice how in all the major cases, we get a readily implementable solution to real world problems.  They’re not without flaws, it’s not like theory went out the window or anything, and for the bigger cases you still really want the actual PhDs designing the systems in practice, but they really went out and just did things.

Operations Research

Let’s step one branch further on the family tree, to my dear friends in OR.  A relevant historical note is that OR was born out of WWII, and was originally all about solving practical problems with military applications.  From there we go onto industrial applications, but I think it’s not a coincidence that from the start this field was solidly aimed at creating things for solving specific, real-world problems.

So basically, anywhere you look in OR we have useful stuff.  Say I’m opening an ice cream store, and I want to know how much ice cream to buy from my supplier – hey that’s a Newsvendor problem, we have algorithms for that.  We can treat it simply, with an extremely pared down model you could solve with a closed-form solution in your sleep, or we can relax assumptions and make it NP-hard with things like stochastic demand and lead-times.  Let’s say my ice cream shop got bigger, and now I’ve got multiple registers and lines are all out of control, how should I set these up to make sure my customers can ice cream faster?  Hey now, we’ve got some queuing theory for you!  Now I’m really in business, I’ve got an ice cream franchise on my hands, with 5 locations!  I can move ice cream from store to store, but of course that costs me – how should I distribute ice cream from the manufacturer among my stores?  That’s an inventory placement problem.  How should my stores be distributed around the city?  Network design.  How can I load up my trucks most efficiently?  Bin packing.

Now I’m not saying OR has everything solved.  Far from it.  But imagine you were at the very start, and you just said “How do I optimize my ice cream store?” that seems like a huge impossible problem.  But OR folks didn’t approach it by saying, we need microdata from every ice cream store in the U.S., and then we’ll be able to tell you in 5 years if the variable “number of registers” has a positive causal relationship with profitability – they modeled the problem, agreed where the thing was separable, and then went about inventing new techniques for solving progressively harder versions of that problem optimizing for a profit function.  And no, not every ice cream store today uses this stuff, but it is widely available, and you absolutely can hire some OR folks to help you optimize your logistics.

Machine Learning

Probably everyone is tired of hearing about this but briefly indulge me.  Much to the chagrin of statistics departments everywhere, ML has taken off despite being statistics with a jaunty hat on top.  I’m not going to pick out examples because we all already know this stuff – it’s probably so widely applicable as to be a GPT.  For our purposes though, I’m going to pick out two particular aspects of the ML adoption ramp-up which I think are salient for us to consider.  First, competition on benchmark datasets.  Imagenet is the canonical example at this point, but we’ve got plenty of others.  You see new papers come out these days, everyone is saying they’re state of the art against other solutions.  So, barring things like informational frictions, if I’ve got a computer vision problem, all I really need to do is find an implementation of something from NeurIPS in the past 3 years and bam, I’ve got a working algorithm that does something useful.  And this leads to the other salient point – the tooling got really good and really accessible.  Stata this is not.  There’s a reason almost all of this stuff is in Python, and it isn’t because it’s the fastest language.  Now on this last front to be fair to econ I know there are some people doing valiant work, but we should keep this sort of simplicity of deployment (import new_paper_neural_net, model = new_paper_neural_net(x,y)) in mind as a desirable target, not a niche nice-to-have.

Moneyball (Sabermetrics)

I’ll be honest in saying I don’t know very much about this stuff, but suffice to say that I find it embarrassing for scientists, writ large, to have been beaten to the punch on optimizing their work by sports institutions so legendarily resistant to change that Brad Pitt got to play the hero who saves the day (sort of) with data.  To pick two examples I do know, the shift in basketball shooting to favor 3-point shots (and subsequent predictable equilibrium with recently increasing midrange shots, as those get easier once defenses prioritize the perimeter), and pitch framing for catchers in baseball, which apparently is now completely ubiquitous.  And this is an interesting case because I don’t think this kind of research even produced, let’s say, software tools that are in any way comparable to like an LP or a neural net.  But you still have catchers adopting new practices based on data en masse, and I think it’s largely because the research was aimed at saying “Hey, this action is superior to doing not_action” rather than establishing something to like the high bar of causality, and because there’s a pretty nice apparatus in place for communicating these results out more widely.  So anyway. I’m sure other people who know this better could make a more thorough and convincing case, but this likewise seems relevant for us, and I have to think there are lessons to be drawn from how critical the Sloan Conference is in all this.

Conference Design, Grant Funding Design, and Lab Management

So now let’s turn to the topics at hand, and see if we can sketch out some fruitful directions for ‘economic engineering’ research in the science of science context, using lessons from the above.

First, let’s consider conference design.  The very first useful thing to be done here I think is for all the theoreticians to get together and decide what we want to optimize for, normatively.  Could be a cascade optimization, who knows: information diffusion -> peer feedback -> network formation -> good vibes.  Then I think it would be awfully useful if some of the modelers would tell us where the most important, separable subproblems in conference design are.  Just off the top of my head, I’ll call out how to assign/conduct reviews of submitted works. 

This bit is going to pop out as the framing and solution to an optimization problem, so one could imagine we end up with a benchmark dataset (e.g., Imagenet) or canonical model of the problem (e.g., Newsvendor), and then a line of research proposing new algorithms for solving the same (e.g., Computer Vision research), and then folks packaging them up very tightly in Python for use by all the thousands of people who run conferences.

Other bits pop up naturally – what cadence should you have a conference?  How big should a conference be?  How do you order the sessions?  How do you conduct the feedback?  Are discussants actually a useful feature?  Should we interlace connectivity events (like mixers) with feedback sessions? 

Next let’s talk about grant funding design.  I think there’s already some interesting writing on this, and I commend Pierre Azoulay for banging his head against the wall with NIH/NSF, but the mode I’d advise against is trying to tackle this either with some natural experiments or with A/B tests.  Why?  Well at root this can be reframed as a dynamic control problem, with interesting characteristics like the arrival of grants, time cost of writing them, the uncertain distribution of quality, and so on.  But let’s say Pierre broke through to the NIH/NSF and they ran an A/B test… would it be enough?  How many A/B tests do they have to run until we’re happy about some estimation of a causal quantity?  I think instead we need to jump directly to the ‘control’ aspect of the problem, and propose ways to directly optimize the grant funding policy (with, e.g., an LP, or MIP, or Bandits, or RL, take your pick).  And then subsequently package that up in Python and filter it out to the many, many granting organizations out there who will never have the power necessary to run statistically valid RCTs.  This is one good example I had in mind when saying we should be doing fewer A/B tests and more online optimization.

Finally, consider lab management, which somehow appears basically unstudied.  This is where the Moneyball comparison comes in most nicely.  A lot of what’s going on here, at the manager level and at the individual level, is the basically arbitrary set up and execution of workflows and behaviors.  Not unlike watching a baseball team practice in the early 20th century.  It’s all pretty haphazard. For this, I don’t think you particularly need to invent some nice optimizers or Python software, but nor do I think you want to be spending time estimating causal relationships.  Instead, I think you want to be doing like the folks at MIT have done, look at the dominant behaviors right now, and propose alternate behaviors.  Maybe this is something like Pentland proposing coffee breaks.  Maybe it’s something like Agile for researchers.  Maybe you realize we need tracking data like stadiums have before we can do this kind of research – fine, let’s go write a grant about it!  But fundamentally what are we doing here when scientists are the last profession to realize that just winging it is not going to maximize productivity? And it’s too early to tell, but I’m optimistic about ICSSI or something like it functioning like the Sloan conference, fruitfully bringing together academics and practitioners so there’s a tighter pipeline from research to implementation.

Anyway that’s a lot to think about for now! I’ll make efforts to get even more specific and hone in on one of these problems to start describing the math in more detail, but we’ll save that for a later date…

Economic Engineering

Well and we’re back. Now that my day job relates to science of science, I am once again diving back into the literature, and once again a blog is a nice way of getting myself to write down and semi-organize my thoughts.

After my years in the wilderness, let me explain some updated thinking on the state of academic econ, science of science research, where I think it is, and where I think it ought to go. Important to note of course I’ve never been an official part of academic econ, but rather have been adjacent to it for ~10 years: first in Research & Policy at Kauffman, then doing various RA work during my MBA, and then seeing things in practice at Amazon.

So from that limited perspective, I will say that my initial impression formed at Kauffman was that if only we had more and better research, technocrats could go make better policy decisions and ideally some peoples’ lives would be better. And of course I had in mind some sort of hierarchy of research ‘goodness,’ with A/B tests being gold standard, natural experiments being next, and other things after that.

Now, I did have one rather frustrating bit of disillusionment near the end there, because there was quite clear consensus on the academic side that high-skill immigration is a huge boon for the U.S., and yet Congress could not get together and do something as simple as expand the number of H1Bs available — and my recollection is the votes were there, but Democrats decided to hold that bit hostage in exchange for other concessions on other issues, which they never got anyway (this could be completely factually wrong, I’m just telling you my memory and the resulting impressions it formed). So. That was my first hint that quality research is not everything.

Then at UChicago, I was quite lucky to be allowed to take an intro-PhD course on Market Design from Eric Budish and Mohammad Akbarpour, and while as usual the math largely eluded me, I understood enough conceptually to enjoy that here was econ making a mark on the world (e.g. kidney exchange, hospital residency assignment, etc.) without needing to be reliant on politics, and (interesting point) nothing to do really with causal inference or A/B tests. So mark that as point number two on an emerging arrow of thought.

Finally I get to Amazon, and without any details, I was exposed to a lottttttt of different science than I had ever seen, particularly getting to see a lot of ML, and a lot of Operations Research, applied to practical problems. I even got to contribute to some stuff myself. Optimization, who knew! (Literally, I didn’t know what an objective function was until I was maybe 27 — life is weird). Getting to see some non-econ takes on science at a professional level: mark that as point number three.

So then this is where I’m at now, and while I’m sure some people would say “yeah obviously we all already know that…” and others might say “no what are you talking about that’s all wrong…”, my current point of view is that we could use something a bit new, which I’ll call Economic Engineering. You see, I think we have all this truly wonderous research in things like science of science, and lots of people way smarter than me working on the topic in the academy, but I will tell you that I am probably one of the most prolific consumers of this research, and not too unsophisticated I like to think, but I still feel lost a large fraction of the time. Sometimes (sometimes!) there is a nice A/B test someone ran, somewhere else, that I believe has enough external validity to be useful in my context.

But most of the time, I am awfully hungry for tools, of the sort that I saw in market design, in machine learning, and in operations research.

And I understand that the problems are a lot less nicely scoped, without clear action levers, or they have a huge design space, but… I just get the overwhelming feeling we can do better. If I’m running NeurIPS, how do I even begin to think about running the best conference I can? Let’s even just pare the problem down to what papers we ought to accept — it’s not even clear what we should optimize for! Probability of most future citations? Maximizing information diffusion? I really don’t know, and as far as I can tell, no one really does :/

So anyway, what would I like to see here? Well, suppose I’m running NeurIPS, I would love to know a) what are the things theory says I should be optimizing for, b) what are the parts of the conference most amenable to any kind of optimization technique, c) what specific algorithms are state-of-the-art for doing a given optimization in this context?

Finally I’ll say one more provocative thing, which is I think we should have way fewer A/B tests and natural experiments in the literature, and way more attempts at online optimization. The potential action space is just too large, the functions are likely nonlinear, and basically my biggest worry is we end up 20-30 years down the line still guessing at the design of major institutional features of science, like conferences and grant funding. I think academic researchers need to take more seriously the hard limits on time we have, the fact that these design decisions have to get made every year regardless, and we’d do a lot better designing these institutions if the researchers put forth more concrete tools in their papers, ready to use. I’d like to see a new Economic Engineering.

15.S07, Week 1, The Bakeoff

I am, week by week, going through Pierre Azoulay’s course on innovation.

Last week, I read “The Bakeoff,” and live-tweeted it.  Here are some extended thoughts:

At its core, this article is about a noble experiment by a naive operator (Gundrum), trying to answer the question: what is the optimal way we should form teams, in order to innovate?

I like Gundrum, because hey, he invented Mrs. Fields, but I call him naive because the 3 forms of team were kind of haphazardly thrown together.  In particular, one team is in traditional hierarchical style, and the other two are inspired by software coding practices (XP, or pair programming, and open source).

Besides the first, which is a control, I really don’t think the other two represent distinct “methods of innovation” as Gundrum intends.  Which makes sense — neither pair programming nor open source were developed in order to innovate faster or better, per se.

What would we like to see instead?  Well, just think about any good experiment.  What we want to do is a) have a theory about some independent variable X that we think will influence our rate/quality of innovation, and b) set up our teams such that, as much as possible, all that is different between them is a randomized tweak of X.  Obviously you can’t literally clone a team and do exactly that, but it’s the form we want to approximate, to say anything about X’s effect on Y (innovation).

There are of course other ways to set up the experiment (e.g., you could have one team, and introduce a plausibly exogenous shock — Z — into the system that only acts on Y through X), but basically we are after isolating effects.

I actually thought Gundrum did a reasonably good job of thinking about how to measure Y, though, after all — a representative sample engaged in a fair taste test, and voted on which cookie they liked best.  This is a not-bad, if maybe high-cost and slow, way to measure how “good” at innovating each team was (although of course, innovation != tastes best, necessarily…).

Anyway a good first exercise, and helpful in prodding us to think about how we might set up a better experiment, if we were in Gundrum’s shoes.

Annotated: “Academic signaling and the post-truth world”

Wherein I annotate things.

Today, responding to (the more fun half of) Noah Smith’s blog post, “Academic signaling and the post-truth world”:

Lots of people are freaking out about the “post-truth world” and the “war on science“. People are blaming Trump, but I think Trump is just a symptom.

For one thing, rising distrust of science long predates the current political climate; conservative rejection of climate science is a decades-old phenomenon. It’s natural for people to want to disbelieve scientific results that would lead to them making less money. And there’s always a tribal element to the arguments over how to use scientific results; conservatives accurately perceive that people who hate capitalism tend to over-emphasize scientific results that imply capitalism is fundamentally destructive.

But I think things are worse now than before. The right’s distrust of science has reached knee-jerk levels. And on the left, more seem willing to embrace things like anti-vax, and to be overly skeptical of scientific results saying GMOs are safe.

I’m choosing to skip over this bit, because many reasons, but mostly it just wouldn’t be fun for me.

Read More »

Monday Night Science & Innovation Links, December 20, 2016

Got catching up to do!  More links!

“This is just a pointer to two new (non-technical) papers of mine that look at the implications of various falling costs associated with new technologies.” — Digitopoly | Falling Costs: Two Non-Technical Papers

“Those departures put pressure on Alphabet to transform its science project into a working commercial product.” — Google is launching a new self-driving car company called Waymo – Vox

“So, to sum up: They aren’t privy to his data. He isn’t privy to them. And because they work from encrypted data, they can’t use their machine learning models on other data—and neither can he. But Craib believes the blind can lead the blind to a better hedge fund.” — Numerai Used 7,500 Faceless Coders Paid in Bitcoin to Build Its Hedge Fund’s Brain | WIRED

“If a thousand virtual worlds take shape, so too can a thousand AIs.” — Google’s Improbable Deal to Recreate the Real World in VR | WIRED

“Interested readers can view our complete recommendations, but a new Trump national space policy should declare:[…]” — Opinion: Dear President Trump: Here’s How to Make Space Great Again | WIRED

“Are Ideas Getting Harder to Find? Yes, say Bloom, Jones, Van Reenen, and Webb.” — A Very Depressing Paper on the Great Stagnation – Marginal REVOLUTION

Sunday Night Science & Innovation Links, December 19, 2016

Annnnd we’re back.  Finals is a pain.

“Using data on expenditure on research and development, and patent applications, receipts, and citations, we show that the Chinese economy has become increasingly innovative. ” — From “Made in China” to “Innovated in China”: Necessity, Prospect, and Challenges

“Yes, it may be a damaging four years for research, innovation, the economy (driven by R&D), and the environment – some irrevocable. But that’s not reason to lose hope. Instead it’s a challenge to all of us to get involved. We must be more dedicated than ever to work for change.” — Dear Scientists: Our Government Needs You – Scientific American Blog Network

“Science, rather than appearing like a human enterprise, full of fits and starts in the never-ending search for knowledge, is expected to prove claims once a week, or even more frequently. And I think that’s bad for readers and viewers.” — Why science news embargoes are bad for the public – Vox

“This means the social costs of new techniques (as opposed to the costs captured in market prices) are systematically underestimated.” — Bite-back, Joel Mokyr

“What will happen to those efforts under a Donald Trump presidency? One thing seems likely: Set aside Mars. Private companies are going to get a chance to do business on the moon.” — What a Trump presidency means for NASA and the future of space exploration — Quartz