Wednesday, October 4, 2017

How to train a postdoc? - by Uschi Symmons

- by Uschi Symmons

A couple of weeks ago I was roped into a twitter discussion about postdoc training, which seemed to rapidly develop into a stalemate between the parties: postdocs, who felt they weren't getting the support and training they wanted and needed, and PIs, who felt their often substantial efforts were being ignored. Many of the arguments sounded familiar: over the past two years I’ve been actively involved in our postdoc community, and have found that when it comes to postdocs, often every side feels misunderstood. This can lead to a real impasse for improvements, so in this blog post I’ve put together a couple of points summarizing problems and some efforts we've made to work around these to improve training and support.

First off, here some of the problems we encountered:
1. postdocs are a difficult group to cater for, because they are a very diverse group in almost every aspect:
- work/lab experience and goals: ranging from college-into-grad-school-straight-into-postdoc to people who have multi-year work experience outside academia to scientists who might be on their second or third postdoc. This diversity typically also translates into future ambitions: many wish to continue in academic research, but industry/teaching/consulting/science communication are also part of the repertoire.
- training: Some postdocs come from colleges and grad schools with ample opportunity for soft-skill training. Others might never have had a formal course in even such trivial things, like paper writing or how to give a talk.
- postdoc duration: there is a fair amount of variation in how long postdocs stay, depending on both personality and field of research. In our department postdocs, for example, postdoc positions vary widely, ranging from 1-2 years (eg computational sciences, chemistry) to 5-7 years (biomedical sciences).
- nationality: I don’t know if postdocs are actually more internationally diverse than grad students, but the implications of that diversity are often greater. Some postdocs might be preparing for a career in the current country, others might want to return to their home country, which makes it difficult to offer them the same kind of support. Some postdocs may have stayed in the same country for a long time and know the funding system inside-out, others may have moved country repeatedly and have only a vague idea about grant opportunities.
- family status: when I was in grad school three people in my year (<5%) had kids. In our postdoc group that percentage is way higher (I don’t have numbers, but would put it around 30-40%), and many more are in serious long-term relationships, some of which require long commutes (think two-body problem). Thus, organising postdoc events means dealing with people on very diverse schedules.

2. In addition postdocs are also often a smaller group than grad students. For example, at UPenn, we have as many postdocs in the School of Engineering as we have grad students in a single department of the school (Bioengineering). If fact, I have often heard disappointed faculty argue that postdocs “don’t make use of available resources”, because of low turnout at events. In my experience this is not the case: organising as a grad student and a postdoc I have found that turnout is typically around 30-40% - postdoc events simply seem less attended, because the base is so much smaller.

3. Finally, Postdocs frequently have lower visibility: whereas grad students are typically seen by many faculty during the recruitment process or during classes, it is not unusual for postdocs to encounter only their immediate working group. And unlike grad students, postdocs do not come in as part of a cohort, but at different times during the year, making it also difficult to plan things like orientation meetings, where postdocs are introduced to the department in a timely manner.

Seeing all of the above, it is a no-brainer why training postdocs can be difficult. On one hand problems are conceptual: Do you try to cater to everyone’s needs or just the majority? Do you try to help the “weakest link” (the people with least prior training) or advance people who are already at the front of the field? On the other hand, there are also plenty of practical issues: Do you adjust events to the term calendar, even if postdocs arrive and leave at different times? Do you organise the same events annually or every couple of years? Is it OK to have evening/weekend events? But these are not unsolvable dilemmas. Based on our experiences during the past two years, here are some practical suggestions*:

  1. Pool resources/training opportunities with the grad school and/or other postdoc programmes close-by: for a single small postdoc program, it is impossible to cater to all needs. But more cross-talk between programs means more ground can be covered. Such cross-talk is most likely going to be a win-win situation, both because it bolsters participant numbers and because postdocs can contribute with their diverse experiences (eg in a “how to write a paper” seminar; even postdocs who want more formal training will have written at least one paper). Our postdoc programme certainly benefits from access to the events from UPenn’s Biomedical Programme, as well as a growing collaboration with GABE, our department’s graduate association.

  2. Have a well(!)-written, up-to-date wiki/resource page AND make sure you tell incoming postdocs about this. As a postdoc looking for information about pretty much anything (taxes, health insurance, funding opportunities) I often feel like Arthur in the Hitchhiker’s Guide to the Galaxy:

    Once you know where to look and what you’re looking for, it can be easy to find, but occasionally I am completely blindsided by things I should have known. This can be especially problematic for foreign postdocs (I’ve written more about that here), and so telling postdocs ahead of time about resources can avoid a lot of frustration. A good time for this could be when the offer letter is sent or when postdocs deal with their initial admin. Our department still doesn’t have a streamlined process for this, but I often get personal enquiries, and I typically refer postdocs to either the National Postdoc Association's Survival Guide for more general advice or the aforementioned Biomedical Postdoc Program for more UPenn-related information.

  3. Have an open dialogue with postdocs and listen to their needs: More often than not, I encounter PIs and admin who want to help postdocs. They provide training in areas they have identified as problematic, and given the diversity of the postdoc group most likely that training is genuinely needed by some. But often postdocs would like more: more diversity, other types of training or maybe they even completely different pressing issues. Yet, without open dialogue between departmental organisers and the postdoc community it’s hard to find out about these needs and wishes. Frustratingly, one tactic I encounter frequently is departmental organisers justifying the continuation or repetition of an event based on it's success, without ever asking the people who did not attend, or wondering if a different event would be equally well received. To build a good postdoc program, universities and departments need to get better at gauging needs and interests, even if this might mean re-thinking some events, or how current events are integrated into a bigger framework.
    This can be difficult. As a case in point, Arjun, my PI, likes to point out that, when asked, the vast majority of postdocs request training in how to get a faculty position. So departments organise events about getting faculty positions. In fact, I am swamped with opportunities to attend panel discussions on “How to get a job in academia”: we have an annual one in our School, multiple other departments at the university host such discussions and it’s a much-favored trainee event at conferences. But after seeing two or three such panels, there’s little additional information to be gained. This does not mean that departments should do away with such panels, but coordinating with other departments (see point 1) or mixing it up with other events (eg by rotating events in two to three year cycles) would provide the opportunity to cater to the additional interests of postdocs.
    Frequent topics I’ve heard postdocs ask for are management skills, teaching skills, grant writing and external feedback/mentoring by faculty. For us, successful new programs included participation in a Junior Investigators Symposium on campus, which included two most positively received sessions about writing K/R awards and a “speed mentoring” session, where faculty provided career feedback in a 10-minute, one-on-one setting. Similarly, postdocs at our school who are interested in teaching can partake in training opportunities by UPenn’s Center for Teaching and Learning, and those interested in industry and the business side of science can make use of a paid internship program by Penn’s Center for Innovation to learn about IP and commercialization. While only a small number of postdocs make use of these opportunities per year, the provide a very valuable complement to the programs offered by the school/department. 

  4. Make a little bit of money go a long way: Many fledgling postdoc programs, such as ours, operate on a shoestring. Obviously, in an ideal world neither PIs nor administrative bodies should shy away from spending money on postdoc training - after all, postdocs are hired as trainees. But in reality it is often difficult to get substantial monetary support: individual PIs might not want to pay for events that are not of interest for their own postdocs (and not every event will cater for every postdoc) and admin may not see the return on investment for activities not directly related to research. However, you may have noticed that many of the above suggestions involved little or no additional financial resources: faculty are often more than willing to donate their time to postdoc events, postdocs themselves can contribute to resources such as wikis, and collaborations with other programs on campus can help cover smaller costs. In addition, individual postdocs may have grants or fellowships with money earmarked for training. Encouraging them to use those resources can be of great value, especially if they are willing to share some of the knowledge they gained. My EMBO postdoctoral fellowship paid for an amazing 3-day lab management course, and I am currently discussing with our graduate association to implement some of the training exercises that we were taught.

As my final point I’d like to say that I personally very rarely encounter faculty who consider postdocs  cheap labor. If anything, most PIs I talk to have their postdocs best interest at heart. Similarly, postdocs are often more than willing to organize events and mediate the needs of their fellows. However, in the long run the efforts of individual PIs and postdocs cannot replace a well-organized institutional program, which I think likely will require taking on board some of my above suggestions and building them into a more systematic training program.

*The National Postdoc Association has a much more elaborate toolkit for setting up and maintaining a postdoc association and there's also a great article about initiating and maintaining a postdoc organisation by Bruckman and Sebestyen. However, not all postdoc groups have the manpower or momentum to directly dive into such an program, so the tips listed here are more to get postdocs involved initially and create that sense of community and momentum to build an association.

Wednesday, August 2, 2017

Figure scripting and how we organize computational work in the lab

Saw a recent Twitter poll from Casey Brown on the topic of figure scripting vs. "Illustrator magic", the former of which is the practice of writing a program to completely generate the figure vs. putting figures into Illustrator to make things look the way you like. Some folks really like programming it all, while I've argued that I don't think this is very efficient, and so arguments go back on forth on Twitter about it. Thing is, I think ALL of us having this discussion here are already way in the right hand tail in terms of trying to be tidy about our computational work, while many (most?) folks out there haven't ever really thought about this at all and could potentially benefit from a discussion of what an organized computational analysis would look like in practice. So anyway, here's what we do, along with some discussion of why and what the tradeoffs are (including talking about figure scripting.

First off, what is the goal? Here, I'm talking about how one might organize a computational analysis in finalized form for a paper (will touch on exploratory analysis later). In my mind, the goal is to have a well-organized, well-documented, readable and, most importantly, complete and consistent record of the computational analysis, from raw data to plots. This has a number of benefits: 1. it is more likely to be free of mistakes; 2. it is easier for others (including within the lab) to understand and reproduce the details of your analysis; 3. it is more likely to be free of mistakes. Did I mention more likely to be free of mistakes? Will talk about that more in a coming post, but that's been the driving force for me as the analyses that we do in the lab become more and more complex.

[If you want to skip the details and get more to the principles behind them, please skip down a bit.]

Okay, so what we've settled on in lab is to have a folder structured like this (version controlled or Dropboxed, whatever):

I'll focus on the "paper" folder, which is ultimately what most people care about. The first thing is "extractionScripts". This contains scripts that pull out numbers from data and store them for further plot-making. Let me take this through the example of image data in the lab. We have a large software toolset called rajlabimagetools that we use for analyzing raw data (and that has it's own whole set of design choices for reproducibility, but that's a story for another day). That stores, alongside the raw data, analysis files that contain things like spot counts and cell outlines and thresholds and so forth. The extraction scripts pull data from those analysis files and puts it into .csv files, which are stored in extractedData. For an analogy with sequencing, this is like maybe taking some form of RNA-seq data and setting up a table of TPM values in a .csv file. Or whatever, you get the point. plotScripts then contains all the actual plotting scripts. These load the .csv files and run whatever to make graphical elements (like a series of histograms or whatever) and stores them in the graphs folder. finalFigures then contains the Illustrator files in which we compile the individual graphs into figures. Along with each figure (like, we have a Fig1readme.txt that describes exactly what .eps or .pdf files from the graphs folders ended up in, say, Figure 1f (and, ideally, what script). Thus, everything is traceable back from the figure all the way to raw data. Note: within the extractionScripts is a file called "extractAll.m" and in plotScripts "plotAll.R" or something like that. These master scripts basically pull all the data and make all the graphs, and we rerun these completely from scratch right before submission to make sure nothing changed. Incidentally, of course, each of the folders often has a massive number of subfolders and so forth, but you get the idea.

What are the tradeoffs that led us to this workflow? First off, why did we separate things out this way? Back when I was a postdoc (yes, I've been doing various forms of this since 2007 or so), I tried to just arrange things by having a folder per figure. This seemed logical at the time, and has the benefit that the output of the scripts are in close proximity to the script itself (and the figure), but the problem was that figures kept getting endlessly rearranged and remixed, leading to endless tedious (and error-prone) rescripting to regain consistency. So now we just pull in graphical elements as needed. This makes things a bit tricky, since for any particular graph it's not immediately obvious what made that graph, but it's usually not too hard to figure out with some simple searching for filenames (and some verbose naming conventions).

The other thing is why have the extraction scripts separated from the plots? Well, in practice, the raw data is just too huge to distribute easily this way, and if it was all mushed together with the code and intermediates, it would be hard to distribute. But, at least in our case, the more important fact is that most people don't really care about the raw data. They trust that we've probably done that part right, and what they're most interested are the tables of extracted data. So this way, in the paper folder, we've documented how we pulled out the data along while keeping the focus on what most people will be most interested in.

[End of nitty gritty here.]

And then, of course, figure scripting, the topic that brought this whole thing up in the first place. A few thoughts. I get that in principle, scripting is great, because it provides complete documentation, and also because it potentially cuts down on errors. In practice, I think it's hard to efficiently make great figures this way, so we've chosen perhaps a slightly more tedious and error prone but flexible way to make our figures. We use scripts to generate PDFs or EPSs of all relevant graphical elements, typically not spending time to optimize even things like font size and so forth (mostly because all of those have to change so many times in the end anyway). Yes, there is a cost here in terms of redoing things if you end up changing the analysis or plot. Claus Wilke argued that this discourages people from redoing plots, which I think has some truth to it. At the same time, I think that the big problem with figure scripting is that it discourages graphical innovation and encourages people to use lazy defaults that usually suffer from bad design principles—indeed, I would argue it's way too much work currently to make truly good graphics programmatically. Take this example:

Or imagine writing a script for this one:

Maybe you like or don't like these type of figures, but either way, not only would it take FOREVER to write up a script for these (at least for me), but by the time you've done it, you would probably never build up the courage to remix these figures the dozen or so times we've reworked this one over the course of publication. It's just faster, easier, and more intuitive to do with a tool for, you know, playing with graphical elements, which I think encourages innovation. Also, many forms of labeling of graphs that reduce cognitive burden (like putting text descriptors directly next to the line or histogram that they label) are much easier in Illustrator and much harder to do programmatically, so again, this works best for us. It does also, however, introduce a human element for error, and that has happened to us, although I should say that programmatic figures are a typo away from errors as well, and that's happened, too. There is also the option to link figures, and we have done that with images in the past, but in the end, relying on Illustrator to find and maintain links as files get copied around just ended up being too much of a headache.

Note that this is how we organize final figures, but what about exploratory data analysis? In our lab, that ends up being a bit more ad-hoc, although some of the same principles apply. Following the full strictures for everything can get tedious and inhibitory, but one of the main things we try and encourage in the lab is keeping a computational lab notebook. This is like an experimental lab notebook, but, uhh, for computation. Like "I did this, hoped to see this, here's the graph, didn't work." This has been, in practice, a huge win for us, because it's a lot easier to understand human descriptions of a workflow than try and read code, especially after a long time and double especially for newcomers to the lab. Note: I do not think version control and commit messages serve this purpose, because version control is trying to solve a fundamentally different problem than exploratory analysis. Anyway, talked about this computational lab notebook thing before, should write something more about it sometime.

One final point: like I said, one of the main benefits to these sorts of workflows is that they help minimize mistakes. That said, mistakes are going to happen. There is no system that is foolproof, and ultimately, the results will only be as trustworthy as the practitioner is careful. More on that in another post as well.

Anyway, very interested in what other people's workflows look like. Almost certainly many ways to skin the cat, and curious what the tradeoffs are.

Sunday, July 30, 2017

Can we measure science?

I was writing a couple grants recently, some with page limits and some with word limits. Which of course got me thinking about the differences in how to game these two constraints. If you have a word limit, you definitely don’t want to use up your limit on a bunch of little words, which might lead to a bit more long-wordiness. With the page limit, though, you spend endless time trying to use shorter words to get that one pesky paragraph one little line shorter (and hope the figures don’t jump around). Each of these constraints has its own little set of games we play trying to obey the letter of the law while seemingly breaking its spirit. But here’s the thing: no amount of "gaming the system" will ever allow me to squeeze a 10 page grant into 5 pages. While there’s always some gamesmanship, in the end, it is hard to break the spirit of the metric, at least in a way that really matters. [Side note, whoever that reviewer was who complained that I left 2-3 inches of white space at the end of my last NIH grant, that was dumb—and yes, turns out the whole method does indeed work.]

I was thinking about this especially in the context of metrics in science, which is predicated on the idea that we can measure science. You know, things like citations and h-index and impact factor and RCR (NIH’s relative citation ratio) and so forth. All of which many (if not most) scientists these days declare as being highly controversial and without any utility or merit—"Just read the damn papers!" is the new (and seemingly only) solution to everything that ails science. Gotta say, this whole thing strikes me as surprisingly unscientific. I mean, we spend our whole lives predicated on the notion that carefully measuring things is the way to understand the world around us, and yet as soon as we turn the lens on ourselves, it’s all “oh, it’s so horribly biased, it’s a popularity contest, all these metrics are gamed, it’s there’s no way to measure someone’s science other than just reading their papers. Oh, and did I mention that time so and so didn’t cite my paper? What a jerk.” Is everyone and every paper a special snowflake? Well, turns out you can measure snowflakes, too (Libbrecht's snowflake work is pretty cool, BTW 1 2).

I mean, seriously, I think most of us wish we had the sort of nice quantitative data in biology that we have with bibliometrics. And I think it’s reasonably predictive as well. Overall, better papers end up with more citations, and I would venture to say that the predictive power is better than most of what we find in biology. Careers have certainly been made on worse correlations. But, unlike the rest of biomedical science, any time someone even insinuates that metrics might be useful, out come the anecdotes:
  • “What about this undercited gem?” [typically one of your own papers]
  • “What about this overhyped paper that ended up being wrong?” [always someone else’s paper]
  • “What about this bubble in this field?” [most certainly not your own field]
Ever see the movie “Minority Report”, where there are these trio of psychics that can predict virtually every murder, leading to a virtually murder-free society? And it’s all brought down because of a single case the system gets wrong about Tom Cruise? Well, sign me up for the murder-free society and send Tom Cruise to jail, please. I think most scientists would agree that self-driving cars will lead to statistically far fewer accidents than human-driven cars, and so even if there’s an accident here and there, it’s the right thing to do. Why doesn’t this rational approach translate to how we think about measuring the scientific enterprise?

Some will say these metrics are all biased. Like, some fields are more hot than others, certain types of papers get more citations, and so forth. Since when does this mean we throw our hands up in the air and just say “Oh well, looks like we can’t do anything with these data!”? What if we said, oh, got more reads with this sequencing library than that sequencing library, so oh well, let’s just drop the whole thing? Nope, we try to correct and de-bias the data. I actually think NIH did a pretty good job of this with their relative citation ratio, which generally seems to identify the most important papers in a given area. Give it a try. (Incidentally, for those who maintained that NIH was simplistic and thoughtless in how it was trying to measure science during the infamous "Rule of 21" debate, I think this paper explaining how RCR works belies that notion. Let's give these folks some credit.)

While I think that citations are generally a pretty good indicator, the obvious problem is that for evaluating younger scientists, we can't wait for citations to accrue, which brings us to the dreaded Impact Factor. The litany of perceived problems with impact factor is too long and frankly too boring to reiterate here, but yes, they are all valid points. Nevertheless, the fact remains that there is a good amount of signal along with the noise. Better journals will typically have better papers. I will spend more time reading papers in better journals. Duh. Look, part of the problem is that we're expecting too much out of all these metrics (restriction of range problem). Here's an illustrative example. Two papers published essentially simultaneously, one in Nature and one in Physics Review Letters, with essentially the same cool result: DNA overwinds when stretched. As of this writing, the Nature paper has 280 citations, and the PRL paper has 122. Bias! The system is rigged! Death to impact factor! Or, more rationally, two nice papers in quality journals, both with a good number of citations. And I'm guessing that virtually any decent review on the topic is going to point me to both papers. Even in our supposedly quantitative branch of biology, aren't we always saying "Eh, factor of two, pretty much the same, it's biology…"? Point is, I view it as a threshold. Sure, if you ONLY read papers in the holy triumvirate of Cell, Science and Nature, then yeah, you're going to miss out on a lot of awesome science—and I don't know a single scientist who does that. (It would also be pretty stupid to not read anything in those journals, can we all agree to that as well?) And there is certainly a visibility boost that comes with those journals that you might not get otherwise. But if you do good work, it will more often than not publish well and be recognized.

Thing is that we keep hearing these "system is broken" anecdotes about hidden gems while ignoring all the times when things actually work out. Here's a counter-anecdote from my own time in graduate school. Towards the end of my PhD, I finally wrapped up my work on stochastic gene expression in mammalian cells, and we sent it to Science, Nature and PNAS (I think), with editorial rejections from all three (yes, this journal shopping is a demoralizing waste of time). Next stop was PLoS Biology, which was a pretty new journal at the time, and I remember liking the whole open access thing. Submitted, accepted, and then there it sat. I worked at a small institute (Public Health Research Institute), and my advisor Sanjay Tyagi, while definitely one of the most brilliant scientists I know, was not at all known in the single cell field (which, for the record, did actually exist before scRNA-seq). So nobody was criss-crossing the globe giving talks at international conferences on this work, and I was just some lowly graduate student. And yet even early on, it started getting citations, and now 10+ years later, it is my most cited primary research paper—and, I would say, probably my most influential work, even compared to other papers in "fancier" journals. And, let me also say that there were several other similar papers that came out around the same time (Golding et al. Cell 2005, Chubb et al. Curr Biol 2006, Zenklusen and Larson et al. Nat Struct Mol Bio 2008), all of which have fared well over time. Cool results (at least within the field), good journals, good recognition, great! By the way, I can't help but wonder if we had published this paper in the hypothetical preprint-only journal-less utopia that seems all the rage these days, would anyone have even noticed, given our low visibility in the field?

So what should we do with metrics? To be clear, I'm not saying that we should only use metrics in evaluation, and I agree that there are some very real problems with them (in particular, trainees' obsession with the fanciest of journals—chill people!). But I think that the judicious use of metrics in scientific evaluation does have merit. One area I've been thinking about is more nefarious forms of bias, like gender and race, which came up in a recent Twitter discussion with Anne Carpenter. Context was whether women face bias in citation counts. And the answer, perhaps unsurprisingly, is yes—check out this careful study in astrophysics (also 1 2 with similar effects). So again, should we just throw our hands up and say "Metrics are biased, let's toss them!"? I would argue no. The paper concludes that the bias in citation count is about 10% (actually 5% raw, then corrected to 10%). Okay, let's play this out in the context of hiring. Let's say you have two men, one with 10% fewer citations than the other. I'm guessing most search committees aren't going to care much whether one has 500 cites on their big paper instead of 550. But now let's keep it equal and put a woman's name on one of the applications. Turns out there are studies on that as well, showing a >20% decrease in hireability, even for a technician position, and my guess is that this would be far worse in the context of faculty hiring. I've know of at least two stories of people combating bias—effectively, I might add—in these higher level academic selection processes by using hard metrics. Even simple stuff like counting the number of women speakers and attendees at a conference can help. Take a look at the Salk gender discrimination lawsuit. Yes, the response from Salk about how the women scientists in question had no recent Cell, Science, or Nature papers or whatever is absurd, but notice that the lawsuits themselves mention various metrics: percentages, salary, space, grants, not to mention "glam" things like being in the National Academies as proxies for reputation. Don't these hard facts make their case far stronger and harder to dismiss? Indeed, isn't the fact that we have metrics to quantify bias critical here? Rather than saying "citations are biased, let's not use them", how about we just boost women's cites by 10% in any comparison involving citations, adjusting as new data comes in?

Another interesting aspect of the metric debate is that people tend to use them when it suits their agenda and dismiss them when they don't. This became particularly apparent in the Rule of 21 debate, which was cast as having two sides: those with lots of grants and seemingly low per dollar productivity per Lauer's graphs, and those with not much money and seemingly high per dollar productivity. At the high end were those complaining that we don't have a good way to measure science, presumably to justify their high grant costs because the metrics fail to recognize just how super-DUPER important their work is. Only to turn around and say that actually, upon reanalysis, their output numbers actually justify their high grant dollars. So which is it? On the other end, we have the "riff-raff" railing against metrics like citation counts for measuring science, only to embrace them wholeheartedly when they show that those with lower grant funding yielded seemingly more bang for the buck. Again, which is it? (The irony is that the (yes, correlative) data seem to argue most for increasing those with 1.5 grants to 2.5 or so, which probably pleases neither side, really.)

Anyway, metrics are flawed, data are flawed, methodologies are flawed, that's all of science. Nevertheless, we keep at it, and try to let the data guide us to the truth. I see no reason that the study of the scientific enterprise itself should be any different. Oh, and in case I still have your attention, you know, there's this one woefully undercited gem from our lab that I'd love to tell you about… :)

Tuesday, July 4, 2017

A system for paid reviews?

Some discussion on the internet about how slow reviews have gotten and how few reviewers respond, etc. The suggestion floated was paid review, something on the order of $100 per review. I have always found this idea weird, but I have to say that I think review times have gotten bad enough that perhaps we have to do something, and some economists have some research showing that paid reviews speed up review.

In practice, lots of hurdles. Perhaps the most obvious way to do this would be to have journals pay for reviews. The problem would be that it would make publishing even more expensive. Let's say a paper gets 6-9 reviews before getting accepted. Then in order for the journal to be made whole, they'd either take a hit on their crazy profits (haha!), or they'd pass that along in publication charges.

How about this instead? When you submit your paper, you (optionally) pay up front for timely reviews. Like, $300 extra for the reviews, on the assumption that you get a decision within 2 weeks (if not, you get a refund). Journal maybe can even keep a small cut of this for payment overhead. Perhaps a smaller fee for re-review. Would I pay $300 for a decision within 2 weeks instead of 2 months? Often times, I think the answer would be yes.

I think this would have the added benefit of people submitting fewer papers. Perhaps people would think a bit harder before submitting their work and try a bit harder to clean things up before submission. Right now, submitting a paper incurs an overhead on the community to read, understand and provide critical feedback for your paper at essentially no cost to the author, which is perhaps at least part of the reason the system is straining so badly.

One could imagine doing this on BioRxiv, even. Have a service where authors pay and someone commissions paid reviews, THEN the paper gets shopped to journals, maybe after revisions. Something was out there like this (Axios Review), but I guess it closed recently, so maybe not such a hot idea after all.


Friday, June 30, 2017


___ toiled over ridiculous reviewer experiments for over a year for the honor of being 4th author.

___ did all the work but somehow ended up second author because the first author "had no papers".

___ told the first author to drop the project several times before being glad they themselves thought of it.

___ was better to have as an author than as a reviewer.

___ ceased caring about this paper about 2 years ago.

Nobody's quite sure why ___ is an author, but it seems weird to take them off now.

___ made a real fuss about being second vs. third author, so we made them co-second author, which only serves to signal their own utter pettiness to the community.

Friday, May 5, 2017

Just another Now-that-I'm-a-PI-I-get-nothing-done day

Just had another one of those typically I-got-nothing-done days. I’m sure most PIs know the feeling: the day is somehow over, and you’re exhausted, and you feel like you’ve got absolutely nothing to show for it. Like many, I've had more of these days than I'd care to count, but this one was almost poetically unproductive, because here I am at the end of the day, literally staring at the same damn sentence I’ve been trying to write since the morning.

Why the case of writer's block? Because I spent today like most other work days: sitting in the lab, getting interrupted a gazillion times, not being able to focus. I mean, I know what I should do to get that sentence written. I could have worked from home, or locked myself in my office, and I know all the productivity rules I violate on a routine basis. But then I thought back on what really happened today…

Arrived, sat down, opened laptop, started looking at that sentence. Talked with Sydney about strategy for her first grant. Then met with Caroline to go over slides for her committee meeting—we came up with a great scheme for presenting the work, including some nice schematics illustrating the main points. Went over some final figure versions from Eduardo, which were greatly improved from the previous version, and also talked about the screens he’s running (some technical problems, but overall promising). And also, Eduardo and I figured out the logic needed for writing that cursed sentence. Somewhere in there, watched Sara hit submit on the final revisions for her first corresponding author paper! Meanwhile, Ian’s RNATag-seq data is looking great, and the first few principal components are showing exactly what we want. Joked around with Lauren about some mistake in the analysis code for her images, and talked about her latest (excellent) idea to dramatically improve the results. Went to lunch with good friend and colleague John Murray, talked about kids and also about a cool new idea we have brewing in the lab; John had a great idea for a trick to make the data even cooler. Chris dragged me into the scope room because the CO2 valve on the live imaging setup was getting warm to the touch, probably because CO2 had been leaking out all over the place because a hose came undone. No problem, I said, should be fine—and glad nobody passed out in the room. Uschi showed me a technical point in her SNP FISH analysis that suggests we can dramatically reduce our false-positive rate, which is awesome (and I’m so proud of all the coding she’s learned!). I filled our cell dewar with liquid nitrogen for a while, looks like it’s fully operational, so can throw away the return box. Sydney pulled me into the scope room to look at this amazing new real-time machine learning image segmentation software that Chris had installed. Paul’s back in med school, but dropped by and we chatted about his residency applications for a bit. While we were chatting, Lauren dropped off half a coffee milkshake I won in a bet. Then off to group meeting, which started with a spirited discussion about how to make sure people make more buffers when we run out, after which Ally showed off the latest genes she’s been imaging with expansion microscopy, and Sareh gave her first lab meeting presentation (yay!) on gene induction (Sara brought snacks). Then collaborators Raj and Parisha stayed for a bit after group meeting to chat about that new idea I’d talked about with John—they love the idea, but brought up a major technical hurdle that we spent a while trying to figure out (I think we’ll solve it, either with brains or brute force). And then, sat down, stared at that one half-finished sentence again, only to see that it was time to bike home to deal with the kids.

So yeah, an objective measure of the day would definitely be, hey, I was supposed to write this one sentence, and I couldn’t even get that done. But all in all, now that I think about it, it was a pretty great day! I think PIs often lament their lack of time to think, reminiscing about the Good Old Days when we had time to just focus on our work with no distractions, that we maybe forget about how lucky we are to have such rich lives filled with interesting people doing interesting things.

That said, that sentence isn’t going to write itself. Hmm. Well, maybe if I wait long enough…

Wednesday, May 3, 2017

Quick take on NIH point scale: will this shift budget uncertainty to the NIH?

Just heard about the new NIH point scale, and was puzzling through some of the implications. First, quick summary: NIH, in an effort to split the pie more evenly, is implementing a system in which each grant you have is assigned a point value, and you are capped at 21 points (3 R01 equivalents). Other grants are worth less. The consequences of this are of course vast, and I'm assuming most of this is going to be covered elsewhere. I'll just say that I do think some labs are just plain overfunded, so this will probably help with that. Also, it's clear from the point breakdown that some things are incentivized and disincentivized, which probably has some pluses and minuses.

Anyway, I did start wondering about what life would be like for a big lab working with 3 R01s. One of the realities of running such a lab is budget uncertainty. I remember early on when I started at Penn, a (very successful) senior faculty member took me to lunch and was talking about funding and said, "Jeez, my lab is too big, and I've been thinking about how I got here. Thing is you have a grant expiring and you want to replace it, so you have to submit 3 grants hoping that one will come in, but then maybe you get 2 or even all 3, and now you have to spend the money, and your lab gets too big." Clearly, this is bad, and the new system will really help with that. I guess what will happen is that if you get those 3 grants, then you will only take one of them. And, you may have to give back the rest of the grant you already have so that you don't go over 21. Think about this now from the point of view of the NIH: you're going to have money coming back that you didn't expect, and grants not funded that you thought would be funded. The latter is I suppose easy to deal with (just give it to someone else), but I wouldn't be surprised if the former might cause some budgetary problems. Basically, the fluctuations in funding would shift from the PIs to the NIH. Which I think is on balance a good thing. It makes a lot more sense to have NIH manage a large pool of uncertainty in funding than to have individual scientists try and manage crazy step function changes in funding, which will hopefully allow scientists to have more certainty on how much money to expect moving forward. Nice. But maybe I haven't thought through all the angles here.