Estrogen, COVID-19, and Rapid Reviews

In my day, philosophers encountered birth control pills at least twice during training. First and easiest, your account of causation could not simply say birth control pills reduce the chance of pregnancy - they have no effect on men’s chance. Second, and trickier, how to handle their effect on blood clots? By simulating pregnancy, they increase the risk of clots. But by preventing pregnancy they decrease it. Once, you could publish papers about that.

Well, here they are again. MIT Press has a new journal devoted entirely to rapid reviews of COVID-19 papers. (Hopkins does too.) And by way of introducing them, their most recent reviews.

This study on estrogen got two strong reviews. Women are less susceptible than men to C19.* Estrogen is one possibility. The authors confirm that post-menopausal women had worse symptoms, but then age is an even stronger risk factor.

However, thanks to medicine apparently invented to confound philosophers, it’s possible to separate estrogen from age. Among pre-menopausal women, those on oral contraceptives appear to have fared better. (This was not clear for older women on hormone replacement.) The study has some limits - it’s not a randomized trial.

But once again causal analysis of pills and blood clots is relevant, and here the pill itself provides a quasi-intervention. Causal payback, baby.

——

  • It was tempting to write “Women are less susceptible to C19 than men.” The ambiguity is delightful, and one wonders if it is true.

What is a replication

In a recent Nature essay urging pre-registering replications, Brian Nosek and Tim Errington note:

Conducting a replication demands a theoretical commitment to the features that matter.

That draws on their paper What is a replication? and Nosek’s earlier UQ talk of the same name arguing that a replication is a test with “no prior reason to expect a different outcome.”

Importantly, it’s not about procedure. I wish I’d thought of that, because it’s obvious after it’s pointed out. Unless you are offering a case study, you should want your result to replicate when there are differences in procedure.

But psychology is a complex domain with weak theory. It’s hard to know what will matter. There is no prior expectation that the well-established Weber-Fechner law would fail among the Kalahri – but it would be interesting if it did. The well-established Müller-Lyer illusion does seem to fade in some cultures. That requires different explanations.

Back to the Nature essay:

What, then, constitutes a theoretical commitment? Here’s an idea from economists: a theoretical commitment is something you’re willing to bet on. If researchers are willing to bet on a replication with wide variation in experimental details, that indicates their confidence that a phenomenon is generalizable and robust. … If they cannot suggest any design that they would bet on, perhaps they don’t even believe that the original finding is replicable.

This has the added virtue of encouraging dialogue with the original authors rather than drive-by refutations. And by pre-registering, you both declare that before you saw the results, this seemed a reasonable test. Perhaps that will help you revise beliefs given the results, and suggest productive new tests.

What is the purpose of retraction? Clearly it’s appropriate in cases of fraud or negligence. But what of the routine error of novel science? Surely this is defensible:

I agree we were wrong and an unpublished specimen will eventually prove it, but I disagree that a retraction was the best way to handle the situation.

Taken from RetractionWatch.

Retractable masks?

[In a piece about about masks] (https://somethingstillbugsme.substack.com/p/many-people-say-that-it-is-patriotic), reporter Cat Ferguson blogs at @somethingstillbugsme@substack.com:

To any journalists reading this who cover COVID-19 science: please keep an eye on Retraction Watch’s list of retracted or withdrawn papers. If something seems too good to be true, push on it. Whether it’s premature to say the retraction rate is exceptionally high for COVID-19 papers, it’s worth it to be overly skeptical….

For COVID forecasting, remember the superforecasters at Good Judgment. Currently placing US deaths by March in 200K - 1.1M, with 3:2 for above 350K, up from 1:1 on July 11.

“Foolish demon, it did not have to be so.” But Taraka was no more. ~R. Zelazny

Alan Jacobs with a cautionary tale about assuming news is representative of reality, and remembering to sanity-check our answers. blog.ayjay.org/proportio…

Protests and COVID

Worried about #COVID, I did not join #BLM protests. Even if outdoors + masks, marches bunch up, & there are only so many restrooms. It’s been an open question what effect they had. NCRC has has reviewed a 1-JUN NBER study: @ county level, seems no. Can’t address individ.

Good news: Despite case rise, excess deaths have been dropping, nearly back to 100% after high 142%. Bad news: @epiellie thinks it’s just lag: early test ➛ more lead time. Cases up 3-4 wks ago, ICU 2-3, deaths up last period. Q: why do ensemble models expect steady death rate?

I saw my old and much-loved Monash colleague #”ChrisWallace” https://en.wikipedia.org/wiki/Chris_Wallace_(computer_scientist) trending on Twitter. Alas, it turns out it’s just some reporter with a 5-second clip.

How about #WallaceTreeMultiplier, #MML, #ArrowOfTime, #SILIAC.

Open access is good, unless you're a journal?

Bob Horn sent me this news in Nature

NEWS
16 JULY 2020 Open-access Plan S to allow publishing in any journal Funders will override policies of subscription journals that don’t let scientists share accepted manuscripts under open licence. Richard Van Noorden

This seems good news, unless you’re a journal.

I expect journals to do a good quality control, and top journals to do top quality control. At minimum good review and gatekeeping (they are failing here, but assume that is fixed separately). But also, production. Most scientists can neither write nor draw, and I want journals to minimize typos and maximize production quality. If I want to struggle with scrawl, I’ll go for preprints: it’s fair game there.

So, if you (the journal) can’t charge me for access, and I still expect high quality, you need to charge up front. The obvious candidates are the authors and funders. The going rate right now seems to be around $2000 per article, which is a non-starter for authors. Authors of course want to fix this by getting the funders to pay, but that money comes from somewhere.

Challenge: How to get up-front costs below $500 per article?

Here’s some uninformed back-of-the-envelope saying that will be hard.

Editors. Even Rowling needs editors.

  • Assume paper subscriptions pay for themselves and peer review is free.
  • For simplicity, assume we’re paying one editor $50K to do all the key work.
  • Guess: they take at least 5 hours per 10-page paper on correspondence, editing, typesetting, and production. Double for benefits etc. That’s $250 per paper.

Looking good so far!

Webslingers: someone has to refill the bit buckets.

  • Suppose webslinger + servers is $64K/year. Magically including benefits.
  • The average journal has 64 articles in a year.
  • Uh-oh: that’s $1000 right there.

So… seems one webslinger needs to be able to manage about 10 journal websites. Is that doable? How well do the big publishers scale? Do they get super efficient, or fall prey to Parkinson’s law?

Alternative: societies / funders have to subsidize the journals as necessary road infrastructure. That might amount to half the costs. How much before they effectively insulate the new journals from accountability to quality control… again?

AI Bias

Dr. Rachel Thomas writes,

When we think about AI, we need to think about complicated real-world systems [because] decision-making happens within complicated real-world systems.

For example, bail/bond algorithms live in this world:

for public defenders to meet with defendants at Rikers Island, where many pre-trial detainees in NYC who can’t afford bail are held, involves a bus ride that is two hours each way and they then only get 30 minutes to see the defendant, assuming the guards are on time (which is not always the case)

Designed or not, this system leads innocents to plead guilty so they can leave jail faster than if they waited for a trial. I hope that was not the point.

Happily I don’t work bail/bond algorithms, but one decision tree is much like another. ”We do things right” means I need to ask more about decision context. We know decision theory - our customers don’t. Decisions should weigh the costs of false positives versus false negatives. It’s tempting to hand them the maximized ROC curve and make threshold choice Someone Else’s Problem. But Someone Else often accepts the default.

False positives abound in cyber-security. The lesser evil is being ignored like some nervous “check engine” light. The greater is being too easily believed. We can detect anomalies - but usually the customer has to investigate.

We can help by providing context. Does the system report confidence? Does it simply say “I don’t know?" when appropriate? Do we know the relative costs of misses and false alarms? Can the customer adjust those for the situation?

Rapid Reviews: COVID-19

The announcement of RR:C19 seems a critical step forward. Similar to Hopkins' Novel Coronavirus Research Compendium. Both are mentioned in Begley’s article.

So.. would it help to add prediction markets on replication, publication, citations?

Unsurprisingly, popular media is more popular:

A new study finds that “peer-reviewed scientific publications receiving more attention in non-scientific media are more likely to be cited than scientific publications receiving less popular media attention.”

PDF: Systematically, the NYT COVID-19 counts are high, & CovidTracker’s low. Two more in between. Small relative difference, though notable absolute (1000s). Preprint, so needs a check.

www.medrxiv.org/content/1…

From @Broniatowski: Postdoc with the MOSI project at the Institute for Data Democracy & Politics at GWU. Work on ”detecting, tracking, and correcting disinformation/misinformation.” Could start this summer.

www.gwu.jobs/postings/…

Old tabs: found a nice recommendation from one of our @replicationmarkets forecasters, on their beautifully designed website, “Follow the Argument.”

followtheargument.org/sunday-01…

A solid Rodney Brooks essay on peer review value and flaws, from inside. No solution, but worth reading for context and detail. HTT Bob Horn. rodneybrooks.com/peer-revi…

I’m not accustomed to saying anything with certainty after only one or two observations. ~Andreas Vesalius, The China Root Epistle, 1546.

PDF: US intervention timing simulations using county-level & mobility data: “had these same control measures been … 1-2 weeks earlier,” 55% of reported deaths could have been avoided. Strong claim. Haven’t looked, but counterfactuals are hard.

PDF: Simulation: age separation reduced C19 mortality even fixing interactions. “Separating just the older group… reduces the overall mortality… by 66%.” But “age separation is difficult.” ¿Is Erdös-Rényi an OK model here?

  • New short PDF Heparin and C19 reviews ~2K Spanish C19 patients. Heparin halved mortality after adjusting for age and gender; same or better after adding severity and other drugs. Needs randomized followup, but I guess ~80% likely to reproduce.

A few COVID-19 PDFs

A non-random sample of new PDFs whose titles caught my eye yesterday. Based on a quick scan, so I may have missed something. (These are brand new PDFs - so use even more doubt than for well-published papers.)

  • Mortality rate and estimate of fraction of undiagnosed… cases… Simple amateur model estimates the mortality curve for US March & April diagnoses, using data from worldometers, and a Gaussian[!?] model. March peak death rate is on the 13th day after diagnosis, similar to Chinese data. Total mortality was 21%[!!?] suggesting severe under-testing [or data problems]. Whatever, the same method applied to cases after 1-APR finds 6.4%, suggesting more testing. If the real rate is 2.4% [!?], then 89% of March cases were untested and 63% of April cases. [The 2.4% IFR seems ~4x too high by other estimates. They got it by averaging rates from China, the Diamond Princess, Iceland, and Thailand. It’s not clear if they weighted those. First author cites themselves in a basically unrelated physics paper. But then, I’m not an epidemiologist either.]

  • [Estimation of the true infection fatality rate… in each country] (https://www.medrxiv.org/content/10.1101/2020.05.13.20101071v2?%253fcollection=) Short paper adjusting estimates because low PCR exam rate means exams are restricted to suspicious cases, and vice versa. “Reported IRs [infection rates] in USA using antibody tests were 1.5% in Santa Clara, 4.6% in Los Angeles county, and 13.9% in New York state, and our estimate of TIR [true infection rate] in the whole USA is 5.0%.” Estimates US IFR [infection fatality rate] as 0.6% [95% CI: 0.33 - 1.07], slightly higher than Germany and Japan’s ~0.5%, a bit lower than Sweden 0.7%, and much lower than Italy or the UK around 1.6%. [This is similar to the running Metaculus estimate, and note the 2.4% above is way outside the interval.]

  • Estimation of the infection fatality rate… More “simple methodology” but from claims to be from science & epi faculty in Mexico. They assume all members of a household will be infected together, which is plausible. But they don’t really dig into household data, and don’t really dig into case data, but just explore the method on “available data”. Eh.

  • Reproductive number of COVID-19: A systematic review and meta-analysis…: Global average of R=2.7 [95%CI 2.2-3.3] with lots of variation. They note that’s above WHO but lower than another summary. Among their 27 studies in their analysis, only two [both South Korea], publish estimates <1. Take with salt. By non-epidemiologists. They include Diamond Princess [R>14, albeit with large error bars]. And they claim method [MLE vs SEIR vs MCMC vs…] matters but they have so few samples per category and so many comparisons I don’t think it means anything.

  • Restarting after… Comparing policies and infections in the 16 states of Germany, the UC-Davis authors find that contact restrictions were more effective than border closures. Mobility data comes from Google. Using SEIR models, they then predict the effect of ways to relax policy. They think social distancing (German-style) reduced case counts by about 97% – equivalently, total case count would have been about 38X higher. Contact restrictions were estimated to be about 50% effective, versus about 2% for border closures. [What you’d expect for closing the gate after the Trojan horse is inside.] They put a lot of effort into modeling different parts of the transportation system: cars vs. trucks; public transit. They think that compared to keeping restrictions, lifting contact restrictions will cause a 51% or a 27% increase depending on scenarios. Relaxing initial business closures yields a 29% or 16% increase. Relaxing non-essential closures by 7% or 4%.

  • Relationship between Average Daily Temperature and… (v2). The authors say warmer is better, but note it could be UV or hidden correlation with eg age, pollution, social distancing, etc. And “when no significant differences exist in the average daily temperture of two cities in the same country, there is no significant difference in the average cumulative daily rate of confirmed cases.” Fifteen F-tests, and no obvious correction for multiple tests. I didn’t read in enough detail to know the right correction, but eyeballing the results, I’m guessing that would cut out half their “significant” correlations. Still, it remains plausible .

  • A surprising formula… A mathematician at UNC Chapel Hill claims surprising accuracy using a two-phase differential equations model. (Switches from one curve to the other at point of closest approach.) I haven’t had time to dive into the details, but I’m partial to phase-space for modeling dynamical systems. The paper argues for hard measures, once triggered, a theory the author calls epidemic “momentum managing”, which he expands in a longer linked piece.

  • Disparities in Vulnerability… From 6 scholars in population health and aging, at 4 major US universities: “particular attention should be paid to the risk of adverse outcomes in midlife for non-Hispanic blacks, adults with a high school degree or less, and low-income Americans”. So, the usual. Oh, “our estimates likely understate those disparities.” An AI model trained on pre-pandemic medical claims created a vulnerability index for severe respiratory infection, by eduction, income, and race-ethnicity. High school education or less: 2x risk vs. college. Lowest income quartile: 3x risk vs highest. Both because of early onset underlying health conditions, esp. hypertension. See Risk Factor Chart below.

Oops: - ‘astonishingly, the forensic medical professional had not died at all. [And] they “do not know for sure and cannot scientifically confirm that the virus moved from the dead body.”’ ~RetractionWatch bit.ly/2Tt71Mv

Innumeracy: In mid-Feb I sent my team home due to C19 worries. ~1week later someone at the office was diagnosed, so good. Official case load in my county was still <10.

Three months later with 8K cases (1:140 people), I find I’m getting lax on cleaning & handwashing. :-s

Grumpy Geophysicist argues against public preprint servers.

Confirmation bias is a risk within the scientific community; it is positively rampant in the broader public.

Slopen science?