Wednesday, July 12, 2017

Why gene expression has a log-normal distribution

In a new paper just out, Biochemical Complexity Drives Log-Normal Variation in Genetic Expression, I explain a biological mystery: why do log-normal distributions keep showing up in gene expression data?

Anybody who's spent much time looking at gene expression data has probably noticed this: lots of distributions tend to have nice bell-curve shapes when plotted on a log scale. Consider, for example, a few samples of a gene being repressed by various levels of LmrA:

Some typical distributions taken from the Cello LmrA repressor transfer curve, all approximately log-normal

In short, these distributions are approximately log-normal, though they might also be described by one of a number of similar heavy-tailed distributions like the Gamma or Weibull distributions. Indeed, the typical explanation for gene expression variation has been that it's a Gamma distribution, based on the underlying randomness of chemical reactions causing stochastic bursts of gene expression.

What kept bugging me about that explanation, though, is that it just doesn't fit what we know about how gene expression actually works.  If it's basically about randomness in chemical reactions, then as expression gets stronger, the law of large numbers should take over and the distributions should get tighter. Think about it like flipping coins: when you flip a few coins there's a lot of variation in how many come up heads and how many come up tails, but when you flip lots of coins it always comes out pretty even.  But in most cases we deal with in synthetic biology, that just doesn't happen. Consider for example, the distributions of LmrA above: the high and low levels of expression are just about as wide, even though one's nearly 100 times higher than the other.

Instead, the answer turns out to be a beautifully simple emergent phenomenon. Gene expression is a really, really complicated chemical process. Most of the time, we don't pay attention to most of that complexity because we're not attempting to affect it, just use it as a given. But that complexity means we can describe gene expression as a catalytic chemical reaction whose rate is the product of a lot of different factors. And the same Central Limit Theorem that tells us that coin flips should make a nice bell-shaped normal distribution also says that when we multiply a lot of distributions, it should tend to a log-normal distribution.

This has a few different implications, but the most important ones are these:

  • When you are analyzing gene expression data, you should use geometric mean and geometric standard deviation, not ordinary mean and standard deviation. 
  • When you plot gene expression data, you should use logarithmic axes, not linear axes.
Any discussion of gene expression data that does otherwise, without good reason, will end up with distorted data and misleading graphs. In short: welcome to a brave new world of geometric statistics!

Friday, May 26, 2017

Communication

Last night, on my way back home from a scientific meeting, I received my first ever coherent email from my nearly-five-year-old daughter, written all by herself from her own email account as she was getting ready for sleep. Just three short sentences, complete with misspellings and in her own inimitable style, but it was the defining moment of my night, and struck me much harder than I expected.

I have saved her email in a permanent location. The content is unimportant: what matters to me is the vista of communication it opens up. I am overjoyed and frightened as my little one begins to dip her toe into the great river of human knowledge and communication. From this moment, she begins to tie herself into the much larger world beyond our home and family, her friends and her school. Now I can start to write to her directly when I travel, to send her the pictures I take and stories I write for her while I am away.

And it's also time to start talking about information safety and privacy. Knowledge, consent, boundaries. Notice, for instance, that I have not actually shared the content of her email, because I feel those are not my words to share. Just like with other big issues, like sex and relationships, my belief is that these conversations need to start happening, at the age-appropriate level, long before they are likely to start becoming critical.

I am excited and scared, and it is wonderful and terrifying. Just like so many other parts of parenting.

From the email Harriet was responding to: her stuffed animal representative on the trip and me, all sweaty from a long terminal-to-terminal running to catch a plane.

Monday, May 01, 2017

Explaining CAR T-cell therapy with marshmallows

Last week, I had fun giving a guest lecture at my daughter's preschool on some cutting edge synthetic biology research. Part of what made it so fun was figuring out how to communicate the essence of the subject on an appropriately comprehensible level.

My daughter's class has been learning about the body, things like muscles and bones and the heart and blood.  One day a few weeks ago, she came home bubbling with excitement about having made blood out of candy that day: into some diluted corn syrup (plasma), they put mini-marshmallows to be white blood cells, red cinnamon candies to be red blood cells, and sprinkles to be platelets. I thought this sounded awesome, and it inspired me to build on that for a lesson about CAR T-cell therapy.


For this lesson, you will need white, red, green, and orange mini-marshmallows, food coloring, and toothpicks. The white marshmallows are white blood cells, the pink ones are healthy cells, the green ones are germs, and the orange ones are cancer cells.


Dip the toothpicks into the food coloring, then poke them into marshmallows to make patterns of three colored dots on the marshmallows. Put patterns on the marshmallows as follows:

  • Give all of the pink healthy cells the same pattern.
  • Give the orange cancer cells a pattern that's almost the same as the healthy cells---but with one difference.
  • Give the germs patterns that are quite different from the pink healthy cells.
  • Give the white blood cells patterns that match germs, but not the healthy cells or cancer cells.

Remember that marshmallows can get flipped around, so "red-red-blue" is effectively the same as "blue-red-red"!


You should now have a bunch of marshmallows with patterns on them.  The lesson goes like this:

  • All cells have patterns of chemicals on their outsides (Show some patterns).
  • White blood cells tell which cells are diseases by matching patterns (Show some white blood cell patterns).
  • A white blood cell leaves your healthy cells alone because they don't match (Show a white blood cell not matching a healthy cell)
  • The white blood cells learn the patterns of diseases and when they match the germs (Show a white blood cell matching a germ), they kill the germ (Eat the germ marshmallow).
  • But cancer cells are tricky, and sometimes their patterns are too close to healthy cells for the white blood cells to learn their patterns (Show how the cancer cell and healthy cell patterns are similar).
  • But now there is a new type of medicine people are trying to make work, where we can take some white blood cells out and teach them a new pattern to recognize (Take a white blood cell and mark it with the cancer pattern).
  • Now we put the white blood cells back in, and they recognize the cancer (Show how the pattern matches now) and kill it! (Eat the cancer marshmallow).
That's CAR T-cell therapy in a nutshell in 5-7 minutes, minus all the details and the cautions and concerns. I had great time teaching this class, and these 3-5 year old kids asked really good questions, like "Does everybody have white blood cells?" and "How do you teach the cells the patterns?" so I think they learned.  

And as I was writing this, my daughter arrived home, bringing a heartmeltingly lovely thank you card her classmates had made.


I think that her class got it.  Science communication win!

Monday, April 24, 2017

The edge of science is never far away

My preschooler daughter started her bedtime routine rather late tonight, because I am always a sucker for certain types of questions. Tonight, she asked about how our lungs move air in and out. That led to how the heart works, then how muscles work, which zoomed in and in through fiber bundles to individual cells, fibers within the cells, and actin and myosin.

Each picture or diagram came with a "And what is inside that part? [point]" until we were looking at an actin protein, then a molecule of ATP, an oxygen atom, and finally the stark and simple table of the standard model itself. What's inside an electron? Nobody knows, or even if that question really makes sense. We know somebody who's working with CERN on the Higgs. Quarks have really funny names.  We're out at the edge of science and I'm grinning and telling her that when she grows up, she could be a scientist and help try to find out the answers to all of these questions.

We know so much about our world, remarkably much, but the nearness of the edge of science continues to exhilarate me. It doesn't take many questions to get you out there, and the path is simpler than many realize. Our children can walk it easily, if we do not discourage them and if we smile and appreciate the "I don't knows."

After Harriet gets out of her bath, we're going to omit the usual bedtime story and watch "Powers of Ten" instead. I'm looking forward to it.

What it takes to do an interlaboratory study

In another step of my ongoing quest to make synthetic biology engineering simpler and reliable, my collaborators and I are starting another big interlaboratory study focusing on precise measurement of fluorescence. We're now in the very nervous part, where all of the samples of material that everybody helping out with the study is going to measure have just been shipped out, and I'm hoping that the numbers that come back will be nice and tight, just like the preliminary study showed. 

It takes a lot of work to put a study like this together---much more than I would have anticipated before I started doing this sort of thing. We've spent several months figuring out how exactly we want to run the experiment, and documenting it all as precisely as possible in order to make sure everybody does it the same way. Then my colleague Nicholas at MIT spent quite a bit of time over the past 24 hours preparing 875 sample tubes and packing them into boxes. As Nicholas put it: "On a completely unrelated note my lab is currently low on Eppie tubes."

Nicholas DeLateur preparing samples for shipment.

One step at a time, of such careful and unglamorous work, does science and engineering move forward, and I am grateful for all of the people I have found who understand its value and join in working together on such steps.

Thursday, April 20, 2017

Reducing DNA context dependence in bacterial promoters

Swati Carr's work on insulating promoters is now out as an article in PLOS ONE, with me and Doug Densmore. I've talked about this work before, and I'm very happy to have been involved in it as something that I consider a very solid piece of engineering.

Basically, promoters are the "control switches" that determine how much a gene is expressed, and which other chemicals in the cell can regulate that expression. The problem is, at least in bacteria, that the promoters we usually use are extremely sensitive to what you put in front of them---even to the point that the tiny "scars" left in the DNA sequence from stitching genes together can have a radical effect on their operation. With Swati's method, Degenerate Insulation Screening (DIS), we now have a simple "shake and bake" engineering method for insulating these promoters, which works very well to make a promoter behave consistently, despite changes in what is placed in front of it.

Let me illustrate it very simply, with two images that I suspect will be clear to even a non-synthetic-biologist. In these pictures, the green and red bars show the behavior of two genes in a bunch of different variations of a small circuit. The more similar the bars are, the better, because it means the genes are behaving more reliably.

In short, this is your circuit:

This is your circuit without DIS:

Any questions?

Thursday, April 06, 2017

Grace on Biology

I just made this image for a talk introducing ideas in synthetic biology to programmers, and cannot resist sharing. Bonus points if you recognize one of my scientific heroes without help. Images courtesy of Wikipedia.

Thursday, March 09, 2017

Predatory publishers "cloaking" themselves with Editorial Manager

It appears that low-quality "predatory publishers" are now using the well-known and trusted Editorial Manager software to try to make themselves appear more like legitimate scientific publications. So far, the company that runs Editorial Manager is effectively supporting this practice by declining to exercise due diligence in their business partnerships. I have a guest post up on "Retraction Watch" that gives all the details.

Sunday, February 12, 2017

Zen Walks

We're having unusually beautiful weather for February right now, so it's a good time for going out on zen walks. That's my name for a way that I walk when I want to go get in touch with myself, or when I really need to think something over.

In its essence, a zen walk is quite simple: put on your shoes, go outside, and begin walking. As you go, make no decisions about which way to go, but simply follow where your feet are taking you. At every intersection (and sometimes between them also), take the motions that feel the most natural in that moment. Let go of your image of where you might be going to, so that you do not just follow a habitual path, yet also let go of the image of avoiding your habitual paths, which also constrains your movements. Go, let go, and find out one step at a time where you are going.

Zen walking is without a goal, except for being there, and it brings me space to find out what I care about and what is on my mind. I am not a person to whom contentment and satisfaction come naturally, though there is surely enough in my life that I "should" be happy about. It is far too easy for me, though, to become caught up in "should" and "ought" and lists that I make for myself that generate unhappiness in their lack of completion. A zen walk is one of my tactics for letting myself re-discover that fact and finding my way back to which things I truly care about and why.

Friday, February 10, 2017

Protein Engineering Diagrams

We've got a new paper that's just been accepted, working toward extending the SBOL visual diagram language to be able to describe the engineering of proteins as well as DNA and RNA.  The core driving force behind this effort has been Sid Cox, who's done a good bit of work in the area and has had the courage to make this first surely-imperfect proposal, with a number of others of us helping critique, refine, and bend things towards compatibility and integration.

The idea behind the language is surprisingly simple: despite the ferocious complexity of how proteins fold and interact, when we engineer with proteins our actions can often be described much more simply. Proteins, particularly in complex eukaryotic organisms, are often quite modular, with specific domains controlling things like where they go in a cell, what they interact with, and how they decay. These are, in turn, laid out along an initial single line of amino acids (and encoded in DNA or RNA), and can often be recombined by mixing and matching these components. Doing that isn't simple, but explaining what you have done and why often can be fairly simple.

That's what our new diagram language aims for. Each protein in a system is represented by a line decorated with glyphs representing structured (oval) and unstructured (line) regions, membrane domains (zigzags), binding domains (open boxes), etc. With a brief glance, you can get a pretty good idea of what the protein or protein system is supposed to do and how it's supposed to do it.
Diagram for a two-protein design that provides light-inducible programmed localization to the cell membrane.
This is by no means a finished product, but it's a good solid start. Now that we've got a proposal, people can start critiquing it, and we can start working on various tweaks and philosophical debates necessary to get it integrated with the other diagram standards already in place, like SBOLv. This won't be fast, but it should hopefully produce a reasonable consensus on how to describe what's currently typically just shown as all sorts of random ad-hoc blobs.

If your institution permits, you can see the paper where it's been accepted at ACS Synthetic Biology, or you can read a preprint, and you can also play with the associated online diagram software.