SCNS Toolkit

Single Cell Network Synthesis Toolkit

Synthesis Engine
Scripts
Datasets
Publications

Introduction

Determining the structure and function of transcriptional regulatory networks will be crucial to advancing our understanding of developmental and disease processes, including how to better inform reprogramming strategies. Classical networks derived from population-based data are founded on measurements reporting averages across potentially millions of cells and so provide little insight into the cellular heterogeneity that may be critical for the lineage commitment of individual cells. Single cell gene expression profiling therefore offers an attractive alternative for generating transcriptional regulatory networks at a greater resolution.

The SCNS Toolkit is a set of tools for the synthesis of Boolean gene regulatory networks from single cell gene expression experiments. It was originally designed for single cell qPCR data but can also be used with RNAseq. The toolkit produces binary gene expression values from measured data, which can be viewed as the state space of an asynchronous Boolean network. A synthesis algorithm is then used to identify the underlying Boolean logic between genes, from which networks can be built. Network stable state analysis and in silico perturbations can be carried out to generate hypotheses about gene regulation and function.

All code is freely available at GitHub (https://github.com/swoodhouse/SCNS-Toolkit).

Synthesis Engine

The synthesis engine is written in F# and uses the Z3 theorem prover. It compiles and runs on Linux with F# 3.1 and Mono 3.12.1, and on Windows with F# 3.1 and .NET 4.5. Mac OS X is currently untested.

Details of how our algorithm works are provided in Synthesising Executable Gene Regulatory Networks from Single-cell Gene Expression Data, Computer Aided Verification (CAV), 2015.

Scripts

Two scripts are provided in the toolkit, for converting single cell gene expression data into a format that can be handled by the synthesis engine, and for subsequent analysis of synthesised Boolean networks.

constructSTG.fsx is an F# script for Linux, Windows or Mac OS X which discretises a CSV file containing single cell gene expression data to binary expression values, and then constructs a state transition graph. The input CSV file must have genes as column names and unique cell identifiers as row names. The script produces CSV files for input to the synthesis engine and a SIF file for visualisation in Cytoscape (or a similar tool).

genysis_perturbations.R is an R script for Linux which automates the process of running GenYsis (http://lsisrv5.epfl.ch/lsi/~garg/genysis_v2.html) on a model and performing all single-gene perturbations.

The perturbed models are then compared to the wild-type model in terms of alterations to the stable states that the model is able to reach. Both a failure to reach states normally reachable for the wild-type model, as well as stabilisation at novel "unnatural" states can be important, with the former mimicking for example the failure of a cell to develop down a given lineage, while the latter could be used to gain mechanistic understanding of pathological cellular states (such as in cancer cells). A summary of these results are collated into a CSV file.

Datasets

Blood development study

Decoding the regulatory network of early blood development from single-cell gene expression measurements, Nature Biotechnology, 2015.

Ct values from the Fluidigm BioMark (datasets/Ct_values.xlsx). Filtered but un-normalised Ct values for 46 genes in 3,934 single cells measured on the Fluidigm BioMark platform. The prefix (e.g. PSA2) refers to the embryonic stage and sorting strategy (PS), the sort (A or B) and the embryo number that the cell came from. The following three-digit number is the unique identifier of the cell. For more details on filtering, see the paper.

Binary gene expression values (datasets/Binary_expression_values.xlsx). The 1,448 binary states in the connected state graph, labelled according to the first anatomical stage at which that cell state arose. Gene expression was assigned a value of 1 where amplification was detected above the limit of detection. Values at or below the limit of detection (assigned Ct value 25) were given a value of zero.

List of cells (datasets/Reachability_states.xlsx) used as the initial and final cell states for the reachability constraint of the network synthesis.

List of cells (datasets/Equal_cell_states.xlsx) for which the binary states for the 33 TFs are the same. Includes cells that do not appear in the state transition graph.

Stable state analysis after in silico perturbations (datasets/Perturbations.xlsx). Each gene in the Boolean network was either knocked out or overexpressed and the stable state analysis was repeated. 1 indicates that the gene is expressed and 0 that it is not expressed. Also provided is a summary of the perturbation, including the number of TFs that regulate each factor and that it regulates, which wild type stable states each perturbation can reach, and the number of additional stable states the perturbed network can reach.

Where a stable state differs from the wild type by only the factor that has been perturbed, it is considered as being unable to reach the wild type state. For example, the knockout of Lmo2 is considered unable to reach state 7 because Lmo2 is not expressed.

Consensus model (datasets/combined_embryo_model.net) of the synthesised Boolean networks, for stable state analysis using GenYsis.

Publications

Decoding the regulatory network of early blood development from single-cell gene expression measurements, Nature Biotechnology, 2015.
Synthesising Executable Gene Regulatory Networks from Single-cell Gene Expression Data, Computer Aided Verification (CAV), 2015.
Diffusion maps for high-dimensional single-cell analysis of differentiation data, Bioinformatics, 2015.