Vireo

Vireo is a flexible demultiplexing software that can demutliplex without any reference SNP genotypes, with reference SNP genotypes for a subset of the donors in the pool or no reference SNP genotypes. If you have reference SNP genotypes for all of the donors in your pool, you could also use Demuxlet or Souporcell. If you don’t have reference SNP genotypes, you could alternatively use Freemuxlet or ScSplit.

Data

This is the data that you will need to have preparede to run Vireo:

Required

  • Common SNP genotypes vcf ($VCF)

    • If you have reference SNP genotypes for individuals in your pool, you can use those

      • For Vireo you should only have the donors that are in this pool in the vcf file

    • If you do not have reference SNP genotypes, they can be from any large population resource (i.e. 1000 Genomes or HRC)

    • Filter for common SNPs (> 5% minor allele frequency) and SNPs overlapping genes

  • Barcode file ($BARCODES)

  • Number of samples in pool ($N)

  • Bam file ($BAM)

    • Aligned single cell reads

  • Output directory ($VIREO_OUTDIR)

Run Vireo

CellSNP Pileup

First, you need to count the number of alleles at each SNP in each droplet using cellSNP-lite:

singularity exec Demuxafy.sif cellsnp-lite -s $BAM -b $BARCODES -O $VIREO_OUTDIR -R $VCF -p 20 --minMAF 0.1 --minCOUNT 20 --gzip

You can alter the -p, --minMAF and --minCOUNT parameters to fit your data and your needs. We have found these settings to work well with our data.

If the pileup is successful, you will have this new file in your $VIREO_OUTDIR:

.
├── cellSNP.base.vcf.gz
├── cellSNP.samples.tsv
├── cellSNP.tag.AD.mtx
├── cellSNP.tag.DP.mtx
└── cellSNP.tag.OTH.mtx

Additional details about outputs are available below in the Vireo Results and Interpretation.

Demultiplex with Vireo

Next, we can use the cellSNP results to demultiplex the data with Vireo. As already mentioned, you can use Vireo with multiple different levels of reference SNP genotypes. We’ve provided an example command for each of these differing amounts of donor SNP genotype data.

You will need to provide which genotype measure ($FIELD) is provided in your donor SNP genotype file (GT, GP, or PL); default is PL.

STRONGLY Recommended

For Vireo you should only have the donors that are in this pool in the vcf file. Vireo assumes all the individuals in your vcf are in the pool - so if left unfiltered, it will check for all the individuals in the reference SNP genotype file.

Vireo also runs more efficiently when the SNPs from the donor $VCF have been filtered for the SNPs identified by cellSNP-lite. Therefore, it is highly recommended subset the vcf first.

We can do both of these filtering actions at the same time with bcftools:

Note: If your reference SNP genotype $VCF is bgzipped (i.e. ends in .vcf.gz), you should first bgzip and index your file with:

singularity exec Demuxafy.sif bgzip -c $VCF > $VCF.gz
singularity exec Demuxafy.sif tabix -p vcf $VCF.gz
singularity exec Demuxafy.sif bcftools view $VCF -R $VIREO_OUTDIR/cellSNP.base.vcf.gz -s sample1,sample2 -Ov -o $VIREO_OUTDIR/donor_subset.vcf

Alternatively, if you have the individuals from the pool in a file with each individuals separated by a new line (individual_file.tsv), then you can use -S individual_file.tsv.

To run Vireo with reference SNP genotype data for your donors (ideally filtered as shown above):

singularity exec Demuxafy.sif vireo -c $VIREO_OUTDIR -d $VIREO_OUTDIR/donor_subset.vcf -o $VIREO_OUTDIR -t $FIELD

If Vireo is successful, you will have these new files in your $VIREO_OUTDIR:

.
├── cellSNP.base.vcf
├── cellSNP.samples.tsv
├── cellSNP.tag.AD.mtx
├── cellSNP.tag.DP.mtx
├── cellSNP.tag.OTH.mtx
├── donor_ids.tsv
├── donor_subset.vcf
├── fig_GT_distance_estimated.pdf
├── _log.txt
├── prob_doublet.tsv.gz
├── prob_singlet.tsv.gz
└── summary.tsv

Additional details about outputs are available below in the Vireo Results and Interpretation.

Vireo Results and Interpretation

After running the Vireo steps, you will have a number of files in your $VIREO_OUTDIR. These are the files that most users will find the most informative:

  • summary.tsv

    • A summary of the droplets assigned to each donor, doublets and unassigned.

      Var1

      Freq

      113_113

      1342

      349_350

      1475

      352_353

      1619

      39_39

      1309

      40_40

      1097

      41_41

      1144

      42_42

      1430

      43_43

      1561

      465_466

      1104

      596_597

      1271

      597_598

      1532

      632_633

      871

      633_634

      967

      660_661

      1377

      doublet

      2770

      unassigned

      113

  • donor_ids.tsv

    • The classification of each droplet, and some droplet metrics.

      cell

      donor_id

      prob_max

      prob_doublet

      n_vars

      best_singlet

      best_doublet

      AAACCTGAGATAGCAT-1

      41_41

      1.00e+00

      9.13e-09

      115

      41_41

      40_40,41_41

      AAACCTGAGCAGCGTA-1

      465_466

      1.00e+00

      5.03e-17

      239

      465_466

      349_350,43_43

      AAACCTGAGCGATGAC-1

      113_113

      1.00e+00

      7.57e-07

      98

      113_113

      113_113,633_634

      AAACCTGAGCGTAGTG-1

      349_350

      1.00e+00

      8.07e-07

      140

      349_350

      349_350,597_598

      AAACCTGAGGAGTTTA-1

      632_633

      1.00e+00

      5.99e-11

      177

      632_633

      40_40,113_113

      AAACCTGAGGCTCATT-1

      39_39

      1.00e+00

      4.44e-06

      110

      39_39

      39_39,40_40

Merging Results with Other Software Results

We have provided a script that will help merge and summarize the results from multiple softwares together. See Combine Results.

Citation

If you used the Demuxafy platform for analysis, please reference our paper (REFERENCE) as well as Vireo.