Comparing and Combining Results

After you have run each of the Demultiplexing and/or Doublet Detecting softwares you would like, it is helpful to convert them to similar nomenclature and combine the results into a single dataframe. In addition, we have found it helpful to generate summaries of each of the combinations of softwares identified. To help streamline this process, we have provided a script that will easily integrate all the softwares you have run into a single dataframe and can do the following:

  1. Generate a dataframe that has all the software assignments per droplet in the pool

  • A tab-separated dataframe with the droplet singlet-doublet classification and the individual assignment (for demultiplexing softwares) per droplet

  1. Generate an upset plot that shows the droplet classificaitons by each software and the final classifications

  2. Generate a droplet type summary file

  • Provides the number of droplets classified for each combination of droplet classifications by each software

  1. Generate demultiplexing individual assignment summary file

  • Provides the number of droplets classified for each combination of individual assignment droplet classifications by each software

  1. If individuals have not been assigned to each cluster for reference-free demultiplexing softwares, will create a common assignment across all demultiplexing softwares for easy comparison

  2. Combined final droplet assignment from all softwares included

  • Uses one of four intersectional methods to combine software assignments together into a single combined assignment per barcode

and to generate a summary file for all the software combinations and if you ran demultiplexing softwares, it will also generate a demultiplexing summary file for the individual and cluster assignments from the demultiplexing softwares.

Data

In order to use our script to combine the results from the various demultiplexing and doublet detecting softwares, you need the following:

Required

  • Output directory ($OUTDIR)

  • Path to results of each of the softwares you would like to merge into a single dataframe.

    • You need to provide the path to at least one software result, otherwise, it will not run.

Merging Results with Combine_Results.R

The script has multiple options to provide the paths to each of the software results you would like to run. To see each of the options, simply run:

singularity exec Demuxafy.sif Combine_Results.R -h

Providing the possible parameter options:

usage: /opt/Demultiplexing_Doublet_Detecting_Docs/scripts/Combine_Results.R
      [-h] -o OUT [-z DEMUXALOT] [-d DEMUXLET] [-q DROPULATION]                                                                                      ocs/docs/source/Demuxalot                         Doublet_
      [-f FREEMUXLET] [-g FREEMUXLET_ASSIGNMENTS]
      [-a FREEMUXLET_CORRELATION_LIMIT] [-s SCSPLIT] [-w SCSPLIT_ASSIGNMENTS]
      [-j SCSPLIT_CORRELATION_LIMIT] [-u SOUPORCELL]
      [-x SOUPORCELL_ASSIGNMENTS] [-k SOUPORCELL_CORRELATION_LIMIT]
      [-v VIREO] [-e DOUBLETDECON] [-t DOUBLETDETECTION] [-i DOUBLETFINDER]
      [-n SCDBLFINDER] [-c SCDS] [-r SCRUBLET] [-l SOLO] [-b REF]
      [-p PCT_AGREEMENT] [-m METHOD]

optional arguments:
  -h, --help            show this help message and exit
  -o OUT, --out OUT     The file where results will be saved
  -z DEMUXALOT, --demuxalot DEMUXALOT
                        Path to demuxalot results. Only use this option if you
                        want to include the demuxalot results.
  -d DEMUXLET, --demuxlet DEMUXLET
                        Path to demuxlet results. Only use this option if you
                        want to include the demuxlet results.
  -q DROPULATION, --dropulation DROPULATION
                        Path to dropulation results. Only use this option if
                        you want to include the dropulation results.
  -f FREEMUXLET, --freemuxlet FREEMUXLET
                        Path to freemuxlet results. Only use this option if
                        you want to include the freemuxlet results.
  -g FREEMUXLET_ASSIGNMENTS, --freemuxlet_assignments FREEMUXLET_ASSIGNMENTS
                        Path to freemuxlet cluster-to-individual assignments.
                        Only use this option if have used reference SNP
                        genotypes to assign individuals to clusters for the
                        freemuxlet results.
  -a FREEMUXLET_CORRELATION_LIMIT, --freemuxlet_correlation_limit FREEMUXLET_CORRELATION_LIMIT
                        The minimum correlation between the cluster and the
                        individual SNP genotypes which should be considered as
                        a valid assignment. If you want no limit, use 0.
                        Default is 0.7.
  -s SCSPLIT, --scSplit SCSPLIT
                        Path to scSplit results. Only use this option if you
                        want to include the scSplit results.
  -w SCSPLIT_ASSIGNMENTS, --scSplit_assignments SCSPLIT_ASSIGNMENTS
                        Path to scSplit cluster-to-individual assignments.
                        Only use this option if you have used reference SNP
                        genotypes to assign individuals to clusters for the
                        scSplit results.
  -j SCSPLIT_CORRELATION_LIMIT, --scSplit_correlation_limit SCSPLIT_CORRELATION_LIMIT
                        The minimum correlation between the cluster and the
                        individual SNP genotypes which should be considered as
                        a valid assignment. If you want no limit, use 0.
                        Default is 0.7.
  -u SOUPORCELL, --souporcell SOUPORCELL
                        Path to souporcell results. Only use this option if
                        you want to include the souporcell results.
  -x SOUPORCELL_ASSIGNMENTS, --souporcell_assignments SOUPORCELL_ASSIGNMENTS
                        Path to souporcell cluster-to-individual assignments.
                        Only use this option if you have used reference SNP
                        genotypes to assign individuals to clusters for the
                        souporcell results.
  -k SOUPORCELL_CORRELATION_LIMIT, --souporcell_correlation_limit SOUPORCELL_CORRELATION_LIMIT
                        The minimum correlation between the cluster and the
                        individual SNP genotypes which should be considered as
                        a valid assignment. If you want no limit, use 0.
                        Default is 0.7.
  -v VIREO, --vireo VIREO
                        Path to vireo results. Only use this option if you
                        want to include the vireo results.
  -e DOUBLETDECON, --DoubletDecon DOUBLETDECON
                        Path to DoubletDecon results. Only use this option if
                        you want to include the DoubletDecon results.
  -t DOUBLETDETECTION, --DoubletDetection DOUBLETDETECTION
                        Path to DoubletDetection results. Only use this option
                        if you want to include the DoubletDetection results.
  -i DOUBLETFINDER, --DoubletFinder DOUBLETFINDER
                        Path to DoubletFinder results. Only use this option if
                        you want to include the DoubletFinder results.
  -n SCDBLFINDER, --scDblFinder SCDBLFINDER
                        Path to scDblFinder results. Only use this option if
                        you want to include the scDblFinder results.
  -c SCDS, --scds SCDS  Path to scds results. Only use this option if you want
                        to include the scds results.
  -r SCRUBLET, --scrublet SCRUBLET
                        Path to scrublet results. Only use this option if you
                        want to include the scrublet results.
  -l SOLO, --solo SOLO  Path to solo results. Only use this option if you want
                        to include the solo results.
  -b REF, --ref REF     Which demultiplexing software to use as a reference
                        for individuals when you do not have assignment key
                        for all demultiplexing method. Options are 'Demuxlet',
                        'Freemuxlet', 'scSplit', 'Souporcell' and 'Vireo'. If
                        blank when assignment keys are missing, default
                        softwares to use if present are Demuxalot, then Vireo,
                        then Demuxlet, then Freemuxlet, then Souporcell, then
                        Dropulation, then scSplit.
  -p PCT_AGREEMENT, --pct_agreement PCT_AGREEMENT
                        The proportion of a cluster that match the 'ref'
                        assignment to assign that cluster the individual
                        assignment from the reference. Can be between 0.5 and
                        1. Default is 0.9.
  -m METHOD, --method METHOD
                        Combination method. Options are 'MajoritySinglet'.
                        'AtLeastHalfSinglet', 'AnySinglet' or 'AnyDoublet'. We
                        have found that 'MajoritySinglet' provides the most
                        accurate results in most situations and therefore
                        recommend this method. See https://demultiplexing-
                        doublet-detecting-
                        docs.readthedocs.io/en/latest/CombineResults.html for
                        detailed explanation of each intersectional method.
                        Leave blank if you just want all the softwares to be
                        merged into a single dataframe.

An example command that combines Demuxlet results, Souporcell results, Solo results and Scds results would look like this: There are a two different options for using this script: 1. Combining the results and calling the droplet type through the combination of the softwares. 1. Combining the results without any joint droplet calling. You might choose this if you just want to see how the different softwares perform on your data before deciding which to move forward with for final joint droplet calling.

First, let’s assign the variables that will be used to execute each step.

Example Variable Settings

Below is an example of the variables that we can set up to be used in the command below. These are files provided as a test dataset available in the Data Preparation Documentation Please replace paths with the full path to data on your system.

OUTDIR=/path/to/output/combined
DEMUXLET_OUTDIR=/path/to/output/demuxlet
SOUPORCELL_OUTDIR=/path/to/output/souporcell
SOLO_OUTDIR=/path/to/output/solo
SCDS_OUTDIR=/path/to/output/scds

The first option is to select a method to make joint calls on the individual assignment and singlet-doublet droplet types using the softwares included.

singularity exec Demuxafy.sif Combine_Results.R \
  -o $OUTDIR/combined_results.tsv \
  --demuxlet $DEMUXLET_OUTDIR \
  --souporcell $SOUPORCELL_OUTDIR \
  --solo $SOLO_OUTDIR \
  --scds $SCDS_OUTDIR \
  --method "MajoritySinglet" ## there are other methods that can also be used, please see the help message above for the other options

Note

The path to the directories will work if the file names are the expected file names based on the example tutorials. However, if you used a different file naming convention or changed the names, you can also provide the full path to the exact file for each software.

Results and Interpretation

After running the Combine_Results.R script, you should have two, three or four files depending on if you used demultiplexing softwares and if you used joint droplet calling. Here, we show the results for the above example that also provides combined calls with the “MajoritySinglet” calls.

/path/to/output/combined
├── combined_results_demultiplexing_summary.tsv
├── combined_resultsSinglets_upset.pdf
├── combined_results_summary.tsv
├── combined_results.tsv
└── combined_results_w_combined_assignments.tsv

Note

  • You will only have the combined_results_demultiplexing_summary.tsv file if you included demultiplexing softwares.

  • And you will only have the combined_results_w_combined_assignments.tsv file if you ran it with --method

Here’s a deeper look at the contents of each of these results:

  • combined_resultsSinglets_upset.pdf

    • This is an upset figure of the droplets which are colored by their finall individual or doublet classification.

    • A filled circle indicates the that those droplets are classified as singlets by that method while empty circles indicate a doublet classification by that software

    _images/combined_resultsSinglets_upset.png
  • combined_results.tsv

    • Has the selected results combined; only including key columns.

      Barcode

      Demuxlet_DropletType

      Demuxlet_Individual_Assignment

      Souporcell_Individual_Assignment

      Souporcell_Cluster

      Souporcell_DropletType

      scds_score

      scds_DropletType

      solo_DropletType

      solo_DropletScore

      AAACCTGAGATAGCAT-1

      singlet

      41_41

      41_41

      6

      singlet

      0.116344358493288

      singlet

      singlet

      -8.442187

      AAACCTGAGCAGCGTA-1

      singlet

      465_466

      465_466

      11

      singlet

      0.539856378453988

      singlet

      singlet

      -2.8096201

      AAACCTGAGCGATGAC-1

      singlet

      113_113

      113_113

      5

      singlet

      0.0237184380134577

      singlet

      singlet

      -2.8949203

      AAACCTGAGCGTAGTG-1

      singlet

      349_350

      349_350

      3

      singlet

      0.163695865366576

      singlet

      singlet

      -5.928284

      AAACCTGAGGAGTTTA-1

      singlet

      632_633

      632_633

      7

      singlet

      0.11591462421927

      singlet

      doublet

      0.2749935

      AAACCTGAGGCTCATT-1

      singlet

      39_39

      39_39

      12

      singlet

      0.0479944175570073

      singlet

      singlet

      -5.2726507

      AAACCTGAGGGCACTA-1

      singlet

      465_466

      465_466

      11

      singlet

      0.374426050641161

      singlet

      singlet

      -0.65760195

      AAACCTGAGTAATCCC-1

      singlet

      660_661

      660_661

      4

      singlet

      0.247842972104563

      singlet

      singlet

      -3.5948637

      AAACCTGAGTAGCCGA-1

      doublet

      doublet

      unassigned

      unassigned

      unassigned

      0.342998285281922

      singlet

      singlet

      -0.50507957

  • combined_results_summary.tsv

    • The number of each of the combinations of the software cell type classifications

    Demuxlet_DropletType

    Souporcell_DropletType

    scds_DropletType

    solo_DropletType

    N

    singlet

    singlet

    singlet

    singlet

    16193

    doublet

    doublet

    doublet

    doublet

    1714

    singlet

    singlet

    singlet

    doublet

    947

    doublet

    doublet

    singlet

    singlet

    468

    singlet

    singlet

    doublet

    singlet

    392

    singlet

    singlet

    doublet

    doublet

    345

    doublet

    doublet

    singlet

    doublet

    335

    doublet

    singlet

    singlet

    singlet

    171

    doublet

    doublet

    doublet

    singlet

    169

    doublet

    singlet

    doublet

    doublet

    114

    doublet

    singlet

    singlet

    doublet

    44

    doublet

    singlet

    doublet

    singlet

    18

    singlet

    doublet

    singlet

    singlet

    17

    singlet

    unassigned

    singlet

    singlet

    13

    doublet

    unassigned

    singlet

    singlet

    11

    singlet

    doublet

    doublet

    doublet

    9

    singlet

    doublet

    singlet

    doublet

    6

    singlet

    doublet

    doublet

    singlet

    5

    doublet

    unassigned

    singlet

    doublet

    4

    doublet

    unassigned

    doublet

    doublet

    3

    doublet

    unassigned

    doublet

    singlet

    2

    unassigned

    unassigned

    singlet

    singlet

    2

  • combined_results_demultiplexing_summary.tsv

    • Summary of the number of each of the combination of classifications by demultiplexing software:

      Demuxlet_Individual_Assignment

      Souporcell_Individual_Assignment

      N

      doublet

      doublet

      2706

      352_353

      352_353

      1603

      43_43

      43_43

      1547

      597_598

      597_598

      1510

      349_350

      349_350

      1450

      42_42

      42_42

      1417

      660_661

      660_661

      1358

      113_113

      113_113

      1333

      39_39

      39_39

      1289

  • combined_results_w_combined_assignments.tsv

    • Dataframe combining all the software results together + combined assignment based on selected method:

    Barcode

    Demuxlet_DropletType

    Demuxlet_Individual_Assignment

    Souporcell_Cluster

    Souporcell_Individual_Assignment

    Souporcell_DropletType

    scds_score

    scds_DropletType

    solo_DropletType

    solo_DropletScore

    MajoritySinglet_DropletType

    MajoritySinglet_Individual_Assignment

    AAACCTGAGATAGCAT-1

    singlet

    41_41

    6

    41_41

    singlet

    0.116344358493288

    singlet

    singlet

    -8.442187

    singlet

    41_41

    AAACCTGAGCAGCGTA-1

    singlet

    465_466

    11

    465_466

    singlet

    0.539856378453988

    singlet

    singlet

    -2.8096201

    singlet

    465_466

    AAACCTGAGCGATGAC-1

    singlet

    113_113

    5

    113_113

    singlet

    0.0237184380134577

    singlet

    singlet

    -2.8949203

    singlet

    113_113

    AAACCTGAGCGTAGTG-1

    singlet

    349_350

    3

    349_350

    singlet

    0.163695865366576

    singlet

    singlet

    -5.928284

    singlet

    349_350

    AAACCTGAGGAGTTTA-1

    singlet

    632_633

    7

    632_633

    singlet

    0.11591462421927

    singlet

    doublet

    0.2749935

    singlet

    632_633

    AAACCTGAGGCTCATT-1

    singlet

    39_39

    12

    39_39

    singlet

    0.0479944175570073

    singlet

    singlet

    -5.2726507

    singlet

    39_39

    AAACCTGAGGGCACTA-1

    singlet

    465_466

    11

    465_466

    singlet

    0.374426050641161

    singlet

    singlet

    -0.65760195

    singlet

    465_466

    AAACCTGAGTAATCCC-1

    singlet

    660_661

    4

    660_661

    singlet

    0.247842972104563

    singlet

    singlet

    -3.5948637

    singlet

    660_661

    AAACCTGAGTAGCCGA-1

    doublet

    doublet

    unassigned

    doublet

    doublet

    0.342998285281922

    singlet

    singlet

    -0.50507957

    doublet

    doublet

Citation

If you used the Demuxafy platform for analysis, please reference our preprint.