ScDblFinder

scDblFinder is a transcriptome-based doublet detecting method that uses doublet simulation from droplets in the dataset to identify doublets. We have provided a wrapper script that takes common arguments for ScDblFinder and also provide example code for you to run manually if you prefer.

Data

This is the data that you will need to have prepare to run ScDblFinder:

Required

  • A counts matrix ($COUNTS)

    • The directory path containing your cellranger counts matrix files (directory containing barcodes.tsv, genes.tsv and matrix.mtx or barcodes.tsv.gz, features.tsv.gz and matrix.mtx.gz)

      or

    • h5 file (filtered_feature_bc_matrix.h5)

      • If you don’t have your data in this format, you can run ScDblFinder manually in R and load the data in using a method of your choosing.

Optional

  • Output directory ($SCDBLFINDER_OUTDIR)

    • If you don’t provide an $SCDBLFINDER_OUTDIR, the results will be written to the present working directory.

  • Filtered barcode file

    • A list of barcodes that are a subset of the barcodes in your h5 or matrix.mtx files. This is useful if you have run other QC softwares such as CellBender or DropletQC to remove empty droplets or droplets with damaged cells.

    • Expectation is that there is no header in this file

Run ScDblFinder

:octicon:`stopwatch` Expected Resource Usage

~1min using a total of 3Gb memory when using 2 thread for the full Test Dataset which contains ~20,982 droplets of 13 multiplexed donors,

You can either run ScDblFinder with the wrapper script we have provided or you can run it manually if you would prefer to alter more parameters.

First, let’s assign the variables that will be used to execute each step.

Example Variable Settings

Below is an example of the variables that we can set up to be used in the command below. These are files provided as a test dataset available in the Data Preparation Documentation Please replace paths with the full path to data on your system.

COUNTS=/path/to/TestData4PipelineFull/test_dataset/outs/filtered_gene_bc_matrices/Homo_sapiens_GRCh38p10/
SCDBLFINDER_OUTDIR=/path/to/output/scDblFinder

To run ScDblFinder with our wrapper script, simply execute the following in your shell:

singularity exec Demuxafy.sif scDblFinder.R -o $SCDBLFINDER_OUTDIR -t $COUNTS

To see all the parameters that this wrapper script will accept, run:

singularity exec Demuxafy.sif scDblFinder.R -h

  usage: scDblFinder.R file.
        [-h] -o OUT -t TENX_MATRIX [-b BARCODES_FILTERED]

  optional arguments:
    -h, --help            show this help message and exit
    -o OUT, --out OUT     The output directory where results will be saved
    -t TENX_MATRIX, --tenX_matrix TENX_MATRIX
                          Path to the 10x filtered matrix directory or h5 file.
    -b BARCODES_FILTERED, --barcodes_filtered BARCODES_FILTERED
                          Path to a list of filtered barcodes to use for doublet
                          detection.

ScDblFinder Results and Interpretation

After running the ScDblFinder with the wrapper script or manually you should have two files in the $SCDBLFINDER_OUTDIR:

/path/to/output/scDblFinder
├── scDblFinder_doublets_singlets.tsv
└── scDblFinder_doublet_summary.tsv

Here’s a more detaild description of each of those files:

  • scDblFinder_doublet_summary.tsv

    • A sumamry of the number of singlets and doublets predicted by ScDblFinder.

      Classification

      Droplet N

      doublet

      3323

      singlet

      17659

  • scDblFinder_doublets_singlets.tsv

    • The per-barcode singlet and doublet classification from ScDblFinder.

      Barcode

      scDblFinder_DropletType

      scDblFinder_Score

      AAACCTGAGATAGCAT-1

      singlet

      0.0033526041079312563

      AAACCTGAGCAGCGTA-1

      doublet

      0.9937564134597778

      AAACCTGAGCGATGAC-1

      singlet

      5.045032594352961e-

      AAACCTGAGCGTAGTG-1

      singlet

      0.007504515815526247

      AAACCTGAGGAGTTTA-1

      singlet

      0.00835108570754528

      AAACCTGAGGCTCATT-1

      singlet

      0.028838597238063812

      AAACCTGAGGGCACTA-1

      doublet

      0.9985504746437073

      AAACCTGAGTAATCCC-1

      singlet

      0.005869860760867596

Merging Results with Other Software Results

We have provided a script that will help merge and summarize the results from multiple softwares together. See Combine Results.

Citation

If you used the Demuxafy platform for analysis, please reference our preprint as well as ScDblFinder.