Test data set

After installing IgDiscover, you should run it once on a small test data that we provide, both to test your installation and to familiarize yourself with running the program.

  1. Download und unpack the test data set (version 0.4). To do this from the command-line, use these commands:

    wget https://bitbucket.org/igdiscover/testdata/get/v0.4.tar.gz
    tar xvf v0.4.tar.gz
The test data set contains some paired-end reads from a rhesus monkey and a simplified version of the publicly available VH, DH, and JH gene lists as three FASTA files.
  1. Initialize the IgDiscover pipeline directory:

    igdiscover init --db igdiscover-testdata-*/db/ --reads igdiscover-testdata-*/reads.1.fastq.gz discovertest

    The name discovertest is the name of the pipeline directory that will be created. Note that only the path to the first reads file needs to be given. The second file is found automatically.

    The command will have printed a message telling you that the pipeline directory has been initialized, that you should edit the configuration file, and how to actually run IgDiscover after that.

  2. Instead of editing the configuration by hand, copy the prepared configuration file that comes with the test dataset into the pipeline directory:

    cp igdiscover-testdata-*/igdiscover.yaml discovertest/

    For your own runs, you will need to read through the configuration file and adjust it to your needs. You may want to have a look at it when the pipeline is running in the next step. The configuration is in YAML format. When editing the file, just follow the way it is already structured.

  3. Run the analysis. To do so, change into the pipeline directory and run this command:

    cd discovertest && igdiscover run

    On this small dataset, running the pipeline should take not more than about 5 minutes.

  4. Finally, inspect the results in the discovertest/final directory. For example, the final list of discovered V genes is in discovertest/final/database/V.fasta.

    See the explanation of final result files.

Other test data sets

ENA project PRJEB15295 contains the data for our Nature Communications paper from 2016, in particular ERR1760498, which is the data for the human “H1” sample (multiplex PCR, IgM heavy chain).

Data used for testing TCR detection (human, RACE): SRR2905677 and SRR2905710.