Using refgenie with iGenomes

If you're already using iGenomes, it's easy to configure refgenie to use your existing folder structure. iGenomes is project that provides sequences and annotation files for commonly analyzed organisms. Each iGenome is available as a compressed file that contains sequences and annotation files for a single genomic build of an organism.

Initialize a refgenie config file if you don't have one you want to use for your iGenomes assets:

export REFGENIE='igenome_config.yaml'
refgenie init -c $REFGENIE

And then you have two options:

This command line tool is distributed with refgenie and is ready to use after installing refgenie. It adds all the assets enclosed in the genome archive downloaded from the iGenomes website to the refgenie local asset inventory. The required inputs are:

  • -g: name of the genome that should be assigned to the assets,
  • -p: a path to the downloaded archive or a directory (unarchived iGenomes folder).

usage:

$ import_igenome -h

usage: import_igenome [-h] -p PATH -g GENOME [-c CONFIG]

Integrates every asset from the downloaded iGenomes tarball/directory with
Refgenie asset management system

optional arguments:
  -h, --help            show this help message and exit
  -p PATH, --path PATH  path to the desired genome tarball or directory to
                        integrate
  -g GENOME, --genome GENOME
                        name to be assigned to the selected genome
  -c CONFIG, --config CONFIG
                        path to local genome configuration file. Optional if
                        'REFGENIE' environment variable is set.

Example:

$ import_igenome -g staph -p Staphylococcus_aureus_NCTC_8325_NCBI_2006-02-13.tar.gz

Extracting 'Staphylococcus_aureus_NCTC_8325_NCBI_2006-02-13.tar.gz'
Moved 'Staphylococcus_aureus_NCTC_8325_NCBI_2006-02-13.tar.gz' to '/Users/mstolarczyk/Desktop/testing/test_genomes/staph'
Added assets:
- staph/Chromosomes
- staph/BWAIndex
- staph/BowtieIndex
- staph/AbundantSequences
- staph/Bowtie2Index
- staph/WholeGenomeFasta

Option 2: refgenie add

You can also add individual assets you want refgenie to track with refgenie add. This way of iGenomes integration with refgenie is useful if you do not plan to add all of the assets for the downloaded iGenome. It is also useful beyond iGenomes, since you can technically add whatever assets you want, from whatever sources, into your refgenie.

refgenie add genome/asset -p RELATIVE_PATH

So, after downloading an archive from iGenomes website:

tar -xf Staphylococcus_aureus_NCTC_8325_NCBI_2006-02-13.tar.gz
refgenie add staph/bowtie2_index \
  -p Staphylococcus_aureus_NCTC_8325/NCBI/2006-02-13/Sequence/Bowtie2Index

Now we can seek any added assets:

refgenie seek staph/BWAIndex

Or remove unwanted/faulty ones:

refgenie remove staph/BWAIndex

This way you can configure refgenie to use your iGenomes assets, so you can wean yourself off of the iGenomes hard structure and transition to the refgenie-managed path system.