reference genome manager
What is refgenie?
Refgenie is full-service reference genome manager that organizes storage, access, and transfer of reference genomes. It provides command-line and python interfaces to download pre-built reference genome "assets" like indexes used by bioinformatics tools. It can also build assets for custom genome assemblies. Refgenie provides programmatic access to a standard genome folder structure, so software can swap from one genome to another.
What makes refgenie better?
It provides a command-line interface to download individual resources. Think of it as
GitHubfor reference genomes. You just type
refgenie pull -g hg38 -a bwa_index.
It's scripted. In case you need resources not on the server, such as for a custom genome, you can
refgenie build -g custom_genome -a bowtie2_index.
It simplifies finding local asset locations. When you need a path to an asset, you can
seekit, making your pipelines portable across computing environments:
refgenie seek -g hg38 -a salmon_index.
It includes a python API. For tool developers, you use
cfg = refgenie.RefGenConf("genomes.yaml")to get a python object with paths to any genome asset, e.g.,
Install and initialize
pip install --user refgenie export REFGENIE='genome_config.yaml' refgenie init -c $REFGENIE
Download indexes and assets for a remote reference genome
First, view available remote assets:
Querying available assets from server: http://refgenomes.databio.org/assets Remote genomes: hg19, hg19_cdna, hg38, hg38_cdna Remote assets: hg19: bismark_bt1_index; bismark_bt2_index; bowtie2_index; bwa_index; fasta; hisat2_index hg19_cdna: bowtie2_index; hisat2_index; kallisto_index; salmon_index hg38: bismark_bt1_index; bismark_bt2_index; bowtie2_index; bwa_index; fasta; hisat2_index hg38_cdna: bowtie2_index; hisat2_index; kallisto_index; salmon_index
Next, pull one:
refgenie pull --genome hg38 --asset bowtie2_index
Starting pull for 'hg38/bowtie2_index' 'hg38/bowtie2_index' archive size: 3.5GB Downloading URL: http://refgenomes.databio.org/asset/hg38/bowtie2/archive ...
Build your own indexes and assets for a custom reference genome
refgenie build --genome mygenome --asset bwa_index --fasta mygenome.fa.gz
Retrieve paths to refgenie-managed assets
Once you've populated your refgenie with a few assets, it's easy to get paths to them:
refgenie seek --genome mm10 --asset bowtie2_index
This will return the path to the particular asset of interest, regardless of your computing environment. This gives you an ultra-portable asset manager! See further reading on retrieving asset paths.
If you want to read more about the motivation behind refgenie and the software engineering that makes refgenie work, proceed next to the overview.