Retrieve paths to assets
Once you've assembled a few assets, either by downloading or by building them, you'll be able to use
refgenie seek to retrieve the paths. It's quite simple, really -- say you've built the
fasta assets for
hg38. If you type:
refgenie seek -c CONFIG.yaml hg38/bowtie2_index
You'll get back the absolute path on your system to the
bowtie2_index asset, something like:
seek uses a concept called seek keys. Let's formally define how these differ from an asset:
- asset: a folder consisting of one or more files related to a specific reference genome
- seek key: a string identifier for a particular file or folder contained within an asset that can be located with
As asset can have multiple seek keys. When an asset recipe is defined, the author records which files in the asset are potentially interesting for a user to retrieve a path to. These files are added to the recipe as seek keys.
Many assets have only a single seek key, because the asset really only consists of a single thing. For example, a bowtie2_index asset has only 1 thing you'd need to seek to: the folder containing the indexes. But other assets can have multiple seek keys. For example, the
fasta asset provides not only a fasta file, but a
.fai fasta index as well as a chrom_sizes file that is required by some tools (notably, ucsc tools). These files are really tightly coupled with a particular fasta file and it doesn't make sense to make separate assets for each of them, and this is where seek keys come in.
Once you have built a
fasta asset, you'll be able to retrieve not just the fasta file, but also the
chrom_sizes files. These will show up when you type
refgenie list as
fasta.chrom_sizes seek keys.
Seek keys make things portable
Why is this useful? Instead of writing your shell script to take a path to each the
chrom_sizes file and the
bowtie2_index path, you just take the genome name, and use
refgenie seek to find the correct path. Not only have you simplified the interface to your analysis, but both the script itself and your call of it are now portable.
In other words, you just write:
python script.py --genome hg38
and then use refgenie to seek any assets you need. Previously, you would have done:
python script.py --chrom_sizes /local/path/to/hg38/chrom_size \ --bt2_index /local/path/to/hg38/bowtie2_index
This second command obviously differs by computing environment, whereas the first does not.
GENOME="hg38" refgenie seek -g $GENOME -a bowtie2_index refgenie seek -g $GENOME -a fasta.chrom_sizes refgenie seek -g $GENOME -a fasta
refgenie seek hg38/bowtie2_index refgenie seek hg38/fasta.chrom_sizes refgenie seek hg38/fasta