Refgenie from within Python
Third-party python tools can rely on our Python object for access to refgenie assets. For this we have a Python package called
refgenconf which provides a class with methods to access local and remote genome assets.
You should already have
refgenconf if you've installed
refgenie, but if needed you can also install it separately with some variant of
pip install refgenconf.
RefGenConf object, which is the package's main data type. You just need to give it a refgenie genome configuration file (in YAML format). You can create a template using
As a general rule, the CLI functions are available from within Python under the same names, e.g.
refgenie list ... is available as
import refgenconf rgc = refgenconf.RefGenConf("genome_config.yaml")
Now, you can interact with it:
Use this to show all available remote assets:
In a tool, you're probably most interested in using refgenie to locate reference genome assets, for which you want to use the
get_asset function. For example:
# identify genome (perhaps provided by user) genome = "hg38" # get the local path to bowtie2 indexes: bt2idx = rgc.seek(genome, "bowtie2_index") # run bowtie2...
This enables you to write python software that will work on any computing environment without having to worry about passing around brittle environment-specific file paths. See this tutorial for more comprehensive example of how to work with
refgenconf as a tool developer.
See the complete refgenconf python API for more details.