Many genomic analyses require less stringent comparison: simple compatibility between assets that are not necessarily identical.
For example, if we wanted to annotate genomic regions based on aligned BAM file using feature annotations we would need the reads to share just the coordinate system (the sequence lengths and names). Importantly, it does not require sequence identity at all. In this case, we would like to confirm that the given
bowtie2_index asset is at least compatible with the coordinate system of feature annotation file.
To sum up, we need a more detailed comparison that may ignore sequences but allow to compare other aspects of the reference genome.
Refgenie facilitates such fine-grained comparison of genomes via
refgenie compare command. A useful "byproduct" of genome initialization described in Use aliases how-to guide is a JSON file with annotated sequence digests, that are required to compare FASTA file contents.
In order to compare two initialized genomes one needs to issue the following command
refgenie compare hg38 hg38_plus
The result is a set of informative flags that determine the level of compatibility of the genomes, for example:
Flag: 2005 Binary: 0b11111010101 CONTENT_ALL_A_IN_B LENGTHS_ALL_A_IN_B NAMES_ALL_A_IN_B CONTENT_A_ORDER CONTENT_B_ORDER CONTENT_ANY_SHARED LENGTHS_ANY_SHARED NAMES_ANY_SHARED
Based on the output above we can conclude that genome
hg38_plus is a superset of