5. Run analyses
GenomeHubs supports importing the results of analyses such as InterProScan into Ensembl databases to add functional annotations to imported assemblies. Docker container images are provided to allow these analyses to be run with the correct settings, but the analyses can also be run in whichever way is best suited to your compute infrastructure, provided the output files have the required format.

Blastp against SwissProt
Hits to sequences in the SwissProt database can be imported into a GenomeHubs Ensembl database to provide functional annotation.
Download and unzip the latest SwissProt database:
Format the SwissProt BLAST database:
Run blastp:
Run InterProScan
InterProScan provides functional domain annotations that can be displayed in a GenomeHubs Ensembl browser.
N.B. The InterProScan container has not been updated to the latest version, however result files should be compatible with all versions of GenomeHubs.
Modify InterProScan configuration to suit your system:
Edit interproscan.properties and change the
maxnumber.of.embedded.workersvalues to match your number of threads (eg: 16)
Run InterProScan:
Run RepeatMasker
N.B. Running RepeatMasker with RepBase Libraries Requires a RepBase subscription. See below for an alternative repeat masking approach using Repeat Detector and redmask.
The latest version of RepeatMasker are compatible with the open source DFAM libraries, but DFAM currently has limited taxonomic scope and we are yet to get this version running reliably in a docker container. Results from the version described below should be compatible with all versions of GenomeHubs.
Clone the GenomeHubs RepeatMasker Docker repository:
Download a copy of the latest RepeatMasker libraries from RepBase:
Build the Docker image:
Run RepeatMasker:
Run Repeat Detector using redmask
This is provided as an alternative to RepeatMasker to generate a soft masked genome, but lacks repeat classification.
Run redmask:
Run CEGMA
CEGMA is no longer supported and it's author suggests using BUSCO (see below) instead. But the tool still works and it provides an assessment of genome completeness against core eukaryotic genes that can be imported into a GenomeHubs Ensembl database.
Run CEGMA:
Run BUSCO
BUSCO is an actively maintained alternative to CEGMA using sets of single copy orthologues for various taxonomic groups identified in OrthoDB.
Clone the GenomeHubs BUSCO Docker repository:
Fetch BUSCO lineages - choose the lineage(s) most appropriate for the taxon you wish to analyse:
Build the Docker image:
Run BUSCO:
Last updated
Was this helpful?