GenomeHubs
  • Introduction
  • Introduction
    • GenomeHubs
    • Files and containers
    • Demo
  • Quick Start
    • 1. Prepare
    • 2. Setup MySQL database server
    • (optional) Test Ensembl browser
    • 3. Import assembly and gene models
    • 4. Export files
    • 5. Run analyses
    • 6. Import analysis results
    • 7. Update meta
    • 8. Start download site
    • 9. Start BLAST server
    • 10. Start search container
    • 11. Edit Ensembl plugin
    • 12. Start Ensembl browser
  • Next Steps
    • Setup with multiple hosts
    • Understand the GFF parser
    • Import additional assemblies
    • Run comparative analyses
    • Import variation data
    • Add track hubs
    • Connect using Perl API
    • Set up REST API
Powered by GitBook
On this page
  • Parsing valid GFF
  • STABLE_IDS

Was this helpful?

  1. Next Steps

Understand the GFF parser

PreviousSetup with multiple hostsNextImport additional assemblies

Last updated 4 years ago

Was this helpful?

The EasyImport GFF parser within GenomeHubs is designed to accommodate the diversity of real-world GFF files. This adds some complexity to the configuration of a gene model import but allows the parser to read values into an Ensembl database from the varied locations in which gene names and descriptions can be specified in a valid GFF file and to repair many of the problems that can render a GFF invalid.

A full description of the GFF parser and other import options is available at

Parsing valid GFF

STABLE_IDS

An Ensembl database stores the primary name for each gene, transcript and translation in a stable_id field. This should be a stable identifier that will continue to be used if the same gene is subsequently imported from an updated assembly/annotation.

The first step in importing gene models is to identify the features/attributes to use as a sources of stable IDs, these are typically the values of the ID or Name attributes from the corresponding feature (Gene, mRNA or CDS):

  • the syntax follows the general pattern feature->attribute

  • the /(.+)/ is a per regular expression to match the entire value of the selected attribute

$ nano ~/genomehubs/v1/ensembl/conf/database.ini
[GENE_STABLE_IDS]
    GFF = [ gene->ID /(.+)/ ]
[TRANSCRIPT_STABLE_IDS]
    GFF = [ mRNA->ID /(.+)/ ]
[TRANSLATION_STABLE_IDS]
    GFF = [ CDS->ID /(.+)/ ]
easy-import.readme.io