Import variation data

Variants can imported using a wrapper around the Ensembl Import VCF Script, which exposes a subset of the full functionality.

The input must be a bgzipped vcf file:

$ bgzip variants.vcf

Create and edit a panel file to associate samples with populations:

  • this file has 2, tab-separated columns

  • sample names must match the sample names in the vcf file

  • only samples listed in this file will be imported

$ nano nano /path/to/data/panel.tsv
sample_1           population 1
sample_2           population 1
sample_3           population 2
sample_4           population 2
...

Create and edit a description file to add a description for each sample:

  • this file has 2, tab-separated columns

  • sample names must match the sample names in the vcf file

  • html markup is supported and can be used to add a link to related SRA accession, if available

$ nano nano /path/to/data/description.tsv
sample_1           description of sample 1
sample_2           description of sample 2
sample_3           description of sample 3
sample_4           description of sample 4
...

Create and edit a configuration file to set database and variant details:

  • as with the core database import, common settings can be specified in a default.ini file and passwords can be set in an overwrite.ini file

  • if database connection settings are not set in a [DATABASE_VARIATION] section, values from [DATABASE_CORE] will be reused

  • The variation database name must match the corresponding core database name with "variation" in place of "core"

  • when importing local files, specify the path to the file as mounted in the container

  • the FILTER will be passed to the bcftools view command with the -i flag, this is not needed if your SNP data are already filtered

Run the GenomeHubs variation container:

  • depending on the number of SNPs in your VCF file after filtering, is likely to take several hours to run

Modify the EasyMirror configuration to load variation databases:

  • EasyMirror will attempt to load database types listed in SPECIES_DB_AUTOEXPAND so this can be used to load funcgen, etc databases mirrored from Ensembl

Restart your Ensembl site to load the newly created variation database:

Last updated

Was this helpful?