Import variation data
Last updated
Was this helpful?
Last updated
Was this helpful?
Variants can imported using a wrapper around the Ensembl , which exposes a subset of the full functionality.
The input must be a bgzipped vcf file:
Create and edit a panel file to associate samples with populations:
this file has 2, tab-separated columns
sample names must match the sample names in the vcf file
only samples listed in this file will be imported
Create and edit a description file to add a description for each sample:
this file has 2, tab-separated columns
sample names must match the sample names in the vcf file
html markup is supported and can be used to add a link to related SRA accession, if available
Create and edit a configuration file to set database and variant details:
as with the core database import, common settings can be specified in a default.ini
file and passwords can be set in an overwrite.ini
file
if database connection settings are not set in a [DATABASE_VARIATION]
section, values from [DATABASE_CORE]
will be reused
The variation database name must match the corresponding core database name with "variation" in place of "core"
when importing local files, specify the path to the file as mounted in the container
the FILTER
will be passed to the bcftools view command with the -i
flag, this is not needed if your SNP data are already filtered
Run the GenomeHubs variation container:
depending on the number of SNPs in your VCF file after filtering, is likely to take several hours to run
Modify the EasyMirror configuration to load variation databases:
EasyMirror will attempt to load database types listed in SPECIES_DB_AUTOEXPAND
so this can be used to load funcgen, etc databases mirrored from Ensembl
Restart your Ensembl site to load the newly created variation database: