GenomeHubs
Search…
Import variation data
Variants can imported using a wrapper around the Ensembl Import VCF Script, which exposes a subset of the full functionality.
The input must be a bgzipped vcf file:
1
$ bgzip variants.vcf
Copied!
Create and edit a panel file to associate samples with populations:
  • this file has 2, tab-separated columns
  • sample names must match the sample names in the vcf file
  • only samples listed in this file will be imported
1
$ nano nano /path/to/data/panel.tsv
2
sample_1 population 1
3
sample_2 population 1
4
sample_3 population 2
5
sample_4 population 2
6
...
Copied!
Create and edit a description file to add a description for each sample:
  • this file has 2, tab-separated columns
  • sample names must match the sample names in the vcf file
  • html markup is supported and can be used to add a link to related SRA accession, if available
1
$ nano nano /path/to/data/description.tsv
2
sample_1 description of sample 1
3
sample_2 description of sample 2
4
sample_3 description of sample 3
5
sample_4 description of sample 4
6
...
Copied!
Create and edit a configuration file to set database and variant details:
  • as with the core database import, common settings can be specified in a default.ini file and passwords can be set in an overwrite.ini file
  • if database connection settings are not set in a [DATABASE_VARIATION] section, values from [DATABASE_CORE] will be reused
  • The variation database name must match the corresponding core database name with "variation" in place of "core"
  • when importing local files, specify the path to the file as mounted in the container
  • the FILTER will be passed to the bcftools view command with the -i flag, this is not needed if your SNP data are already filtered
e85
e89
1
$ nano nano /path/to/conf/example_variants.ini
2
[DATABASE_CORE]
3
NAME = heliconius_erato_demophoon_v1_core_32_85_1
4
HOST = genomehubs-mysql
5
PORT = 3306
6
RW_USER = importer
7
RW_PASS = CHANGEME
8
RO_USER = anonymous
9
[DATABASE_VARIATION]
10
NAME = genus_species_assembly_variation_32_85_1
11
[META]
12
SPECIES.PRODUCTION_NAME = genus_species_assembly
13
SPECIES.SCIENTIFIC_NAME = Genus species
14
[FILES]
15
VCF = [ vcf /import/data/variants.vcf.gz ]
16
PANEL = [ tsv /import/data/panel.tsv ]
17
[STUDY]
18
SOURCE = Anonymous 2017
19
[BCFTOOLS]
20
FILTER = QUAL>=30 & FMT/DP>=10 & FMT/DP<=100 & SUM(FMT/DP)<=N_SAMPLES*100 & FMT/SB<200 & MIN(FMT/GQ)>=30
Copied!
1
$ nano nano /path/to/conf/example_variants.ini
2
[DATABASE_CORE]
3
NAME = heliconius_erato_demophoon_v1_core_36_89_1
4
HOST = genomehubs-mysql
5
PORT = 3306
6
RW_USER = importer
7
RW_PASS = CHANGEME
8
RO_USER = anonymous
9
[DATABASE_VARIATION]
10
NAME = genus_species_assembly_variation_36_89_1
11
[META]
12
SPECIES.PRODUCTION_NAME = genus_species_assembly
13
SPECIES.SCIENTIFIC_NAME = Genus species
14
SPECIES.DIVISION = EnsemblMetazoa
15
[FILES]
16
VCF = [ vcf /import/data/example_variants/variants.vcf.gz ]
17
PANEL = [ tsv /import/data/example_variants/panel.tsv ]
18
DESCRIPTION = [ tsv /import/data/example_variants/description.tsv ]
19
[STUDY]
20
SOURCE = Anonymous 2017
21
DESCRIPTION = Anonymous 2017. Article title. Journal. Vol:pages
22
[BCFTOOLS]
23
FILTER = QUAL>=30 & FMT/DP>=10 & FMT/DP<=100 & SUM(FMT/DP)<=N_SAMPLES*100 & FMT/SB<200 & MIN(FMT/GQ)>=30
24
[MODIFY]
25
OVERWRITE_DB = 1
Copied!
Run the GenomeHubs variation container:
  • depending on the number of SNPs in your VCF file after filtering, is likely to take several hours to run
e85
e89
1
docker run --rm \
2
-d \
3
--name genomehubs-variation \
4
-u $UID:$GROUPS \
5
-v /path/to/conf:/import/conf \
6
-v /path/to/data:/import/data \
7
-e FLAGS="-i" \
8
-e VARIANTS=example_variants \
9
genomehubs/variation:17.03
Copied!
1
docker run --rm \
2
-d \
3
--name genomehubs-variation \
4
-u $UID:$GROUPS \
5
-v /path/to/conf:/import/conf \
6
-v /path/to/data:/import/data \
7
-e FLAGS="-i" \
8
-e VARIANTS=example_variants \
9
genomehubs/variation:17.06
Copied!
Modify the EasyMirror configuration to load variation databases:
  • EasyMirror will attempt to load database types listed in SPECIES_DB_AUTOEXPAND so this can be used to load funcgen, etc databases mirrored from Ensembl
1
$ nano ~/genomehubs/v1/ensembl/conf/setup.ini
2
[DATA_SOURCE]
3
SPECIES_DB_AUTOEXPAND = [ variation ]
Copied!
Restart your Ensembl site to load the newly created variation database:
e85
e89
1
$ docker rm -f genomehubs-ensembl
2
$ docker run -d \
3
--name genomehubs-ensembl \
4
-v ~/genomehubs/v1/ensembl/conf:/ensembl/conf:ro \
5
--link genomehubs-mysql \
6
-p 8081:8080 \
7
genomehubs/easy-mirror:17.03
Copied!
1
$ docker rm -f genomehubs-ensembl
2
$ docker run -d \
3
--name genomehubs-ensembl \
4
-v ~/genomehubs/v1/ensembl/conf:/conf:ro \
5
--link genomehubs-mysql \
6
-p 8081:8080 \
7
genomehubs/easy-mirror:17.06
Copied!
Last modified 1yr ago
Copy link