VarFish User Manual
VarFish is a system for the filtration of variants. Currently, the main focus is on small/sequence variants called from high-througput sequencing data (in contrast to structural variants).

The VarFish global “home” screen showing the demo project with the Corpas family quartet.
Important
VarFish is for research use only software.
API Token Management
This page allows for managing API tokens. This feature is interesting if you want to use software (or develop one yourself) that interfaces with SODAR programatically, or if you want to use the API import feature of VarFish to easily import your cases.

You can create API tokens with the Create Token button. Each token can be deleted through the little cog button towards the right of the token list. In the token list, you can see the time of creation, the expiry time, and the first 8 characters of the key.
Please note that after creating the token, you will only be able to see the first 8 characters again (for reidentification). For security reasons, the token itself will be encrypted using a one-way hash function. It is possible to check a given token to be the same as in the database, but it is not possible to retrieve a lost token. Rather, one would discard the old one from VarFish and just create a new one. Note that the token is completely independent from any token that you might obtain from a CUBI-developed or hosted web app (in particular it is separate from any SODAR API token).
Please also note that if you create and use an API token then, currently, whoever bears your token has the same permissions to the SODAR system through the API as your user. Allowing to limit scope is on the list of future features, but currently this has not been implemented.
On creation, you can chose a number of hours that the token should be valid. Using an expiry time of 0 will make the token never expire.
User Profile
The user profile screen displays information about your account to you. Further, you can change global settings by using the Update link in the Settings box.

Currently, you can adjust the following settings:
- UMD Predictor API Token
For enabling variant pathogenicity score using the UMD predictor, add your API token here. For more information see umd-predictor.eu.
- GA4GH Beacon Network Widget
Opt-in into displaying the Global Alliance for Genomes and Health Beacon Network Widget. This allows you to query the beacon network for variants that you see in your cases.
- Changelog seen in version
This value stores the last time that you clicked the New Features! button on. In a future version, this setting will be hidden from normal users.
- Display project UUID copying link
Whether or not to display a little icon next to project name for easy copying of the project UUID to the clipboard.
Import Release Info
This screen displays the release version information of the background databases that have been imported into VarFish. These are variants such as gnomAD or OMIM that are used for enriching variant annotation when displaying variant results.

Please note that the cases are annotated independently of the VarFish web server. The databases used to annotate individual case VCFs can be found in the case overview.
IGV Configuration
For each variant in the result set, VarFish provides you with a IGV link that will show the locus of the variant in the IGV browser window. For this, you must have IGV running locally and properly configured.
For this, you have to go to the open the preferences window by first clicking the View menu entry and then the Preferences menu entry.
Select the Advanced tab.
There, make sure that the Enable port checkbox is ticket and the port value is set to 60151
.
Finally, click OK to save your changes.
The settings is also illustrated in the following figure.

VarFish Kiosk Tutorial
This is the tutorial for the VarFish Kiosk mode. It walks you through uploading a VCF into VarFish Kiosk and analyzing it using the filtration and prioritization features of VarFish. Finally, you download the result as an Excel file.
Note
The VarFish Kiosk Mode
VarFish can be run in two modes of operation: (1) The “classic” mode is available on site-specific installations and has all the features including multi case projects and allows for multiple users to collaborate. (2) The “kiosk” mode that is available centrally at https://varfish-kiosk.bihealth.org. Here you can upload your cases for analyzing them, all data will be discarded after 2 weeks.
In the case of any questions or problems, don’t hesitate to contact Manuel Holtgrewe.
Download Example Data
Please first download the pfeiffer-singleton.vcf.gz file.
This is the exome data published into the public domain by Manuel Corpas as the corpasome with a spike-in of the variant FGFR2:p.Glu566Gly
which is associated with Pfeiffer syndrome in ClinVar.
The example data is taken from the Exomiser project.
Data Upload
Next, navigate to https://varfish-kiosk.bihealth.org/ and you will be presented with the following screen.

Select the previously downloaded pfeiffer-singleton.vcf.gz
as VCF File and click Submit.
Optionally, you can upload a pedigree information file with the PED File field or give its text content in the text field PED Text below.
The example VCF file only contains one sample so this is not necessary here.
Note
PED Files
If you have a VCF file with multiple samples and do not specify the VCF file then no family information will be available subsequently.
The PED file format is as follows.
Each line gives the information of one pedigree member.
Each line has the following format (separated by spaces):
Family ID (if uncertain, put FAM
), name of person, name of father (0
if founder), name of mother (0
if founder), specification of sex (1
for male, 2
for female, 0
if unknown), specification of disease (1
for unaffected, 2
for affected, 0
if unknown).
The sample names must match the names in the VCF file!
For example, a trio (with male-affected child and unaffected parents) could look as follows:
FAM index father mother 1 2
FAM father 0 0 1 1
FAM mother 0 0 2 1
After hitting Submit, your file will be uploaded, sorted, and annotated.

In the case of failure, an error message will be displayed. Otherwise, you will be redirected to the case screen.
Note
Data Security in VarFish Kiosk
VarFish Kiosk is a login-less service.
This allows you to get started with VarFish quickly but there is (currently) no way to display all of your uploaded cases etc.
Instead, you have to copy and save the address of your case after upload to retrieve it.
The URL is virtually impossible to guess.
You can simply pass on a case that you uploaded by sending the address in an email.
Similarly, you have to be careful not to publish the case address as anyone with the case URL can access the case.
Data on varfish-kiosk.bihealth.org
will be removed after two weeks and measures are in place to block users found trying to guess case URLs.
Depending on your local legislation and the consent of your data, VarFish Kiosk might not be suitable to analyze your clinical data.
Case Overview
In the case screen, you can find information about your case. Once you annotate variants with flags or text, this information will be displayed here as well.

Overview shows an overall overview of your case.
Quality Control shows quality control measures derived for your case, similar to the Peddy method.
Relatedness allows to validate whether your pedigrees belong together.
Rate of het. calls on chrX allows you to do a rough check of biological sex based on variant calls on the X chromosome.
Depth and heterozygosity gives insight into the coverage and ratio of heterozygous variant calls.
Variant types shows variant counts by variant type.
Variant effects shows a histogram of variants by predicted molecular effect.
Indels sizes shows the distribution of the sizes of indels in your data.
Variant Annotation shows your manual annotation of variants in your case.
After quality control, you can cut straight to chase and click Filter Variants on the top right. This will bring you to the variant filtration screen.
Variant Filtration
It is best to start out with a Quick Preset. Let us assume that for our case, we assume dominant mode of inheritance. Click Load Presets –> dominant which will select values that are a good starting point:
The maximal allowed population frequency will be set to ~0.2%.
Variant quality restrictions are set to relatively strict values.
Variants are limited to those where an amino acid change or change in splicing is predicted.
Click through the Frequency tab and the entries below More… to inspect the different filter options. You can quickly adjust the settings for individual categories by adjusting the dropbox between the categories such as Frequency or Impact. Once you perform such a change, the corresponding settings pane is displayed and you can see the effect of your action or perform further fine-adjustments.
Once you are happy with your selection (we recommend that you go back to defaults for dominant mode of inheritance with Load Presets –> dominant), click Filter & Display to start querying.
Note
Query Speed
The time a query takes to complete is proportional to the number of returned variants. It is thus recommended to start with relatively strict filter settings and screen the resulting variants. If you are unhappy with the results then relax the settings to obtain more results. In our hands, this proofed to be the most time-efficient way.

After some patience, you will be shown your resulting list of variants.
Note
Result Record Count
Note well that by default the number of records to display is limited to 200
.
You can adjust this at More… –> Miscellaneous but this comes with longer query times and will have a heavier burden on your browser.
Below we show the anatomy of a result line:

Click to expand for more details about the variant.
Click the flag or comment symbol to flag the variant or add comments. Flagged or commented variants are marked with filled out symbols. The gray field next to those symbols opens the ACMG criteria form and will be filled with a number and color response.
The first symbol in this group of three symbols marks if the variant is seen in dbSNP. The second symbol marks if the variant is seen in ClinVar, while the third symbol marks if the variant is seen in HGMD.
The starting position of this variant.
Reference and alternative allele of this variant.
Frequency, number of homozygous and pLI score from ExAC (by default). This can be changed to other frequency database such as gnomAD or 1000G in the top of the results list.
The gene name along with a dropdown menu for link-outs to several services for more information about the gene.
A red doctor symbol right next to the gene name indicates whether this gene is listed in the ACMG incidental findings list.
The protein effect for this variant.
The genotype for each variant and member of the pedigree.
Look up this variant in MutationTaster MT, jump to the position in your IGV browser or query other services for this variant.
Variant Prioritization
With our filter settings, we got 126
variants from the query.
Of course, it is not feasible to review all of these variants.
Instead, it is state of the art to obtain pathogenicity prediction scores for ones variants (e.g., using CADD or MutationTaster) and also compare the phenotypes of the gene that a variant affects to the phenotypes of your patient.
Note
Query Performance, Again
Pathogenicity and (in a less pronounced fashion) phenotype similarity computation will increase your query times. Try to first filter without scores and then activate the prioritization on not more than a few hundred resulting variants.

Click Prioritization to show the prioritization options.
Next, enable variant pathogenicity prioritization and switch it to the CADD.
Then, enable phenotype prioritization and select HiPhive (human only).
We don’t have real patient information for the spiked-in variant but the HPO website tells us that Pfeiffer syndrome includes the following phenotypes: HP:0004440; HP:0003196; HP:0000244; HP:0000218
.
Just copy and paste these HPO terms into the HPO Terms field.
Finally, again hit Filter & Display to run the query with prioritization enabled. After waiting a few seconds, you will see the results and the spiked-in variant should be on the top.

We now go on to flag it as the final causative variant with good phenotype match…

… and also perform an assessment of the variant following the ACMG guidelines.

After flagging, commenting and assigning an ACMG rating, the resulting row will be highlighted.

To get an overview of your flagged and commented variants for the whole case, go to the case overview by clicking Back to Case and then switch to the Variant Annotation tab.

Finally, we go to the case overview by switching back to the Overview tab and mark the case as solved.

Export Results As Excel File
To export your results as Excel file, go into to the filter form again. Instead of clicking Filter & Display, click the arrow right next to it. This will open a dropwdown menu. Selecting Download as File will start the export and redirect you to the status page of the export process.

The export will take a moment. The page does not refresh automatically, please click Refresh page every once in a while. The process logs are displayed at the end of the page.

Once the export has finished, you will be offered a link to download the resulting file.

Closing Remarks
This is the end of this tutorial.
A good next step is to try this again with the following quartet VCF file which is again based on the public Corpasome data having the Pfeiffer variant spiked into one of the children as a de novo variant. You can use the following pedigree information:
FAM index father mother 1 2 FAM sibling father mother 1 1 FAM father 0 0 1 1 FAM mother 0 0 2 1
After upload of the data and selecting the Load Presets –> dominant, identifying the variant should be quick.
Another good next step is going through this manual. You can navigate using the links on the left.
While VarFish Kiosk is nice for ad-hoc analysis of single VCF files, we recommend sites anticipating a higher throughput to perform a dedicated installation of VarFish Classic. This documentation als contains instructions for the installation but this will require fast server hardware and knowledge about Linux server administration.
Variants & Cases
The variants are assigned to Cases. Use the Cases link on the left to see all cases in a project. Then, click on the case name to go to the case’s detail view.
Case Detail View
On the case detail view, you can see the following information:
- Details
Case detail information such a creation date, case name, and name of individuals.
- Pedigree
The full pedigree information with the information whether variants are present for the individuals (i.e., whether it was sequenced).
- Flagged Variants
The variants flagged in the individual.
- Comment Variants
The variants that were commented in the individual.
- Background Jobs Overview
List of background job for this case, e.g., for file export generation.

The case details view for the demo case. Note the details on the different aspects of the case and in particular the Filter Case and ClinVar Report buttons on the top right.
Case Detail View Actions
On the top right, you can see the following button:
- Filter Case
This takes you to the Variant Filtration view. Here you can filter the case’s variant by a multitude of criteria including genotype, call quality, and variant effect.
Variant Statistics & QC
VarFish is providing you with advanced integrating tools for quality control (QC) of your variant calls. When importing your cases, VarFish will compute statistics about the variants in your cases and check them with your pedigrees.
Sex & Relation QC
The first consistency check performed is whether all individuals used as father or mother in your pedigree have the appropriate sex. In the case of any issue, little red icons are displayed in your case listings and pedigree displays.

Example for sex and relationship problems displayed in the case list. The little “venus-mars” icon indicates a problem with sex assignment, the little “people” icon indicates a problem with relationship.
The second check that is performed is computing the ratio of het./hom. calls on the X chromosome outside of the pseudo paralogous regions. This ratio should be small for males and large for females. Male samples whose ratio is above 1.0 and female samples whose ratio is below 1.0 are flagged as erroneous. In the case of problems, little red icons are displayed in the same way as with the incorrect parent-sex assignment described above.
The third check that is performed is looking at the relationship of your parent-child and sibling-sibling pairs in each pedigree. A relationship ratio is computed as well as the IBS0 value according to Pedersen & Quinlan (2018). The relationship ratio is higher for closely related individuals (about 0.5 for parent-child and sibling-sibling pairs). The IBS0 value is the number of variants that do not share any allele. This value should be close to 0 for parent-child relations and also small for siblings.
QC Plots
Further, the case details view displays six plots helpful for variant quality control.

The six statistics and QC plots described in this section.
Rate of het. calls on chrX
This plot displays the rate of heterozygous over homozygous variants on the X chromosome outside of the pseudoautosomal regions. This count is displayed for samples classified as male, female, and unknown in the pedigree. Values falling on the wrong side of the threshold of 1.0 described in Sex & Relation QC are colored red.
Depth and Heterozygosity
This plot displays the fraction of heterozygous calls vs. the median depth. Depth outliers are colored blue while ratio outliers are colored red. Values are counted as outliers if they are more than 3 inter-quartile ranges from the median. Keep this in mind when interpreting these plots.
Variant Type Histogram
For each sample, the number of called on-exome SNVs, indels and MNVs is displayed. Note that some variant callers such as the widely used GATK tools do not call MNVs but break them up into individual SNVs. Thus the MNV count will be 0 in many cases.
Variant Effect Histogram
For many relevant variant effect classes, the absolute frequency in on-exome variants is displayed in this histogram for each sample.
Indel Size Histogram
The number of bases deleted (negative) and inserted (positive) from 1 to 20 is displayed in this histogram for each sample.
QC Metrics
Variant Filtration
Contents
This view allows you to filter variants to a number of criteria. Further, you can trigger an export of the variants with your current criteria to a downloadable VCF, Excel, or TSV file.
You can open the variant filtration view for each case by first navigating to the case’s detail page and then clicking then Filter Case button on the top right.
On the top of the page, you can see the Variant Filtration Form for setting the parameters for creating your filtration. Below, the results will be displayed after submitting the form.
Note
VarFish will store every query that you make. When loading the filtration form, your previous form settings will be restored and a notification will be displayed to notify you of this.
Note
The implementation of the variant filter in VarFish is monolithic as we use the data from the user submitted form to compile a single, rather large, SQL query from it. This enables us to have a very efficient (in terms of computing time and resources) filtering step. The downside of this is that we can’t track how many variants are actually filtered out by which filter setting.
Variant Filtration Form
Note
As in many places, VarFish offer in-place online help: Move your mouse cursor over any item to display its tooltip description (if it has any).
The form has the following components. Note that some form tabs will be hidden below the More… tab depending on your screen size.
Genotype tab
Frequency tab
Variants & Effects tab
Quality tab
Gene Lists tab
Flags & Comments tab
ClinVar & HGMD tab
Configure Downloads tab
Miscalleneous tab
Filter Import Export tab
Load Presets button
RefSeq / ENSEMBL switch
- Filter & Display button
The little triangle on the right gives access to the Download as File and Submit to MutationDistiller menu entries.
Genotype

In this tab, the individuals of your pedigree are displayed with their name, father and mother, sex, and disease state.
Here, you can configure the genotype pattern that you want to query for. The Genotype column contains select fields for each of your pedigree individuals. The value meanings are:
- any (default)
Any genotype is allowed.
- 0/0
The genotype of this individual should be reference.
- 0/1
The genotype of this individual should be heterozygous.
- 1/1
The genotype of this individual should be homozygous alternative.
- variant
The genotype of this individual should be heterozygous OR homozygous alternative.
- non-variant
The genotype of this individual should be reference or no-call (
./.
).- non-reference
The genotype of this individual should be heterozygous OR homozygous alternative OR no-call (
./.
).
Further, you can check the enable comp. het. mode checkbox. In this case, the values of the Genotype column’s select fields are ignored. Instead, the list of variants will be filtered as follows:
All variants are filtered according to the remaining tabs of the filtration form (all except Genotype).
- Two sets of variants are created:
A paternal set with variants that are in heterozygous state in both the index and the father and which are reference in the mother.
A maternal set with variants that are in heterozygous state in both the index and the mother and which are reference in the father.
For each gene occuring in either set, the number of variants are counted, leading to paternal count and maternal count for each gene.
Only those genes where both the paternal and maternal count is above zero are kept.
All variants where the paternal and the maternal count are above zero are reported. This can include variants where the paternal or maternal count is above one.
Note
The compound heterozygous mode currently only works if you have a full trio in your data set (father/mother/child). Further, only the genotypes of these three individuals will be considered in the filtration.
Frequency

Here you can filter variants by their relative frequency in variation databases or how often they occur within in heterozygous or homozygous state. The population databases are 1000 Genomes Phase 3, ExAC, genomAD exomes, and gnomAD gnomes. You switch on/off a population for consideration by the little checkbox on the left.
The column Homozygous count limits the number of maximal occurences of a variant in homozygous state for each database.
For example, setting 10
for 1000 Genomes, all variants occuring 11 times or more often in the 1000 Genomes dataset will be excluded.
The Heterozygous count field works the same way but for number of heterozygous state.
The Frequency field works as follows.
Here, you specify the maximal frequency in any sub population of the given database.
For example, setting 0.01
for ExAC, you will exclude all variants occuring with a higher frequency than 1% in any sub population, e.g., if the variant has 2% in the African ExAC samples and 0.1% in the European samples, then it will be excluded.
In all homozygous/heterozygous/frequency fields, you can disable the corresponding filter by leaving the field empty.
Variants & Effects

This tab allows for the fine-granular selection of variants based on the variant effects.
The Variant Types section allows you to select whether to include SNVs (single nucleotide variants, e.g., A>C), Indels (insertions or deletions, e.g., AC>T, A>CT, ACT>GG), or MNVs (multi-nucleotide variants where reference and alternative allele have the same number of bases and more than one base is affected, e.g., CC>TT, CCC>TTT).
The Transcript Type section allows you to select whether to include coding and/or non-coding variants.
In the Detailed Effects section, you can perform selection of variants on the finest level of granularity. The Effect Groups allow you to quickly select and unselect fields from the Detailed Effects section.
Quality

This tab allows you to set quality thresholds on the genotype calls on a per-sample level. Further, you control how calls not passing the threshold in individuals are treated.
- min DP het.
Minimal coverage of heterozygous variants to pass the quality filter.
- min DP hom.
Minimal coverage of homozygous variants to pass the quality filter.
- min AB
Minimal allelic balance. This settings is applied to heterozygous variant calls only. Given a variant with total coverage c and a reads supporting the alter native allele, the allelic balance AB is defined as a/c. A well-balanced variant has an allelic balance that is not too far from 0.5. To pass the quality filer, the allelic balance must be: min AB <= AB <= 1 - min AB.
- min GQ
Minimal (Phred-scaled) genotype quality for variants to pass the quality filter.
- min AD
Minimal number of reads supporting the alternative allele to pass the quality filter.
The “on FAIL” column determines the action to take for variants that don’t pass the quality filter:
- drop variant
The whole variant is removed from the result if the quality filter fails in this individual. This makes a low-quality call in the particular sample remove the variant even if the quality is high in other individuals.
- ignore
The quality filter is ignored for the particular sample.
- no-call
The variant in this individual is counted as “no-call” in the Genotype filter settings.
Gene Lists

Enter any Entrez gene ID, ENSEMBL gene ID, HGNC/HUGO gene symbol in the Gene Blocklist field to remove variants in this gene from the result list. If a variant affects more than one gene, blocklisting one of them will not blocklist them in the other genes.
Similarly, enter any Entrez gene ID, ENSEMBL gene ID, HGNC/HUGO gene symbol into the Gene Allowlist field to limit variants to those in the allow-listed genes. Leave the allowlist empty to not apply any allow-listing.
Flags & Comments

Here you can filter your variants based on the user-provided flags.
ClinVar & HGMD

You can use this to require membership in ClinVar and HGMD Public. When requiring ClinVar membership, you can limit the reported variants to those with a particular pathogenicity.
Note that the HGMD Public data is taken from the ENSEMBL browser and is several years behind the current HGMD Public and Professional versions.
Configure Downloads

These fields allow you to configure how your file downloads are created. You can select the file type to use for the exprot (Excel, TSV, or VCF).
Further, you can select the individuals to include. This is useful for generating single-individual VCF files if you want to use tool that does not support multi-sample VCF files.
Also, you can select whether you want to export your flags and comments.
Miscalleneous

Here you can select a row limit on the online variant display.
This limit will not be applied to your file downloads.
Filter Import Export

Here you find the configuration stored in JSON format. While the format is machine and not human-oriented, it allows you to save your current form settings in a text file and restore them later.
Click the Download JSON button to download a text file with the value of the text area above. Clicking the JSON >> Settings button applies the changes from the text area to the form. The text area is automatically updated to reflect the current form settings when you change any form field.
Load Presets
Here you find shortcuts to several presets. Note that these are “factory” defaults at the moment. Currently, it is not possible to create your own presets. This will be possible in a future version.
RefSeq / ENSEMBL switch
Use this to choose between RefSeq and ENSEMBL transcripts when filtering for variant effects.
Variant Filtration Results

After form submission, the results are displayed below the form.
Filtration Results Header
The header contains a Frequencies switch that allows you to select the database for display population frequencies. Further, it shows the number of displayed and the number of result records. Lastly, it displays the transcript data source used.
Warning
Always monitor the number of displayed vs. total records. You might have to adjust the number of displayed rows so you don’t miss any variants!
Result Rows
The result rows consist of the following elements:
Clicking right-pointing arrow will show you more details on your variant below the result row.
The little bookmark sign indicates whether the variant has been flagged (filled if flags are present). The summary flag status is also indicated by the row color. Click on the bookmark sign to adjust the flags for this variant.
The little speech bubble indicates whether there are any comments for this flag (filled if comments are present).
The little database icon (three disks) indicates dbSNP membership of the variant (dark if present in dbSNP, very light if not). Click on the icon to go to its dbSNP entry.
The little hospital icon indicates ClinVar membership (again dark if present in ClinVar, very light if not).
The little circle indicates membership in HGMD Public (see ClinVar & HGMD for information about HGMD Public age).
The following columns indicate the variant position, reference and alternative bases.
This is followed by the frequency display from the population database selected in the header.
The next column shows the gene symbol, clicking on the little triangle next to it allows you to see the variant in various databases.
The variant effect on the protein level in HGVS notation. Moving the cursor over this field will show a textual explanation of the effect.
The next columns show the genotypes in the individuals. Moving the cursor over this field will show the genotype quality and number of reference and alternative reads.
The MT button will query MutationTaster for this variant.
The IGV button opens the selected locus in IGV if you have it open in the background and activated and the port set to
60151
.Clicking the little triangle next to IGV allows you to open the variant locus in various other genome browsers.
Project-Wide Queries and Stats
Project-Wide Statistics
You can also view joint statistics for all cases within a project. For this, open a project’s case list (open the project detail view, then click the Cases icon in the left bar).
Here, the project-wide variant statistics will be displayed above your cases if it has been generated already. If you want to (re)-generate it, use the Recompute Project-Wide Stats button on the top right. This will create a background job for the recomputation (it might take quite some time). After the job is complete, the updated data will be displayed on the case list.
Project-Wide Queries
Further, you can perform queries to all cases in your project. For this, navigate to the project’s case list. Then, click the Joint Filtration button on the top right.
The form that opens is very similar to the one described in Variant Filtration with the following differences:
All members of all cases in your project will appear.
Instead of having one row for each variant and one genotype column for each sample, you have one row for each variant and sample and one column with genotype information only. There is an additional column that gives the name of the sample that the row is for.
The TSV and Excel file download generation creates similarly-structured tables.
VCF export is currently not supported yet.
Variant Annotation
Contents
Variant Comments & Flags
Creating Comments & Flags

The flag marker (little bookmark) and comment marker (little text buble) are shown for each result row. They are filled when the flags have been set or a comment has been submitted for the variant.
Use the little bookmark-shaped or text bubble icon next to each variant to open the “flagging / comment” window. Check the desired flags and/or enter your comment text in the text box below. Click Save to create a new comment and/or flags.

When clicking the flag/comment markers, the “Flags & Comments” popup opens. Select the flags that you want to apply and/or enter a comment in the text box and then click the Save button. The Summary label also determines the color of the result row (green, yellow, red, or no coloring). Selecting no Summary but any other flag will highlight the result row in gray.

The flag and comment marker are now filled.
Exporting Comments & Flags
You can export comments and flags together with your variants into an Excel file.
Viewing Comments & Flags

Comments and flags are displayed when expanding the variant details.
The comments and flags for a variant are displayed in the variant details. For this, click the arrow at the beginning of a resulting row. The comments and flags are displayed in the box in the top right of the expanded variant details.
Alternatively, you can also view your comments and flags in the case details overview as described below in the “Viewing Annotations” section.
ACMG Rating

The ACMG marker (little gray box with a dash in the middle) is shown for each result row. It is filled with the ACMG rating and a corresponding coloring when the ACMG rating has been set for the variant.
Use the little gray box next to each variant to open the “ACMG Rating” window. Check the desired classifications and click Save to create a ACMG rating. The actual class is automatically computed. You can override the computation and set your own class by entering a number in the Class override box.

When clicking the ACMG marker, the “ACMG Rating” popup opens. Select the classes that you want to apply and then click the Save button. The actual class is automatically computed. You can override the computation and set your own class by entering a number in the Class override box.

The ACMG rating marker is now filled.
Viewing Annotations
You can get a complete list of all the comments, flags and ACMG ratings for a case in the case details view. For this, go back to the case detail page and click on Variant Annotations.

You can see all variant flags, comments and ACMG ratings in the case details view.
Databases
This sections gives information about the integrated databases and tools and the ones that are linked out to. Further, it provides some pointers towards how to extend VarFish’s database and tool collection.
Integrated Databases and Tools
The following databases are integrated into VarFish, meaning that their contents are available from within VarFish itself.
Category |
Database |
---|---|
Frequency |
gnomAD |
ExAC |
|
1000 Genomes |
|
mtDB |
|
helixMTdb |
|
MITOMAP |
|
Clinical |
ClinVar |
HGMD Public |
|
Variant Database |
dbSNP |
Variant Tools |
VariantValidator |
Phenotype |
HPO |
OMI |
|
MGI Mapping |
|
Gene Description |
HGNC |
NCBI Gene Summary |
|
NCBI GeneRIF |
|
ACMG Recommendations Gene |
|
HPO |
|
Pathways |
KEGG |
Constraint Scores |
gnomAD pLI/LOEUF |
ExAC pLI |
|
Conservation |
UCSC 100 Vertebrates |
Link-Out Databases and Tools
VarFish links out to the following databases and tools.
Category |
Database |
---|---|
Gene |
OMIM |
GeneCards |
|
NCBI Entrez |
|
HGNC |
|
HGMD Public |
|
ProteinAtlas |
|
PubMed |
|
ClinVar |
|
EnsEMBL |
|
MetaDome |
|
PanelApp |
|
MGI |
|
Variant Score/Tool |
MutationTaster |
varSEAKSplicing |
|
UMD Predictor |
|
PolyPhen 2 |
|
Human Splicing Finder |
|
Variant Database |
Beacon Network |
Varsome |
|
Genome Browser |
Locus in local IGV |
Public UCSC |
|
DGV |
|
EnsEMBL |
|
gnomAD |
Adding and Updating Databases and Tools
We invite users to contribute to VarFish databases and tools (of course also VarFish itself) through our project and GitHub issue tracker at https://github.com/bihealth/varfish-server or by emailing us directly. In this section, we summarise the process of extending the databases and tool selection. However, as this a very large topic, we suggest users contact us with their suggestions by email or through the GitHub issue tracker to get more information. We will also be happy to work with users in finding the best way of integrating new tools and database.
Link-Outs Modifications
Adding a new tool or database by adding a link from a variant or gene is usually very simple. However, it requires that the database is accessible through the web (or can be controlled by hyperlinks as is the case with IGV). Further, it must be possible to create “deep links” into the database or tool. For example, it is possible to directly create a link to a position into the UCSC genome browser linke this.
https://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=21:11038733-11038733
However, tools such as VariantValidator do not allow this but an API must be used. There are some databases and tools where it is not possible to created deep links and the database or tool author would have to create this functionality first.
Database Modifications
Updating databases is more complicated. Overall, the steps are as follows:
The data must be downloaded and converted into TSV (tab-separated values) file(s). For this, we are maintaining a Snakemake workflow on GitHub at https://github.com/bihealth/varfish-db-downloader.
The VarFish source code must be modified to
create a new Django model class to manage the database table(s) for the new database,
create importer code for loading the data into the database,
adjust the code for the user interface to display the data (or use it in a different fashion),
(potentially) adjust the query generation code to incorporate the new database in the queries,
Also, the documentation has to be adjusted.
We strongly recommend users to contact us for getting support with this.
Installation
This chapter describes how to install the VarFish core components and their requirements. The audience of this chapter are those who want to install VarFish on their own infrastructure.
Since v0.22.1 (about February 2021), the recommended way of installing VarFish is using Docker Compose. Docker Compose allows to describe the programs/services that are required to run VarFish as a site of Docker containers. Docker containers allow to the whole runtime environment of complex software packages in a transparent and efficient manner.
For the following, knowledge of Linux administration and exposure to Docker is required. Deeper knowledge to Docker and Docker Compose is of greater help in case of debugging. In the case that have problems, please open an issue in our Issue Tracker or send an email to cubi-helpdesk@bihealth.de. Please note that VarFish is academic software and we try to provide support on a best-effort.
You can find a quickstart-style manual in the varfish-docker-compose README.
Note that this will only perform installation of VarFish and related services with data (re)distributed by the VarFish authors. See Extra Services for installing extra services such as annotation with CADD scores.
Prerequisites
- Hardware:
Memory: 64 GB of RAM
CPU: 16 cores
- Disk: 600+ GB of free and fast disk space
about ~500 GB for initial database (on compression enabled ZFS it will consume only 167GB)
on installation: ~100 GB for data package file
per exome: ~200MB
a few (~5) GB for the Docker images
- Operating System:
a modern Linux that is supported by Docker.
outgoing HTTPS connections to the internet are allowed to download data and Docker images
server ports 80 and 443 are open and free on the host that run on this on
- Software:
Git
Tuning database servers is an art of its own and you can have a look at the section Performance Tuning for getting started.
Install with Docker Compose
This section assumes that you have installed the prerequisites Git, Docker and Docker Compose. So the following two commands should work.
$ git version
git version 1.8.3.1
$ docker-compose -version
docker-compose version 1.28.2, build 67630359
$ docker version
Client: Docker Engine - Community
Version: 20.10.3
[...]
First, we will obtain a checkout of varfish-docker-compose
.
This repository contains the docker-compose.yml
and configuration files.
On execution, about ten Docker containers will be spun up, each running a part of the services that are required to run VarFish.
These include the Postgres database (that does the heavy lifting), Redis for caching, Jannovar for full functional effect annotation, Exomiser for variant priorisation, queue workers for performing database queries and similar tasks, and the VarFish web server itself.
But this will come later.
$ git clone https://github.com/bihealth/varfish-docker-compose.git
$ cd varfish-docker-compose
Next, download and extract the VarFish site data archive which contains everything you need to get started (the download is ~100GB of data).
This will create the volumes
directory (500GB of data, ZFS compression yields us 167GB disk usage).
Replace grch37
with grch38
in the command below if you want to use the GRCh38 release.
We currently only provide prebuilt databases for either GRCh37 or GRCh38.
$ wget --no-check-certificate https://file-public.cubi.bihealth.org/transient/varfish/anthenea/varfish-site-data-v1-20210728-grch37.tar.gz{,.sha256}
$ sha256sum --check varfish-site-data-v1-20210728-grch37.tar.gz.sha256
$ tar xf varfish-site-data-v1-20210728-grch37.tar.gz
$ ls volumes
exomiser jannovar minio postgres redis traefik
The next step is to create an installation-specific configuration file .env
as a copy of env.example
.
You will have to at least set DJANGO_SECRET_KEY
variable to something random (a bash one-liner for this is tr -dc A-Za-z0-9 </dev/urandom | head -c 64 ; echo ‘’).
$ cp env.example .env
$ $EDITOR .env
You can now bring up the site with Docker Compose.
The site will come up at your server and listen on ports 80 and 443 (make sure that the ports are open), you can access it at https://<your-host>/
in your web browser.
This will create a lot of output and will not return you to your shell.
You can stop the servers with Ctrl-C
.
$ docker-compose up
You can also use let Docker Compose run the containers in the background:
$ docker-compose up -d
Starting compose_exomiser-rest-prioritiser_1 ... done
Starting compose_jannovar_1 ... done
Starting compose_traefik_1 ... done
Starting compose_varfish-web_1 ... done
Starting compose_postgres_1 ... done
Starting compose_redis_1 ... done
Starting compose_minio_1 ... done
Starting compose_varfish-celeryd-query_1 ... done
Starting compose_varfish-celeryd-default_1 ... done
Starting compose_varfish-celeryd-import_1 ... done
Starting compose_varfish-celerybeat_1 ... done
You can check that everything is running (the versions might be different in your installation):
$ docker ps
3ec78fb9f12c bihealth/varfish-server:0.22.1-0 "docker-entrypoint.s…" 17 hours ago Up 31 seconds 8080/tcp compose_varfish-celeryd-import_1
313afb611ab1 bihealth/varfish-server:0.22.1-0 "docker-entrypoint.s…" 17 hours ago Up 30 seconds 8080/tcp compose_varfish-celerybeat_1
4d865726e83b bihealth/varfish-server:0.22.1-0 "docker-entrypoint.s…" 17 hours ago Up 31 seconds 8080/tcp compose_varfish-celeryd-query_1
a5f90232c4da bihealth/varfish-server:0.22.1-0 "docker-entrypoint.s…" 17 hours ago Up 31 seconds 8080/tcp compose_varfish-celeryd-default_1
96cec7caebe4 bihealth/varfish-server:0.22.1-0 "docker-entrypoint.s…" 17 hours ago Up 33 seconds 8080/tcp compose_varfish-web_1
8d1f310c9b48 postgres:12 "docker-entrypoint.s…" 17 hours ago Up 32 seconds 5432/tcp compose_postgres_1
8f12e16e20cd minio/minio "/usr/bin/docker-ent…" 17 hours ago Up 32 seconds 9000/tcp compose_minio_1
03e877ac11db quay.io/biocontainers/jannovar-cli:0.33--0 "jannovar -Xmx6G -Xm…" 17 hours ago Up 33 seconds compose_jannovar_1
6af09b819e59 traefik:v2.3.1 "/entrypoint.sh --pr…" 17 hours ago Up 33 seconds 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp compose_traefik_1
514cb4386224 redis:6 "docker-entrypoint.s…" 19 hours ago Up 32 seconds 6379/tcp compose_redis_1
5678b9e6797b quay.io/biocontainers/exomiser-rest-prioritiser:12.1.0--1 "exomiser-rest-prior…" 19 hours ago Up 34 seconds compose_exomiser-rest-prioritiser_1
In the case of any error please report it to us via the Issue Tracker or email to cubi-helpdesk@bihealth.de. Please include the full output as a text file attachment.
Actually, your VarFish website is now ready to be used. Visit the website at https://<your-host>/ and login with the account root and password changeme.

There will be a warning about self-signed certificates, see TLS / SSL Configuration on how to deal with this. You can change it in the Django Admin (available from the menu with the little user icon on the top right). You can also use the Django Administration interface to create new user records.
You will observe that the database came with some demo data sets of public IGSR data that are ready for exploration.

Updating the Database
First, the tables that are to be updated should be generated. For this, follow the instructions in the VarFish DB Downloader repository.
At this point you should have a folder structure available that resembles:
varfish-db-downloader/
GRCh37/
<table_group>/
<version>/
<table>.tsv
<table>.release_info
GRCh37/
[...]
noref/
[...]
import_versions.tsv
[...]
If the HPO and OMIM tables are supposed to be updated, it would look like this:
varfish-db-downloader/
noref/
hpo/
20220126/
Hpo.release_info
Hpo.tsv
HpoName.release_info
HpoName.tsv
mim2gene/
20220126/
Mim2geneMedgen.release_info
Mim2geneMedgen.tsv
import_versions.tsv
[...]
Copy this structure on to the machine where the Docker compose is running. Take Docker compose down (this will shut down your VarFish instance!):
$ cd varfish-docker-compose # make sure to be in the docker compose folder
$ docker-compose down
Modify the docker-compose.yml
file by finding the following entry:
varfish-web:
image: ghcr.io/bihealth/varfish-server:VERSION
env_file:
- .env
networks:
- varfish
restart: unless-stopped
volumes:
- "/root/varfish-server-background-db-20210728:/data:ro"
[...]
And add another volume that maps your directory into the container:
volumes:
- "/root/varfish-server-background-db-20210728:/data:ro"
- type: bind
source: varfish-db-downloader/
target: /data-db-downloader
read_only: true
Start docker compose again:
$ docker-compose up
Once done, attach to your container:
$ docker exec -it varfish-docker-compose_varfish-web_1 bash -i
Switch to the application directory and start the import:
varfish-web-container$ cd /usr/src/app
varfish-web-container$ python manage.py import_tables --tables-path /data-db-downloader
The output of the command should look something like this:
Disabling autovacuum on all tables...
Hpo -- Importing Hpo 2022/01/26 (, source: /data-db-downloader/noref/hpo/20220126/Hpo.tsv) ...
Mim2geneMedgen -- Importing Mim2geneMedgen 2022/01/26 (, source: /data-db-downloader/noref/mim2gene/20220126/Mim2geneMedgen.tsv) ...
Hpo -- Removing old Hpo results.
Mim2geneMedgen -- Removing old Mim2geneMedgen results.
Mim2geneMedgen -- Importing new Mim2geneMedgen data
Hpo -- Importing new Hpo data
Mim2geneMedgen -- Finished importing Mim2geneMedgen 2022/01/26 (Mim2geneMedgen.tsv)
Hpo -- Finished importing Hpo 2022/01/26 (Hpo.tsv)
HpoName -- Importing HpoName 2022/01/26 (, source: /data-db-downloader/noref/hpo/20220126/HpoName.tsv) ...
HpoName -- Removing old HpoName results.
HpoName -- Importing new HpoName data
HpoName -- Finished importing HpoName 2022/01/26 (HpoName.tsv)
Enabling autovacuum on all tables...
To verify the import, switch to the VarFish web interface, find the users menu
on the top right corner and select the Import Release Info
entry. The
updated tables should have the latest version.

Extra Services
This section describes the installation of extra services.
Install Scoring with CADD
This section describes how to enable the scoring of variants with CADD using the CADD-scripts provided by the CADD authors. Note well that CADD-scripts is only free for non-commercial users as expressed in the CADD-scripts license. The installation is described for using a VarFish Docker Compose based installation.
First, create a directory volumes/cadd-rest-api
inside the varfish-docker-compose
directory and download an updated version of the install script.
$ cd varfish-docker-compose
$ mkdir -p volumes/cadd-rest-api/db
$ curl https://raw.githubusercontent.com/kircherlab/CADD-scripts/7502f47/install.sh \
> volumes/cadd-rest-api/install.sh
Next, download the appropriate files using the install.sh
script you just downloaded.
The script will ask you for some decisions and the corresponding lines are highlighted below.
$ docker run -it -e CADD=/opt/miniconda3/share/cadd-scripts-1.6-0 \
-v $PWD/volumes/cadd-rest-api:/data bihealth/cadd-rest-api:0.3.1-0 \
bash /data/install.sh -b
Using kircherlab.bihealth.org as download server
CADD-v1.6 (c) University of Washington, Hudson-Alpha Institute for Biotechnology and Berlin Institute of Health 2013-
2020. All rights reserved.
The following questions will quide you through selecting the files and dependencies needed for CADD.
After this, you will see an overview of the selected files before the download and installation starts.
Please note, that for successfully running CADD locally, you will need the conda environment and at least one set of
annotations.
Do you want to install the virtual environments with all CADD dependencies via conda? (y)/n n
Do you want to install CADD v1.6 for GRCh37/hg19? (y)/n y
Do you want to install CADD v1.6 for GRCh38/hg38? (y)/n n
Do you want to load annotations (Annotations can also be downloaded manually from the website)? (y)/n y
Do you want to load prescored variants (Makes SNV calling faster. Can also be loaded/installed later.)? y/(n) y
Do you want to load prescored variants for scoring with annotations (Warning: These files are very big)? y/(n) y
Do you want to load prescored variants for scoring without annotations? y/(n) y
Do you also want to load prescored InDels? We provide scores for well known InDels from sources like ClinVar, gnomAD/TOPMed etc. y/(n) y
The following will be loaded: (disk space occupied)
- Download CADD annotations for GRCh37-v1.6 (121 GB)
- Download prescored SNV inclusive annotations for GRCh37-v1.6 (248 GB)
- Download prescored InDels inclusive annotations for GRCh37-v1.6 (3.4 GB)
- Download prescored SNV (without annotations) for GRCh37-v1.6 (78 GB)
- Download prescored InDels (without annotations) for GRCh37-v1.6 (0.6 GB)
Please make sure you have enough disk space available.
Ready to continue? (y)/n y
Starting installation. This will take some time.
[...]
Connecting to kircherlab.bihealth.org (kircherlab.bihealth.org)|141.80.169.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 61 [application/x-gzip]
Saving to: ‘InDels_inclAnno.tsv.gz.tbi.md5’
InDels_inclAnno.tsv.gz.tbi.md5 100%[======================================================================================>] 61 --.-KB/s in 0s
2021-03-08 18:55:10 (19.9 MB/s) - ‘InDels_inclAnno.tsv.gz.tbi.md5’ saved [61/61]
InDels_inclAnno.tsv.gz: OK
InDels_inclAnno.tsv.gz.tbi: OK
Then, update the .env
file by uncommenting the lines that configure the variant prioritization with CADD in VarFish (use the contents of the .env
file as the lines below might not be completely up to date).
# Extra: CADD REST API *****************************************************
# Uncomment the following lines to enable variant prioritization using the
# CADD score. See the VarFish Server manual for installation instructions,
# in particular how to download the required data.
VARFISH_ENABLE_CADD=1
VARFISH_CADD_REST_API_URL=http://cadd-rest-api:8080
VARFISH_CADD_MAX_VARS=5000
Also, uncomment the lines in the docker-compose.yml
file for the cadd-rest-api-server
and cadd-rest-api-celeryd
containers (the following listing is redacted, the docker-compose.yml
file is up to date).
# Uncomment the following lines to enable the CADD REST API server that
# is used for variant prioritization using the CADD score. We need both
# the server and the CADD-based worker.
cadd-rest-api-server:
image: bihealth/cadd-rest-api:0.3.1-0
env_file: cadd-rest-api.env
command: ["wsgi"]
# [...]
# You have to provide multiple cadd-rest-api-celeryd-worker container if
# you want to handle more than one query at a time.
cadd-rest-api-celeryd-worker-1:
[...]
cadd-rest-api-celeryd-worker-3:
image: bihealth/cadd-rest-api:0.3.2-0
env_file: cadd-rest-api.env
command: ["celeryd"]
networks: [varfish]
restart: unless-stopped
volumes:
- "./volumes/cadd-rest-api/data/annotations:/opt/miniconda3/share/cadd-scripts-1.6-0/data/annotations:ro"
- "./volumes/cadd-rest-api/data/prescored:/opt/miniconda3/share/cadd-scripts-1.6-0/data/prescored:ro"
- "./volumes/cadd-rest-api/db:/data/db:rw"
Finally, restart your Docker container cluster including the new containers with docker-compose down && docker-compose up -d
.
System Configuration
This section describes how to configure the varfish-docker-compose
setup.
When running with the varfish-docker-compose
files and the provided database files, VarFish comes preconfigured with sensible default settings and also contains some example datasets to try out.
There are a few things that you might want to tweak.
Please note that there might be more settings that you can change when exploring the VarFish source code but right now their use is not supported for external users.
VarFish & Docker Compose
The recommended (and supported) way to deploy VarFish is using Docker compose.
The VarFish server and its component are not installed on the system itself but rather a number of Docker containers with fixed Docker images are run and work together.
The base docker-compose.yml
file starts a fully functional VarFish server.
Docker Compose supports using so-called override files.
Basically, the mechanism works by providing an docker-compose.override.yml
file that is automatically read at startup when running docker-compose up
.
This file is put into the .gitignore so it is not in the varfish-docker-compose
repository but rather created in the checkouts (e.g., manually or using a configuration management tool such as Ansible).
On startup, Docker Compose will read first the base docker-compose.yml
file.
It will then read the override file (if it exists) and recursively merge both YAML files with the override file overriding taking precedence over the base file.
Note that the recursive merging will be done on YAML dicts only, lists will overwritten.
The mechanism in detail is described in the official documentation.
We provide the following files that you can use/combine into the local docker-compose.override.yml
file of your installation.
docker-compose.override.yml-cert
– use TLS encryption with your own certificate from your favourite certificate provider (by default an automatically generated self-signed certificate will be used by traefik, the reverse proxy).docker-compose.override.yml-letsencrypt
– use letsencrypt to obtain a certificate.docker-compose.override.yml-cadd
– spawn Docker containers for allowing pathogenicity annotation of your variants with CADD.
The overall process is to copy any of the *.override.yml-*
files to docker-compose.yml
and adjusting it to your need (e.g., merging with another such file).
Note that you could also explicitely provide multiple override files but we do not consider this further. For more information on the override mechanism see the official documentation.
The following sections describe the possible adjustment with Docker Compose override files.
TLS / SSL Configuration
The varfish-docker-compose
setup uses traefik as a reverse proxy and must be reconfigured if you want to change the default behaviour of using self-signed certificates.
Use the contents of docker-compose.override.yml-cert
for providing your own certificate.
You have to put the cerver certificate and key into config/traefik/tls/server.crt
and server.key
and then restart the traefik
container.
Make sure to provide the full certificate chain if needed (e.g., for DFN issued certificates).
If your site is reachable from the internet then you can also use the contents of docker-compose.override.yml-letsencrypt
which will use [letsencrypt](https://letsencrypt.org/) to obtain the certificates.
Make sure to adjust the line with --certificatesresolvers.le.acme.email=
to your email address.
Note well that if you make your site reachable from the internet then you should be aware of the implications.
VarFish is MIT licensed software which means that it comes “without any warranty of any kind”, see the LICENSE
file for details.
After changing the configuration, restart the site (e.g., with docker-compose down && docker-compose up -d
if it is running in detached mode).
LDAP Configuration
VarFish can be configured to use up to two upstream LDAP servers (e.g., OpenLDAP or Microsoft Active Directory).
For this, you have to set the following environment variables in the file .env
in your varfish-docker-compose
checkout and restart the site.
The variables are given with their default values.
ENABLE_LDAP=0
Enable primary LDAP authentication server (values:
0
,1
).AUTH_LDAP_SERVER_URI=
URI for primary LDAP server (e.g.,
ldap://ldap.example.com:port
orldaps://...
).AUTH_LDAP_BIND_DN=
Distinguished name (DN) to use for binding to the LDAP server.
AUTH_LDAP_BIND_PASSWORD=
Password to use for binding to the LDAP server.
AUTH_LDAP_USER_SEARCH_BASE=
DN to use for the search base, e.g.,
DC=com,DC=example,DC=ldap
AUTH_LDAP_USERNAME_DOMAIN=
Domain to use for user names, e.g. with
EXAMPLE
users from this domain can login withuser@EXAMPLE
.AUTH_LDAP_DOMAIN_PRINTABLE=${AUTH_LDAP_USERNAME_DOMAIN}
Domain used for printing the user name.
If you have the first LDAP configured then you can also enable the second one and configure it.
ENABLE_LDAP_SECONDARY=0
Enable secondary LDAP authentication server (values:
0
,1
).
The remaining variable names are derived from the ones of the primary server but using the prefix AUTH_LDAP2
instead of AUTH_LDAP
.
SAML Configuration
Besides LDAP configuration, it is also possible to authenticate with existing SAML 2.0 ID Providers (e.g. Keycloak). Since varfish is built on top of sodar core, you can also refer to the sodar-core documentation for further help in configuring the ID Providers.
To enable SAML authentication with your ID Provider, a few steps are necessary. First, add a SAML Client for your ID Provider of choice. The sodar-core documentation features examples for Keycloak. Make sure you have assertion signing turned on and allow redirects to your varfish site.
The SAML processing URL should be set to the externally visible address of your varfish deployment, e.g. https://varfish.example.com/saml2_auth/acs/
.
Next, you need to obtain your metadata.xml aswell as the signing certificate and key file from the ID Provider. Make sure you convert these keys to standard OpenSSL
format, before starting your varfish instance (you can find more details here).
If you deploy varfish without docker, you can pass the file paths of your metadata.xml and key pair directly. Otherwise, make sure that you have included them
into a single folder and added the corresponding folder to your docker-compose.yml
(or add it as a docker-compose-overrrided.yml
), like in the following snippet.
varfish-web:
...
volumes:
- "/path/to/my/secrets:/secrets:ro"
Then, define atleast the following variables in your docker-compose .env
file (or the environment variables when running the server natively).
ENABLE_SAML
[Default 0] Enable [1] or Disable [0] SAML authentication
SAML_CLIENT_ENTITY_ID
The SAML client ID set in the ID Provider config (e.g. “varfish”)
SAML_CLIENT_ENTITY_URL
The externally visible URL of your varfish deployment
SAML_CLIENT_METADATA_FILE
The path to the metadata.xml file retrieved from your ID Provider. If you deploy using docker, this must be a path inside the container.
SAML_CLLIENT_IDP
The url to your IDP. In case of keycloak it can look something like https://keycloak.example.com/auth/realms/<my_varfish_realm>
SAML_CLIENT_KEY_FILE
Path to the SAML signing key for the client.
SAML_CLIENT_CERT_FILE
Path to the SAML certificate for the client.
SAML_CLIENT_XMLSEC1
[Default /usr/bin/xmlsec1] Path to the xmlsec executable.
By default, the SAML attributes map is configured to work with Keycloak as SAML Auth provider. If you are using a different ID Provider,
or different settings you also need to adjust the SAML_ATTRIBUTES_MAP
option.
SAML_ATTRIBUTES_MAP
A dictionary identifying the SAML claims needed to retrieve user information. You need to set atleast
email
,username
,first_name
andlast_name
. Example:SAML_ATTRIBUTES_MAP="email=email,username=uid,first_name=firstName,last_name=name"
To set initial user permissions on first login, you can use the following options:
SAML_NEW_USER_GROUPS
Comma separated list of groups for a new user to join.
SAML_NEW_USER_ACTIVE_STATUS
[Default True] Whether a new user is considered active.
SAML_NEW_USER_STAFF_STATUS
[Default True] New users get the staff status.
SAML_NEW_USER_SUPERUSER_STATUS
[Default False] New users are marked superusers (I advise leaving this one alone).
If you encounter any troubles with this rather involved procedure, feel free to take a look at the discussion forums on github and open a thread.
Sending of Emails
You can configure VarFish to send out emails, e.g., when permissions are granted to users.
PROJECTROLES_SEND_EMAIL=0
Enable sending of emails.
EMAIL_SENDER=
String to use for the sender, e.g.,
noreply@varfish.example.com
.EMAIL_SUBJECT_PREFIX=
Prefix to use for email subjects, e.g.,
[VarFish]
.EMAIL_URL=
URL to the SMTP server to use, e.g.,
smtp://user:password@mail.example.com:1234
.
External Postgres Server
In some setups, it might make sense to run your own Postgres server. The most common use case would be that you want to run VarFish in a setting where fast disks are not available (virtual machines or in a “cloud” setting). You might still have a dedicated, fast Postgres server running (or available as a service from your cloud provider). In this case, you can configure the database connection settings as follows.
DATABASE_URL=postgresql://postgres:password@postgres/varfish
Adjust to the credentials, server, and database name that you want to use.
The default settings do not make for secure settings in the general case.
However, Docker Compose will create a private network that is only available to the Docker containers.
In the default docker-compose
setup, postgres server is thus not exposed to the outside and only reachable by the VarFish web server and queue workers.
Miscellaneous Configuration
VARFISH_LOGIN_PAGE_TEXT
Text to display on the login page.
FIELD_ENCRYPTION_KEY
Key to use for encrypting secrets in the database (such as saved public keys for the Beacon Site feature). You can generate such a key with the following command:
python -c 'import os, base64; print(base64.urlsafe_b64encode(os.urandom(32)))'
.VARFISH_QUERY_MAX_UNION
Maximal number of cases to query for at the same time for joint queries. Default is
20
.
Sentry Configuration
Sentry is a service for monitoring web apps. Their open source version can be installed on premise. You can configure sentry support as follows
ENABLE_SENTRY=0
Enable Sentry support.
SENTRY_DSN=
A sentry DSN to report to. See Sentry documentation for details.
HGMD Professional Documentation
Users can enable a gene and variant wise link-out to HGMD professional as follows.
VARFISH_ENABLE_HGMD_PRO_LINKOUT=0
Enable HGMD Professional link-out.
VARFISH_HGMD_PRO_LINKOUT_URL_PREFIX=https://my.qiagendigitalinsights.com/bbp/view/hgmd/pro/](https://my.qiagendigitalinsights.com/bbp/view/hgmd/pro
Configure the URL prefix for HGMD Professional link-outs.
System and Docker (Compose) Tweaks
A number of customizations customizations of the installation can be done using Docker or Docker Compose. Other customizations have to be done on the system level. This section lists those that the authors are aware of but in particular network-related settings can be done on many levels.
Using Non-Default HTTP(S) Ports
If you want to use non-standard HTTP and HTTPS ports (defaults are 80 and 443) then you can tweak this in the traefik
container section.
You have to adjust two parts, below we give them separately with full YAML “key” paths.
services:
traefik:
ports:
- "80:80"
- "443:443"
To listen on ports 8080
and 8443
instead, your override file should have:
- services:
- traefik:
- ports:
“8080:80”
“8443:443”
Also, you have to adjust the command line arguments to traefik for the web
(HTTP) and websecure
(HTTPS) entrypoints.
services:
traefik:
command:
# ...
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"
Use the following in your override file.
services:
traefik:
command:
# ...
- "--entrypoints.web.address=:8080"
- "--entrypoints.websecure.address=:8443"
Based on the docker-compose.yml
file alone, your docker-compose.override.yml
file should contain the following line.
You will have to adjust the file accordingly if you want to use a custom static certificate or letsencrypt by incorporating the files from the provided example docker-compose.override.yml-*
files.
services:
traefik:
ports:
- "8080:80"
- "8443:443"
command:
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--entrypoints.web.address=:80"
- "--entrypoints.web.http.redirections.entryPoint.to=websecure"
- "--entrypoints.web.http.redirections.entryPoint.scheme=https"
- "--entrypoints.web.http.redirections.entrypoint.permanent=true"
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"
Then, restart by calling docker-compose up -d
in the directory with the docker-compose.yml
file.
Listing on Specific IPs
By default, the traefik
container will listen on all IPs and interfaces of the host machine.
You can change this by prefixing the ports
list with the IPs to listen on.
The settings to adjust here are:
services:
traefik:
ports:
- "80:80"
- "443:443"
And they need to be overwritten as follows in your override file.
services:
traefik:
ports:
- "10.0.0.1:80:80"
- "10.0.0.1:443:443"
More details can be found in the corresponding section of the Docker Compose manual.
Of course, you can combine this with adjusting the ports, e.g., to 10.0.0.1:8080:80
etc.
Limit Incoming Traffic
In some settings you might want to limit incoming traffic to certain networks / IP ranges.
In principle, this is possible with adjusting the Traefik load balancer/reverse proxy.
However, we would recommend you to use the firewall of your operating system or your overall network for this purpose.
Consult the corresponding manual (e.g., of firewalld
for CentOS/Red Hat or of ufw
for Debian/Ubuntu) for instructions.
We remark that in most cases it is better to perform an actual separation of networks and place each (virtual) machine into one network only.
Understanding Volumes
The volumes
sub directory of the varfish-docker-compose
directory contains the data for the containers.
These are as follows.
cadd-rest-api
Databases for variant annotation with CADD (large).
exomiser
Databases for variant prioritization (medium)
jannovar
Transcript databases for annotation (small).
minio
Storage for files uploaded from client via REST API (big).
postgres
PostgreSQL databases (very big).
redis
Storage for the work queues (small).
traefik
Configuration and certificates for load balancer (very small).
In principle, you can put these on different storages systems (e.g., some over the network and some on directly attached disks).
The main motivation is that fast storage is expensive.
Putting the small and medium sized directories on slower, cheaper storage will have little or no effect on storage efficiency.
At the same time, access to redis
and exomiser
directories should be fast.
As for postgres
, this storage is accessed most heavily and should be on storage as fast as you can afford.
cadd-rest-api
should also be on fast storage but it is accessed almost only read-only.
You can put the minio
folder on slower storage to shave off some storage costs from your VarFish installation.
To summarize:
You can put
minio
on cheaper storage.As for
cadd-rest-api
, you can probably get away to put this on cheaper storage.Put everything else, in particular
postgres
on storage as fast as you can afford.
As described in the section Performance Tuning, the authors recommend using an advanced file system such as ZFS on multiple SSDs for large, fast storage and enabling compression. You will get excellent performance and can expect storage saving of 50%.
Beacon Site (Experimental)
An experimental support for the GA4GH beacon protocol.
VARFISH_ENABLE_BEACON_SITE=
Whether or not to enable experimental beacon site support.
Undocumented Configuration
The following list remains a points to implement with Docker Compose and document.
Kiosk Mode
Updating Extras Data
Ingesting Variants
This step describes how to ingest data into VarFish, that is
annotating variants and preparing them for import into VarFish
actually importing them into VarFish.
All of the steps below assume that you are running the Linux operating system. It might also work on Mac OS but is curently unsupported.
Variant Annotation
In order to import a VCF file with SNVs and small indels, the file has to be prepared for import into VarFish server. This is done using the Varfish Annotator software.
Installing the Annotator
The VarFish Annotator is written in Java and you can find the JAR on varfish-annotator Github releases page.
However, it is recommended to install it via bioconda.
For this, you first have to install bioconda as described in their manual.
Please ensure that you have the channels conda-forge
, bioconda
, and defaults
set in the correct order as described in the bioconda installation manual.
A common pitfall is to forget the channel setup and subsequent failure to install varfish-annotator
.
The next step is to install the varfish-annotator-cli package or create a conda environment with it.
# EITHER
$ conda install -y varfish-annotator-cli==0.14.0
# OR
$ conda create -y -n varfish-annotator varfish-annotator-cli==0.14.0
$ conda activate varfish-annotator
As a side remark, you might consider installing mamba
first and then using mamba install
and create
in favour of conda install
and create
.
Obtaining the Annotator Data
The downloaded archive has a size of ~10 GB while the extracted data has a size of ~55 GB.
$ GENOME=grch37 # alternatively use grch38
$ RELEASE=20210728
$ mkdir varfish-annotator-20210728-$GENOME
$ cd varfish-annotator-20210728-$GENOME
$ wget --no-check-certificate \
https://file-public.cubi.bihealth.org/transient/varfish/anthenea/varfish-annotator-db-$RELEASE-$GENOME.h2.db.gz{,.sha256} \
https://file-public.cubi.bihealth.org/transient/varfish/anthenea/jannovar-db-$RELEASE-$GENOME.tar.gz{,.sha256}
$ sha256sum --check varfish-annotator-db-$RELEASE-$GENOME.h2.db.gz.sha256
varfish-annotator-db-20210728-grch37.h2.db.gz: OK
$ sha256sum --check jannovar-db-$RELEASE-$GENOME.tar.gz.sha256
jannovar-db-20210728-grch37.tar.gz: OK
$ gzip -d varfish-annotator-db-$RELEASE-$GENOME.h2.db.gz
$ tar xf jannovar-db-$RELEASE-$GENOME.tar.gz
$ rm jannovar-db-20210728-$RELEASE.tar.gz{,.sha256} \
varfish-annotator-db-$RELEASE-$GENOME.h2.db.gz.sha256
$ mv jannovar-db-$RELEASE-$GENOME/* .
$ rmdir jannovar-db-$RELEASE-$GENOME
Annotating VCF Files
First, obtain some tests data for annotation and later import into VarFish Server.
# use $GENOME and $RELEASE from above
$ wget --no-check-certificate \
https://file-public.cubi.bihealth.org/transient/varfish/anthenea/varfish-test-data-v1-20211125.tar.gz{,.sha256}
$ sha256sum --check varfish-test-data-v1-20211125.tar.gz.sha256
varfish-test-data-v1-20211125.tar.gz: OK
$ tar -xf varfish-test-data-v1-20211125.tar.gz
varfish-test-data-v1-20211125/
...
varfish-test-data-v1-20211125/GRCh37/vcf/HG00107-N1-DNA1-WES1/bwa.gatk_hc.HG00107-N1-DNA1-WES1.vcf.gz
...
Annotating Small Variant VCFs
Next, you can use the varfish-annotator
command.
You must provide an bgzip-compressed VCF file INPUT.vcf.gz
1# Use the path to the FASTA file that you used for alignment.
2$ REFERENCE=path/to/hs37fa.fa--or--hs38.fa
3# use $GENOME and $RELEASE from above
4$ varfish-annotator \
5 -XX:MaxHeapSize=10g \
6 -XX:+UseConcMarkSweepGC \
7 annotate \
8 --db-path varfish-annotator-20210728-$GENOME/varfish-annotator-db-$RELEASE-$GENOME.h2.db \
9 --ensembl-ser-path varfish-annotator-20210728-$GENOME/ensembl*.ser \
10 --refseq-ser-path varfish-annotator-20210728-$GENOME/refseq_curated*.ser \
11 --ref-path $REFERENCE \
12 --input-vcf "INPUT.vcf.gz" \
13 --release "$GENOME" \
14 --output-db-info "FAM_name.db-infos.tsv" \
15 --output-gts "FAM_name.gts.tsv" \
16 --case-id "FAM_name"
Let us disect this call.
The first three lines contain the code to the wrapper script and some arguments for the java
binary to allow for enough memory when running.
1$ varfish-annotator \
2 -XX:MaxHeapSize=10g \
3 -XX:+UseConcMarkSweepGC \
The next lines use the annotate
sub command and provide the needed paths to the database files needed for annotation.
The .h2.db
file contains information from variant databases such as gnomAD and ClinVar.
The .ser
file are transcript databases used by the Jannovar library.
The .fa
file is the path to the genome reference file used.
While only release GRCh37/hg19 is supported, using a file with UCSC-style chromosome names having chr
prefixes would also work.
4--db-path varfish-annotator-20210728-$GENOME/varfish-annotator-db-$RELEASE-$GENOME.h2.db \ --ensembl-ser-path varfish-annotator-20210728-$GENOME/ensembl*.ser \ --refseq-ser-path varfish-annotator-20210728-$GENOME/refseq_curated*.ser \ --ref-path $REFERENCE \
The following lines provide the path to the input VCF file, specify the release name (must be GRCh37
) and the name of the case as written out.
This could be the name of the index patient, for example.
9--input-vcf "INPUT.vcf.gz" \ --release "GRCh37" \ --case-id "index" \
The last lines
12--output-db-info "FAM_name.db-info.tsv" \ --output-gts "FAM_name.gts.tsv"
After the program terminates, you should create gzip files for the created TSV files and md5 sum files for them.
$ gzip -c FAM_name.db-info.tsv >FAM_name.db-info.tsv.gz
$ md5sum FAM_name.db-info.tsv.gz >FAM_name.db-info.tsv.gz.md5
$ gzip -c FAM_name.gts.tsv >FAM_name.gts.tsv.gz
$ md5sum FAM_name.gts.tsv.gz >FAM_name.gts.tsv.gz.md5
The next step is to import these files into VarFish server. For this, a PLINK PED file has to be provided. This is a tab-separated values (TSV) file with the following columns:
family name
individul name
father name or
0
for foundermother name or
0
for foundersex of individual,
1
for male,2
for female,0
if unknowndisease state of individual,
1
for unaffected,2
for affected,0
if unknown
For example, a trio would look as follows:
FAM_index index father mother 2 2
FAM_index father 0 0 1 1
FAM_index mother 0 0 2 1
while a singleton could look as follows:
FAM_index index 0 0 2 1
Note that you have to link family individuals with pseudo entries that have no corresponding entry in the VCF file. For example, if you have genotypes for two siblings but none for the parents:
FAM_index sister father mother 2 2
FAM_index broth father mother 2 2
FAM_index father 0 0 1 1
FAM_index mother 0 0 2 1
Annotating Structural Variant VCFs
Structural variants can be annotated as follows.
1# use $GENOME from above
2$ varfish-annotator \
3 annotate-svs \
4 -XX:MaxHeapSize=10g \
5 -XX:+UseConcMarkSweepGC \
6 \
7 --default-sv-method=YOURCALLERvVERSION"
8 --release $GENOME \
9 \
10 --db-path varfish-annotator-20210728-$GENOME/varfish-annotator-db-$RELEASE-$GENOME.h2.db \
11 --ensembl-ser-path varfish-annotator-20210728-$GENOME/ensembl*.ser \
12 --refseq-ser-path varfish-annotator-20210728-$GENOME/refseq_curated*.ser \
13 \
14 --input-vcf FAM_sv_calls.vcf.gz \
15 --output-db-info FAM_sv_calls.db-info.tsv \
16 --output-gts FAM_sv_calls.gts.tsv
17 --output-feature-effects CASE_SV_CALLS.feature-effects.tsv
Note
varfish-annotator annotate-svs
will write out the INFO/SVMETHOD
column to the output file.
If this value is empty then the value from --default-sv-method
will be used.
You must either provide INFO/SVMETHOD
or --default-sv-method
.
Otherwise, you will get errors in the import step (visible in the case import background task view).
You can use the following shell snippet for adding INFO/SVMETHOD
to your VCF file properly.
Replace YOURCALLERvVERSION
with the value that you want to provide to Varfish.
cat >$TMPDIR/header.txt <<"EOF"
##INFO=<ID=SVMETHOD,Number=1,Type=String,Description="Type of approach used to detect SV">
EOF
bcftools annotate \
--header-lines $TMPDIR/header.txt \
INPUT.vcf.gz \
| awk -F $'\t' '
BEGIN { OFS = FS; }
/^#/ { print $0; }
/^[^#]/ { $8 = $8 ";SVMETHOD=YOURCALLERvVERSION"; print $0; }
' \
| bgzip -c \
> OUTPUT.vcf.gz
tabix -f OUTPUT.vcf.gz
Again, you have have to compress the output TSV files with gzip
and compute MD5 sums.
$ gzip -c FAM_sv_calls.db-info.tsv >FAM_sv_calls.db-info.tsv.gz
$ md5sum FAM_sv_calls.db-info.tsv.gz >FAM_sv_calls.db-info.tsv.gz.md5
$ gzip -c FAM_sv_calls.gts.tsv >FAM_sv_calls.gts.tsv.gz
$ md5sum FAM_sv_calls.gts.tsv.gz >FAM_sv_calls.gts.tsv.gz.md5
$ gzip -c FAM_sv_calls.feature-effects.tsv >FAM_sv_calls.feature-effects.tsv.gz
$ md5sum FAM_sv_calls.feature-effects.tsv.gz >FAM_sv_calls.feature-effectstsv.gz.md5
Variant Import
As a prerequisite you need to install the VarFish command line interface (CLI) Python app varfish-cli
.
You can install it from PyPi with pip install varfish-cli
or from Bioconda with conda install varfish-cli
.
Second, you need to create a new API token as described in API Token Management.
Then, setup your Varfish CLI configuration file ~/.varfishrc.toml
as:
[global]
varfish_server_url = "https://varfish.example.com/"
varfish_api_token = "XXX"
Now you can import the data that you imported above.
You will also find some example files in the test-data
directory.
For the import you will also need the project UUID. You can get this from the URLs in VarFish that list project properties. The figure below shows this for the background job list but this also works for the project details view.
$ varfish-cli --no-verify-ssl case create-import-info --resubmit \
94777783-8797-429c-870d-c12bec2dd6ea \
test-data/tsv/HG00102-N1-DNA1-WES1/*.{tsv.gz,.ped}
When executing the import as shown above, you have to specify:
a pedigree file with suffix
.ped
,a genotype annotation file as generated by
varfish-annotator
ending in.gts.tsv.gz
,a database info file as generated by
varfish-annotator
ending in.db-info.tsv.gz
.
Optionally, you can also specify a TSV file with BAM quality control metris ending in .bam-qc.tsv.gz
.
Currently, the format is not properly documented yet but documentation and supporting tools are forthcoming.
If you want to import structural variants for your case, then you simply submit the output files from the SV annotation step together with the the .feature-effects.tsv.gz
and .gts.tsv.gz
files from the small variant annotation step.
Running the import command through VarFish CLI will create a background import job as shown below. Once the job is done, the created or updated case will appear in the case list.

Case Quality Control
You can provide an optional TSV file with case quality control data.
The file name should end in .bam-qc.tsv.gz
and also accompanied with a MD5 file.
The format is a bit peculiar and will be documented better in the future.
The TSV file has three columns and starts with the header.
case_id set_id bam_stats
It is then followed by exactly one line where the first two fields have to have the value of a dot (.
).
The last row is then a PostgreSQL-encoded JSON dict with the per-sample quality control information.
You can obtain the PostgreSQL-encoding by replacing all string delimiters ("
) with three ones (""""`
).
The format of the JSON file is formally defined in varfish-server case QC info.
Briefly, the keys of the top level dict are the sample names as in the case that you upload. On the second level:
bamstats
The keys/values from the output of the
samtools stats
command.min_cov_target
Coverage histogram per target (the smallest coverage per target/exon counts for the whole target). You provide the start of each bin, usually starting at
"0"
, in increments of 10, up to"200"
. The keys are the bin lower bounds, the values are of JSON/JavaScriptnumber
type, so floating point numbers.min_cov_base
The same information as
min_cov_target
but considering coverage base-wise and not target-wise.summary
A summary of the target information.
idxstats
A per-chromosome count of mapped and unmapped reads as returned by the
samtools idxstats
command.
You can find the example of a real-world JSON QC file below for the first sample.
{
"index": {
"bamstats": {
"raw total sequences": 154189250,
"filtered sequences": 0,
"sequences": 154189250,
"is sorted": 1,
"1st fragments": 77094625,
"last fragments": 77094625,
"reads mapped": 153919815,
"reads mapped and paired": 153863370,
"reads unmapped": 269435,
"reads properly paired": 153071356,
"reads paired": 154189250,
"reads duplicated": 7273644,
"reads MQ0": 2701485,
"reads QC failed": 0,
"non-primary alignments": 129724,
"total length": 19427845500,
"total first fragment length": 9713922750,
"total last fragment length": 9713922750,
"bases mapped": 19393896690,
"bases mapped (cigar)": 19238950186,
"bases trimmed": 0,
"bases duplicated": 916479144,
"mismatches": 61093079,
"error rate": 0.003175489,
"average length": 126,
"average first fragment length": 126,
"average last fragment length": 126,
"maximum length": 126,
"maximum first fragment length": 126,
"maximum last fragment length": 126,
"average quality": 35,
"insert size average": 192.6,
"insert size standard deviation": 54.3,
"inward oriented pairs": 73269191,
"outward oriented pairs": 3391556,
"pairs with other orientation": 12579,
"pairs on different chromosomes": 258359,
"percentage of properly paired reads (%)": 99.3
},
"min_cov_target": {
"0": 100,
"10": 87.59,
"190": 12.31,
"200": 10.74
},
"min_cov_base": {
"0": 100,
"10": 95.89,
"190": 46.55,
"200": 43.88
},
"summary": {
"mean coverage": 206.69,
"target count": 232447,
"total target size": 57464133
},
"idxstats": {
"1": {
"mapped": 14553406,
"unmapped": 5166
},
"MT": {
"mapped": 10058,
"unmapped": 7
},
"*": {
"mapped": 0,
"unmapped": 212990
}
}
},
"father": {
"bamstats": {
Performance Tuning
This chapter describes how to optimize the performance of VarFish and its components. Mainly, this amounts to optimizing the hardware and software of the PostgreSQL server used by VarFish. The audience of this chapter are those who have installed VarFish on their own infrastructure.
Selecting Hardware
Hardware selection is the most critical point. The sizing of CPU and RAM is not so critical for VarFish. 16 CPU cores and 64 GB of RAM should be good to start with while more will not hurt and is not that expensive these days. The focus should be in using a server with fast disk I/O.
From the author’s experience the ideal build consists of
multiple SSD disk,
host bus adapter (as opposed to a RAID controller),
using a ZFS setup.
The SSDs offer overall good throughput and excellent random I/O performance in particular.
They should appear as block devices (e.g., sda
) to the operating system such that ZFS can use them properly.
You will find that there is some discussion on the best setup of ZFS.
We have found ten SSDS in a single raidz2 pool with enabled compression (default) on the file system to offer excellent performance.
Further, up to two disks can fail without loss of data.
Of course, you can also use a classic hardware RAID controller. We would advise against storing data on a SAN system and always recommend local disks (aka direct storage). While VarFish will run fine in a virtual machine, you have to take good care that disk access is fast. In particular, the QCOW driver of KVM is known to offer bad performance.
Configuration Tuning
The varfish-docker-compose repository contains a postgresql.conf
file with pre-tuned database settings.
When using Docker Compose for your VarFish site you will get this configuration automatically.
This should be good enough for most instances.
Below are some proposals for starting points on tuning configuration. Please consult the Postgres configuration documentation on all settings. You will also find many resources on Postgres performance tuning on the internet using your favourite search engine.
ZFS optimization.
In the case that you store your database files on a ZFS file system you can try setting the full_page_writes
setting to off
.
This will improve the write performance and according to various sources ZFS file systems are “torn page resilient” which prevents data loss.
full_page_writes = off # only do this on ZFS (!)
SSD optimization.
If you are using SSDs then you can adjust the value of random_page_cost
.
This value helps the Postgres query planner to estimate the cost of random vs. sequential data access.
For SSDs, you can set this to 1.1
:
random_page_cost = 1.1 # optimized for SSD
Placing Tables and Indices
In principle, you can the table space feature of PostgreSQL to move certain tables and indices to different storage classes. The following tables and their indices are large and read-only after the initial import.
conservation_knowngeneaa
dbsnp_dbsnp
frequencies_*
extra_annos_*
Moving them to cheaper storage with higher latency than the rest of the data might be feasible if you are hard-pressed for saving storage. The authors have not tried this and would be very interested in experience reports.
Reference Times
For reference, here are some timings for importing the background database on different hardware.
Data |
VarFish |
Postgres |
Storage |
File System |
Time |
---|---|---|---|---|---|
20210728-grch37 |
v0.23.9+42 |
12.9 |
25xSSD RBD 16.2.7 |
XFS |
13.5h |
20210728-grch38 |
v0.23.9+42 |
12.9 |
25xSSD RBD 16.2.7 |
XFS |
15h |
And some times for importing exome cases. Note that you can import multiple cases at the same time.
Data |
VarFish |
Postgres |
Storage |
File System |
Time |
---|---|---|---|---|---|
WES singleton |
v0.23.9+42 |
12.9 |
25xSSD RBD 16.2.7 |
XFS |
2-3 min |
WES trio |
v0.23.9+42 |
12.9 |
25xSSD RBD 16.2.7 |
XFS |
5-10 min |
Upgrade Varfish Installation
This section contains upgrade instructions for upgrading your VarFish Server installation using VarFish Docker Compose.
Problem with Data Release 20210728
and GRCh37
The data release has a problem with the GRCh37 extra annotations.
If you can then use the updated site data 20210728b
release.
If you already have an instance with 20210728
background data then you can use the following data file.
Download and extract the file and mount it as /data
inside the varfish-web
container.
You can then apply the patch to your database with the following command.
$ docker exec -it varfish-docker-compose_varfish-web_1 python /usr/src/app/manage.py \
import_tables --tables-path /data --truncate --force
You can find out more details, give feedback, and ask for help in this Github discussion.
v0.23.0 to v1.2.0
This includes all version in between, v0.23.1, …, v1.2.0.
Summary
This are minor bug fix releases and small added features.
You should be able to upgrade by just updating your varfish-docker-compose
repository clone and calling docker-compose up -d
.
v0.23.1 to v0.23.2
Summary
This is a minor bug fix release that improved the deployment of the VarFish Demo and Kiosk sites.
You should be able to upgrade by just updating your varfish-docker-compose
repository clone and calling docker-compose up -d
.
v0.22.1 to v0.23.0
Summary
The Docker Compose installer now provides support for setting up CADD score annotation via cadd-rest-api.
The environment variable
FIELD_ENCRYPTION_KEY
should be setup properly by the user.Two new celery queues are needed:
maintenance
andexport
.To enable the new and optional feature for uploading variants to SPANR you have to set the environment variable
VARFISH_ENABLE_SPANR_SUBMISSION
to1
.
Detailed Instructions
Docker Compose: cadd-rest-api
Update your varfish-docker-compose installation with the changes from the Github repository without installing cadd-rest-api.
This will give you commented out lines for running one cadd-rest-api-server
and multiple cadd-rest-api-celeryd-worker-?
containers.
For enabling them, follow the instructions in Install Scoring with CADD.
Additional Celery Queues
After updating your varfish-docker-compose.yml
file, ensure that you the two additional containers varfish-celeryd-maintenance
and varfish-celeryd-export
.
These will run the background jobs for running maintenance tasks and export results.
They will be started when running docker-compose up
.
Environment Variable: FIELD_ENCRYPTION_KEY
Set the environment variable in the .env
file as documented in Miscellaneous Configuration.
The default value is also stored in the public repository and thus not very secure.
PAP Configuration
This section describes the setup of VarFish behind a PAP (package filter, application gateway, package filter) structure.
VarFish stores human genetic data which is by its very nature very privacy sensitives. Administrators will thus want to set up VarFish in protected institution networks that are not accessible by the outside world. However, certain data exchange is generally desired, such as connecting two or more VarFish instances with the clinical beacon protocol.
PAP Structure
In such cases, the German agency for information security (BSI) recommends the P-A-P structure (link to 2021 edition of their recommendation). The following figure illustrates the structure

Overview of VarFish server behind P-A-P structure.
The structure is as follows:
A demilitarized zone (DMZ) network is setup to contain an application gateway. In the case of HTTP(S), this is a reverse proxy.
Incoming traffic from the internet passes into the gateway passes through a packetfilter (in other words: firewall).
Outgoing traffic out of the gateway passes another packetfilter and it then reaches the destination server in protected network.
The reasoning behind the structure is explained in the NET 3.2 document linked to above. In the following section, we will explain the technical implementation.
Firewall and Network Setup
The German specification NET.3.2.A16 is as follows:
NET.3.2.A16 Aufbau einer “P-A-P” Struktur (S) Eine “Paketfilter - Application-Level-Gateway - Paketfilter”-(P-A-P)-Struktur SOLLTE eingesetzt werden. Sie MUSS aus mehreren Komponenten mit jeweils dafür geeigneter Hard- und Software bestehen. Für die wichtigsten verwendeten Protokolle SOLLTEN Sicherheitsproxies auf Anwendungsschicht vorhanden sein. Für andere Dienste SOLLTEN zumindest generische Sicherheitsproxies für TCP und UDP genutzt werden. Die Sicherheitsproxies SOLLTEN zudem innerhalb einer abgesicherten Laufzeitumgebung des Betriebssystems ablaufen.
Which translates into English roughly as follows:
NET.3.2.A16 Creating a “P-A-P” Structure (S) A “packet filter - application level gateway - packet filter”-(P-A-P)-Structure SHOULD be used. It MUST consist of multiple components with appropriate hardware and software. For the most important protocols, security proxies SHOULD exist on the application layer. For other services, at least generic security proxies for TCP and UDP SHOULD be used. The security proxies SHOULD run inside a secured runtime enviornment of the operating system.
A possible implementation looks as follows:
The VarFish server runs in the internal network with IP
10.0.10.10
.Create a separate VLAN for the PAP structure and use a /30 (or lower) CIDR prefix. Only place proxy services there, ideally only one.
Example: use
1.2.3.0/30
with IP gateway1.2.3.1
and application gateway server1.2.3.2
.
Configure the firewall to allow incoming traffic via HTTPS (TCP/443) to
1.2.3.2
only.Allow outgoing traffic from
192.168.0.1
via the packet filter to10.0.10.10
via HTTPS (TCP/443) only.
The following section describes how to setup a Linux Docker container with the traefik reverse proxy. To the authors’ best understanding, this fulfills all of the required and optional rules for P-A-P by BSI.
Traefik Reverse Proxy Setup
Traefik is a versatile reverse proxy (and load balancer). It works well with Docker but configuring it can be a bit daunting for beginners. The following describes a straightforward and minimal setup.
Preparation:
Install a modern Linux server on the gateway server (
1.2.3.2
from above)On the server, install Docker following the official instructions
Also install Docker Compose with the official instructinos
Setup public DNS (e.g.,
varfish-ext.example.com
) to point to1.2.3.2
and ensure that public resolvers can resolve it (e.g., Google DNS at8.8.8.8
)We assume that your internal VarFish instance is available as
varfish-int.example.com
and it is setup with a valid TLS certificate.Collect the public IPs of the hosts on the internet that you want to be able to access your VarFish instance. These might be cluster IPs if the remote servers are behind NAT. In the example below we use the sub network
2.3.4.0/28
and IP3.4.5.6
as valid sources.
First, create some directories with the following command:
# mkdir -p /etc/reverse-proxy
# mkdir -p /etc/reverse-proxy/var/traefik
# mkdir -p /etc/reverse-proxy/etc/trafik
# mkdir -p /etc/reverse-proxy/etc/trafik/conf.d
Now, create the file /etc/reverse-proxy/docker-compose.yaml
as follows.
version: "2"
services:
traefik:
image: traefik:latest
restart: always
ports:
- "443:443"
networks:
- web
volumes:
- ./var/traefik:/var/traefik:rw
- ./etc/traefik:/etc/traefik:ro
container_name: traefik
networks:
web:
This will create a new container named traefik
with the latest version of Traefik.
The container goes into its own network and the port 443 is exposed.
The container can read /etc/reverse-proxy/traefik
as /etc/traefik
via a bind mount and read and write /etc/reverse-proxy/var/traefik
as /var/traefik
.
The first will contain configuration, the latter will be used for storing letsencrypt certificate generation state
Next, create /etc/reverse-proxy/etc/traefik/traefik.yaml
and /etc/reverse-proxy/etc/traefik/conf.d/dynamic_config.yaml
entryPoints:
websecure:
address: ":443"
providers:
file:
directory: /etc/traefik/conf.d
docker:
exposedByDefault: false
certificatesResolvers:
le:
acme:
email: youremail@example.com
storage: /var/traefik/acme.json
tlsChallenge: true
This will setup traefik correctly using letsencrypt certificate.
Note
Regarding use of “legacy” technical language.
Please note that the term ipwhitelist
below is part of the traefik configuration syntax.
We will update our documentation once updated terms are available.
# (1) TLS store
tls:
stores:
default: {}
http:
# (2) set routing source for reverse proxy
routers:
varfish:
middlewares:
- varfish-add-prefix
- varfish-ip-allowlist
entryPoints:
- websecure
service: varfish
rule: "Host(`varfish-ext.example.com`)"
tls:
certresolver: le
# (3) routing destination for the reverse proxy
services:
varfish:
loadBalancer:
servers:
- url: "https://varfish-int.bihealth.org"
middlewares:
# (4) expose only beaconsite endpoint
varfish-add-prefix:
addprefix:
prefix: "/beaconsite/endpoint"
varfish-ip-allowlist:
ipwhitelist:
sourcerange: "2.3.4.0/28,3.4.5.6"
This will setup the
TLS store for the certificates
routing source and
routing destination for the reverse proxy
automatically add
/beaconsite/endpoint
prefix so only the beaconsite endpoint is exposed, andrestrict access to the given source sites.
You can now startup the reverse proxy:
# cd /etc/reverse-proxy
# docker-compose up -d
You can inspect the logs by using docker logs --tail=100 --follow traefik
.
You can increase the log verbosity by placing the following block on top of traefik.yaml
.
log:
level: DEBUG
Data Backups
This section describes how to create data backups in VarFish. The assumption is that you are running VarFish in the recommended way via Docker Compose.
All valuable state is kept in the VarFish PostgreSQL database.
VarFish provides a convenient way to call the PostgreSQL tool pg_dump
.
You can call it in the following way when VarFish is running under Docker Compose and the postgres container is running as well.
# docker exec -it varfish-docker-compose_varfish-web_1 \
python /usr/src/app/manage.py pg_dump --mode=MODE
This will execute python /usr/src/app/manage.py pg_dump --mode=MODE
in the docker container that is running the VarFish web server.
You can use one of the following dump modes.
full
This will perform a full data dump including all background data.
backup-large
This will exclude the huge background data tables, e.g., dbSNP and gnomAD.
backup-small
This will also exclude all imported variant data. The assumption is that you have a separate backup of the imported TSV files or can easily regenerate them from the VCF files that you still have.
Here is an example on how to create a compressed “small” dump file named varfish-${day_of_week}.sql.gz
such that you get a rotating daily dump.
# docker exec -it varfish-docker-compose_varfish-web_1 \
python /usr/src/app/manage.py pg_dump --mode=MODE \
| gzip -c \
> varfish-$(date +%a).sql.gz
Introduction
This part describes strategies and procedures for the filtration of germline variant cases using the VarFish platform. It is meant as an addition to the standard VarFish manual in that it does not explain the individual VarFish functions in detail. Instead, it provides detailed instructions on how to filter cases from germline cases and contains proposed values and threshold for filter setting.
Intended Audience
The intended reader both has a good understanding of human/medical genetics high-throughput sequencing variant analysis (whole genome sequencing, or targeted/exome sequencing) and the resulting variant types. Further, the reader is interested in clinical genetics and the identification of pathogenic variants in Mendelian (monogenetic) disorders. The reader comes from a clinical diagnostics or research setting (or both). Thus, the overall aim is not to fundamentally educate in the application of high-throughput sequencing in a clinical settings. Rather, it provides instructions how to use VarFish for this application.
Structure of the SOPs
The term SOP (standard operating procedures) is meant here as a best effort to create reproducible approaches for causative variant identification in a research setting. The SOPs contained herein can serve as a starting point of creating actual clinical SOPs with adjustments to the clinical and laboratory setting. Of course, they should also be refined for the reader’s actual laboratory setting when used in a research setting as well.
Generally, all SOPs have the sections Aims/Scope, Results, Steps, and Thresholds. They document
the considered scope (and what is out of scope),
the expected result (and thus provide some guideline of what to check against),
the individual steps (in such brevity that each SOP fits on 1-2, ideally 1, page), and
finally the thresholds used for the individual thresholds with some reasoning (the thresholds are the largest reason for the second page).
References and Disclaimer
We expect the reader to be familiar with the relevant literature, including the following guidelines:
Richards, S., Aziz, N., Bale, S., Bick, D., Das, S., Gastier-Foster, J., … & Voelkerding, K. (2015). Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in medicine, 17(5), 405.
Ellard, S., Baple, E. L., Owens, M., Eccles, D. M., Abbs, S., Deans, Z. C., … & McMullan, D. J. (2017). ACGS best practice guidelines for variant classification 2017. ACGS Guidelines.
We close this introduction by emphasizing that VarFish is for research use only and by quoting the disclaimer of the ACMG guidelines that apply to the VarFish manual and the following SOPs in spirit as well.
These ACMG Standards and Guidelines were developed primarily as an educational resource for clinical laboratory geneticists to help them provide quality clinical laboratory services. Adherence to these standards and guidelines is voluntary and does not necessarily assure a successful medical outcome. These Standards and Guidelines should not be considered inclusive of all proper procedures and tests or exclusive of other procedures and tests that are reasonably directed to obtaining the same results. In determining the propriety of any specific procedure or test, the clinical laboratory geneticist should apply his or her own professional judgment to the specific circumstances presented by the individual patient or specimen. Clinical laboratory geneticists are encouraged to document in the patient’s record the rationale for the use of a particular procedure or test, whether or not it is in conformance with these Standards and Guidelines. They also are advised to take notice of the date any particular guideline was adopted and to consider other relevant medical and scientific information that becomes available after that date. It also would be prudent to consider whether intellectual property interests may restrict the performance of certain tests and other procedures.
Supporting SOPs
This appendix contains SOPs that do not directly deal with variant filtration but are supportive in the the workflow of causative variant identification in mendelian diseases from high-throughput sequencing data.
Contents
SOP: Quality Control
Aims and Scope
This SOP explains how to use VarFish to get a gauge of the quality of the exome at hand. For this, VarFish provides technical metrics such as exon depth of coverage and metrics that allow inference about the donor and thus allow to detect sample swaps.
Result
The result of this step is to indicate whether the sequencing results can be trusted in terms of consistency with pedigree and sex meta data and in terms of quality (depth of coverage and percentage of duplicated reads).
Steps
Consider the section “Alignment Quality Control”.
The table “Target Coverage” indicates the percentage of targets in each sample that has coverage of at least 10x, 20x, etc. The detection of het. variants below 10x is not reliable, 20x or more is recommended. However, also note that if an target has a coverage of above 10x in all but one position that falls to 9x, the target counts as not having at least 10x coverage. The thresholds from below have worked well for the authors using recent technologies.
The table “Stats” gives some overall sequencing metrics. The values of “Duplicates”. The values of “Pairs”, “Average Insert Sizes”, and “SD Insert Size” are mostly of informative value. They are useful for detecting outliers in the context of multiple samples of the same study.
Consider the Figures in QC Plots
The plot “Relatedness vs. IBS0” is only informative for families. The relatedness coefficient (RC) should be around 1.0 for parent-child relations, 0.5 for siblings and decreases further with lower relatedness. The IBS0 value is around 0.0 for parent-child relations and increases with lower relatedness. The RC between monozygotic twins and technical replicates of same sample is expected to be around 2.0. Parent-child relations and sibling-sibling relations should be in the top left of the plot. Unless the parents are consanguineous, they should be in the lower-right corner of the plot. Unexpected RC counts indicate possible sample swaps or discordance between the samples’ genetics and the pedigree from the meta data.
The plot “Rate of het. calls on chrX” allows inference of the genetic sex of the sample. This ration is expected to be well below 0.5 for male individuals and well above 0.5 (actually around 1.0-2.0) for female individuals. Unexpected ratios indicate a sample swap and the corresponding points will be indicated in red color.
In the case of unexpected relatedness, samples must be checked for sample swap. Unexpected inferred sex can be caused by incorrect meta data (e.g., for fetuses) and can also help resolve cases of unexpected relationships (e.g., child/parent swaps). Samples with technical quality metrics violating thresholds are candidates for being repeated.
Thresholds
Thresholds of course always depend on overall sequencing depth and technology. Based on our experience with recent technologies (Agilent SureSelect Human All Exome V6 on Illumina NextSeq 500 or HiSeq 4000 machines in 2018/2019) we propose the following thresholds. We recommend to adjust them in your setting depending on technology and previous experience.
Metric |
Good / Green |
Acceptable / Yellow |
Below Standards / Red |
---|---|---|---|
10x coverage |
≥ 98% |
≥ 95% |
< 95% |
20x coverage |
≥ 98% |
≥ 95% |
< 95% |
Duplicates |
≤ 10% |
≤ 20% |
> 20% |
SOP: Database & Literature Research
Aims and Scope
The aim of this section is to highlight the most important databases that are either integrated into VarFish or that VarFish links out to. The list is not comprehensive and we refer the reader to the ACMG guidelines
Result
Steps
SOP: Pathogencity Score Interpretation
Aims and Scope
The aim of this section is to provide guidelines in the interpretation of variant pathogenicity scores. Please refer to the original scoring methods’ publications for authorative information.
Result
For each scored variant, an understanding of how likely a variant has a pathogenic biomedical effect.
Steps
VarFish uses the PHRED-scaled CADD score, the CADD authors recommend a cutoff of 15 (“somewhere betwen 10 and 20, maybe 15”). As a frame of reference: a CADD score of 10 translates into the top 10% of CADD-scored SNVs, 15 to the top 3.1%, 20, to the top 1%, 30 to the top 0.1%.
MutationTaster provides a classification into one of four possible types: disease causing automatic - known to be disease causing, polymorphism automatic - known to be benign, disease causing - predicted to be deleterious, polymorphism - predicted to be benign. Additionally, a probability for the prediction’s correctness by a Bayes classifier is given. The variants annotated with automatic can be generally trusted. The other predictions’ reliability can be gauged by the Bayes classifier probability. The probabilities themselves are difficult to interpret, they are best set into relation to each other.
UMD Predictor can only be used for scoring SNVs. The scores range from 0 to 100 and the authors give the following thresholds in their original publication: “(i) <50 polymorphism; (ii) 50–64 probable polymorphism; (iii) 65–74 probably pathogenic mutation; and (iv) >74 pathogenic mutation.”
Thresholds
The following thresholds/grading of variants can be used for grading pathogenicity scores. Note that pathogenicity scores are extremely useful for sorting/ranking variants in the prioritization step. However, any cutoff and assignment of a pathogenicity will have false positives and false negatives.
score |
benign |
likely benign |
likely pathogenic |
pathogenic |
---|---|---|---|---|
CADD |
<10 |
≥10, <15 |
≥15, <20 |
≥20 |
MutationTaster |
polymorphims (automatic) |
disease causing (automatic) |
||
UMD Predictor |
<50 ≥50, <65 |
≥65, <75 |
≥75 |
SOP: Phenotype Score Interpretation
Aims and Scope
The aim of this section is to provide guidelines in the interpretation of phenotype match scores. Please refer to the original scoring methods’ publications for authorative information.
Result
For each scored set of genes, an understanding of the individual scores.
Steps
Generally, the phenotype scores are computed for each gene and compare the phenotypes given for the affected individual and the phenotypes linked to the gene. Thus, they depend on a good clinical annotation of the case and the curation of the gene-to-phenotype database. VarFish uses the Exomiser software for implementing the Phenix, Phive, and HiPhive scores.
The Phenix score is built from phenotypes of known human disease genes based on a concept called information content. Thus, only already known disease genes will obtain a non-zero score.
An important caveat is that Phenix will normalize the scores with respect to the genes from the filtered variant list. Thus, a change in filter parameters and subsequently in the list of genes in the query will change the score of a given gene.
The Phive score also incorporates mouse phenotypes by linking human and mouse physiology and homologous genes. Thus, it can be used to find new disease genes in human if the gene’s mouse homologue has a proper phenotype annotation.
TODO: also normalized relatively?
The HiPhive score extends the Phive idea with zebrafish and protein-protein interaction networks. It is the most powerful of the Phenix/Phive/HiPhive family in that new disease genes can be identified from mouse, fish, and also by a link via protein interactions. However, it also allows for relatively indirect links that might be more complex to followup and proof the etiology.
TODO: also normalized relatively?
Overall, the phenotype prioritization scores are extremely useful for ranking genes by matches to the clinical phenotype annotation of the individual. However, they cannot be interpreted meaningfully on their own and are only meaningful when compared for the same list of genes.
Variant Filtration SOPs
This chapter contains SOPs directly related to the filtration, prioritization, and interpretation of variants. The first SOPs cover the filtration of variants for singleton and trio exomes in various modes of inheritance. When dealing with different case structures (e.g., siblings or only having one parent present), they can be handled with adjusted trio SOPs. This is followed with SOPs for assessing variants for pathogenicity and suitability as candidate variants.
Contents
SOP: Filtering Singletons for Autosomal Variants
Aims and Scope
The aim of this SOP is the filtration of singleton data for variants on the autosomal chromosomes. Depending on the hypothesis on the mode of inheritance the steps differ slightly. Alternative actions are given for de novo, dominant, homozygous recessive, and compound recessive variants.
Filtration for variants on the X chromosomes is described in SOP: Filtering Singletons for X-chromosomal Variants. The evaluation of variants is described in SOP: Variant Assessment, the use of phenotype and pathogenicity scores is described in SOP: Prioritization with Phenotype and Pathogenicity Scores.
Result
The result is a list of variants in compatible mode of inheritance with appropriate population frequency. These can then be assessed as described in SOP: Variant Assessment. A typical WES data set yields the following variant counts (numbers will vary depending on the enrichment kit):
de novo |
dominant |
hom. rec. |
comp. rec. |
---|---|---|---|
0-80 |
100-500 |
0-30 |
TODO |
Steps
Use the Load Preset button to load filter presets (according to the table below and your mode of inheritance).
Configure the Genotype according to the table below.
setting
de novo
dominant
hom. rec.
comp. rec.
presets
De Novo
Strict
Recessive
Recessive
genotype
0/1
0/1
1/1
c/h index
For compound recessive mode of inheritance, selecting “c/h index” as mode of inheritance for the child enables the comp. het. mode.
Click Filter & Display.
Compare the resulting variant count with the numbers from the table above. Also check that all query result records are displayed1.
Handle unexpected high and low number of variants.
In case of too few variants try relaxing the Quality settings, e.g., by setting DP het. to 8 and min AAB to 0.2.
Try adjusting the Frequency settings (keep in mind incidence rates of the case’s disorder).
The presets Relaxed and Super Strict can be used for non-recessive modes of inheritance to adjust multiple thresholds at once.
Thresholds
SOP: Filtering Singletons for X-chromosomal Variants
Aims and Scope
The aim of this SOP is the filtration of singleton data for variants on the X chromosome. Depending on the hypothesis on the mode of inheritance the steps differ slightly. Alternative actions are given for de novo, dominant, homozygous recessive, and compound recessive variants.
Filtration for variants on the autosomes is described in SOP: Filtering Singletons for Autosomal Variants. The evaluation of variants is described in SOP: Variant Assessment, the use of phenotype and pathogenicity scores is described in SOP: Prioritization with Phenotype and Pathogenicity Scores.
Result
The result is a list of variants in compatible mode of inheritance with appropriate population frequency. These can then be assessed as described in SOP: Variant Assessment. A typical WES data set yields the following variant counts (numbers will vary depending on the enrichment kit):
X de novo |
X dominant |
X hom. rec. |
X comp. rec. |
---|---|---|---|
TODO |
TODO |
TODO |
TODO |
Steps
Note
The following needs work by a geneticists, also in terms of practicability
Use the Load Preset button to load filter presets (according to the table below and your mode of inheritance).
Configure the Genotype according to the table below.
setting
X de novo
X dominant
X hom. rec.
X comp. rec.
presets
De Novo
Strict
Recessive
Recessive
genotype (M)
1/1
1/1
N/A
N/A
genotype (F)
0/1
0/1
1/1
c/h index
The genotype of the index is chosen based on its sex (male M, female F).
For compound recessive mode of inheritance, selecting “c/h index” as mode of inheritance for the daughter.
Enter
chrX
into the field .Click Filter & Display.
Compare the resulting variant count with the numbers from the table above. Also check that all query result records are displayed1.
Handle unexpected high and low number of variants.
In case of too few variants try relaxing the Quality settings, e.g., by setting DP het. to 8 and min AAB to 0.2.
Try adjusting the Frequency settings (keep in mind incidence rates of the case’s disorder).
The presets Relaxed and Super Strict can be used for non-recessive modes of inheritance to adjust multiple thresholds at once.
Thresholds
SOP: Filtering Trios for Autosomal Variants
Aims and Scope
The aim of this SOP is the filtration of trio data for variants on the autosomal chromosomes. Depending on the hypothesis on the mode of inheritance the steps differ slightly. Alternative actions are given for de novo, dominant, homozygous recessive, and compound recessive variants.
Filtration for variants on the X chromosomes is described in SOP: Filtering Trios for X-chromosomal variants. The evaluation of variants is described in SOP: Variant Assessment, the use of phenotype and pathogenicity scores is described in SOP: Prioritization with Phenotype and Pathogenicity Scores.
Result
The result is a list of variants in compatible mode of inheritance with appropriate population frequency. These can then be assessed as described in SOP: Variant Assessment. A typical WES data set yields the following variant counts (numbers will vary depending on the enrichment kit):
de novo |
dominant |
hom. rec. |
comp. rec. |
---|---|---|---|
0-3 |
50-150 |
2-75 |
2-20 |
Steps
Use the Load Preset button to load filter presets (according to the table below and your mode of inheritance).
Configure the Genotype according to the table below.
setting
de novo
dominant
hom. rec.
comp. rec.
presets
Strict
Strict
Recessive
Recessive
genotype
index
0/1
0/1
1/1
c/h index
parents
0/0, 0/0
0/0, 0/1
0/1, 0/1
–
For dominant mode of inheritance, set the genotypes of the affected parent to 0/1 and the unaffected parent to 0/0.
For compound recessive mode of inheritance, selecting “c/h index” as mode of inheritance for the child enables the comp. het. mode and the parents’ genotype does have to be selected.
Click Filter & Display.
Compare the resulting variant count with the numbers from the table above. Also check that all query result records1.
Handle unexpected high and low number of variants.
Too many de novo and too few variants in the other modes of inheritance can be an indicator of issues with the sample relatedness (cf. SOP: Quality Control).
In case of too few variants try relaxing the Quality settings, e.g., by setting DP het. to 8 and min AAB to 0.2. In the case of too few de novo variants, try setting the max AD setting of the parents to 2.
Try adjusting the Frequency settings (keep in mind incidence rates of the case’s disorder).
The presets Relaxed and Super Strict can be used for non-recessive modes of inheritance to adjust multiple thresholds at once.
Thresholds
TODO
SOP: Filtering Trios for X-chromosomal variants
Aims and Scope
The aim of this SOP is the filtration of trio data for variants on the X chromosome. Depending on the hypothesis on the mode of inheritance the steps differ slightly. Alternative actions are given for X-linked de novo, dominant, recessive.
Filtration for variants on the autosomes is described in SOP: Filtering Trios for Autosomal Variants. The evaluation of variants is described in SOP: Variant Assessment, the use of phenotype and pathogenicity scores is described in SOP: Prioritization with Phenotype and Pathogenicity Scores.
Result
The result is a list of variants in compatible mode of inheritance with appropriate population frequency. These can then be assessed as described in SOP: Variant Assessment. A typical WES data set yields the following variant counts (numbers will vary depending on the enrichment kit):
X de novo |
X dominant |
X hom. rec. |
X comp. rec. |
---|---|---|---|
TODO |
TODO |
TODO |
TODO |
Steps
Note
The following needs work by a geneticists, also in terms of practicability
Use the Load Preset button to load filter presets (according to the table below and your mode of inheritance).
Configure the Genotype according to the table below.
setting
X de novo
X dominant
X hom. rec.
X comp. rec.
presets
Strict
Strict
Recessive
Recessive
genotype
index (M)
1/1
1/1
N/A
c/h index
index (F)
0/1
0/1
1/1
c/h index
mother
0/0
0/1 or 0/0
0/1
–
father
0/0
1/1 or 0/0
1/1
–
The genotype of the index is chosen based on its sex (male M, female F).
For dominant mode of inheritance, set the genotypes of the affected parent to variant (0/1 or 1/1 according to the table) and of the unaffected to 0/0.
For compound recessive mode of inheritance, selecting “c/h index” as mode of inheritance for the child enables the comp. het. mode and the parents’ genotype does have to be selected.
Enter
chrX
into the field .Click Filter & Display.
Compare the resulting variant count with the numbers from the table above. Also check that all query result records are displayed (check the First N of M records label on above the results table, potentially adjust the Result row limit setting you can find in the tab).
Handle unexpected high and low number of variants.
Too many de novo and too few variants in the other modes of inheritance can be an indicator of issues with the sample relatedness (cf. SOP: Quality Control).
In case of too few variants try relaxing the Quality settings, e.g., by setting DP het. to 8 and min AAB to 0.2. In the case of too few de novo variants, try setting the max AD setting of the parents to 2.
Try adjusting the Frequency settings (keep in mind incidence rates of the case’s disorder).
The presets Relaxed and Super Strict can be used for non-recessive modes of inheritance to adjust multiple thresholds at once.
Thresholds
SOP: Prioritization with Phenotype and Pathogenicity Scores
Aims and Scope
The aim of this SOP is to use scores for prioritizing a list of candidate variants. Phenotype scores can be used for ranking variants by their affected gene’s match to the patient’s phenotypes. Pathogenicity scores can be used for estimating the impact of a variant.
The filtration of variants is described in the SOPs above. For guidelines on interpreting the scores see SOP: Phenotype Score Interpretation and SOP: Pathogencity Score Interpretation.
Result
The result is a list of variants annotated with phenotype and/or pathogenicity scores that can be used for sorting and ranking variants. Further, by putting thresholds on the largest rank to consider or thresholds on the scores, the list of variants to be assessed can be shortened.
Steps
Open the
tab.For using phenotype-based prioritization
tick the Enable phenotype-based prioritization box,
select an appropriate prioritization Algorithms, and
enter (or paste) the HPO terms into the HPO Terms field.
For using variant pathogenicity prioritization
tick the Enable variant pathogenicity-based prioritization box, and
select the scoring method2 to use.
Click Filter & Display to trigger the filtration.
Also check that all query result records are displayed1. The limit is applied to the variants sent for prioritization. You will not see the N top-ranking records but you will see a ranking of an arbitrary selection of N records in the case that the limit of records to display is smaller than the query result size N.
Click on the score and rank heading below the phenotype, pathogenicity, and/or pheno. & patho. columns to sort the table by phenotype, pathogenicity, or a combination of both scores.
Consider the top variants by one of the sorting methods from above, stop based on the rank or score:
Rank: Consider the top N (e.g., =20) variants only.
If you are in a time-limited setting, you should pick the number N in advance of your study to get reproducible results in terms of diagnostic yield.
Score: (Note that the distribution of the different scores varies significantly).
Consider the top-scoring variants until the score drops by a factor of 2 from one variant to the next.
Consider the top-scoring variants until the score drops below a threshold T.
See SOP: Phenotype Score Interpretation and SOP: Pathogencity Score Interpretation for more information in score interpretation.
- 2(1,2)
For using the UMD Predictor score you have to obtain a API token from https://umd-predictor.eu/ and enter it in VarFish in your user profile. You can reach the user profile by clicking on the person icon on the top left, then . Note that UMD Predictor can only score SNVs.
Thresholds
SOP: Variant Assessment
Aims and Scope
This SOP describes how to assess variants with the information integrated into VarFish. Clicking the little “>” on the left of the result table folds out the details of the given variant.
Result
The result is a better understanding of the variant and gene.
Steps
Note
The following needs refinement. Actually, it does not read like a SOP but rather an extended manual.
Consider the Gene information box.
The Name, Gene Family, and NCBI Summary give a first impression about the gene and its molecular functional and implication in diseases. Genes with missing or very short NCBI Summary are often not well-characterized and such genes are hard to link to diseases.
ClinVar for Gene gives the number of pathogenic and likely pathogenic variants in the gene and shows how often the gene has been implicated in disease in ClinVar.
HPO Terms displays all HPO terms associated with a gene and, if present, the annotated modes of inheritance of diseases linked to this gene.
OMIM Phenotypes gives the OMIM diseases linked to the gene.
Gene RIFs displays short “reference into function” notes on PubMed articles that report on the gene.
Constraints shows gene contraint scores from ExAc and gnomAD for this gene.
The remaining fields provide link-outs into NCBI Entrez, ENSEMBL, and OMIM.
The ClinVar for Variant table shows ClinVar annotations for the given variant, if any.
The Frequency Details table provides detailed information about the frequency of the variant in different populations given in the different population databases.
The Transcript Information table shows the impact of the variant on all transcripts of the gene.
The Genotype and Call Infos provides detailed information about the variant call.
The UCSC 100 Vertebrate Conservation box shows the alignment of the corresponding amino acid in the UCSC 100 vertebrate alignment (the evoluationary distance to human decreases from left to right), if available. This information can be used for getting a feeling on how conserved the location is in the gene.
SOP: Using Variant Link-Outs
Aims and Scope
This SOP describes how to use the most relevant link-out features of VarFish for estimating the pathogenicity and relevance of a given variant for a case’s disorder. Note that this is an non-comprehensive list of pragmatic points that fit on two pages of paper. The ACMG and ACGS guidlines.
Result
The result is a better understanding of the variant’s pathogenic potential.
Steps
Use the IGV button on the right of the variant result table row. If IGV is running and configured properly then IGV will jump to the given position such that you can inspect the variant in the raw data.
Use the MT button on the right of the variant result table row. This will run MutationTaster (MT) on your variant. The result page displays the analysis summary for each affected transcript and then details for each affected transcript.
The prediction disease causing (automatic) and polymorphism (automatic) ist most important, followed by the probability given by the MT classifier.
The splice sites analysis gives interesting information about whether splicing is predicted to be affected.
The conservation provides information about conservation.
The following link-outs are shown when clicking on the little downward arrow next to IGV.
Use Locus @UCSC to consider the locus in UCSC genome browser.
Use Human Splicing Finder (HSF) for estimating the effect of a variant on the splicing of a gene’s transcripts. The link-out will open a new tab showing the results of the HSF (which will also give predictions for deepl intronic variants).
Use Query varSEAK Splicing for also estimating the effect of a variant on splicing of a gene’s transcripts. varSEAK does not show results of deep intronic variants.
Use Query PolyPhen 2 for obtaining PolyPhen 2 scores of missense variants.
Use Query UMD Predictor2 for querying the UMD Predictor (note that this only works for SNVs.
Use: Query Varsome for looking up the variant in Varsome
SOP: Using Gene Link-Outs
Aims and Scope
This SOP describes how to use the most relevant link-out features of VarFish for estimating the relevance of a given for a case’s disorder. Note that this is a highly non-comprehensive list that only highlights selected aspects of some databases that fits on two pages of paper.
Result
The result is a better understanding on whether a defect in the gene can be responsible for the case’s disorder.
Steps
Note
The following needs to be done.
REST API Overview
Varfish provides a growing set of REST APIs. You can find an Python library for accessing the API and a command line interface in varfish-cli.
Note
This documentation section is under development.
Using the API
Usage of the REST API is detailed in this section. Basic knowledge of HTTP APIs is assumed.
Authentication
The API supports authentication through Knox authentication tokens as well as logging in using your SODAR username and password. Tokens are the recommended method for security purposes.
For token access, first retrieve your token using the API Tokens site app on the VarFish web UI. Note that you can you only see the token once when creating it.
Add the token in the Authorization
header of your HTTP request as follows:
Authorization: token 90c2483172515bc8f6d52fd608e5031db3fcdc06d5a83b24bec1688f39b72bcd
Versioning
The VarFish REST API uses accept header versioning. While specifying the desired API version in your HTTP requests is optional, it is strongly recommended. This ensures you will get the appropriate return data and avoid running into unexpected incompatibility issues.
To enable versioning, add the Accept
header to your request with the following media type and version syntax.
Replace the version number with your expected version.
Specific sections of the SODAR API may require their own accept header. See the exact header requirement in the respective documentation on each section of the API.
Model Access and Permissions
Objects in SODAR API views are accessed through their sodar_uuid
field.
In the REST API documentation, “UUID” refers to the sodar_uuid
field of each model unless otherwise noted.
For permissions the API uses the same rules which are in effect in the SODAR GUI. That means you need to have appropriate project access for each operation.
Return Data
The return data for each request will be a JSON document unless otherwise specified.
If return data is not specified in the documentation of an API view, it will return the appropriate HTTP status code along with an optional detail
JSON field upon a successfully processed request.
Project Management API
The REST API for project access and management operations is described in this section.
API Views
The project management API is provided by the SODAR Core package. The documentation for the REST API views can be found in the SODAR Core Documentation.
Versioning
For accept header versioning, the following media type and version are expected in the current VarFish version:
Accept: application/vnd.bihealth.sodar-core+json; version=0.10.7
Case Import API
The REST API for case import functionality is documented in this section.
API Views
Note
This is currently not implemented.
Versioning
For accept header versioning, the following media type and version are expected in the current VarFish version:
Accept: application/vnd.bihealth.varfish+json; version=0.23.9
Case & Query API
The REST API for case access and is described in this section. Cases are not managed directly but through the Case Import API.
Versioning
For accept header versioning, the following media type and version are expected in the current VarFish version:
Accept: application/vnd.bihealth.varfish+json; version=0.23.9
Return Data
The return data for each request will be a JSON document unless otherwise specified.
If return data is not specified in the documentation of an API view, it will
return the appropriate HTTP status code along with an optional detail
JSON
field upon a successfully processed request.
For creation views, the sodar_uuid
of the created object is returned
along with other object fields.
Query Settings
The query follows a JSON Schema.
API Views
- class variants.views_api.CaseListApiView(**kwargs)[source]
List all cases in the current project.
URL:
/variants/api/case/{project.sodar_uid}/
Methods:
GET
Returns: List of project details (see
CaseRetrieveApiView
)
- class variants.views_api.CaseRetrieveApiView(**kwargs)[source]
Retrieve detail of the specified case.
URL:
/variants/api/case/{project.sodar_uuid}/{case.sodar_uuid}/
Methods:
GET
Returns:
date_created
- creation timestamp (ISO 8601str
)date_modified
- modification timestamp (ISO 8601str
)index
- index sample name (str
)name
- case name (str
)notes
- any notes related to case (str
ornull
)num_small_vars
- number of small variants (int
ornull
)num_svs
- number of structural variants (int
ornull
)pedigree
-list
ofdict
representing pedigree entries,dict
have keyssex
- PLINK-PED encoded biological sample sex (int
, 0-unknown, 1-male, 2-female)father
- father sample name (str
)mother
- mother sample name (str
)name
- current sample’s name (str
)affected
- PLINK-PED encoded affected state (int
, 0-unknown, 1-unaffected, 2-affected)has_gt_entries
- whether sample has genotype entries (boolean
)
project
- UUID of owning project (str
)release
- genome build (str
, one of["GRCh37", "GRCh37"]
)sodar_uuid
- case UUID (str
)status
- status of case (str
, one of"initial"
,"active"
,"closed-unsolved"
,"closed-uncertain"
,"closed-solved"
)tags
-list
ofstr
tags
- class variants.views_api.SmallVariantQueryListApiView(**kwargs)[source]
List small variant queries for the given Case.
URL:
/variants/api/query-case/list/{case.sodar_uuid}
Methods:
GET
Parameters:
page
- specify page to return (default/first is1
)page_size
– number of elements per page (default is10
, maximum is100
)
Returns:
count
- number of total elements (int
)next
- URL to next page (str
ornull
)previous
- URL to next page (str
ornull
)results
-list
of case small variant query details (seeSmallVariantQuery
)
- class variants.views_api.SmallVariantQueryCreateApiView(**kwargs)[source]
Create new small variant query for the given case.
URL:
/variants/api/query-case/create/{case.sodar_uuid}
Methods:
POST
Parameters:
form_id
: query settings form (str
, use"variants.small_variant_filter_form"
)form_version
: query settings version (int
, only valid:1
)query_settings
: the query settings (dict
, cf. Case Query Schema V1)name
: optional string (str
, defaults toNone
)public
: whether or not this query (settings) are public (bool
, defaults toFalse
)
Returns:
JSON serialization of case small variant query details (see
SmallVariantQuery
)
- class variants.views_api.SmallVariantQueryRetrieveApiView(**kwargs)[source]
Retrieve small variant query details for the qiven query.
URL:
/variants/api/query-case/retrieve/{query.sodar_uuid}
Methods:
GET
Parameters:
None
Returns:
JSON serialization of case small variant query details (see
SmallVariantQuery
)
- class variants.views_api.SmallVariantQueryStatusApiView(**kwargs)[source]
Returns the status of the small variant query.
URL:
/variants/api/query-case/status/{query.sodar_uuid}
Methods:
GET
Parameters:
None
Returns:
dict
with one keystatus
(str
)
- class variants.views_api.SmallVariantQueryUpdateApiView(**kwargs)[source]
Update small variant query for the qiven query.
URL:
/variants/api/query-case/update/{query.sodar_uuid}
Methods:
PUT
,PATCH
Parameters:
name
: new name attribute of the querypublic
: whether or not to make this query public
Returns:
JSON serialization of updated case small variant query details (see
SmallVariantQuery
)
- class variants.views_api.SmallVariantQueryFetchResultsApiView(*args, **kwargs)[source]
Fetch results for small variant query.
Will return a HTTP 400 if the results are not ready yet.
URL:
/variants/api/query-case/results/{query.sodar_uuid}
Methods:
GET
page
- specify page to return (default/first is1
)page_size
– number of elements per page (default is10
, maximum is100
)
Returns:
count
- number of total elements (int
)next
- URL to next page (str
ornull
)previous
- URL to next page (str
ornull
)results
-list
of results (dict
)
- class variants.views_api.SmallVariantQuerySettingsShortcutApiView(**kwargs)[source]
Generate query settings for a given case by certain shortcuts.
URL:
/variants/api/query-case/settings-shortcut/{case.uuid}
Methods:
GET
Parameters:
database
- the database to query, one of"refseq"
(default) and"ensembl"
quick_preset
- overall preset selection using the presets below, valid values aredefaults
- applies presets that are recommended for starting out without a specific hypothesisde_novo
- applies presets that are recommended for starting out when the hypothesis is dominannt inheritance with de novo variantsdominant
- applies presets that are recommended for starting out when the hypothesis is dominant inheritance (but not with de novo variants)homozygous_recessive
- applies presets that are recommended for starting out when the hypothesis is recessive with homzygous variantscompound_heterozygous
- applies presets that are recommended for starting out when the hypothesis is recessive with compound heterozygous variantsrecessive
- applies presets that are recommended for starting out when the hypothesis is recessive mode of inheritancex_recessive
- applies presets that are recommended for starting out when the hypothesis is X recessive mode of inheritanceclinvar_pathogenic
- apply presets that are recommended for screening variants for known pathogenic variants present Clinvarmitochondrial
- apply presets recommended for starting out to filter for mitochondrial mode of inheritancewhole_exomes
- apply presets that return all variants of the case, regardless of frequency, quality etc.
inheritance
- preset selection for mode of inheritance, valid values areany
- no particular constraint on inheritance (default)dominant
- allow variants compatible with dominant mode of inheritance (includes de novo variants)homozygous_recessive
- allow variants compatible with homozygous recessive mode of inheritancecompound_heterozygous
- allow variants compatible with compound heterozygous recessive mode of inheritancerecessive
- allow variants compatible with recessive mode of inheritance of a disease/trait (includes both homozygous and compound heterozygous recessive)x_recessive
- allow variants compatible with X_recessive mode of inheritance of a disease/traitmitochondrial
- mitochondrial inheritance (also applicable for “clinvar pathogenic”)custom
- indicates custom settings such that none of the above inheritance settings applies
frequency
- preset selection for frequencies, valid values aredominant_super_strict
- apply thresholds considered “very strict” in a dominant disease contextdominant_strict
- apply thresholds considered “strict” in a dominant disease context (default)dominant_relaxed
- apply thresholds considered “relaxed” in a dominant disease contextrecessive_strict
- apply thresholds considered “strict” in a recessiv disease contextrecessive_relaxed
- apply thresholds considered “relaxed” in a recessiv disease contextcustom
- indicates custom settings such that none of the above frequency settings applies
impact
- preset selection for molecular impact values, valid values arenull_variant
- allow variants that are predicted to be null variantsaa_change_splicing
- allow variants that are predicted to change the amino acid of the gene’s protein and also splicing variantsall_coding_deep_intronic
- allow all coding variants and also deeply intronic oneswhole_transcript
- allow variants from the whole transcript (exonic/intronic)any_impact
- allow any predicted molecular impactcustom
- indicates custom settings such that none of the above impact settings applies
quality
- preset selection for variant call quality values, valid values aresuper_strict
- very stricdt quality settingsstrict
- strict quality settings, used as the defaultrelaxed
- relaxed quality settingsany
- ignore quality, all variants pass filtercustom
- indicates custom settings such that none of the above quality settings applies
chromosomes
- preset selection for selecting chromosomes/regions/genes allow/block lists, valid values arewhole_genome
- the defaults settings selecting the whole genomeautosomes
- select the variants lying on the autosomes onlyx_chromosome
- select variants on the X chromosome onlyy_chromosome
- select variants on the Y chromosome onlymt_chromosome
- select variants on the mitochondrial chromosome onlycustom
- indicates custom settings such that none of the above chromosomes presets applies
flags_etc
- preset selection for “flags etc.” section, valid values aredefaults
- the defaults also used in the user interfaceclinvar_only
- select variants present in Clinvar onlyuser_flagged
- select user_flagged variants onlycustom
- indicates custom settings such that none of the above flags etc. presets apply
Returns:
presets
- adict
with the following keys; this mirrors back the quick presets and further presets selected in the parametersquick_presets
- one of thequick_presets
preset values from aboveinheritance
- one of theinheritance
preset values from abovefrequency
- one of thefrequency
preset values from aboveimpact
- one of theimpact
preset values from abovequality
- one of thequality
preset values from abovechromosomes
- one of thechromosomes
preset values from aboveflags_etc
- one of theflags_etc
preset values from above
query_settings
- adict
with the query settings ready to be used for the given case; this will follow Case Query Schema V1.
JSON Schema
This section contains the JSON schemas used in the VarFish Server API.
Case Query Schema V1
varfish-server case query settings
https://raw.githubusercontent.com/bihealth/varfish-server/main/variants/schemas/case-query-v1.json |
|||||
Single case query settings for varfish-server |
|||||
type |
object |
||||
properties |
|||||
|
The transcript database to use |
||||
You can select between either using refseq or ensembl transcripts, defaults to refseq |
|||||
type |
string |
||||
examples |
refseq |
||||
ensembl |
|||||
default |
refseq |
||||
|
The effects schema |
||||
An explanation about the purpose of this instance. |
|||||
type |
array |
||||
examples |
missense_variant |
||||
stop_gained |
|||||
stop_lost |
|||||
default |
|||||
items |
type |
string |
|||
enum |
3_prime_UTR_exon_variant, 3_prime_UTR_intron_variant, 5_prime_UTR_exon_variant, 5_prime_UTR_intron_variant, coding_transcript_intron_variant, complex_substitution, direct_tandem_duplication, disruptive_inframe_deletion, disruptive_inframe_insertion, downstream_gene_variant, exon_loss_variant, feature_truncation, frameshift_elongation, frameshift_truncation, frameshift_variant, inframe_deletion, inframe_insertion, intergenic_variant, internal_feature_elongation, missense_variant, mnv, non_coding_transcript_exon_variant, non_coding_transcript_intron_variant, splice_acceptor_variant, splice_donor_variant, splice_region_variant, start_lost, stop_gained, stop_lost, stop_retained_variant, structural_variant, synonymous_variant, transcript_ablation, upstream_gene_variant |
||||
additionalItems |
False |
||||
uniqueItems |
True |
||||
|
Whether to enable ExAC frequency filter |
||||
Set to |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
False |
||||
|
anyOf |
type |
null |
||
Maximal allele frequency in ExAC |
|||||
When |
|||||
type |
number |
||||
examples |
0.05 |
||||
maximum |
0.05 |
||||
minimum |
0 |
||||
|
anyOf |
type |
null |
||
Maximal heterozygous state count in ExAC |
|||||
When |
|||||
type |
integer |
||||
examples |
1 |
||||
10 |
|||||
default |
0 |
||||
|
anyOf |
type |
null |
||
Maximal homozygous state count in ExAC |
|||||
When |
|||||
type |
integer |
||||
examples |
1 |
||||
10 |
|||||
default |
0 |
||||
|
anyOf |
type |
null |
||
Maximal hemizygous state count in ExAC |
|||||
When |
|||||
type |
integer |
||||
examples |
1 |
||||
10 |
|||||
minimum |
0 |
||||
|
Whether to enable gnomAD exomes frequency filter |
||||
Set to |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
False |
||||
|
anyOf |
type |
null |
||
Maximal allele frequency in gnomAD exomes |
|||||
When |
|||||
type |
number |
||||
examples |
0.05 |
||||
maximum |
0.05 |
||||
minimum |
0 |
||||
|
anyOf |
type |
null |
||
Maximal heterozygous state count in gnomAD exomes |
|||||
When |
|||||
type |
integer |
||||
examples |
1 |
||||
10 |
|||||
default |
0 |
||||
|
anyOf |
type |
null |
||
Maximal homozygous state count in gnomAD exomes |
|||||
When |
|||||
type |
integer |
||||
examples |
1 |
||||
10 |
|||||
default |
0 |
||||
|
anyOf |
type |
null |
||
Maximal hemizygous state count in gnomAD exomes |
|||||
When |
|||||
type |
integer |
||||
examples |
1 |
||||
10 |
|||||
minimum |
0 |
||||
|
Whether to enable gnomAD genomes frequency filter |
||||
Set to |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
False |
||||
|
anyOf |
type |
null |
||
Maximal allele frequency in gnomAD genomes |
|||||
When |
|||||
type |
number |
||||
examples |
0.05 |
||||
maximum |
0.05 |
||||
minimum |
0 |
||||
|
anyOf |
type |
null |
||
Maximal heterozygous state count in gnomAD genomes |
|||||
When |
|||||
type |
integer |
||||
examples |
1 |
||||
10 |
|||||
default |
0 |
||||
|
anyOf |
type |
null |
||
Maximal homozygous state count in gnomAD genomes |
|||||
When |
|||||
type |
integer |
||||
examples |
1 |
||||
10 |
|||||
default |
0 |
||||
|
anyOf |
type |
null |
||
Maximal hemizygous state count in gnomAD genomes |
|||||
When |
|||||
type |
integer |
||||
examples |
1 |
||||
10 |
|||||
minimum |
0 |
||||
|
Whether to enable thousand genomes frequency filter |
||||
Set to |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
False |
||||
|
anyOf |
type |
null |
||
Maximal allele frequency in thousand genomes |
|||||
When |
|||||
type |
number |
||||
examples |
0.05 |
||||
maximum |
0.05 |
||||
minimum |
0 |
||||
|
anyOf |
type |
null |
||
Maximal heterozygous state count in thousand genomes |
|||||
When |
|||||
type |
integer |
||||
examples |
1 |
||||
10 |
|||||
default |
0 |
||||
|
anyOf |
type |
null |
||
Maximal homozygous state count in thousand genomes |
|||||
When |
|||||
type |
integer |
||||
examples |
1 |
||||
10 |
|||||
default |
0 |
||||
|
anyOf |
type |
null |
||
Maximal hemizygous state count in thousand genomes |
|||||
When |
|||||
type |
integer |
||||
examples |
1 |
||||
10 |
|||||
minimum |
0 |
||||
|
Whether to enable thousand genomes frequency filter |
||||
Set to |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
False |
||||
|
anyOf |
type |
null |
||
Maximal carrier count in in-house database |
|||||
When |
|||||
type |
integer |
||||
examples |
20 |
||||
minimum |
0 |
||||
|
anyOf |
type |
null |
||
Maximal heterozygous state count in thousand genomes |
|||||
When |
|||||
type |
integer |
||||
examples |
10 |
||||
minimum |
0 |
||||
|
anyOf |
type |
null |
||
Maximal homozygous state count in thousand genomes |
|||||
When |
|||||
type |
integer |
||||
examples |
10 |
||||
minimum |
0 |
||||
|
anyOf |
type |
null |
||
Maximal hemizygous state count in thousand genomes |
|||||
When |
|||||
type |
integer |
||||
examples |
10 |
||||
minimum |
0 |
||||
|
Whether to enable mtdb frequency filter |
||||
Set to |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
False |
||||
|
anyOf |
type |
null |
||
Maximal number/absolute frequency of carriers in mtdb |
|||||
When |
|||||
type |
integer |
||||
examples |
1 |
||||
10 |
|||||
minimum |
0 |
||||
|
anyOf |
type |
null |
||
Maximal relative frequencey of carriers in mtdb |
|||||
When |
|||||
type |
number |
||||
examples |
0.05 |
||||
maximum |
0.05 |
||||
minimum |
0 |
||||
|
Whether to enable helixmtdb frequency filter |
||||
Set to |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
False |
||||
|
anyOf |
type |
null |
||
Maximal carrier frequency in helixmtdb |
|||||
When |
|||||
type |
number |
||||
examples |
0.001 |
||||
0.05 |
|||||
maximum |
1 |
||||
minimum |
0 |
||||
|
anyOf |
type |
null |
||
Maximal heteroplasmy frequency in helixmtdb |
|||||
When |
|||||
type |
integer |
||||
examples |
1 |
||||
10 |
|||||
minimum |
0 |
||||
|
anyOf |
type |
null |
||
Maximal homoplasmy frequency in helixmtdb |
|||||
When |
|||||
type |
integer |
||||
examples |
1 |
||||
10 |
|||||
minimum |
0 |
||||
|
Whether to enable the mitomap carrier filter |
||||
Set to |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
False |
||||
|
anyOf |
type |
null |
||
Maximal number of carriers in mtDB |
|||||
When |
|||||
type |
integer |
||||
examples |
10 |
||||
minimum |
0 |
||||
|
anyOf |
type |
null |
||
The mitomap_frequency schema |
|||||
When |
|||||
type |
number |
||||
examples |
0.001 |
||||
0.05 |
|||||
maximum |
1 |
||||
minimum |
0 |
||||
|
Include variants on coding transcripts |
||||
When enabled then variants whose most pathogenic effect is on a coding transcripts |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
False |
||||
|
Include variants on non-coding transcripts |
||||
When enabled then variants whose most pathogenic effect is on a non-coding transcripts |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include SNV variants |
||||
When set to |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include indel variants |
||||
When set to |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include MVN variants |
||||
When set to |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
anyOf |
type |
null |
||
The largest distance to exons |
|||||
When set then only variants with at most |
|||||
type |
integer |
||||
examples |
1 |
||||
10 |
|||||
minimum |
0 |
||||
|
Include variants marked with no flag |
||||
When set (default) then variants that have no simple flag set are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “bookmarked” flag |
||||
When set (default) then variants that have the “bookmarked” simple flag set are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “candidate” flag |
||||
When set (default) then variants that have the “candidate” simple flag set are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “does not segregate” flag |
||||
When set (default) then variants that have the “does not segregate” simple flag set are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “final causative” flag |
||||
When set (default) then variants that have the “final causative” simple flag set are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “for validation” flag |
||||
When set (default) then variants that have the “for validation” simple flag set are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “no disease association” flag |
||||
When set (default) then variants that have the “no disease association” simple flag set are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “segregates” flag |
||||
When set (default) then variants that have the “segregates” simple flag set are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants that have the “molecular” flag unset |
||||
When set (default) then variants that have the “molecular” flag unset are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “molecular” flag set to “negative” |
||||
When set (default) then variants that have the “molecular” flag set to “negative” are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “molecular” flag set to “positive” |
||||
When set (default) then variants that have the “molecular” flag set to “positive” are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “molecular” flag set to “uncertain” |
||||
When set (default) then variants that have the “molecular” flag set to “uncertain” are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants that have the “phenotype match” flag unset |
||||
When set (default) then variants that have the “phenotype match” flag unset are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “phenotype match” flag set to “negative” |
||||
When set (default) then variants that have the “phenotype match” flag set to “negative” are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “phenotype match” flag set to “positive” |
||||
When set (default) then variants that have the “phenotype match” flag set to “positive” are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “phenotype match” flag set to “uncertain” |
||||
When set (default) then variants that have the “phenotype match” flag set to “uncertain” are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants that have the “summary” flag unset |
||||
When set (default) then variants that have the “summary” flag unset are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “summary” flag set to “negative” |
||||
When set (default) then variants that have the “summary” flag set to “negative” are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “summary” flag set to “positive” |
||||
When set (default) then variants that have the “summary” flag set to “positive” are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “summary” flag set to “uncertain” |
||||
When set (default) then variants that have the “summary” flag set to “uncertain” are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants that have the “validation” flag unset |
||||
When set (default) then variants that have the “validation” flag unset are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “validation” flag set to “negative” |
||||
When set (default) then variants that have the “validation” flag set to “negative” are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “validation” flag set to “positive” |
||||
When set (default) then variants that have the “validation” flag set to “positive” are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “validation” flag set to “uncertain” |
||||
When set (default) then variants that have the “validation” flag set to “uncertain” are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants that have the “visual” flag unset |
||||
When set (default) then variants that have the “visual” flag unset are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “visual” flag set to “negative” |
||||
When set (default) then variants that have the “visual” flag set to “negative” are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “visual” flag set to “positive” |
||||
When set (default) then variants that have the “visual” flag set to “positive” are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Include variants marked with “visual” flag set to “uncertain” |
||||
When set (default) then variants that have the “visual” flag set to “uncertain” are included in the result |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
List of genes to restrict the resulting variants to |
||||
List of gene symbols, entrez gene identifiers, or ENSEMBL gene identifiers to limit variants for (for a variant affecting multiple genes, the combinations of the variants and genes will be reported independently), leave empty to apply no such filter |
|||||
type |
array |
||||
examples |
TTN |
||||
default |
|||||
items |
type |
string |
|||
pattern |
^([a-zA-Z0-9_-]+)$ |
||||
additionalItems |
True |
||||
|
List of genes to exclude from the result |
||||
List of gene symbols, entrez gene identifiers, or ENSEMBL gene identifiers to exclude variants for (for a variant affecting multiple genes, the combinations of the variants and genes will be reported independently), leave empty to apply no such filter |
|||||
type |
array |
||||
examples |
TTN |
||||
default |
|||||
items |
type |
string |
|||
pattern |
^([a-zA-Z0-9_-]+)$ |
||||
additionalItems |
True |
||||
|
Remove variant if it exists in local copy dbSNP |
||||
Set to true to exclude variants that are present in dbSNP from the result set |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
False |
||||
|
Restrict variants to those in local copy of Clinvar |
||||
Set to true to restrict variants to those present in local copy of Clinvar |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
False |
||||
|
Weaken weight of ‘criteria provided’ in variant assessment |
||||
When set, then variant assessments with and without assertion are interpreted as equally important. By default, they are not those with assessment override the others. |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
False |
||||
|
Whether to include variants marked as benign in local Clinvar copy if ``require_in_clinvar`` |
||||
Set to true (default) to make variants pass the filter that are marked as benign in the local Clinvar copy, set to false to make them not pass the filter |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Whether to include variants marked as pathogenic in local Clinvar copy if ``require_in_clinvar`` |
||||
Set to true (default) to make variants pass the filter that are marked as pathogenic in the local Clinvar copy, set to false to make them not pass the filter |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Whether to include variants marked as likely benign in local Clinvar copy if ``require_in_clinvar`` |
||||
Set to true (default) to make variants pass the filter that are marked as likely benign in the local Clinvar copy, set to false to make them not pass the filter |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Whether to include variants marked as likely pathogenic in local Clinvar copy if ``require_in_clinvar`` |
||||
Set to true (default) to make variants pass the filter that are marked as likely pathogenic in the local Clinvar copy, set to false to make them not pass the filter |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
Whether to include variants marked as unknown certificance in local Clinvar copy if ``require_in_clinvar`` |
||||
Set to true (default) to make variants pass the filter that are marked as of unknown significance in the local Clinvar copy, set to false to make them not pass the filter |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
True |
||||
|
List of genomic regions to limit the query to |
||||
When set thenonly variants contained in or overlapping with the given genomic regions pass the filter, leave empty to apply no region filter |
|||||
type |
array |
||||
examples |
chr1:100,000,00-110,00,00 |
||||
chrY |
|||||
X |
|||||
Y |
|||||
default |
|||||
items |
type |
string |
|||
pattern |
^[a-zA-Z0-9]+(:(\d+(,\d+)*)-(\d+(,\d+)*))?$ |
||||
|
Enable pathogenicity annotation |
||||
Set to true to enable annotation with pathogenicity, requires setting a value for |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
False |
||||
|
anyOf |
type |
null |
||
The pathogenicity score to use for annotating variants |
|||||
Select pathogenicity score to use if |
|||||
type |
string |
||||
examples |
cadd |
||||
mutationtaster |
|||||
|
Enable phenotype-based prioritization of variants |
||||
Select |
|||||
type |
boolean |
||||
examples |
True |
||||
False |
|||||
default |
False |
||||
|
anyOf |
type |
null |
||
The phenotype-based prioritization algorithm to use for priorizing variants |
|||||
Select algorithm to use if |
|||||
type |
string |
||||
examples |
phenix |
||||
hiphive |
|||||
hiphive-human |
|||||
hiphive-mouse |
|||||
|
anyOf |
type |
null |
||
The prio_hpo_terms schema |
|||||
An explanation about the purpose of this instance. |
|||||
type |
array |
||||
examples |
|||||
default |
|||||
items |
type |
string |
|||
pattern |
HP:\d+ |
||||
additionalItems |
True |
||||
|
The require_in_hgmd_public schema |
||||
An explanation about the purpose of this instance. |
|||||
type |
boolean |
||||
examples |
False |
||||
default |
False |
||||
|
anyOf |
type |
null |
||
Enable and select the biallelic recessive inheritance filter |
|||||
Use “compound-recessive” to restrict to variants compatible with compound recessive mode of inheritance and “recessive” to restrict to compatibility with either compound and homozygous recessive mode of inheritance. Use |
|||||
type |
string |
||||
enum |
recessive, compound-recessive |
||||
|
anyOf |
type |
null |
||
Select the recessive index |
|||||
Set to the identifier of the recessive index |
|||||
type |
string |
||||
examples |
CHILD-NAME |
||||
|
anyOf |
type |
null |
||
Select the denovo index |
|||||
Set to the identifier of the de novo index |
|||||
type |
string |
||||
examples |
CHILD-NAME |
||||
|
Quality filter threshold |
||||
Set quality thresholds for each individual. The key are the individual names and the values follows the defined schema from below |
|||||
type |
object |
||||
examples |
SAMPLE |
dp_het |
10 |
||
dp_hom |
5 |
||||
ab |
0.3 |
||||
gq |
20 |
||||
ad |
3 |
||||
ad_max |
200 |
||||
fail |
drop-variant |
||||
FATHER |
gq |
40 |
|||
fail |
ignore |
||||
MOTHER |
gq |
40 |
|||
fail |
ignore |
||||
CHILD |
gq |
40 |
|||
fail |
drop-variant |
||||
patternProperties |
|||||
|
type |
object |
|||
properties |
|||||
|
Minimal total depth of coverage for heterozygous variants |
||||
If set then exclude variants with lower total depth of coverage in sample’s genotype call for heterozygous variants |
|||||
type |
integer |
||||
minimum |
0 |
||||
default |
0 |
||||
|
Minimal total depth coverage for homozygous and hemizygous variants |
||||
If set then exclude variants with lower total depth of coverage in sample’s genotype call for homozygous variants |
|||||
type |
integer |
||||
minimum |
0 |
||||
default |
0 |
||||
|
Minimal allelic balance for heterozygous variants |
||||
If set then exclude variants with lower allelic balance in sample’s genotype call |
|||||
type |
number |
||||
maximum |
1 |
||||
minimum |
0 |
||||
default |
0 |
||||
|
Minimal genotype call quality |
||||
If set then exclude variants with lower genotype quality in sample’s genotype call |
|||||
type |
integer |
||||
minimum |
0 |
||||
default |
0 |
||||
|
Minimal number of read in alternative allele |
||||
If set then exclude variants with lower depth of coverage on alternate allele in sample’s genotype call |
|||||
type |
integer |
||||
minimum |
0 |
||||
default |
0 |
||||
|
anyOf |
type |
null |
||
Maximal alternate allele depth of coverage |
|||||
If set then exclude variants with higher depth of coverage on alternate allele in sample’s genotype call |
|||||
type |
integer |
||||
minimum |
0 |
||||
|
Action to perform when genotype filter threshold is not passed |
||||
Actions: ignore: ignore failure, drop-variant: drop whole variant (if ONE genotype in the variant fails filter), no-call: interpret as no-call |
|||||
type |
string |
||||
enum |
ignore, drop-variant, no-call |
||||
default |
ignore |
||||
additionalProperties |
False |
||||
|
Genotype filter settings |
||||
Set genotype filter for each individual, must be given for each individual in query with genotype data |
|||||
type |
object |
||||
examples |
SAMPLE |
hom |
|||
FATHER |
ref |
||||
MOTHER |
ref |
||||
CHILD |
het |
||||
patternProperties |
|||||
|
anyOf |
type |
null |
||
type |
string |
||||
enum |
any, ref, het, hom, non-hom, variant, non-variant, non-reference |
||||
additionalProperties |
False |
Case QC Schema V1
varfish-server case QC info
https://raw.githubusercontent.com/bihealth/varfish-server/main/importer/schemas/case-qc-v1.json |
|||||
Per case quality control information for varfish |
|||||
type |
object |
||||
patternProperties |
|||||
|
type |
object |
|||
properties |
|||||
|
type |
object |
|||
properties |
|||||
|
raw total sequences |
||||
type |
integer |
||||
minimum |
0 |
||||
|
filtered sequences |
||||
type |
integer |
||||
minimum |
0 |
||||
|
sequences |
||||
type |
integer |
||||
minimum |
0 |
||||
|
is sorted |
||||
type |
integer |
||||
minimum |
0 |
||||
|
1st fragments |
||||
type |
integer |
||||
minimum |
0 |
||||
|
last fragments |
||||
type |
integer |
||||
minimum |
0 |
||||
|
reads mapped |
||||
type |
integer |
||||
minimum |
0 |
||||
|
reads mapped and paired |
||||
type |
integer |
||||
minimum |
0 |
||||
|
reads unmapped |
||||
type |
integer |
||||
minimum |
0 |
||||
|
reads properly paired |
||||
type |
integer |
||||
minimum |
0 |
||||
|
reads paired |
||||
type |
integer |
||||
minimum |
0 |
||||
|
reads duplicated |
||||
type |
integer |
||||
minimum |
0 |
||||
|
reads MQ0 |
||||
type |
integer |
||||
minimum |
0 |
||||
|
reads QC failed |
||||
type |
integer |
||||
minimum |
0 |
||||
|
non-primary alignments |
||||
type |
integer |
||||
minimum |
0 |
||||
|
total length |
||||
type |
integer |
||||
minimum |
0 |
||||
|
total first fragment length |
||||
type |
integer |
||||
minimum |
0 |
||||
|
total last fragment length |
||||
type |
integer |
||||
minimum |
0 |
||||
|
bases mapped |
||||
type |
integer |
||||
minimum |
0 |
||||
|
bases mapped (cigar) |
||||
type |
integer |
||||
minimum |
0 |
||||
|
bases trimmed |
||||
type |
integer |
||||
minimum |
0 |
||||
|
bases duplicated |
||||
type |
integer |
||||
minimum |
0 |
||||
|
mismatches |
||||
type |
integer |
||||
minimum |
0 |
||||
|
error rate |
||||
error rate as fractions of 1 |
|||||
type |
number |
||||
maximum |
1 |
||||
minimum |
0 |
||||
|
average length |
||||
type |
number |
||||
minimum |
0 |
||||
|
average first fragment length |
||||
type |
number |
||||
minimum |
0 |
||||
|
average last fragment length |
||||
type |
number |
||||
minimum |
0 |
||||
|
maximum length |
||||
type |
integer |
||||
minimum |
0 |
||||
|
maximum first fragment length |
||||
type |
integer |
||||
minimum |
0 |
||||
|
maximum last fragment length |
||||
type |
integer |
||||
minimum |
0 |
||||
|
average quality |
||||
type |
number |
||||
minimum |
0 |
||||
|
insert size average |
||||
type |
number |
||||
minimum |
0 |
||||
|
insert size standard deviation |
||||
type |
number |
||||
minimum |
0 |
||||
|
inward oriented pairs |
||||
type |
integer |
||||
minimum |
0 |
||||
|
outward oriented pairs |
||||
type |
integer |
||||
minimum |
0 |
||||
|
pairs with other orientation |
||||
type |
integer |
||||
minimum |
0 |
||||
|
pairs on different chromosomes |
||||
type |
integer |
||||
minimum |
0 |
||||
|
percentage of properly paired reads (%) |
||||
type |
number |
||||
maximum |
100 |
||||
minimum |
0 |
||||
|
Minimal coverage percentage, counted per target |
||||
Considering all targets, histogram of distribution regarding “minimal coverage of…”, the smallest coverage on a target makes the whole target count at that value |
|||||
type |
object |
||||
patternProperties |
|||||
|
Minimal coverage value histogram entry |
||||
type |
number |
||||
examples |
100 |
||||
99.9 |
|||||
0 |
|||||
maximum |
100 |
||||
minimum |
0 |
||||
additionalProperties |
False |
||||
|
Minimal coverage percentage, counted per base |
||||
Considering all target bases, histogram of distribution regarding “minimal coverage of…” |
|||||
type |
object |
||||
patternProperties |
|||||
|
Minimal coverage value histogram entry |
||||
type |
number |
||||
examples |
100 |
||||
99.9 |
|||||
0 |
|||||
maximum |
100 |
||||
minimum |
0 |
||||
additionalProperties |
False |
||||
|
Coverage summary |
||||
type |
object |
||||
properties |
|||||
|
Mean on-target coverage |
||||
type |
number |
||||
examples |
0 |
||||
100 |
|||||
minimum |
0 |
||||
|
Total number of targets |
||||
type |
integer |
||||
examples |
0 |
||||
100 |
|||||
minimum |
0 |
||||
|
Total target size in bp |
||||
type |
integer |
||||
examples |
0 |
||||
100 |
|||||
minimum |
0 |
||||
additionalProperties |
False |
||||
|
type |
object |
|||
patternProperties |
|||||
|
Read count for each chromosome |
||||
type |
object |
||||
properties |
|||||
|
Mapped read count |
||||
Number of mapped read on chromosome |
|||||
type |
integer |
||||
examples |
0 |
||||
100 |
|||||
minimum |
0 |
||||
|
Unmapped read count |
||||
Number of unmapped read on chromosome (usually the mate maps) |
|||||
type |
integer |
||||
examples |
0 |
||||
100 |
|||||
minimum |
0 |
||||
additionalProperties |
False |
Clinical Beacon Protocol
This section describes the “Clinical Beacon” protocol version 1 (“Clinical Beacon v1”). It follows the GA4GH Beacon Protocol v1 (“Beacon v1”) in large parts with slight deviations. The end points and payloads are the same as in Beacon v1. However, we add two important features, as explained below.
The client sends the current user in the
X-Beacon-User
header.The client has to sign the
X-Beacon-User
andDate
HTTP headers using the Signing HTTP Messages IETF draft.
You can find a simple Python implementation of a standalone client on Github.
X-Beacon-User
Header
The GA4GH Beacon v1 protocol is meant to be used in a “zero trust” environment and they specify that authentication is done using OAuth2. In an ideal world, VarFish sites having installed VarFish would be able to connect to local OpenID instances. In reality, many sites will be seated in clinical environments where Microsoft ActiveDirectory is used for authentication and Microsoft Federated Services use SAML instead.
Further, VarFish sites connecting to each other will have real-world paper contracts for data exchange agreements and after signing such contracts they can trust each other. In the first version we thus decided not to implement zero trust concepts.
The client thus has to set the X-Beacon-User
header to a string that identifies the querying user uniquely.
It is the decision of the client whether it uses interpretable user names or for the sake of user data security, it can use pseudonyms.
This is left to the discretion of the implementing sites and contract partners.
VarFish currently implements this by sending the clear text user names.
Date Header
This is a standard HTTP header that is mandatory in the Clinical Beacon v1 protocol.
Header Signing
We use the Signing HTTP Messages IETF draft for signing HTTP requests. The signature header will typically look as follows (without wrapping of course):
Signature keyId="org.bihealth.varfish",algorithm="rsa-sha512",headers="date x-beacon-user",\
signature="mxY7+9vizRbO7mUJVyvxXm3VgpYycQWNulrAafMOWJ29WYQYMf2i5PBPP3jYBhIGd/3zZ+x+mlQw8xEw\
M6UWvE3QRqzlzBE0ZHeWKgX4h11N1MhtXTnhXL9CL/VqbcgbBI9trkwB/xxaXhUOpvavA37J1ljrdTbXhghCHZ65hMi\
04fUnKKkFhuwOzZ6N5/amIuizc2JeDe73Pg+D5HA4AnE2bnCmf8AqhKLd434SdchcYAHqYTJaxBA2Pxngerg6oSenli\
rgukzrBdbdRpvnFFtQzZsQ56v9hS8cqF/phtl+isAT/dcwvO9/lCKaf3QE8YKCcQmDnPJiQLdtQ9mZKw==",\
created="1646407724"'
Where
keyId
is the ID of the key pair used for signingalgorithm
is the algorithm that has been used for generating the key pairheaders
is the space-separated list of headers that are signed (must bedate x-beacon-user
)signature
is the Base 64 encoded signature.
This leaves open the question for generating the key. We use standard RSA and ECDSA keys, Varfish supports the following algorithms:
rsa-sha256
rsa-sha512
ecdsa-sha256
ecdsa-sha256
The standalone client on Github provides examples for key generation.
Key exchange is trivial as only the public key needs to be registered by the server but it also must be registered by the server before making any query.
Final Remarks
Thus, the Clinical Beacon Protocol v1 is equal to the GA4GH Beacon Protocol v1 with the exception that:
sites are expected to have a certain level of trust as they share non-public data,
sites send a string with each query to identify the querying user, and
all queries are signed with public/private key pairs and each client first needs to register with each server by sending its public key.
As a final remark, API endpoints should of course be deployed behind HTTPS but that is out of scope here.
Installation
The VarFish installation for developers should be set up differently from the installation for production use.
The reason being is that the installation for production use runs completely in a Docker environment. All containers are assigned to a Docker network that the host by default has no access to, except for the reverse proxy that gives access to the VarFish webinterface.
The developers installation is intended not to carry the full VarFish database such that it is light-weight and fits on a laptop. We advise to install the services not running in a Docker container.
Install Postgres
Follow the instructions for your operating system to install Postgres. Make sure that the version is 12 (11 and 13 would also work). Ubuntu 20 already includes postgresql 12. In case of older Ubuntu versions, this would be:
sudo apt install postgresql-12
Install Redis
Redis is the broker that celery uses to manage the queues. Follow the instructions for your operating system to install Redis. For Ubuntu, this would be:
sudo apt install redis-server
Install miniconda
miniconda helps to set up encapsulated Python environments. This step is optional. You can also use pipenv, but to our experience, resolving the dependencies in pipenv is terribly slow.
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh -b -p ~/miniconda3
$ source ~/miniconda3/bin/activate
$ conda init
$ conda create -n varfish python=3.8 pip
$ conda activate varfish
Clone git repository
Clone the VarFish Server repository and switch into the checkout.
$ git clone https://github.com/bihealth/varfish-server
$ cd varfish-server
Install Python Requirements
Some required packages have dependencies that are usually not preinstalled. Therefore, run
$ sudo apt install libsasl2-dev python-dev libldap2-dev libssl-dev
Now, with the conda/Python environment activated, install all the requirements.
$ for i in requirements/*; do pip install -r $i; done
Setup Database
Use the tool provided in utility/
to set up the database. The name for the
database should be varfish
(create new user: yes, name: varfish, password: varfish).
$ bash utility/setup_database.sh
Setup vue.js
Use the tool provided in utility/
to set up vue.js.
$ sudo bash utility/install_vue_dev.sh
Open an additional terminal and switch into the vue directory. Then install the Clinvar Exprot vue app.
$ cd clinvar_export/vueapp
$ npm install
When finished, keep this terminal open to run the vue app.
$ npm run serve
Setup VarFish
First, create a .env
file with the following content.
export DATABASE_URL="postgres://varfish:varfish@127.0.0.1/varfish"
export CELERY_BROKER_URL=redis://localhost:6379/0
export PROJECTROLES_ADMIN_OWNER=root
export DJANGO_SETTINGS_MODULE=config.settings.local
If you wish to enable structural variants, add the following line.
export VARFISH_ENABLE_SVS=1
To create the tables in the VarFish database, run the migrate
command.
This step can take a few minutes.
$ python manage.py migrate
Once done, create a superuser for your VarFish instance. By default, the VarFish root user is named root
(the
setting can be changed in the .env
file with the PROJECTROLES_ADMIN_OWNER
variable).
$ python manage.py createsuperuser
Last, download the icon sets for VarFish and make scripts, stylesheets and icons available.
$ python manage.py geticons -c bi cil fa-regular fa-solid gridicons octicon
$ python manage.py collectstatic
When done, open two terminals and start the VarFish server and the celery server.
terminal1$ make serve
terminal2$ make celery
Database Import
First, download the pre-build database files that we provide and unpack them. Please make sure that you have enough space available. The packed file consumes 31 Gb. When unpacked, it consumed additional 188 Gb.
$ cd /plenty/space
$ wget https://file-public.bihealth.org/transient/varfish/varfish-server-background-db-20201006.tar.gz{,.sha256}
$ sha256sum -c varfish-server-background-db-20201006.tar.gz.sha256
$ tar xzvf varfish-server-background-db-20201006.tar.gz
We recommend to exclude the large databases: frequency tables, extra annotations and dbSNP. Also, keep in mind that importing the whole database takes >24h, depending on the speed of your HDD.
This is a list of the possible imports, sorted by its size:
Component |
Size |
Exclude |
Function |
---|---|---|---|
gnomAD_genomes |
80G |
highly recommended |
frequency annotation |
extra-annos |
50G |
highly recommended |
diverse |
dbSNP |
32G |
highly recommended |
SNP annotation |
thousand_genomes |
6,5G |
highly recommended |
frequency annotation |
gnomAD_exomes |
6,0G |
highly recommended |
frequency annotation |
knowngeneaa |
4,5G |
highly recommended |
alignment annotation |
clinvar |
3,3G |
highly recommended |
pathogenicity classification |
ExAC |
1,9G |
highly recommended |
frequency annotation |
dbVar |
573M |
recommended |
SNP annotation |
gnomAD_SV |
250M |
recommended |
SV frequency annotation |
ncbi_gene |
151M |
gene annotation |
|
ensembl_regulatory |
77M |
frequency annotation |
|
DGV |
43M |
SV annotation |
|
hpo |
22M |
phenotype information |
|
hgnc |
15M |
gene annotation |
|
gnomAD_constraints |
13M |
frequency annotation |
|
mgi |
10M |
mouse gene annotation |
|
ensembltorefseq |
8,3M |
identifier mapping |
|
hgmd_public |
5,0M |
gene annotation |
|
ExAC_constraints |
4,6M |
frequency annotation |
|
refseqtoensembl |
2,0M |
identifier mapping |
|
ensembltogenesymbol |
1,6M |
identifier mapping |
|
ensembl_genes |
1,2M |
gene annotation |
|
HelixMTdb |
1,2M |
MT frequency annotation |
|
refseqtogenesymbol |
1,1M |
identifier mapping |
|
refseq_genes |
804K |
gene annotation |
|
mim2gene |
764K |
phenotype information |
|
MITOMAP |
660K |
MT frequency annotation |
|
kegg |
632K |
pathway annotation |
|
mtDB |
336K |
MT frequency annotation |
|
tads_hesc |
108K |
domain annotation |
|
tads_imr90 |
108K |
domain annotation |
|
vista |
104K |
orthologous region annotation |
|
acmg |
16K |
disease gene annotation |
You can find the import_versions.tsv
file in the root folder of the
package. This file determines which component (called table_group
and
represented as folder in the package) gets imported when the import command is
issued. To exclude a table, simply comment out (#
) or delete the line.
Excluding tables that are not required for development can reduce time and
space consumption. Also, the GRCh38 tables can be excluded.
A space-consumption-friendly version of the file would look like this:
build table_group version
GRCh37 acmg v2.0
#GRCh37 clinvar 20200929
#GRCh37 dbSNP b151
#GRCh37 dbVar latest
GRCh37 DGV 2016
GRCh37 ensembl_genes r96
GRCh37 ensembl_regulatory latest
GRCh37 ensembltogenesymbol latest
GRCh37 ensembltorefseq latest
GRCh37 ExAC_constraints r0.3.1
#GRCh37 ExAC r1
#GRCh37 extra-annos 20200704
GRCh37 gnomAD_constraints v2.1.1
#GRCh37 gnomAD_exomes r2.1
#GRCh37 gnomAD_genomes r2.1
#GRCh37 gnomAD_SV v2
GRCh37 HelixMTdb 20190926
GRCh37 hgmd_public ensembl_r75
GRCh37 hgnc latest
GRCh37 hpo latest
GRCh37 kegg april2011
#GRCh37 knowngeneaa latest
GRCh37 mgi latest
GRCh37 mim2gene latest
GRCh37 MITOMAP 20200116
GRCh37 mtDB latest
GRCh37 ncbi_gene latest
GRCh37 refseq_genes r105
GRCh37 refseqtoensembl latest
GRCh37 refseqtogenesymbol latest
GRCh37 tads_hesc dixon2012
GRCh37 tads_imr90 dixon2012
#GRCh37 thousand_genomes phase3
GRCh37 vista latest
#GRCh38 clinvar 20200929
#GRCh38 dbVar latest
#GRCh38 DGV 2016
To perform the import, issue:
$ python manage.py import_tables --tables-path /plenty/space/varfish-server-background-db-20201006
Performing the import twice will automatically skip tables that are already
imported. To re-import tables, add the --force
parameter to the command:
$ python manage.py import_tables --tables-path varfish-db-downloader --force
Development
Working With Sodar Core
VarFish is based on the Sodar Core framework which has a developer manual itself. It is worth reading its development instructions. The following lists the most important topics:
Running Tests
Running the VarFish test suite is easy, but can take a long time to finish (>10 minutes).
$ make test
You can exclude time-consuming UI tests:
$ make test-noselenium
If you are working on one only a few tests, it is better to run them directly. To specify them, follow the path to the test file, add the class name and the test function, all separated by a dot:
$ python manage.py test -v2 --settings=config.settings.test variants.tests.test_ui.TestVariantsCaseFilterView.test_variant_filter_case_multi_bookmark_one_variant
This would run the UI tests in the variants app for the case filter view.
Working With Git
In this section we will briefly describe the workflow how to contribute to VarFish. This is not a git tutorial and we expect basic knowledge. We recommend gitready for any questions regarding git. We do use git rebase a lot.
In general, we recommend to work with git gui
and gitk
.
The first thing for you to do is to create a fork of our github repository in your github space.
To do so, go to the VarFish repository and click on the Fork
button in the top right.
Update Main
$ git pull --rebase
Create Working Branch
Always create your working branch from the latest main branch.
Use the ticket number and description as name, following the format <ticket_number>-<ticket_title>
, e.g.
$ git checkout -b 123-adding-useful-feature
Write A Sensible Commit Message
A commit message should only have 72 characters per line. As the first line is the representative, it should sum up everything the commit does. Leave a blank line and add three lines of github directives to reference the issue.
Fixed serious bug that prevented user from doing x.
Closes: #123
Related-Issue: #123
Projected-Results-Impact: none
Cleanup Before Pull Request
We suggest to first squash your commits and then do a rebase to the main branch.
Squash Multiple Commits (Or Use Amend)
We prefer to have only one commit per feature (most of the time there is only one feature per branch). When your branch is rebased on the main branch, do:
$ git rebase -i main
Alternatively, you can always use git commit --amend
to modify your last commit.
This allows you also to change your latest commit message.
Rebase To Main
Make sure your main is up-to-date. In you branch, do:
$ git checkout 123-adding-useful-feature
$ git rebase main
In case of conflicts, resolve them (find <<<<
in conflicting files) and do:
$ git add conflicting.file
$ git rebase --continue
If unsure, abort the rebase:
$ git rebase --abort
Push To Origin
$ git push origin 123-adding-useful-feature
In case you squashed and/or rebased and already pushed the branch, you need to force the push:
$ git push -f origin 123-adding-useful-feature
Kiosk
The Kiosk mode in VarFish enables users to upload VCF files. This is not intended for production use as every upload will create it’s own project, so there is no way of organizing your cases properly. The mode serves only as a way to try out VarFish for external users.
Configuration
First, you need to download the VarFish annotator data (11Gb) and unpack it.
$ wget https://file-public.bihealth.org/transient/varfish/varfish-annotator-{,transcripts-}20191129.tar.gz{,.sha256}
$ tar xzvf varfish-annotator-20191129.tar.gz
$ tar xzvf varfish-transcripts-20191129.tar.gz
If you want to enable Kiosk mode, add the following lines to the .env
file.
export VARFISH_KIOSK_MODE=1
export VARFISH_KIOSK_VARFISH_ANNOTATOR_REFSEQ_SER_PATH=/path/to/varfish-annotator-transcripts-20191129/hg19_refseq_curated.ser
export VARFISH_KIOSK_VARFISH_ANNOTATOR_ENSEMBL_SER_PATH=/path/to/varfish-annotator-transcripts-20191129/hg19_ensembl.ser
export VARFISH_KIOSK_VARFISH_ANNOTATOR_REFERENCE_PATH=/path/to/unpacked/varfish-annotator-20191129/hs37d5.fa
export VARFISH_KIOSK_VARFISH_ANNOTATOR_DB_PATH=/path/to/unpacked/varfish-annotator-20191129/varfish-annotator-db-20191129.h2.db
export VARFISH_KIOSK_CONDA_PATH=/path/to/miniconda/bin/activate
Run
To run the kiosk mode, simply (re)start the webserver server and the celery server.
terminal1$ make serve
terminal2$ make celery
Templates (for Issues etc.)
We do organize bug reports and feature request in the Github issue tracker. Please choose the template that fits best what you want to report and fill out the questions to help us decide on how to approach the task.
Bug Reports
The template for bug reports has the following form (an up-to-date form is located in the Github issue tracker):
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Desktop (please complete the following information):**
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
**Smartphone (please complete the following information):**
- Device: [e.g. iPhone6]
- OS: [e.g. iOS8.1]
- Browser [e.g. stock browser, safari]
- Version [e.g. 22]
**Additional context**
Add any other context about the problem here.
Root Cause Analysis
In the following, a root cause analysis (RCA) needs to be done. The ticket will get an answer with the title Root Cause Analysis and a thorough description of what might cause the bug.
Resolution Proposal
When the root cause is determined, a solution needs to be proposed, following this form:
**Resolution Proposal**
e.g. The component X needs to be changed to Y so Z is not executed when M occurs.
**Affected Components**
e.g. VarFish server
**Affected Modules/Files**
e.g. variants module or queries.py
**Required Architectural Changes**
e.g. Function F needs to be moved to X.
**Required Database Changes**
i.e. name any model that needs changing, to be added and will lead to a migration
**Backport Possible?**
e.g., "Yes" if this is a bug fix or small change and should be backported to the current stable version
**Resolution Sketch**
e.g. Change X in F. Then do Y.
Commits
Almost all commits should refer to a ticket in trailing parenthesis, e.g.
Resolve some issue (#NUMBER)
Required trailing lines are required for each commit.
You must either specify Related-Issue
or No-Related-Issue
.
Examples:
Related-Issue: #123
No-Related-Issue: Short text reason
Further, each commit should be marked whether it is expected to change filtration results with Projected-Results-Impact
.
Allowed values are none
or require-revalidation
.
Projected-Results-Impact: none
Projected-Results-Impact: require-revalidation
Fix & Pull Request
Create new branch (name starts with issue number), e.g.
123-fix-for-issue
Create pull request in “Draft” state
Fix problem, ideally in a test-driven way, remove “Draft” state
Review & Merge
Perform code review
Ensure fix is documented in changelog (link to bug and PR #ids)
Feature Requests
A feature request follows the same workflow as a bug request (an up-to-date form is located in the Github issue tracker):
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.
Design
In the following, the design of the feature needs to be specified:
**Implementation Proposal**
e.g. The component X needs to be changed to Y so Z is not executed when M occurs.
**Affected Components**
e.g. VarFish server
**Affected Modules/Files**
e.g. variants module or queries.py
**Required Architectural Changes**
e.g. Function F needs to be moved to X.
**Implementation Sketch**
e.g. Change X in F. Then do Y.
Implement & Test
Create feature branch, named starting with issue ID
Perform implementation, ideally in a test-driven way
Tests and documentation must be augmented/updated as well
Review & Merge
Perform code review
Ensure change is documented in changelog (link to feature issue and PR #ids)
Checklists
Releases
Prerequisites:
Have all issues done for the next milestone.
Tasks:
Create ticket with the following template and assign it to the proper milestone.
Release for version vVERSION - [ ] edit `HISTORY.rst` and ensure a proper section is added - [ ] edit `admin_upgrade.rst` to reflect the upgrade instructions - [ ] create a git tag `v.MAJOR.MINOR.PATCH` and `git push --tags` - [ ] create a "Github release` based on the tag with the text ``` All details can be found in the `HISTORY.rst` file. ```
Follow through the items.
Data & Software Validation
Prerequisites:
Have all background data imported into dedicated instances for validation. (Internally we use
varfish-build-release-{37,38}.cubi.bihealth.org
).Create the
varfish-site-data-X.tar.gz
tarball with the database dump.Have a token ready for the root user.
Tasks:
Create a ticket with the following template.
Validate data for: - **VarFish:** vMAJOR.MINOR.PATCH - **Site Data:** vVERSION (`sha256:CHECKSUM`) - **Genome Build:** GRCh37 or GRCh38 Result Reports: PASTE HERE
Use the
varfish-wf-validation
Snakemake workflow for running the validation.Paste the result reports into the tickets.
Docker & Data Builds
This section describes how to build the Docker images and also the VarFish site data tarballs. The intended audience are VarFish developers.
Build Docker Images
Building the image:
$ ./docker/build-docker.sh
By default the latest tag is used. You can change this with.
$ GIT_TAG=v0.1.0 ./docker/build-docker.sh
Get varfish-docker-compose
The database is built in varfish-docker-compose
.
$ git clone git@github.com:bihealth/varfish-docker-compose.git
$ cd varfish-docker-compose
$ ./init.sh
First-Time Container Startup
You have to startup the postgres container once to create the Postgres database. Once it has been initialized, shutdown with Ctrl-C.
$ docker-compose up postgres
<Ctrl-C>
Now copy over the postgresql.conf
file that has been tuned for the VarFish use cases.
$ cp config/postgres/postgresql.conf volumes/postgres/data/postgresql.conf
Bring up the site again so we can build the database.
$ docker-compose up
Wait until varfish-web
is up and running and all migrations have been applied, look for VARFISH MIGRATIONS END
in the output of run-docker-compose-up.sh
.
Pre-Build Postgres Database
Download static data
$ cd /plenty/space
$ wget https://file-public.bihealth.org/transient/varfish/anthenea/varfish-server-background-db-20201006.tar.gz{,.sha256}
$ sha256sum -c varfish-server-background-db-20201006.tar.gz.sha256
$ tar xzvf varfish-server-background-db-20201006.tar.gz
Adjust the docker-compose.yml
file such that /plenty/space
is visible in the varfish-web container.
volumes:
- "/plenty/space:/data"
Get the name of the running varfish-web container.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
44be6ece102e minio/minio "/usr/bin/docker-ent…" 11 minutes ago Up About a minute 9000/tcp varfish-docker-compose_minio_1
3b23113e5aa1 quay.io/biocontainers/exomiser-rest-prioritiser:12.1.0--1 "exomiser-rest-prior…" 11 minutes ago Up About a minute varfish-docker-compose_exomiser-rest-prioritiser_1
b8c49e8c24a6 quay.io/biocontainers/jannovar-cli:0.33--0 "jannovar -Xmx6G -Xm…" 11 minutes ago Up About a minute varfish-docker-compose_jannovar_1
409a535b9951 bihealth/varfish-server:0.22.1-0 "docker-entrypoint.s…" 12 minutes ago Up About a minute 8080/tcp varfish-docker-compose_varfish-celerybeat_1
7eb7425c59e2 bihealth/varfish-server:0.22.1-0 "docker-entrypoint.s…" 12 minutes ago Up About a minute 8080/tcp varfish-docker-compose_varfish-celeryd-import_1
020811fde306 bihealth/varfish-server:0.22.1-0 "docker-entrypoint.s…" 12 minutes ago Up About a minute 8080/tcp varfish-docker-compose_varfish-celeryd-query_1
87b03ee0249b bihealth/varfish-server:0.22.1-0 "docker-entrypoint.s…" 12 minutes ago Up About a minute 8080/tcp varfish-docker-compose_varfish-celeryd-default_1
7a3fdb337fae bihealth/varfish-server:0.22.1-0 "docker-entrypoint.s…" 12 minutes ago Up About a minute 8080/tcp varfish-docker-compose_varfish-web_1
9295a101570f postgres:12 "docker-entrypoint.s…" 12 minutes ago Up About a minute 5432/tcp varfish-docker-compose_postgres_1
1c4d6e235074 traefik:v2.3.1 "/entrypoint.sh --pr…" 12 minutes ago Up About a minute 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp varfish-docker-compose_traefik_1
8d72fd096743 redis:6 "docker-entrypoint.s…" 12 minutes ago Up About a minute 6379/tcp varfish-docker-compose_redis_1
Initialize the tables (while at least docker-compose up varfish-web postgres redis
is running).
$ docker exec -it -w /usr/src/app varfish-docker-compose_varfish-web_1 python manage.py import_tables --tables-path /data --threads 8
Then, shutdown the docker-compose up
, remove the volumes:
entry for varfish-web
, and create a tarball of the postgres database to have a clean copy.
Add Other Data
Copy the other required data for jannovar
and exomiser
.
You can find the appropriate files to download on the Jannovar (via Zenodo) and Exomiser data download sites:
You should use the hg19 data for Exomiser for any genome release as we will only use the the gene to phenotype prioritization that is independent of the genome release.
The result should look similar to this:
# tree volumes/jannovar volumes/exomiser
volumes/jannovar
├── hg19_ensembl.ser
├── hg19_refseq_curated.ser
└── hg19_refseq.ser
volumes/exomiser
├── 1909_hg19
│ ├── 1909_hg19_clinvar_whitelist.tsv.gz
. . [..]
│ └── 1909_hg19_variants.mv.db
└── 1909_phenotype
├── 1909_phenotype.h2.db
├── phenix
│ ├── 10.out
. . [..]
│ ├── ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt
│ ├── hp.obo
│ └── phenotype_annotation.tab
└── rw_string_10.mv
3 directories, 55 files
Create a Superuser
While the docker-compose up
is running
$ docker exec -it -w /usr/src/app varfish-docker-compose_varfish-web_1 python manage.py createsuperuser
Username: root
Email address:
Password: <changeme>
Password (again): <changeme>
Superuser created successfully.
Setup Initial Data
Create test category & project.
Obtain API key and configure varfish-cli
.
Import some test data through the API.
$ varfish-cli --no-verify-ssl case create-import-info --resubmit \
92f5d735-0967-4db2-a801-50fe96359f51 \
$(find path/to/variant_export/work/*NA12878* -name '*.tsv.gz' -or -name '*.ped')
Create Data Tarballs
Now create the released data tarballs.
tar -cf - volumes | pigz -c > varfish-site-data-v1-20210728-grch37.tar.gz && sha256sum varfish-site-data-v1-20210728-grch37.tar.gz >varfish-site-data-v1-20210728-grch37.tar.gz.sha256 &
tar -cf - volumes | pigz -c > varfish-site-data-v1-20210728-grch38.tar.gz && sha256sum varfish-site-data-v1-20210728-grch38.tar.gz >varfish-site-data-v1-20210728-grch38.tar.gz.sha256 &
tar -cf - test-data | pigz -c > varfish-test-data-v1-20211125.tar.gz && sha256sum varfish-test-data-v1-20211125.tar.gz >varfish-test-data-v1-20211125.tar.gz.sha256
ClinVar Notes
This section contains notes regarding ClinVar and its integration into VarFish. It outlines issues with the interpretation of variants as well as their resolution in VarFish and the rationale for the taken decisions.
ClinVar entries have two major labels:
- variant assertion
The assertion about the pathogenicity of a variant, e.g., likely benign or pathogenic.
- review status
A grading of how well a variant is reviewed. This is shown as a star rating on the ClinVar website.
Some reference ClinVar records (RVC identifiers) refer to one submission (SCV identifiers). Multiple reference ClinVar records are summarised in variant ClinVar records (VCV identifiers).
Review Status Interpretation
The interpretation of the status of a ClinVar record can be challenging. This is caused by two points.
Overall, there are the following occurences in ClinVar of clinvar assertion (June 4, 2020). Note that some only make sense together with the others (e.g., “no conflicts” only makes sense if there is more than one submission).
Count |
ClinVar Status |
---|---|
12,342 |
conflicting interpretations |
839,966 |
criteria provided |
55,467 |
multiple submitters |
71,858 |
no assertion criteria provided |
17,068 |
no assertion provided |
55,467 |
no conflicts |
5,751 |
practice guideline |
11,172 |
reviewed by expert panel |
772,157 |
single submitter |
In ClinVar the star ratings are assigned as follows
Stars |
Description |
---|---|
none |
no assertion criteria provided OR single submitter, no assertion provided |
one |
single submitter, criteria provided OR criteria provided & multiple submitters, conflicting interpretations |
two |
criteria provided, multiple submitters, no conflicts |
three |
reviewed by expert panel |
four |
practice guideline |
In particular, the missing distinction between “no assertion criteria provided” and “no assertion provided” is misleading. Also, it can be misleading that records with an assertion criteria override those without. In several records, good literature has been curated without an assertion criteria while many records from clinical testing companies have an assertion criteria but no phenotype and less diligence has been made as with good research.
Merging of ClinVar Records
The algorithm for merging multiple records in ClinVar to display the VCV records is not public. Also, given the issues with ClinVar’s star rating from above, VarFish uses a modified display from ClinVar’s. Instead of ClinVar’s gold stars, VarFish assigns points.
Points |
Condition |
---|---|
none |
origin is somatic OR no assertion provided |
one |
single submitter OR multiple submitters, conflicting interpretations |
two |
multiple submitters, no conflicting interpretation |
three |
reviewed by expert panel OR practice guideline |
Importantly, Varfish will still display all ClinVar records in the variant display and link out to ClinVar so the user can make their own assessment. The role of ClinVar in VarFish is to assist the user in quickly find variants present in ClinVar and not to override the user in any way.
The rationale:
ClinVar entries for somatic variants and those without a variant assessment are of little interest.
Multiple submitters are better than one submitter, regardless of the assertion criteria. Requiring assertion criteria or expert panel status is good for ClinVar to foster submission of assertion criteria or applications for expert panels but less important for VarFish users.
Variants for practice guideline are less important for VarFish’s use case. Thus, collapsing them with “reviewed by expert panel” should not make a problem.
VarFish merges ClinVar records based on the following algorithm.
Generally, benign and likely benign is merged to likely benign/benign, same for pathogenic and likely pathogenic. Records with uncertain significance are ignored in merging if there is at least one (likely) benign/pathogenic assessment.
Records flagged with practice guideline or expert panel will be assigned three points and override any other assessment. Within three point variants, practice guideline beat expert panel.
In the case that there is only one record, that record’s assessment is used. Note that this will include RCV records in ClinVar that are already merged. Assign one point.
In the case of two or more records:
Ignore uncertain significance records as outlined in (0).
If there are conflicting interpretations, mark the record as such.
Otherwise, merge likely and non-likely assertions and add no conflicting interpretation if more than one non-uncertain significance record.
Assign one point in case of conflicts and two points in case of consistency.
Further, each variant is annotated with an ACMG-style rating. In the case of having an “likely X/X” assertion, ACMG:1.5 or ACMG:4.5 is assigned. In the case of conflicting assertions, an ACMG score of 3 is assigned but the variant is flagged with a “C” to indicate conflicting interpretations. Note that uncertain vs. benign does not create a conflict as well as uncertain vs. pathogenic.
Examples
- INPUT
practice guideline, likely pathogenic
reviewed by expert panel, likely pathogenic
single submitter, pathogenic
- OUTPUT
reviewed by expert panel, likely pathogenic
three points; ACMG:4-LP
- INPUT
single submitter, pathogenic
multiple submitters, no conflict, likely pathogenic
- OUTPUT
multiple submitters, no conflict, likely pathogenic/pathogenic
two points; ACMG:4.5-LP-P
- INPUT
single submitter, pathogenic
single submitter, uncertain significance
single submitter, likely pathogenic
- OUTPUT
multiple submitters, no conflict, likely pathogenic/pathogenic
two points; ACMG:4.5-LP-P
- INPUT
single submitter, pathogenic
multiple submitters, uncertain significance
- OUTPUT
single submitter, likely pathogenic
one point; ACMG:4-LP
- INPUT
single submitter, pathogenic
multiple single submitters, likely benign
- OUTPUT
multiple submitters, conflicting interpretations, uncertain significance
one point; ACMG:3
Contributors
In alphabetical order:
Dieter Beule
Felix Boschann
Nadja Ehmke
Manuel Holtgrewe
Oliver Stolpe
Release Cycle
This section documents the versioning and branching model of VarFish. Generally, we follow the idea of release cycles as also employed by Ceph.
There is a new stable release every year, targeting the month of April. Each stable release receives a name (e.g., “Anthenea”) and a major release number, (e.g., 1 as “A” is the first letter of the alphabet).
Releases are named after starfish species.
Version numbers have three components, x.y.z
.
x
identifies the release cycle (e.g., 1
for Anthenea
).
y
identifies the release type:
x.0.z
- development versions (the bleeding edge)x.1.z
- release candidates (for test users)x.2.z
- stable/bugfix releases (for the general public)
Stable Releases (x.2.z)
There will be a new stable release per year (“x”) with a small number of bug fixes and “trivial feature” releases (“z”). Stable releases will be supported for 14-16 months, so users have some time to upgrade
Release Candidates (x.1.z)
We will start feature freezes roughly a month before the next stable releases. The release candidates are suitable for testing the
Development Versions (x.0.z)
These releases are suitable for sites that are involved in the development of Varfish themselves or that want to track the “bleeding edge” very closely. The main developing sites (currently Berlin, Bonn) deploy self-built Docker containers from the current development branch.
Release Names
Year |
Version |
Release Name |
Species |
---|---|---|---|
2022 |
1.y.z |
Anthenea |
Anthenea aspera |
2023 |
2.y.z |
Bollonaster |
Bollonaster pectinatus |
2024 |
3.y.z |
Culcita |
Culcita coriacea |
2025 |
4.y.z |
Doraster |
Doraster constellatus |
2026 |
5.y.z |
Euretaster |
Euretaster cibrosus |
Releases History
Starting with the 1.0.0 release.
History / Changelog
v1.2.3 (anthenea)
End-User Summary
Create single result row even for multiple clinvar entries (#565).
Adding warning in case of truncated display (#641).
Adding coordinate indices on HelixMtDb and Mitomap (#635).
Fixing clinvar pathogenic filter (#296).
Improving Clinvar record aggregation (#640).
Fixed wrong colored WIP result rows (#673).
Fixing ClinVar submission XML generation (#677).
Regular refresh ClinVar individual from Case (#158).
Fixing hemizygous count display in fold-outs (#646).
Fixing clinvar submission sex/gender update (#686).
Fixing issue with phenotype name in Clinvar (#689).
Changing ClinVar link-out to VCV entry instead of coordinates (#693)
Bugfix that allow clinvar export submission set deletion (#713).
Adding genepanels app for defining gene panels (#723).
Allow excluding cases from in-house database (#579).
Allow to upload per-case gene annotation (#575).
Adding varannos app (#747).
Adding ACMG v3.0 + v3.1 incidental findings to gene allowlist preset (#829).
Adding locus link-out to genoox Franklin (#748).
Full Change List
Create single result row even for multiple clinvar entries (#565).
Adding warning in case of truncated display (#641).
Adding coordinate indices on HelixMtDb and Mitomap (#635).
Fixing Docker builds (#660)
Fixing clinvar pathogenic filter (#296).
Improving Clinvar record aggregation (#640).
Fixed wrong colored WIP result rows (#673).
Fixing ClinVar submission XML generation (#677).
Regular refresh ClinVar individual from Case (#158).
Fixing hemizygous count display in fold-outs (#646).
Fixing clinvar submission sex/gender update (#686).
Fixing issue with phenotype name in Clinvar (#689).
Changing ClinVar link-out to VCV entry instead of coordinates (#693)
Adding unit tests for clinvar export Vue app (#692)
Move varfish export Vue app (#711)
Bugfix that allow clinvar export submission set deletion (#713).
Removing dependency on bootstrap-vue (#716)
Migrating clinvar export to Pinia/Vue3 (#720).
Adding genepanels app for defining gene panels (#723).
Allow excluding cases from in-house database (#579).
Allow to upload per-case gene annotation (#575).
Add missing directory in Dockerfile.
Adding varannos app (#747).
Adding ACMG v3.0 + v3.1 incidental findings to gene allowlist preset (#829).
Adding locus link-out to genoox Franklin (#748).
v1.2.2 (anthenea)
End-User Summary
Add Transcripts GnomadAD constraints and clinvar reports in the export (#568).
Extra annotations in export completed and tested (#495).
Fixed bug where Exac and thousand genomes settings were not shown in frequency tab for GRCh37 (#597).
Form template reports error if genomebuild variable is not set (#607).
Added locus link-out for genoox Franklin (#748).
Full Change List
Extra annotations in export completed and tested (#495).
Fixing issue with sync-from-remote when no remote is defined (#570).
Fixed bug where Exac and thousand genomes settings were not shown in frequency tab for GRCh37 (#597).
Form template reports error if genomebuild variable is not set (#607).
Added locus link-out for genoox Franklin (#748).
v1.2.1 (anthenea)
End-User Summary
Starting with branch of stable version Athenea (VarFish v1).
Documenting problem with extra annotations in
20210728` data release (#450). Includes instructions on how to apply patch to get ``20210728b
.Removing problematic username modification behaviour on login page (#459).
Displaying login page text from settings again (#458).
Suppress “submit to CADD” and “submit to SPANR” buttons for multi-case form (#478). This has not been implemented so far.
Fixing paths in “Variant Ingest” documentation (#472).
Small extension of “Resolution proposal” template (#472).
Adjusting wrong release name to “anthenea” (#479).
Adding “show all variant carriers” feature (#470).
Properly display the clinvar annotations that we have in the database (#464).
Adjusting default frequency filters for “clinvar pathogenic” filter: remove all threshold (#464).
Adding note about difference with upstream Clinvar (#464).
Switching scoring to MutationTaster 85 interface, added back MT 85 link-out alongside MT 2021 link-out (#509).
Made flag filter and flag form nomenclature consistent (#297).
Fixed broken VariantValidator query (#523).
Fixed smallvariant flags filter query (#502).
Added flags segregates, doesnt_segregate and no_disease_association to file export (#502).
Adding feature to enable and configure link-out to HGMD (#576).
Full Change List
Starting with branch of stable version Athenea (VarFish v1).
Documenting problem with extra annotations in
20210728` data release (#450). Includes instructions on how to apply patch to get ``20210728b
.Removing problematic username modification behaviour on login page (#459).
Displaying login page text from settings again (#458).
Suppress “submit to CADD” and “submit to SPANR” buttons for multi-case form (#478). This has not been implemented so far.
Fixing paths in “Variant Ingest” documentation (#472).
Small extension of “Resolution proposal” template (#472).
Adjusting wrong release name to “anthenea” (#479).
Adding “show all variant carriers” feature (#470).
Properly display the clinvar annotations that we have in the database (#464).
Adjusting default frequency filters for “clinvar pathogenic” filter: remove all threshold (#464).
Adding note about difference with upstream Clinvar (#464).
Switching scoring to MutationTaster 85 interface, added back MT 85 link-out alongside MT 2021 link-out (#509).
Made flag filter and flag form nomenclature consistent (#297).
Fixed broken VariantValidator query (#523).
Fixed smallvariant flags filter query (#502).
Added flags segregates, doesnt_segregate and no_disease_association to file export (#502).
Converted not cooperative tooltip to standard title on Filter & Display button (#508).
Adding feature to enable and configure link-out to HGMD (#576).
v1.2.0
This is the first stable VarFish Server release. It is the same as v1.1.4.
v1.1.4
End-User Summary
Full Change List
Installing same postgres version as in docker-compose server (12).
v1.1.3
End-User Summary
Fixing problem with import info display for non-superusers (#431)
Schema and documentation for case QC info (#428)
Adding support for HGNC IDs in gene allow lists (#432)
PanelApp will now populate the gene allow list with HGNC gene IDs (#432)
Full Change List
Fixing problem with import info display for non-superusers (#431)
Schema and documentation for case QC info (#428)
Adding support for HGNC IDs in gene allow lists (#432)
PanelApp will now populate the gene allow list with HGNC gene IDs (#432)
Adding
pg_dump
admin command and documentation (#430)
v1.1.2
End-User Summary
Fixing bug in XLSX export (#417)
Fixing problem with multi-sample queries (#419)
Fixing issue with cohort queries (#420)
Fixing issue with mutationtaster queries (#423)
Fixing problem with multi-variant update (#419)
Full Change List
Fixing bug in corner case of multi variant annotation (#412)
Updating documentation for v1 release (#410)
Fixing issue with
fa-solid:refresh
icon (#409)Fixing page titles (#409)
Fixing bug in XLSX export (#417)
Fixing problem with multi-sample queries (#419). This is done by rolling back adding the
_ClosingWrapper
class. We will need a different approach for the queries than was previously attempted here.Fixing issue with cohort queries (#420)
Fixing issue with mutationtaster queries (#423)
Fixing problem with multi-variant update (#419)
v1.1.1
This is the first release candidate of the VarFish “Anthenea” release (v1). Importantly, the first stable release for v1 will be v1.2.0 (see Release Cycle Documentation for a full explanation of version semantics).
This release adds some more indices so the migrations might take some more time.
End-User Summary
Fixing problem with CNV import (#386)
Fixing problem with user annotation of nonexistent variants (#404)
Full Change List
Adding REST API for generating query shortcuts (#367)
Filter queries in REST API to selected case and not all by user
Fixing problem with CNV import (#386)
Adding index to improve beaconsite performance (#389)
Adding missing
mdi
iconset (#284)Strip trailing slashes in beconsite entrypoints (#388)
Documenting PAP setup (#393)
Adding more indices (#395)
Fixing discrepancy with REST API query shortcuts (#402)
v1.1.0
This is the first release candidate of the VarFish “Anthenea” release (v1). Importantly, the first stable release for v1 will be v1.2.0 (see Release Cycle Documentation for a full explanation of version semantics).
Breaking changes, see below.
End-User Summary
Fixing Kiosk mode of VarFish.
Fixing displaying of beacon information in results table.
Fixing broken flags & comments popup for structural variants.
Fixing broken search field.
Extended manual for bug report workflow.
Fixed recompute of variant stats of large small variant sets.
Added index for
SmallVariant
model filtering forcase_id
andset_id
. This may take a while!Allowing project owners and delegates to import cases via API (#207).
Fix for broken link-out into MutationTaster (#240).
Fixing SODAR Core template inconsistency (#150).
Imports via API now are only allowed for projects of type
PROJECT
(#237).Fixing ensembl gene link-out to wrong genome build (#156).
Added section for developers in manual (#267).
Updating Clinvar export schema to 1.7 version (#226).
Migrated icons to iconify (#208).
Bumped chrome-driver version (#208).
VarFish now allows for the import of GRCh38 annotated variants. For this, GRCh38 background data must be imported. Kiosk mode does not support GRCh38 yet. This is a breaking change, new data and CLI must be used!
Added feature to select multiple rows in results to create same annotation (#259)
Added parameter to Docker entrypoint file to accept number of gunicorn workers
Extended documentation for how to update specific tables (#177)
Improving performance of project overview (#303)
Improving performance of case listing (#304)
Adding shortcut buttons to phenotype annotation (#289)
Fixing issue with multiple added variants (#283)
Implementing several usability improvements for clinvar submission editor (#286)
Make clinvar UI work with many annotations (#302)
Fixing CADD annotation (#319)
Adding mitochondrial inheritance to case phenotype annotation (#325)
Fix issue with variant annotation export (#328)
Allowing direct update of variant annotations and ACMG ratings on case annotations details (#344)
Fixing problem with ACMD classifiction where VUS-3 was given but should be LB-2 (#359)
Adding REST API for creating small variant queries (#332)
Fixing beaconsite queries with dots in the key id (#369)
Allowing joint queries of larger cohorts (#241)
Documenting Clinical Beacon v1 protocol
Improving performance for fetching result queries (#371)
Capping max. number of cases to query at once (#372)
Documenting release cycle and branch names
Add extra annotations, i.e. additional variant scores to the filtered variants (#242)
Fixing bug in project/cohort filter (#379)
Full Change List
- Resolving problem with varfish-kiosk.
Auto-creating user
kiosk_user
when running in Kiosk mode.Using custom middleware for kiosk user (#215).
Kiosk annotation now uses
set -x
flag ifsettings.DEBUG
is true.Mapping kiosk jobs to import queue.
Fixing displaying of beacon information in results table.
Fixing broken flags & comments popup for structural variants.
Fixing broken search field.
Extended manual for bug report workflow.
Fixed recompute of variant stats of large small variant sets.
Added index for
SmallVariant
model filtering forcase_id
andset_id
. This may take a while!Allowing project owners and delegates to import cases via API (#207).
Fix for broken link-out into MutationTaster (#240).
Fixing SODAR Core template inconsistency (#150).
Imports via API now are only allowed for projects of type
PROJECT
(#237).Fixing ensembl gene link-out to wrong genome build (#156).
Added section for developers in manual (#267).
Updating Clinvar export schema to the latest 1.7 version (#226).
Migrated icons to iconify (#208).
Bumped chrome-driver version (#208).
Skipping codacy if token is not defined (#275).
Adjusting models and UI for supporting GRCh38 annotated cases. It is currently not possible to migrate a GRCh37 case to GRCh38.
Adjusting models and UI for supporting GRCh38 annotated cases. It is currently not possible to migrate a GRCh37 case to GRCh38.
Setting
VARFISH_CADD_SUBMISSION_RELEASE
is calledVARFISH_CADD_SUBMISSION_VERSION
now (breaking change).import_info.tsv
expected as in data release from20210728
as built from varfish-db-downloader1b03e97
or later.Extending columns of
Hgnc
to upstream update.Added feature to select multiple rows in results to create same annotation (#259)
Added parameter to Docker entrypoint file to accept number of gunicorn workers
Extended documentation for how to update specific tables (#177)
Improving performance of project overview (#303)
Improving performance of case listing (#304)
Adding shortcut buttons to phenotype annotation (#289)
Fixing issue with multiple added variants (#283)
Make clinvar UI work with many annotations by making it load them lazily for one case at a time (#302)
Implementing several usability improvements for clinvar submission editor (#286)
Adding CI builds for Python 3.10 in Github actions, bumping numpy/pandas dependencies. Dropping support for Python 3.7.
Fixing CADD annotation (#319)
Adding mitochondrial inheritance to case phenotype annotation (#325)
Fix issue with variant annotation export (#328)
Adding REST API versioning (#333)
Adding more postgres versions to CI (#337)
Make migrations compatible with Postgres 14 (#338)
DgvSvs and DgvGoldStandardSvs are two different data sources now
Adding deep linking into case details tab (#344)
Allowing direct update of variant annotations and ACMG ratings on case annotations details (#344)
Removing display_hgmd_public_membership (#363)
Fixing problem with ACMD classifiction where VUS-3 was given but should be LB-2 (#359)
Adding REST API for creating small variant queries (#332)
Upgrading sodar-core dependency to 0.10.10
Fixing beaconsite queries with dots in the key id (#369)
Allowing joint queries of larger cohorts (#241) This is achieved by performing fewer UNION queries (at most
VARFISH_QUERY_MAX_UNION=20
at one time)Documenting Clinical Beacon v1 protocol
Improving performance for fetching result queries (#371)
Fix to support sodar-core v0.10.10
Capping max. number of cases to query at once (#372)
Documenting release cycle and branch names
Checking commit message trailers (#323)
Add extra annotations to the filtered variants (#242)
Fixing bug in project/cohort filter (#379)
v0.23.9
End-User Summary
Bugfix release.
Full Change List
Fixing bugs that prevented properly running in production environment.
v0.23.8
End-User Summary
Added SAML Login possibility from sodar-core to varfish
Upgraded some icons and look and feel (via sodar-core).
Full Change List
Fixing bug that occured when variants were annotated earlier by the user with the variant disappering later on. This could be caused if the case is updated from singleton to trio later on.
Added sso urls to config/urls.py
Added SAML configuration to config/settings/base.py
Added necessary tools to the Dockerfile
Fix for missing PROJECTROLES_DISABLE_CATEGORIES variable in settings.
Upgrading sodar-core dependency. This implies that we now require Python 3.7 or later.
Upgrading various other packages including Django itself.
Docker images are now published via ghcr.io.
v0.23.7
IMPORTANT
This release contains a critical update.
Prior to this release, all small and structural variant tables were marked as UNLOGGED
.
This was originally introduce to improve insert performance.
However, it turned out that stability is greatly decreased.
In the case of a PostgreSQL crash, these tables are emptied.
This change should have been rolled back much earlier but that rollback was buggy.
This release now includes a working and verified fix.
End-User Summary
Fixing stability issue with database schema.
Full Change List
Bump sodar-core to hotfix version. Fixes problem with remote permission synchronization.
Adding migration to mark all
UNLOGGED
tables back toLOGGED
. This should have been reverted earlier but because of a bug it did not.Fixing CI by calling
sudo apt-get update
once more.
v0.23.6
End-User Summary
Fixing problem with remote permission synchronization.
Full Change List
Bump sodar-core to hotfix version. Fixes problem with remote permission synchronization.
v0.23.5
End-User Summary
Adding back missing manual.
Fixing undefined variable bug.
Fixing result rows not colored anymore.
Fixing double CSS import.
Full Change List
Fixing problem with
PROJECTROLES_ADMIN_OWNER
being set toadmin
default but the system user beingroot
in the prebuilt databases. The value now defaults toroot
.Adding back missing manual in Docker image.
Fixing problem with “stopwords” corpus of
nltk
not being present. This is now downloaded when building the Docker image.Fixing undefined variable bug.
Fixing result rows not colored anymore.
Fixing double CSS import.
v0.23.4
End-User Summary
Fixing issue of database query in Clinvar Export feature where too large queries were created.
Fixing search feature.
Full Change List
Docker image now includes commits to the next tag so the versioneer version display makes sense.
Dockerfile entrypoint script uses timeout of 600s now for guniorn workers.
Fixing issue of database query in Clinvar Export feature where too large queries were created and postgres ran out of stack memory.
Adding more Sentry integrations (redis, celery, sqlalchemy).
Fixing search feature.
v0.23.3
End-User Summary
Bug fix release.
Full Change List
Bug fix release where the clinvar submission Vue.js app was not built.
Fixing env file example for
SENTRY_DSN
.
v0.23.2
End-User Summary
Bug fix release.
Full Change List
Bug fix release where Javascript was missing.
v0.23.1
End-User Summary
Allowing to download all users annotation for whole project in one Excel/TSV file.
Improving variant annotation overview per case/project and allowing download.
Adding “not hom. alt.” filter setting.
Allowing users to easily copy case UUID by icon in case heading.
Fixing bug that made the user icon top right disappear.
Full Change List
Allowing to download all users annotation for whole project in one Excel/TSV file.
Using SQL Alchemy query instrastructure for per-case/project annotation feature.
Removing vendored JS/CSS, using CDN for development and download on Docker build instead.
Adding “not hom. alt.” filter setting.
Improving admin configuration documentation.
Extending admin tuning documentation.
Allowing users to easily copy case UUID by icon in case heading.
Fixing bug that made the user icon top right disappear when beaconsite was disabled.
Upgrade to sodar-core v0.9.1
v0.23.0
End-User Summary
Fixed occasionally breaking tests
ProjectExportTest
by sorting member list. This bug didn’t affect the correct output but wasn’t consistent in the order of samples.Fixed above mentioned bug again by consolidating two distinct
Meta
classes inCase
model.Fixed bug in SV tests that became visibly by above fix and created an additional variant that wasn’t intended.
Adapted core installation instructions in manual for latest data release and introduced use of VarFish API for import.
Allowing (VarFish admins) to import regulatory maps. Users can use these maps when analyzing SVs.
Adding “padding” field to SV filter form (regulatory tab).
Celerybeat tasks in
variants
app are now executing again.Fixed
check_installation
management command. Index fordbsnp
was missing.Bumped chromedriver version to 87.
Fixed bug where file export was not possible when nubmer of resulting variants were < 10.
Fixed bug that made it impossible to properly sort by genotype in the results table.
Cases can now be annotated with phenotypes and diseases. To speed up annotation, all phenotypes of all previous queries are listed for copy and paste. SODAR can also be queried for phenotypes.
Properly sanitized output by Exomiser.
Rebuild of variant summary database table happens every Sunday at 2:22am.
Added celery queues
maintenance
andexport
.Adding support for connecting two sites via the GAGH Beacon protocol.
Adding link-out to “GenCC”.
Adding “submit to SPANR” feature.
Full Change List
Fixed occasionally breaking tests
ProjectExportTest
by sorting member list. This bug didn’t affect the correct output but wasn’t consistent in the order of samples. Reason for this is unknown but might be that the order of cases a project is not always returned as in order they were created.Fixed above mentioned bug again by consolidating two distinct
Meta
classes inCase
model.Fixed bug in SV tests that became visibly by above fix and created an additional variant that wasn’t intended.
Adapted core installation instructions in manual for latest data release and introduced use of VarFish API for import.
Adding
regmaps
app for regulatory maps.Allowing users to specify padding for regulatory elements.
Celerybeat tasks in
variants
app are now executing again. Issue was a wrong decorator.Fixed
check_installation
management command. Index fordbsnp
was missing.Bumped chromedriver version to 87.
Fixed bug where file export was not possible when number of resulting variants were < 10.
Fixed bug that made it impossible to properly sort by genotype in the results table.
Adding tests for upstream sychronization backend code.
Allowing users with the Contributor role to a project to annotate cases with phenotype and disease terms. They can obtain the phenotypes from all queries of all users for a case and also fetch them from SODAR.
Adding files for building Docker images and documenting Docker (Compose) deployment.
Properly sanitized output by Exomiser.
Rebuild of variant summary database table happens every Sunday at 2:22am.
Added celery queues
maintenance
andexport
.Adding support for connecting two sites via the GAGH Beacon protocol.
Making CADD version behind CADD REST API configurable.
Adding link-out to “GenCC”.
Adding “submit to SPANR” feature.
v0.22.1
End-User Summary
Bumping chromedriver version.
Fixed extra-annos import.
Full Change List
Bumping chromedriver version.
Fixed extra-annos import.
v0.22.0
End-User Summary
Fixed bug where some variant flags didn’t color the row in filtering results after reloading the page.
Fixed upload bug in VarFish Kiosk when vcf file was too small.
Blocking upload of VCF files with GRCh38/hg38/hg19 builds for VarFish Kiosk.
Support for displaying GATK-gCNV SVs.
Tracking global maintenance jobs with background jobs and displaying them to super user.
Adding “Submit to CADD” feature similar to “Submit to MutationDistiller”.
Increased default frequency setting of HelixMTdb max hom filter to 200 for strict and 400 for relaxed.
It is now possible to delete ACMG ratings by clearing the form and saving it.
Fixed bug when inheritance preset was wrongly selected when switching to
variant
in an index-only case.Added hemizygous counts filter option to frequency filter form.
Added
synonymous
effect to be also selected when checkingall coding/deep intronic
preset.Saving uploads pre-checking in kiosk mode to facilitate debugging.
Kiosk mode also accepts VCFs based on hg19.
VariantValidator output now displays three-letter representation of AA.
Documented new clinvar aggregation method and VarFish “point rating”.
Implemented new clinvar data display in variant detail.
Added feature to assemble cohorts from cases spanning multiple projects and filter for them in a project-like query.
Added column to results list indicating if a variant lies in a disease gene, i.e. a gene listed in OMIM.
Displaying warning if priorization is not enabled when entering HPO terms.
Added possibility to import “extra annotations” for display along with the variants.
On sites deployed by BIH CUBI, we make the CADD, SpliceAI, MMSp, and dbscSNV scores available.
In priorization mode, ORPHA and DECIPHER terms are now selectable.
Fixed bug of wrong order when sorting by LOEUF score.
Adding some UI documenation.
Fixed bug where case alignment stats were not properly imported.
Fixed bug where unfolding smallvariant details of a variant in a cohort that was not part of the base project caused a 404 error.
Fixed bug that prevented case import from API.
Increased speed of listing cases in case list view.
Fixed bug that prevented export of project-wide filter results as XLS file.
Adjusted genotype quality relaxed filter setting to 10.
Added column with family name to results table of joint filtration.
Added export of filter settings as JSON to structural variant filter form.
Varseak Splicing link-out also considers refseq transcript.
Fixed bug that occurred when sample statistics were available but sample was marked with having no genotype.
Adjusted genotype quality strict filter setting to 10.
Added possibility to export VCF file for cohorts.
Increased logging during sample variant statistics computation.
Using gnomAD exomes as initially selected frequency in results table.
Using CADD as initially selected score metric in prioritization form.
Fixed missing disease gene and mode of inheritance annotation in project/cohort filter results table.
Catching errors during Kiosk annotation step properly.
Fixed issues with file extension check in Kiosk mode during upload.
“1” is now registered as heterozygous and homozygous state in genotype filter.
Loading annotation and QC tabs in project cases list asyncronously.
Increased timeout for VariantValidator response to 30 seconds.
Digesting more VariantValidator responses.
Fixed bug where when re-importing a case, the sample variants stats computation was performed on the member list of the old case. This could lead to the inconsistent state that when new members where added, the stats were not available for them. This lead to a 500 error when displaying the case overview page.
Fixed missing QC plots in case detail view.
Fixed bug in case VCF export where a variant existing twice in the results was breaking the export.
Fixed log entries for file export when pathogenicity or phenotype scoring was activated.
Bumped Chrome Driver version to 84 to be compatible with gitlab CI.
CADD is now selected as default in pathogenicity scoring form (when available).
Added global maintenance commands to clear old kiosk cases, inactive variant sets and expired exported files.
Added
SvAnnotationReleaseInfo
model, information is filled during import and displayed in case detail view.Fixed bug that left number of small variants empty when they actually existed.
Increased logging during case import.
Marked old style import as deprecated.
Fixed bug that prevented re-import of SVs.
Fixed bug where a re-import of genotypes was not possible when the same variant types weren’t present as in the initial import.
Fixed bug where
imported
state ofCaseImportInfo
was already set after importing the first variant set.Integrated Genomics England PanelApp.
Added command to check selected indexes and data types in database.
Added columns to results table:
cDNA effect
,protein effect
,effect text
,distance to splicesite
.Made effect columns and
distance to splicesite
column hide-able.Added warning to project/cohort query when a user tries to load previous results where not all variants are accessible.
Renamed all occurrences of whitelist to allowlist and of blacklist to blocklist (sticking to what google introduced in their products).
Fixed bug where cases were not deletable when using Chrome browser.
Harmonized computation for relatedness in project-wide QC and in case QC (thus showing the same results if project only contains one family).
Fixed failing case API re-import when user is not owner of previous import.
Added
PROJECTROLES_EMAIL_
to config.Avoiding variants with asterisk alternative alleles.
Full Change List
Fixed bug where some variant flags didn’t color the row in filtering results after reloading the page.
Fixed upload bug in VarFish Kiosk when vcf file was too small and the file copy process didn’t flush the file completely resulting in only a parly available header.
Blocking upload of VCF files with GRCh38/hg38/hg19 builds for VarFish Kiosk.
Bumping sodar-core dependency to v0.8.1.
Using new sodar-core REST API infrastructure.
Using sodar-core tokens app instead of local one.
Support for displaying GATK-gCNV SVs.
Fix of REST API-based import.
Tracking global maintenance jobs with background jobs.
Global background jobs are displayed with site plugin point via bgjobs.
Bumping Chromedriver to make CI work.
Adding “Submit to CADD” feature similar to “Submit to MutationDistiller”.
Increased default frequency setting of HelixMTdb max hom filter to 200 for strict and 400 for relaxed.
It is now possible to delete ACMG ratings by clearing the form and saving it.
Updated reference and contact information.
File upload in Kiosk mode now checks for VCF file without samples.
Fixed bug when inheritance preset was wrongly selected when switching to
variant
in an index-only case.Added hemizygous counts filter option to frequency filter form.
Added
synonymous
effect to be also selected when checkingall coding/deep intronic
preset.Saving uploads pre-checking in kiosk mode to facilitate debugging.
Kiosk mode also accepts VCFs based on hg19.
VariantValidator output now displays three-letter representation of AA.
Documented new clinvar aggregation method and VarFish “point rating”.
Implemented new clinvar data display in variant detail.
Case/project overview allows to download all annotated variants as a file now.
Querying for annotated variants on the case/project overview now uses the common query infrastructure.
Updating plotly to v0.54.5 (displays message on missing WebGL).
Added feature to assemble cohorts from cases spanning multiple projects and filter for them in a project-like query.
Added column to results list indicating if a variant lies in a disease gene, i.e. a gene listed in OMIM.
Displaying warning if priorization is not enabled when entering HPO terms.
Added possibility to import “extra annotations” for display along with the variants.
On sites deployed by BIH CUBI, we make the CADD, SpliceAI, MMSp, and dbscSNV scores available.
In priorization mode, ORPHA and DECIPHER terms are now selectable.
Fixed bug of wrong order when sorting by LOEUF score.
Adding some UI documenation.
Fixed bug where case alignment stats were not properly imported. Refactored case import in a sense that the new variant set gets activated when it is successfully imported.
Fixed bug where unfolding smallvariant details of a variant in a cohort that was not part of the base project caused a 404 error.
Fixed bug that prevented case import from API.
Increased speed of listing cases in case list view.
Fixed bug that prevented export of project-wide filter results as XLS file.
Adjusted genotype quality relaxed filter setting to 10.
Added column with family name to results table of joint filtration.
Added export of filter settings as JSON to structural variant filter form.
Varseak Splicing link-out also considers refseq transcript. This could lead to inconsistency when Varseak picked the wrong transcript to the HGVS information.
Fixed bug that occurred when sample statistics were available but sample was marked with having no genotype.
Adjusted genotype quality strict filter setting to 10.
Added possibility to export VCF file for cohorts.
Increased logging during sample variant statistics computation.
Using gnomAD exomes as initially selected frequency in results table.
Using CADD as initially selected score metric in prioritization form.
Fixed missing disease gene and mode of inheritance annotation in project/cohort filter results table.
Catching errors during Kiosk annotation step properly.
Fixed issues with file extension check in Kiosk mode during upload.
“1” is now registered as heterozygous and homozygous state in genotype filter.
Loading annotation and QC tabs in project cases list asyncronously.
Increased timeout for VariantValidator response to 30 seconds.
Digesting more VariantValidator responses, namely
intergenic_variant_\d+
andvalidation_warning_\d+
.Fixed bug where when re-importing a case, the sample variants stats computation was performed on the member list of the old case. This could lead to the inconsistent state that when new members where added, the stats were not available for them. This lead to a 500 error when displaying the case overview page.
Fixed missing QC plots in case detail view.
Fixed bug in case VCF export where a variant existing twice in the results was breaking the export.
Fixed log entries for file export when pathogenicity or phenotype scoring was activated. The variants are sorted by score in this case which led to messy logging which was designed for logging when the chromosome changes.
Bumped Chrome Driver version to 84 to be compatible with gitlab CI.
CADD is now selected as default in pathogenicity scoring form (when available).
Added global maintenance commands to clear old kiosk cases, inactive variant sets and expired exported files.
Added
SvAnnotationReleaseInfo
model, information is filled during import and displayed in case detail view.Fixed bug that left number of small variants empty when they actually existed. This happened when SNVs and SVs were imported at the same time.
Increased logging during case import.
Marked old style import as deprecated.
Fixed bug that prevented re-import of SVs by altering the unique constraint on the
StructuralVariant
table.Fixed bug where a re-import of genotypes was not possible when the same variant types weren’t present as in the initial import. This was done by adding a
state
field to theVariantSetImportInfo
model.Fixed bug where
imported
state ofCaseImportInfo
was already set after importing the first variant set.Integrated Genomics England PanelApp via their API.
Added command to check selected indexes and data types in database.
Added columns to results table:
cDNA effect
,protein effect
,effect text
,distance to splicesite
.Made effect columns and
distance to splicesite
column hide-able.Added warning to project/cohort query when a user tries to load previous results where not all variants are accessible.
Renamed all occurrences of whitelist to allowlist and of blacklist to blocklist (sticking to what google introduced in their products).
Fixed bug where cases were not deletable when using Chrome browser.
Harmonized computation for relatedness in project-wide QC and in case QC (thus showing the same results if project only contains one family).
Fixed failing case API re-import when user is not owner of previous import. Now also all users with access to the project (except guests) can list the cases.
Added
PROJECTROLES_EMAIL_
to config.Avoiding variants with asterisk alternative alleles.
v0.21.0
End-User Summary
Added preset for mitochondrial filter settings.
Fixed bug where HPO name wasn’t displayed in textarea after reloading page.
Added possibility to enter OMIM terms in phenotype prioritization filter.
Added maximal exon distance field to
Variants & Effects
tab.Adapted
HelixMTdb
filter settings, allowing to differntiate between hetero- and homoplasmy counts.Increased default max collective background count in SV filter from 0 to 5.
Included lists of genomic regions, black and white genelists and reworked HPO list in table header as response for what was filtered for (if set).
Added
molecular
assessment flag for variant classification.Fixed bug where activated mitochondrial frequency filter didn’t include variants that had no frequency database entry.
Added inheritance preset and quick preset for X recessive filter.
Removed VariantValidator link-out.
Now smallvariant comments, flags and ACMG are updating in the smallvariant details once submitted.
Deleting a case (only possible as root) runs now as background job.
Fixed bug in compound heterozygous filter with parents in pedigree but without genotype that resulted in variants in genes that didn’t match the pattern.
Bumped django version to 1.11.28 and sodar core version to bug fix commit.
Fixed bug where structural variant results were not displayed anymore after introduced
molecular
assessment flag.Fixed bug where variant comments and flags popup was not shown in structural variant results after updating smallvariant details on the fly.
Made
Download as File
andSubmit to MutationDistiller
buttons more promiment.Adapted preset settings for
ClinVar Pathogenic
setting.Finalized mitochondrial presets.
Added identifier to results table and smallvariant details when mitochondrial variant is located in D-loop region in mtDB.
Fixed per-sample metrics in case variant control.
Made ACMG and Beacon popover disappear when clicking anywhere.
Fixed bug when a filter setting with multiple HPO terms resulted in only showing one HPO term after reloading the page.
Extended information when entering the filter page and no previous filter job existed.
Disabled relatedness plot for singletons.
Replaced tables in case QC with downloadable TSV files.
QC charts should now be displayed properly.
Consolidated flags, comments and ACMG rating into one table in the case detail view, with one table for small variants and one for structural variants.
Added VariantValidator link to submit to REST API.
Fixed alignment stats in project-wide QC.
Added more documentation throughout the UI.
Added option to toggle displaying of logs during filtration, by default they are hidden.
Fixed broken displaying of inhouse frequencies in variant detail view.
Added variant annotation list (comments, flags, ACMG ratings) to project-wide info page.
Row in filter results now turns gray when any flag is set (except bookmark flag; summary flag still colours in other colour).
Fixed bug where comments and flags in variant details weren’t updated when the variant details have been opened before.
Added QC TSV download and per-sample metrics table to projec-wide QC.
Removed ExAC locus link in result list, added gnomAD link to gene.
Catching connection exceptions during file export with enabled pathogenicity and/or phenotype scoring.
Fixed project/case search that delivered search results for projects that the searching user had no access to (only search was affected, access was not granted).
Made case comments count change in real time.
Full Change List
Added preset for mitochondrial filter settings.
Fixed bug where HPO name wasn’t displayed in textarea after reloading page. HPO terms are now also checked for validity in textbox on the fly.
Added possibility to enter OMIM terms in phenotype prioritization filter. The same textbox as for HPO terms also accepts OMIM terms now.
Added maximal exon distance field to
Variants & Effects
tab.(Hopefully) fixing importer bug (#524).
Adapted
HelixMTdb
filter settings, allowing to differntiate between hetero- and homoplasmy counts.Fixed inactive filter button to switch from SV filter to small variant filter.
Increased default max collective background count in SV filter from 0 to 5.
Included lists of genomic regions, black and white genelists and reworked HPO list in table header as response for what was filtered for (if set).
Added
molecular
assessment flag for variant classification.Fixed bug where activated mitochondrial frequency filter didn’t include variants that had no frequency database entry.
Added inheritance preset and quick preset for X recessive filter.
Removed VariantValidator link-out.
Now smallvariant comments, flags and ACMG are updating in the smallvariant details once submitted.
Deleting a case (only possible as root) runs now as background job.
Fixed bug in compound heterozygous filter with parents in pedigree but without genotype that resulted in variants in genes that didn’t match the pattern.
Bumped django version to 1.11.28 and sodar core version to bug fix commit.
Fixed bug where structural variant results were not displayed anymore after introduced
molecular
assessment flag.Fixed bug where variant comments and flags popup was not shown in structural variant results after updating smallvariant details on the fly.
Made
Download as File
andSubmit to MutationDistiller
buttons more promiment.Adapted preset settings for
ClinVar Pathogenic
setting.Finalized mitochondrial presets.
Added identifier to results table and smallvariant details when mitochondrial variant is located in D-loop region in mtDB.
Fixed per-sample metrics in case variant control.
Made ACMG and Beacon popover disappear when clicking anywhere.
Fixed bug when a filter setting with multiple HPO terms resulted in only showing one HPO term after reloading the page.
Extended information when entering the filter page and no previous filter job existed.
Added lodash javascript to static.
Disabled relatedness plot for singletons.
Replaced tables in case QC with downloadable TSV files.
QC charts should now be displayed properly.
Consolidated flags, comments and ACMG rating into one table in the case detail view, with one table for small variants and one for structural variants.
Added VariantValidator link to submit to REST API.
Fixed alignment stats in project-wide QC.
Added more documentation throughout the UI.
Added option to toggle displaying of logs during filtration, by default they are hidden.
Fixed broken displaying of inhouse frequencies in variant detail view.
Added variant annotation list (comments, flags, ACMG ratings) to project-wide info page.
Row in filter results now turns gray when any flag is set (except bookmark flag; summary flag still colours in other colour).
Fixed bug where comments and flags in variant details weren’t updated when the variant details have been opened before.
Added QC TSV download and per-sample metrics table to projec-wide QC.
Removed ExAC locus link in result list, added gnomAD link to gene.
Catching connection exceptions during file export with enabled pathogenicity and/or phenotype scoring.
Fixed project/case search that delivered search results for projects that the searching user had no access to (only search was affected, access was not granted).
Made case comments count change in real time.
v0.20.0
End-User Summary
Added count of annotations to case detail view in
Variant Annotation
tab.De-novo quick preset now selects
AA change, splicing (default)
for sub-presetImpact
, instead ofall coding, deep intronic
.Added project-wide option to disable pedigree sex check.
Added button to case detail and case list to fix sex errors in pedigree for case or project-wide.
Added command
import_cases_bulk
for case bulk import, reading arguments from a JSON file.Entering and suggeting HPO terms now requires at least 3 typed charaters.
Fixed broken variant details page when an HPO id had no matching HPO name.
Fixed bug in joint filtration filter view where previous genomic regions where not properly restored in the form.
Fixed bug that lead to an AJAX error in the filter view when previous filter results failed to load because the variants of a case were deleted in the meantime.
Entering the filter view is now only possible when there are variants and a variant set. When there are variant reported but no variant set, a warning in form of a small red icon next to the number of variants is displayed, complaining about an inconsistent state.
In case of errors, you can now give feedback in a form via Sentry.
Fixed bug that occurred during project file export and MutationTaster pathogenicity scoring and a variant was multiple times in the query string for mutation taster.
Adding REST API for Cases.
Adding site app for API token management.
Added frequency databases for mitochondrial chromosome, providing frequency information in the small variant details.
Fixed periodic tasks (contained clean-up jobs) and fixed tests for periodic tasks.
Adding REST API for Cases and uploading cases.
Adding GA4GH beacon button to variant list row and details. Note that this must be activated in the user profile settings.
Added filter support to queries and to filter form for mitochondrial genome.
Full Change List
Added count of annotations to case detail view in
Variant Annotation
tab.De-novo quick preset now selects
AA change, splicing (default)
for sub-presetImpact
, instead ofall coding, deep intronic
.Added project-wide option to disable pedigree sex check.
Added button to case detail and case list to fix sex errors in pedigree for case or project-wide.
Added command
import_cases_bulk
for case bulk import, reading arguments from a JSON file.Entering and suggeting HPO terms now requires at least 3 typed charaters. Also only sending the query if the HPO term string changed to reduce number of executed database queries.
Fixed broken variant details page when an HPO id had no matching HPO name. This happened when gathering HPO names, retrieving HPO id from
Hpo
database given the OMIM id and then the name fromHpoName
. The databasesHpo
andHpoName
don’t match necessarly viahpo_id
, in this case because of an obsolete HPO idHP:0031988
. Now reporting"unknown"
for the name instead ofNone
which broke the sorting routine.Fixed bug in
ProjectCasesFilterView
where previous genomic regions where not properly restored in the form.Fixed bug that lead to an AJAX error in the filter view when previous filter results failed to load because the variants of a case were deleted in the meantime.
Entering the filter view is now only possible when there are variants and a variant set. When there are variant reported but no variant set, a warning in form of a small red icon next to the number of variants is displayed, complaining about an inconsistent state.
Using latest sentry SDK client.
Fixed bug that occurred during project file export and MutationTaster pathogenicity scoring and a variant was multiple times in the query string for mutation taster.
Adding REST API for Cases.
Copying over token management app from Digestiflow.
Added frequency databases
mtDB
,HelixMTdb
andMITOMAP
for mitochondrial chromosome. Frequency information is provided in the small variant detail view.Fixed periodic tasks (contained clean-up jobs) and fixed tests for periodic tasks.
Adding REST API for
Case
.Extending
importer
app with API to upload annotated TSV files and models to support this.Adding GA4GH beacon button to variant list row and details. Note that this must be activated in the user profile settings.
Added filter support to queries and to filter form for mitochondrial genome.
v0.19.0
End-User Summary
Added inhouse frequency information to variant detail page.
Added link-out in locus dropdown menu in results table to VariantValidator.
Added filter-by-status dropdown menu to case overview page.
Added link-out to pubmed in NCBI gene RIF list in variant details view.
Fixing syncing project with upstream SODAR project.
Added controls to gnomad genomes and gnomad exomes frequencies in variant details view.
Adding more HiPhive variants.
Replacing old global presets with one preset per filter category.
Added recessive, homozygous recessive and denovo filter to genotype settings.
Entering HPO terms received a typeahead feature and the input is organized in tags/badges.
Import of background database now less memory intensive.
Added project-wide alignment statistics.
Added
django_su
to allow superusers to temporarily take on the identity of another user.Fixed bug in which some variants in comphet mode only had one variant in results list.
Added user-definable, project-specific tags to be attached to a case. Enter them in the project settings, use them in the case details page.
Added alert fields for all ajax calls.
Removed (non function-disturbing) javascript error when pre-loaded HPO terms were decorated into tags.
Fixed coloring of rows when flags have been set.
Fixed dominant/denovo genotype preset.
Minor adjustments/renamings to presets.
Link-out to genomics england panelapp.
Fixed partly broken error decoration on hidden tabs on field input errors.
Added Kiosk mode.
Fixed bug when exporting a file with enabled pathogenicity scoring led to an error.
Entering filter form without previous settings now sets default settings correctly.
Switched to SODAR core v0.7.1
HPO terms are now pastable, especially from SODAR.
Some UI cleanup and refinements, adding shortcut links.
Large speed up for file export queries.
Fixed UI bug when selecting
ClinVar only
as flags.Added link-out to variant when present in ClinVar.
Fixed broken SV filter button in smallvariant filter form.
Added link-out to case from import bg job detail page.
Added
recessive
quick presets setting.Added functionality to delete small variants and structural variants of a case separately.
Fixed bug in which deleting a case didn’t delete the sodar core background jobs.
Old variants stats data is not displayed anymore in case QC overview when case is re-imported.
Full Change List
Added inhouse frequency information to variant detail page.
Added link-out in locus dropdown menu in results table to VariantValidator. To be able to construct the link,
refseq_hgvs_c
andrefseq_transcript_id
are also exported in query.Added filter-by-status dropdown menu to case overview page. With this, the bootstrap addon
bootstrap-select
was added to the static folder.Added link-out to pubmed in NCBI gene RIF list in variant details view. For this,
NcbiGeneRif
table was extended with apubmed_ids
field.Fixing syncing project with upstream SODAR project.
Added controls to gnomad genomes and gnomad exomes frequencies in the database table by extending the fields. Added controls to frequency table in variant details view.
- Improving HiPhive integration:
Adding human, human/mouse similarity search.
Using POST request to Exomiser to increase maximal number of genes.
Replacing old global presets with one preset per filter category.
Using ISA-tab for syncing with upstream project.
Added recessive, homozygous recessive and denovo filter to genotype settings. Homozygous recessive and denovo filter are JS code re-setting values in dropdown boxes. Recessive filter behaves as comp het filter UI-wise, but joins results of both homozygous and compound heterozygous filter internally.
Entering HPO terms received a typeahead feature and the input is organized in tags/badges.
Import of background database now less memory intensive by disabling autovacuum option during import and removing atomic transactions. Instead, tables are emptied by genome release in case of failure in import.
Added project-wide alignment statistics.
Added
django_su
to allow superusers to temporarily take on the identity of another user.Fixed bug in which some variants in comphet mode only had one variant in results list. The hgmd query was able to create multiple entries for one variant which was reduced to one entry in the resulting list. To correct for that, the range query was fixed and the grouping in the lateral join was removed.
Added user-definable, project-specific tags to be attached to a case.
Added alert fields for all ajax calls.
Removed javascript error when pre-loaded HPO terms were decorated into tags.
Removed (non function-disturbing) javascript error when pre-loaded HPO terms were decorated into tags.
Fixed coloring of rows when flags have been set. When summary is not set but other flags, the row is colored in gray to represent a WIP state. Coloring happens now immediately and not only when page is re-loaded.
Fixed dominant/denovo genotype preset.
Minor adjustments/renamings to presets.
Link-out to genomics england panelapp.
Fixed partly broken error decoration on hidden tabs on field input errors.
Introduced bigint fields into postgres sequences counter for smallvariant, smallvariantquery_query_results and projectcasessmallvariantquery_query_results tables.
Added Kiosk mode.
Fixed bug when exporting a file with enabled pathogenicity scoring led to an error.
Entering filter form without previous settings now sets default settings correctly.
Switched to SODAR core v0.7.1
Changing default partition count to 16.
Allowing users to put a text on the login page.
Renaming partitioned SV tables, making logged again.
HPO terms are now pastable, especially from SODAR.
Some UI cleanup and refinements, adding shortcut links.
Large speed up for file export queries by adding indices and columns to HGNC and KnownGeneAA table.
Fixed UI bug when selecting
ClinVar only
as flags.Added link-out to variant when present in ClinVar by adding the SCV field from the HGNC database to the query.
Fixed broken SV filter button in smallvariant filter form.
Added link-out to case from import bg job detail page.
Added
recessive
quick presets setting.Added functionality to delete small variants and structural variants of a case separately.
Fixed bug in which deleting a case didn’t delete the sodar core background jobs.
Old variants stats data is not displayed anymore in case QC overview when case is re-imported.
v0.18.0
End-User Summary
Added caching for pathogenicity scores api results.
Added column to the project wide filter results table that displays the number of affected cases per gene.
Enabled pathogenicity scoring for project-wide filtration.
Added LOEUF gnomAD constraint column to results table.
Added link-out to MetaDome in results table.
Full Change List
Added new database tables
CaddPathogenicityScoreCache
,UmdPathogenicityScoreCache
,MutationtasterPathogenicityScoreCache
to cache pathogenicity scores api results.Added column to the project wide filter results table that displays the number of affected cases per gene. I.e. the cases (not samples) that have a variant in a gene are counted and reported.
Enabled pathogenicity scoring for project-wide filtration. This introduced a new table
ProjectCasesSmallVariantQueryVariantScores
to store the scoring results for a query.Added LOEUF gnomAD constraint column to results table.
Added link-out to MetaDome in results table.
v0.17.6
End-User Summary
MutationTaster scoring now able to score InDels.
MutationTaster rank now displayed as numbers, not as stars, with -1 corresponding to an error during scoring.
Adding “closed uncertain” state.
Project-wide filtration allows for comp het filter for individual families.
Full Change List
MutationTaster scoring now able to score InDels.
MutationTaster rank now displayed as numbers, not as stars. Rank -1 and probability -1 correspond to error during MutationTaster ranking or empty results from MutationTaster.
Improving display and logging in alignment QC import.
Adding “closed uncertain” state.
Project-wide filtration allows for comp het filter for individual families.
v0.17.5
End-User Summary
BAM statistics (including target coverage information) can now be imported and displayed.
Mitochondrial variants can now be properly displayed.
Added
Delete Case
button and functionality to case overview, only visible for superusers.Fixed error response when MutationDistiller submission wasn’t submitted with a single individual.
Now using 404 & 500 error page from sodar core.
Visual error response on tabs is now more prominent.
Included MutationTaster as additional pathogenicity score.
Included UMD-Predictor as additional pathogenicity score.
Project-wide filter now applicable when the project contains cases with no small variants (e.g. completely empty or only SVs).
Ignoring option
remove if in dbSNP
whenClinVar membership required
is activated as every ClinVar entry has a dbSNP id.Fixed indices on
SmallVariantFlags
andSmallVariantComment
and introduced indices forExacConstraints
andGnomadConstraints
that sped up large queries significantly.Fixed issue where gene dropdown menu was overlayed by sticky top.
Adding progress bar on top of case list.
Improving case list and detail overview page layout and usability.
Upgrade of the SODAR-core library app, includes various improvements such background job pagination and improvements to membership management.
Included tables for converting refseq and ensembl gene ids to gene symbols.
Added warning about missing UMD indel scoring.
Now sorting comments and flags in the case overview by chromosomal position.
Now sorting HPO terms in variant detail view alphabetically.
Improved pubmed linkout string.
Added EnsEMBL and ClinVar linkouts to gene dropdown menu in results list.
Added 3 more variant flags: no known disease association, variant does segregate, variant doesn’t segregate.
Compound heterozygous filter is now applicable to singletons and index patients with only one parent.
Extending the manual with SOPs and guidelines.
Full Change List
Adding code for importing, storing, and displaying BAM quality control values.
Fixing
urls
configuration bug preventing chrMT matches.Added
Delete Case
button and functionality to case overview, only visible for superusers. Deletes record fromCase
and variants fromSmallVariant
,StructuralVariant
andStructuralVariantGeneAnnotation
associated with this case.Fixed error response when MutationDistiller submission wasn’t submitted with a single individual. Error is now displayed via
messages
after reloading the filter page. All form errors that are raised during submission of file export or to MutationTaster are handled now this way.Now using 404 & 500 error page from sodar core.
Visual error response on tabs is now more prominent.
Included MutationTaster as additional pathogenicity score.
Included UMD-Predictor as additional pathogenicity score.
Project-wide filter now applicable when the project contains cases with no small variants (e.g. completely empty or only SVs).
Ignoring option
remove if in dbSNP
whenClinVar membership required
is activated as every ClinVar entry has a dbSNP id.Fixed indices on
SmallVariantFlags
andSmallVariantComment
and introduced indices forExacConstraints
andGnomadConstraints
that sped up large queries significantly.Fixed issue where gene dropdown menu was overlayed by sticky top.
Adding progress bar on top of case list.
Improving case list and detail overview page layout and usability.
Upgraded to SODAR core v0.7.0.
Included tables
RefseqToGeneSymbol
andEnsemblToGeneSymbol
convert gene ids to gene symbols to get a better coverage of gene symbols.Added warning about missing UMD indel scoring.
Now sorting comments and flags in the case overview by chromosomal position. For this, a
chromosome_no
field was introduced inSmallVariantComments
andSmallVariantFlags
that is automatically filled when record is saved, derived fromchromosome
field.Now sorting HPO terms in variant detail view alphabetically.
Improved pubmed linkout string.
Added EnsEMBL and ClinVar linkouts to gene dropdown menu in results list.
Added 3 more variant flags: no known disease association, variant does segregate, variant doesn’t segregate.
Compound heterozygous filter is now applicable to singletons and index patients with only one parent.
Extending the manual with SOPs and guidelines.
v0.17.4
End-User Summary
Fixed bug in exporting files when pathogencity scoring is activated.
Added IGV button to small/structural comment list in case overview.
Adapted to new CADD REST API implementation.
Full Change List
Fixed function call to missing function in exporting files when pathogencity scoring is activated.
Added IGV button to small/structural comment list in case overview.
Adapted to new CADD REST API implementation.
Adding generic
info
field to small variants and fields for distance to refseq/ensembl exons. The import is augmented such that the fields are filled with appropriate empty/null values when importing TSV files that don’t have this field yet.
v0.17.3
End-User Summary
Improving QC plot performance.
Displaying case statistics in project list.
Removed ClinVar view and added alternative column switch to smallvariant results table.
ClinVar settings were extended to allow filtering for origin
somatic
andgermline
.When ClinVar membership is NOT required, variants that have origin
somatic
and nogermline
in ClinVar, are removed.Improved sorting of results table for
gene
and chromosomal position column.Fixed bug where settings of the previous query wasn’t restored for certain fields.
Fixed bug where ClinVar data could break rendering of results table template.
Improved speed of queries.
Invalid form data now more prominently placed.
Improved joining of HGNC information for refseq transcripts to not ignore borderd cases.
Max AD field in quality filter is now also applied to genotype 0/0.
Minor fixes in case overview comments/flags/acmg tables.
Fixed issue in SV results table where columns were missing when the genotype was missing.
Comments on variants are now editable and deletable, in the case detail view as well as the variant detail view.
Case comments are now edtiable.
Fixed pathogenicity and phenotype score column headings in results table.
Full Change List
Using
"scattergl"
for QC plots which leads to a speedup.Making the large tables
UNLOGGED
to improve bulk insertion performance.Displaying case statistics in project list.
Removed ClinVar view and added alternative column switch to smallvariant results table. All models, urls, views, queries and templates concerning ClinVar view were removed. SmallVariant queries now join ClinVar information and display them via switch in the UI.
ClinVar settings were extended to allow filtering for origin
somatic
andgermline
.When ClinVar membership is NOT required, variants that have origin
somatic
and nogermline
in ClinVar, are removed.Results table is now sortable by chromosome and position. And by
gene
column using the following keys in that given order: ACMG membership, HPO inheritance term, gene name. And bysign. & rating
column using the following keys in that given order: significance, rating.Fixed bug where settings of the previous query were overwritten by a JavaScript routine and appeared to be lost.
Fixed bug where unexpected ClinVar significance crashed the template tags.
Added index on
human_entrez_id
field toMgiMapping
materialized view to speed up the join to the results table.Invalid form data is now displayed as boxes rather than tooltips.
Joining of the HGNC information for RefSeq transcripts additionally directly via HGNC to improve results.
Max AD field in quality filter is now also applied to genotype 0/0.
Minor fixes in case overview comments/flags/acmg tables.
Fixed issue in SV results table where columns were missing when the genotype was missing.
Main JavaScript functionality transferred from HTML to static JS files.
Comments on variants are now editable and deletable, in the case detail view as well as the variant detail view.
Case comments are now edtiable.
Moved and consolidated further JS code from HTML to JS files.
Fixed pathogenicity and phenotype score column headings in results table.
v0.17.2
End-User Summary
Improving case list and case detail views.
Adjusting chrX het threshold for telling male/female apart.
Full Change List
Shuffling around case detail view a bit.
Adding icons for case status.
Adjusting chrX het threshold for telling male/female apart.
v0.17.1
End-User Summary
Syncing with upstream now also checks parents.
Fixing saving of ACMG rating.
Increasing maximal number of characters in gene whitelist to 1 million.
Fixing QC display issues for cases without variants.
Fixing UI error where tab wasn’t selectable after invalid data input.
Improving gene and variant detail display.
Adding installation manual.
Full Change List
Syncing with upstream now also checks parents.
Fixing template, form, and model for ACMG rating (adjust to using start/end/bin fields).
Increasing maximal number of characters in gene whitelist to 1 million.
Fixing QC display issues for cases without variants.
Fixing UI error where tab wasn’t selectable after invalid data input.
Improving gene and variant detail display.
Adding installation manual.
v0.17.0
End-User Summary
Fixing problems with link-out to varSEAK.
UI improvement for the compound heterozygous mode.
Fixing bug in genomic region filter form that took only the last character of chromosome names.
Fixing overflow bug in genotype and quality tab when presenting more individuals than would fit in the form.
Fixing genotype settings pre-selector dropdown that was trapped in parent container and possibly not entirely accessible.
Added editable
notes
andstatus
fields to case detail view to enable the user to take a note/summarize the case.Added support to add multiple comments by different users to a case in the case detail view.
Fixed bug where using genotype presets wasn’t fully executed while in comp. het. mode.
Fixed bug where the genomic region form wasn’t properly reconstructed when only a chromosome was given.
Properly sorting results now by chromomsome in order as expected (numerical followed by X, Y, MT).
Included MGI mouse gene link-out in gene dropdown menu in result list.
Fixed bug where the filter button wasn’t disabled when the selected variant set wasn’t in state
active
.Renamed
index
field in genotype dropdown toc/h index
to indicate comp het mode.Fixing bug in retreiving comments on structural variants.
Full Change List
URL-escaping
hgvs_p
to varSEAK.Compound heterozygous mode is now activated via the GT field selection that offers an
index
entry for potential index patients. This is a UI/Javascript improvement and does not affect the code of the query except that setting an index enables the filter, contrary to before where there was an additional boolean field that enabled the mode.Fixing regex bug in genomic region field of the filter form that took only the last charactar of a chromosome name. Therefore it affected regions with chromosome names with more than one character (e.g. ‘10’, ‘11’, …)
Fixing overflow bug in genotype and quality tab when presenting more individuals than would fit in the form.
Fixing genotype settings pre-selector dropdown that was trapped in parent container and possibly not entirely accessible.
Added editable
notes
andstatus
fields toCase
model to enable the user in the case detail view to take notes and assign a status to the case.Fixed displaying of
status
in case detail view when it was never set.Added model
CaseComments
to enable assigning comments to a case by different users in the case detail view.Fixed bug where using genotype presets wasn’t fully executed while in comp. het. mode.
Fixed bug where the genomic region form wasn’t properly reconstructed when only a chromosome was given.
Sorting results now by the numerical representation of the chromosome.
Included MGI mouse gene link-out in gene dropdown menu in result list. This is accomplished by introducing new table
MgiHomMouseHumanSequence
and a condensing materialized viewMgiMapping
that mapsentrez_id
toMGI ID
.Removed
annotation
app.Fixed bug where the filter button wasn’t disabled when the selected variant set wasn’t in state
active
.Added management command
rebuild_project_case_stats
to rebuild stats of all cases of a given project.Import of database tables now handles non-existing entries in a more logical way.
Making variant partion count come from environment variable (#368).
Renamed
index
field in genotype dropdown toc/h index
to indicate comp het mode.Fixed bug that replaced missing form fields in old queries with default settings.
Merged
import_sv_dbs
intoimport_tables
manage command.Fixing bug in retreiving comments on structural variants.
Fixing recomputation of variant stats that now properly handles json decoding.
Adding installation manual.
v0.16.1
End-User Summary
Cases with no variants or no associated variant set can’t be filtered anymore.
Full Change List
Cases with no variants or no associated variant set caused queries to return all variants. This bug was fixed by disabling the filter button (UI) or throwing an error query) if the query is executed.
v0.16.0
End-User Summary
Genomic regions now also able to filter only by chromosome.
Added preset selector for genotypes, setting affected or unaffected individuals to the selected setting.
dbSNP ID in file export is now set to
None
instead of an empty field.Fixed sorting issues with ranks and scores.
Added quality field to set MAX allelic depth (AD) for filtering variants (hom or ref). Default is unset, i.e. filtering behaviour as usual. Only quality setting that doesn’t require a value.
Added main navigation as dropdown menu for smaller screen sizes.
Added template settings for quality filter form to copy to each individual, or affectded/unaffected.
Fixed bug that occurred during file export with activated gene prioritization.
Improved database connection to avoid occasional JSON field retrieval errors.
Full Change List
Genomic regions filter accepts now only chromosome as region, internally setting start/end positions to 0/INT_MAX values.
Structural variant databases are now imported in the same style as the small variant databases.
Removed
model_support.py
file from variants app.Added preset selector for genotypes, setting affected or unaffected individuals to the selected setting.
dbSNP ID in file export is now set to
None
instead of an empty field.Ranks in the results table are now displayed without the hash tag to make them properly sortable. Pathogenicity and phenotype scores in the results table now sort in a numerical order. Ranks and scores are now in separate fields.
Small variant filter now considers set id together with case id.
Removed remaining fixtures from
test_submit_filter.py
Quality filter now can filter variants for max allelic depth.
Added main navigation as dropdown menu for smaller screen sizes.
Added template settings for quality filter form to copy to each individual, or affectded/unaffected.
Fixed function call of gene prioritization function in file export task causing file export to break when gene prioritization was activated.
Remove switching psycopg2 JSON (de)serializer during database query execution to avoid occasional JSON field retrieval errors. Instead, replace the JSON (de)serializers for sqlalchemy and leave it to psycopg2 to take care of this.
Increased length of
Case.index
field from 32 to 512 chars.
v0.15.6
End-User Summary
Row colouring in results table for commented and flagged variants is now back again.
Full Change List
Removing
Annotation
model.Fixed importer bug where info wasn’t imported when table was newly imported and
--force
flag was set.Removed whitening of table rows from DataTables css to prevent it from overwriting our row colouring feature.
Doing dbSNP import now chromosome-wise to prevent import from timing out.
Removed old style fixtures from UI tests.
v0.15.5
End-User Summary
Displaying SV coordinates in detail box.
Displaying family errors in red in “rate of het. calls on chrX” plot.
Compound het query now allows index selection for all patients with parents, not only sibling of the index.
Full Change List
Displaying SV coordinates in detail box.
Fixing sex error generation (only using source name).
Fixing pedigree editor form to use int for sex & affected.
Compound het query now allows index selection for all patients with parents, not only sibling of the index.
v0.15.4
End-User Summary
ExAC constraints in results table are now displayed.
Constraints in results table now show consistenly 3 floating points and are sortable.
Fixing QC plot display.
Fixing in-house counts in results table (filtering by them worked).
Fixing filtration with members that have no genotype.
Fixing SV length display.
Adjusting filter presets.
Fixing filtration for in-house filter.
Changing display to per-transcript effects to table.
Index patient for compound heterozygous query is now selectable.
Fixed bug where clinvar report queries didn’t select for the case.
Full Change List
Increased SmallVariant table partitioning to modulo 1024.
ExAC constraints are now joined via ensembl gene id to results table.
Constraints in results table now show consistenly 3 floating points and are sortable.
ExAC constraints are now consistent with variant details and in results table.
Various fixes to QC plot display, some to JS, some to Python/Django views code.
Clinvar pathogenic genes materialized view gets updated when there is new data imported in one of the dependent tables.
Making prefetch filter load inhouse counts.
Fixing filtration with members that have no genotype.
Making prefetch filter load inhouse counts.
Fixing filtration with members that have no genotype.
Adding back fetching of SV length to queries.
First adjustments of filter presets for NAMSE analyses.
Fixing coalescing when filtering with in-house filter.
Changing display to per-transcript effects to table.
Extended tests to cover missing in-house filter records for existing variants.
Index patient for compound heterozygous query can be selected. Only patients that share the same parents as the original index patients are selectable in addition.
After reworking the database query structure, clinvar report queries didn’t select for the case.
v0.15.3
Bug-fix release.
End-User Summary
none
Full Change List
fixing bug in recomputing small and structural variant counts on importing
v0.15.2
End-User Summary
Fixed broken genomic region filter.
Making gene information in SV results consistent with display in small variant results.
--force
parameter forimport_tables
now works on all tables.Resulting table is now sortable.
Fixed broken EnsEMBL link-out.
Added OMIM gene information to gene card in variant details view.
Refactored database small variant database queries.
Adding case and donor counts to project list.
QC plots are now loaded asynchronously. This should improve page loading time for the case and project overview pages.
Adding inheritance mode information to results table.
Admins/superusers can now update case information and pedigrees.
Projects can now synchronise (check) with upstream SODAR sites, only admins/superusers can trigger this.
Adapting SmallVariants and SmallVariant DBs to new start-end coordinates and UCSC binning.
Fixed frequency table in SmallVariant details that had wrong names assigned to columns and
total
values were not present.Added pLI score to variant details constraint information.
Added constraints information column with selector to results table.
Full Change List
Increased view test coverage to 100%.
Unification of gene information display between SVs and small variants.
Fixed bug that wrongly parsed genomic regions and resulted in filter reporting nothing while active.
Small fix to small variant import.
Extended
--force
parameter forimport_tables
command to be applied to all tables.Fixed bug in creating materialized view that prevented setting up database when applying migrations from scratch.
Added datatables library to add sorting feature to resulting table.
Fixed broken EnsEMBL link-out.
Added conversion table RefseqToEnsembl (complementing EnsemblToRefseq). Now used in ExAC/gnomAD constraint information when refseq transcript database is selected.
Gene card in variant details view now show OMIM gene information, i.e. when an OMIM entry is marked as gene in Mim2geneMedgen table.
“All transcript” annotations now come from Jannovar REST web service instead of the
Annotation
model.Refactored database small variant database queries. The database queries now make full use of lateral joins to keep the nesting flat. The code generation part now doesn’t use the mixin structure anymore that was intransparent and error-prone.
- Bumping
sodar_core
dependency tov0.6.1
Showing case and donor counts to project listing.
Showing site-wide statistics via
siteinfo
app.
- Bumping
Adding missing
release
column toKnownGeneAA
table + adapting queries accordingly.- Cleaning up and refactoring QC plotting code.
Separating plotting JS and data generation Python code.
Load data asynchronously.
Now displaying inheritance mode information for results, based on HPO terms for inheritance and hgnc information.
Not importing
Annotation
data any more.Adding view for updating a case.
Implementing “sync with upstream SODAR site” for projects based on background jobs.
Removing
bgjobs
app in favour of the one from SODAR-core.Removing
containing_bins
columns.Removing
svs
tests_fixtures.py
.Adapting SmallVariants and SmallVariant DBs now containt
start
andend
column, replacingposition
. This is for internal queries only, the outside representation for SmallVariants is still viaposition
. An additional columnbin
for the ucsc binning was included.Frequency table in SmallVariant details had wrong names assigned to columns and
total
values were not present. The values in the columns were 1 column behind of its names, and thus the last column had a name that should have had missing values. These missing values were also a bug in that case thattotal
was not reported (i.e.af
orhet
without population).Constraints information in variant details now shows also pLI score.
Now joining constraints information to results table and added selector to display source/metric in one column.
Fixed: Ensembl transcript ids in SmallVariant list were truncated because of too short database field.
Importing SVs and small variants is done in a background job now.
Small variant and SV tables are now partitioned (by case ID). This should speedup import as indices are smaller and also each partition can be written to independently.
import_tables
improvements:can now use threads to import multiple tables at once
uses SQL Alchemy instead of Django ORM based deletion
Refining celery configuration now, assuming queues “import”, “query”, and “default”.
Removing some redundant indices on frequencies an dbsnp.
v0.15.1
A bug fix release for SV filtration (fixing overlaps).
End-User Summary
Fixed conservation bug (was shown only in 2/3 of all cases).
Showing small and structural variant count for each case.
Improving layout of case list (adding sorting and filtering).
Improved render speed of case list.
Fixing problem with interval overlaps for structural variant queries.
Full Change List
Increased test coverage to 100% for small variant model support tests.
Fixed bug in displaying conservation track for all bases in an AA base triplet. Only two of three bases were decorated with the conservation track information.
Fixed bug that Clinvar report didn’t support compound heterozygous queries anymore.
Variant view tests are now running on factory boy.
Adding tests of SV-related code.
Also interpreting phased diploid genotypes.
Improving layout of case list (adding sorting and filtering).
Improved render speed of case list.
Fixing UCSC binning overlap queries.
Adding “For research use only” to login screen.
v0.15.0
The most important change is the change of colors: Green now means benign and red means pathogenic.
End-User Summary
Renamed Human Splice Finder to Human Splicing Finder.
Added “1” and “0” genotype for “variant”, “reference”, and “non-reference” genotype.
Added support for WGS CNV calling results to SV filtration.
Simplifying variant selection for SVs as diploid calls unreliable (it’s better to distinguish only variant/reference).
Changing color meaning: green now means benign/artifact and red means pathogenic/good candidate.
Adding link-out to varsome
Adding support for ACMG criteria annotation
SV filtration: do not set max count in background by default
SV filtration: display of call properties of XHMM and SV2
Full Change List
Allow import for more than one genotypes/feature effects for structural variants.
Starting to base fixture creation on factory boy.
Renamed Human Splice Finder to Human Splicing Finder.
Added “1” and “0” genotype for “variant”, “reference”, and “non-reference” genotype.
Added support for WGS CNV calling results to SV filtration.
Simplifying selection of variants for SVs. Further, also allowing for phased haplotypes (irrelevance in practice until we start interpreting the GATK HC haplotypes in annotator).
Changing color meaning: green now means benign/artifact and red means pathogenic/good candidate.
Adding link-out to varsome
Adding support for ACMG criteria annotation
Model support tests now running on factory boy.
SV filtration: do not set max count in background by default
SV filtration: display of call properties of XHMM and SV2
v0.14.8
Multiple steps towards v0.15.0 milestone.
End-User Summary
Adding link-out to the UMD Predictor (requires users to configure a UMD Predictor API Token).
Adding user settings feature.
Improving link-out to PubMed.
Adding gene display on case overview for flags and comments.
Added warning icon to results table indicating significant differences in frequency databases.
Added command to rebuild variant summary materialized view
rebuild_variant_summary
.Added ExAC and gnomAD constraint information to variant details gene card.
Displaying allelic balance in genotype hover and variant detail fold-out.
Full Change List
Added elapsed time display to
import_case
Speedup deletion of old data using SQL Alchemy for
import_case
.Added indices to hgnc, mim2genemedgen, acmg and hgmd tables.
Added command to rebuild variant summary materialized view
rebuild_variant_summary
.Adding link-out to PubMed with gene symbol and phenotype term names.
Improving existing link-out to Entrez Gene if the Entrez gene ID is known.
Adding user settings through latest SODAR-core feature.
Adding
ImportInfo
to django admin.Adding “New Features” button to to the top navigation bar.
Adding link-out to the UMD Predictor (requires users to configure a UMD Predictor API Token).
Overlapping gene IDs now displayed for flags and comments on the case overview/detail view.
Added warning icon to results table when a frequency in a non-selected frequency table is > 0.1. Or if hom count is > 50. For inhouse it is only hom > 50 as there is no frequency.
Added ExAC and gnomAD constraint information to variant details gene card. Two new tables were added,
GnomadConstraint
andExacConstraint
.Displaying allelic balance in genotype hover and variant detail fold-out.
Removing unique constraint on
SmallVariant
.Fixing case update in the case of the variants being referenced from query results.
v0.14.7
End-User Summary
Bug fix release.
Full Change List
Fixed bug that inhouse frequencies were not joined to resulting table.
Removed restriction that didn’t allow pasting into number fields.
v0.14.6
End-User Summary
Adding experimental filtration of SVs.
Added names to OMIM IDs in variant detail view.
Added input check for chromosomal region filter.
User gets informed about database versions during annotation and in VarFish.
Added ClinVar information about gene and variant to variant detail view.
Added selector for preset gene filter lists (HLA, MUC, ACMG).
Added comments and flags to variant details view.
Fixed bug that transcripts in variant details view were from RefSeq when EnsEMBL was selected.
Added icon to variant when RefSeq and EnsEMBL effect predicition differ.
Adjusted ranking of genes such that equal scores get the same rank assigned.
Full Change List
Adding initial support for filtration of SVs and SV databases.
Added names to OMIM IDs in variant detail view.
Added input check for chromosomal region filter.
Made ImportInfo table not unique for release info.
Made annotation release info available in case overview.
Made import release info available in site app accessable from user menu.
Added materialized view to gather information about pathogenic and likely pathogenic variants in ClinVar. This information is displayed in the gene card of the detail view.
Added ClinVar information about variant to variant detail view.
Added selector to gene white/blacklist filter, adding common gene lists (HLA, MUC, ACMG) to the filter field.
Added comments and flags to variant details view.
Fixed bug that transcripts in variant details view were from RefSeq when EnsEMBL was selected.
Added icon to variant when RefSeq and EnsEMBL effect predicition for the most pathogenic transcript (in SmallVariant) differ.
Adjusted ranking of genes such that equal scores in two genes get the same rank assigned. In case of the pathogenicity and joint score the highest variant score in a gene represents the gene score. The next ranking gene is assigned not the next larger integer but the rank is increased by the number of genes with the same rank.
v0.14.5
End-User Summary
Bug fix release.
Full Change List
Fixed bug that made query slow when black/whitelist filter was used.
v0.14.4
End-User Summary
Fixed bug in comp het filter.
Fixed bug in displaying correct previous joint filter query.
Fixed bug in displaying not all HPO terms.
Added OMIM terms to variant detail view.
Fixed bug in variant detail view displaying all het counts as zero.
Fixed colouring of variant effect badges in variant detail view’s transcript information.
Full Change List
Fixed bug in comp. het. filter that was caused by downstream inhouse filter.
Fixed bug that selected previous joint filter query of the user, independet of the project.
Fixed bug in displaying not all HPO terms.
Added OMIM terms to variant detail view.
Fixed bug that the het properties of the frequencies models were not returned when converted to dict.
Removing old templates.
Fixed colouring of variant effect badges in variant detail view’s transcript information.
v0.14.3
End-User Summary
Fixed bug in displaying gene info with refseq ID.
Fixed bug in displaying correct number of rows in joint query.
User interface error response improved.
Fixed “too many connections” error.
Added ACMG annotation.
Full Change List
Fixed bug in gene info with refseq ID and symbol in list is now also “rescued”.
Fixed bug in displaying correct number of rows in joint query.
Improved error response when non-existing genes are entered in white/blacklist.
Using direct database calls instead of connections to prevent connection leaking.
New table Acmg added that is joined in main query. A small icon in results indicates existence in ACMG.
v0.14.2
End-User Summary
Added strategy to display missing gene symbols
Allow importing into importinfo table without importing data.
Added misc option to hide colouring of flagged variant rows.
Improved effect filter form.
Extended gene link-outs.
Fixed bug in pheno/patho rank computation.
Improved UI responses during requests.
Full Change List
Added new table with mapping Entrez ID to HGNC ID to improve finding of gene symbols.
Allow importing of meta information of tables that have no data but are used in microservices.
Added misc option that hides colouring of flagged variant rows and also the bookmark icons.
Added checkbox group ‘nonsense’ to effect filter form to group-(un)select certain variant effects.
Added gene link-out to Human Protein Atlas.
Fixed incrementor for rank computation of phenotype and pathogenicity score ranks.
Better UI responses with extended logging during asynchronous calls.
Project overview now provides link to full cases list.
Added option to display only variants without dbSNP membership.
Adapted to SODAR Core 0.5.0
Fixed length of allowed characters for db info table name.
v0.14.1
End-User Summary
Bug fix release
Full Change List
Fixing bug in the case that no HPO term with an HpoName entry is entered.
v0.14.0
End-User Summary
Added prioritization by pathogenicity using CADD.
Added support to filter genomic regions.
Added support for querying for counts within the VarFish database.
Fixed bug that displayed variants in comphet query results twice.
Improved UI response.
Added HPO terms to variant detail view.
Full Change List
Added additional field to specify multiple genomic regions to restrict query.
Fixed mixed up sex display in genotype filter tab.
Extended
SmallVariant
model to have counts for hom. ref. etc. counts.Adding
SmallVariantSummary
materialized view and supporting SQL Alchemy query infastructure.Adding form and view infrastructure for querying against in-house database.
Fixed bug in comphet query that executed the query on the results again during fetching, which displayed variants twice.
Proper error response in asynchronous queries when server is not reachable.
Fixed broken tooltip information in results table.
Resubmitting a file export job now remembers the file type, if changed.
Added integration with in-house CADD REST API (https://github.com/bihealth/cadd-rest-api) similar to Exomiser REST API integration.
Added HPO terms to variant detail view and queried HPO terms are added to results table header.
Added tests for filter jobs, including mocks for CADD and Exomiser requests.
v0.13.0
End-User Summary
Adding initial version of phenotype-based prioritization using the Exomiser REST Prioritiser API.
Full Change List
Adding missing field for exon loss variant to form.
Comments in view class adjusted.
Added HPO to disease name mapping.
Phenotype match scores are added to the file downloads as well.
Sorting of variants by phenotype match added.
Added annotation of variants with phenotyping variant score.
Added tab to the form form entering HPO term IDs.
Adding settings for enabling configuring REST API URL through environment variables.
v0.12.2
End-User Summary
Internal import fixes.
Full Change List
Case updating only removes variant and genotype info instead of replacing case.
Allowing import of gziped db-info files.
v0.12.1
Bugfix release.
End-User Summary
Fix in clinvar job detail view.
Full Change List
Clinvar job detail view was partially borken and job resubmitting didn’t work.
v0.12.0
User experience improvement, tests extended.
End-User Summary
Filtering jobs can now be aborted.
Proper visual error response in forms.
Tests for all views completed.
Variant details now use full table space.
Clinvar report jobs are now using AJAX as well and are running in background.
Full Change List
Filtering jobs runs now as background job and can be aborted.
Invalid fields and affiliated tabs are now marked with a red border.
Deleted empty files from apps.
Tests for all views completed.
Bugfix in rendering download results files for ProjectCases.
Bugfix in template for job detail view.
Bugfix in listing background jobs for a case.
Variant details do not load anymore when detail view is closed.
Variant details now use full table space.
Flags and comments do not depend on EnsEMBL gene id anymore. All traces where removed, including the database column.
Clinvar jobs now have their own background job model. They also use the AJAX query state machine to control job submission and canceling.
Now using sodar_core v0.4.5
Warning appears when Micorsoft Internet Explorer is detected.
v0.11.8
Case importer command improved.
End-User Summary
Case import command registers database version that was used during annotation.
Full Change List
Case import also imports annotation release infos into new table.
Import information now also recognizes the genomebuild.
Tests for case importer.
Fixed bug that didn’t distinguish gzipped from plain text import files.
v0.11.7
Bugfix release.
End-User Summary
Fixed yet another bug in setting SmallVariantFlags.
Full Change List
Fixing bug that variant flags are displayed no matter the case.
v0.11.6
Bugfix release.
End-User Summary
Fixed another bug in setting SmallVariantFlags.
Full Change List
Fixed bug that under certain conditions reported two variants at the same position as none and failed flag updating.
v0.11.5
Bugfix release.
End-User Summary
Databases import now as Django manage command.
Fixed bug in loading last query results.
Fixed bug in setting SmallVariantFlags.
Full Change List
Databases import is now a Django manage command and import commands are removed from the Makefile. Instead of one command for each database, a single command imports all databases stated in a config file.
Fixed bug that displayed last query of user without considering case.
Fixed bug that under certain conditions reported two variants at the same position as none and failed flag updating.
v0.11.4
This is a quick release to fix a bug in retrieving the results from a filter job. This was caused by the celery worker in the production system configuration.
End-User Summary
Zooming in QC plot is now supported.
Fixing bug in delivering filter results.
Full Change List
Replacing Chart.js components by plotly. This has the major advantage that zooming into charts is now supported. Further, users can now enable and disable plotting of certain data points by clicking. This is hugely useful for debugging meta data.
Allow skipping Selenium tests
Fixing bug with celery worker for submitting filter jobs affecting production system.
v0.11.3
This release improves the user experience by pushing filter jobs to the background and load them asynchronously.
End-User Summary
Push filter jobs to the background and povide them via AJAX to not block the UI during execution
Storing of filter query results
Load previous filter query results upon filter form page entry
Full Change List
Adapted to SODAR core version 0.4.2
Unified several empty forms
Adapted database query for loading previous results
Unified filter form templates
Fixed bug in accessing dict without checking availability of key.
Removed two view tests that have to be replaced in the future for ajax request.
Fixed bug in displaying time in background job list overview + ordering by timestamp
Pushing filter job to background
Loading filter results via AJAX (single case and joint project)
Loading of previous filter results when entering the filter form
v0.11.2
This is a bug fix release.
End-User Summary
Removed an internal restriction that prevented data import.
Full Change List
Making id fields for
SmallVariant
andAnnotation
into big integers.The importer now supports gzip-ed files.
v0.11.1
Fixing frequency display, including gnomAD genomes.
v0.11.0
This release adds more textual information about genes to the database and displays it.
End-User Summary
Adding gene summaries and reference-into-function from NCBI Gene database.
Full Change List
Adding models
NcbiGeneInfo
andNcbiGeneInfo
ingeneinfo
app.Displaying this information in the gene details page.
v0.10.0
Accumulation of previous updates. The main new feature is the improved variant details card below variant rows.
End-User Summary
Fixing variant detail cards below results row.
Adding row numbers in more places.
Full Change List
Rendering variant details cards on the server instead of filling them out in JS.
v0.9.6
This release fixes project-roles synchronization from SODAR site.
Fixing celery setup; syncing projects and roles regularly now.
v0.9.5
Small additions, fixing MutationDistiller integration.
Adding link-out to loci in Ensemble, gnomAD, and ExAC.
Adding link-out for Polyphen 2, Human Splicing Finder, and varSEAK Splicing.
Project-wide variant recreation registers started state now correctly.
Fixing URL for MutationDistiller Links.
Using HTTPS links for ENSEMBL and MutationTaster.
v0.9.4
Yet another bug fix release.
Adding missing 5’ UTR fields to forms.
Adding command for rebuilding project stats.
Changing display color of relatedness (red indicates error).
Computing cohort statistics in a transaction. This should get rid of possible inconsistencies.
v0.9.3
This is a bug fix release.
Removing restriction on single comment per variant.
Improving display of sex errors.
v0.9.2
This is a bugfix release.
Fixing error in displaying variants statistics for empty project.
Improving relationship error display.
Putting “sibling-sibling” instead of “parent-child” where it belongs.
Fixing problem with MutationDistiller submission.
Fixing ClinVar form.
Adding gene link-out to HGMD.
v0.9.1
This release fixes some bugs introduced in v0.9.0.
Full Change List
Adding missing dependency on
django_redis
.Fixing counting in project-wide statistics computation.
Fixing references to
pedigree_relatedness
.Fixing sex display in template, sex error message “male” where “female should be”.
Fixing sex assignment in sex scatter plot.
v0.9.0
This release adds project-wide statistics and variant querying.
End-User Summary
You can now see project-wide case QC statistics plots on your project’s Case List.
You can now perform project-wide queries to your variants and also export them to TSV and Excel files.
Full Change List
Added models for storing project-wide statistics, job code for creating this, views for viewing etc.
Adjusting the existing plot and model code to accommodate for this.
Refactoring filtration form class into composition from multiple mixins.
Refactoring small variant query model to use abstract base class and add query model for project-wide queries.
Implementing download as tabular data for project-wide filtration.
Improving index structure for project-wide queries with gene white-lists.
v0.8.0
This release adds variant statistics and quality control features.
End-User Summary
Gathering an extended set of statistics for each individuals in a case.
Inconsistencies within pedigree and between pedigree and variant information displayed throughout UI.
Several statistics and quality control plots are displayed on the case details page.
Full Change List
Adding
var_qc_stats
module with analysis algorithms similar to (Pedersen and Quinlan, 2017).Adding models for gathering per-sample and per-sample-pair statistics.
Display statistics results on case detail page in tableas and plots.
Highlighting of consistency and sanity check errors throughout the views.
Importer computes statistics for new cases, migration adds them to existing cases.
v0.7.0
This release has one main feature: it adds support for submitting variants to MutationDistiller.
End-User Summary
Added support for submitting variants to MutationDistiller from the Variant Filtration Form.
Added “Full Exome” filter preset for including all variants passing genotype filter.
Greatly speeded up VCF export.
Full Change List
Adding “Full Exome” filter preset.
Adding support for submitting filtration results to MutationDistiller.
Pinning redis, cf. https://github.com/celery/celery/issues/5175
Pinning celery, cf. https://github.com/celery/celery/issues/4878
Refactoring query building to a mixin-based architecture to make code more reuseable and allow better reusability.
Adding
ExportVcfFileFilterQuery
for faster VCF export.
v0.6.3
A bugfix release.
End-User Summary
Fixing bug that caused the clinvar report to fail when restoring previous query.
Full Change List
Making sure returning to clinvar report works again.
Enabling SODAR-core adminalerts app.
Including authors and changelog in manual.
v0.6.2
A bugfix release.
End-User Summary
Fixing search bug with upper/lower case normalization.
Fixed bug with whitelist/blacklist when restoring settings.
Extended documentation, added screenshots.
Previous flag state is now properly written to the timeline.
v0.6.1
End-User Summary
Adding forgotten help link to title bar.
v0.6.0
End-User Summary
Various smaller bug fixes and user interface improvements.
Adding summary flag for colouring result lines.
Allow filtering variants by flags.
Integrating flags etc. also into downloadable TSV/Excel files.
Adding new annotation: HGMD public via ENSEMBL.
Adding comments and flags now appears in the timeline.
Varfish stores your previous settings automatically and restores them on the next form view.
Full Change List
Allowing Javascript to access CSRF token, enables AJAX in production.
SmallVariant``s are now also identified by the ``ensembl_gene_id
. This fixes an annotation error.Adding
flag_summary
toSmallVariantFlags
for giving an overall summary.Extending filtration form to filter by flags.
Added new app
hgmd
forHGMD_PUBLIC
data from ENSEMBL.Adding
make black
toMakefile
.Changed default frequencies.
Improving integration of comments and flags with the timeline app.
Also properly integrating import of cases etc. with timeline app.
Added
SmallVariantQuery
model and integrated it for automatically storing form queries and restoring them.
v0.5.0
End-User Summary
This is a major upgrade in terms of features and usability. Please note that this a “dot zero” release, we will fix broken things in a timely manner.
Major changes include:
The “AD” form field was split into one for het. and one for hom. variants.
Clinvar entries are now properly displayed.
Enabling filtering for clinvar membership and pathogenicity.
Fixing file export.
Allowing to mark variants with flags and add comments to them.
Adding clinvar-centric report.
Filtration now also works for pedigrees containing samples without genotypes.
Adding functionality to search for samples.
Full Change List
Adding support for filtering presence in Clinvar. The user has to enable the filter and can then select the
Fixing pedigree display in filter form
Splitting “${person}_ad” field into “*_ad_het” and “*ad_hom”, also adjusting tests etc.
Fixing clinvar queries (was a
+/-1
error)Adding more comprehensive tests for views and query.
Fixing bug in
file_export
module caused by not adjusting to SQL Alchemy filter querying.Added various tests and fixed smaller bugs.
Adding
VariantSmallComment
andVariantFlags
models for user annotation of variants.Adding clinvar-centric support for easily screening variants for relevant Clinvar entries.
The importer now also writes
"has_gt_fields"
key to Pedigree lines.The templates, views, and query generation now also heed
"has_gt_fields"
field.Adding migration that automatically adds the
"has_gt_fields"
.Adding back display of search bar.
Integrating search functionality for
variants
app.Self-hosting CSS, JS, etc. now.
Adding
search_tokens
toCase
with lower-case IDs.
v0.4.0
End-User Summary
This is the first release made available to the public. Major features include
Categories and projects as well as access control assignment is taken from the main SODAR site. Organizing projects and users is done in the main SODAR site.
Variant filtration can be done on a large number of attributes. This includes a specialized compound recessive filter.
Filtration results can be converted into TSV/XLSX files for opening in Excel or VCF for further processing.
Full Change List
Sodar-core integration for user and project management
Download of filter results in TSV, VCF or EXCEL file format
SQLAlchemy replaces for raw query generation for filter queries
Heterozygous database entries of frequency databases are now properties of the model
UI improvements
Updated and completed database query tests
Refinement of indices and queries improves filter query performance
Simplifying import from gts TSV, vars TSV, and PED file in one go
Glossary
- AAB
Alternate Allele Balance, computed as
min(AD/DP, 1 - AD / DP)
, e.g., 3/10 reads have an AAB of 0.3, as do 7/10 reads.- ACGS
Association for Clinical Genomic Science
- ACMG
American College of Medical Genetics
- AD
Alternative Depth, number of reads showing alternative allele.
- ClinVar
A database of variants with their clinical annotation.
- CADD
Combined Annotation Dependent Depletion, a variant pathogenicity score available from https://cadd.gs.washington.edu
- DP
Depth of coverage, number of reads covering a position.
- ENSEMBL
TODO
- Entrez
TODO
- Exomiser
TODO
- IGV
Integrated Genome Viewer
- HiPhive
TODO
- HTS
High-Throughput Sequencing
- MEDLINE
The most relevant bibliographic database for the life sciences.
- MutationDistiller
A variant pathogenicity tool available at https://mutationdistiller.org
- MutationTaster
A variant pathogenicity tool available at https://mutationtaster.org
- NCBI
TODO
- OMIM
Online Mendelian Inheritance in Man
- Phenix
TODO
- Phive
TODO
- PubMed
A free search engine primarily accessing the MEDLINE database of references
- QC
Quality Control
- SNV
Single Nucleotide Variant
- SOP
Standard Operating Procedure
- UCSC
University of California, Santa Cruz; hosting the very popular UCSC genome browser
- UMD Predictor
A variant pathogenicity prediction tool available at https://umd-predictor.eu
- Varsome
A commercial website/product that aggregates information about a variant and allows the public annotation of variants; available at https://www.varsome.com
- WES
Whole Exome Sequencing
- WGS
Whole Genome Sequencing