VarFish User Manual

VarFish is a system for the filtration of variants. Currently, the main focus is on small/sequence variants called from high-througput sequencing data (in contrast to structural variants).

The VarFish home screen.

The VarFish global “home” screen showing the demo project with the Corpas family quartet.

Important

VarFish is for research use only software.

Project Access Control

In VarFish, access to data is organized in Projects. Projects can be grouped into possibly nested Categories and access. Users are assigned Roles in projects and get access to the project’s data through these role assignments.

Projects can either be local or come from a central SODAR site. In the first case, the project owners can change the user in projects through the VarFish site itself. In the second case, user assignment is performed in the central SODAR site.

Project Details

When selecting a project, you are directed to its Details view. Here, you can see its README information, the overview from the further VarFish components, the project timeline, and background jobs.

Note

The overview page will only display the five most recent entries in each category. You can reach the full information using the corresponding buttons in the left navigation bar.

Most importantly use the Cases link to see the full list of cases in a project.

Details view for the demo project.

This figure shows the project details of the demo project. On the left, you can see the navigation bar to the different apps active for the project. On the right, you can see the project overview showing up to the five most recent entries only.

VarFishApp Overview

See the latest imported cases.

Project Timeline

Keep track of recent project settings.

Background Jobs App Overview

Background job status, e.g., for file exports.

User Menu

The user / profile menu gives access to site-wide pages/views (called site-wide apps). Currently, the following ones are accessible, also see the picture below.

_images/user_menu.png
User Name

Displays back the name of the currently logged in user and the account name.

Import Release Info

Display information about the background database releases.

API Tokens

Manage API (application programmableinterface) tokens for programmers.

User Profile

Configure your settings in the User Profile view.

Log Out

Log out of SODAR and redirect to the login screen.

API Token Management

This page allows for managing API tokens. This feature is interesting if you want to use software (or develop one yourself) that interfaces with SODAR programatically, or if you want to use the API import feature of VarFish to easily import your cases.

_images/api_tokens.png

You can create API tokens with the Create Token button. Each token can be deleted through the little cog button towards the right of the token list. In the token list, you can see the time of creation, the expiry time, and the first 8 characters of the key.

Please note that after creating the token, you will only be able to see the first 8 characters again (for reidentification). For security reasons, the token itself will be encrypted using a one-way hash function. It is possible to check a given token to be the same as in the database, but it is not possible to retrieve a lost token. Rather, one would discard the old one from VarFish and just create a new one. Note that the token is completely independent from any token that you might obtain from a CUBI-developed or hosted web app (in particular it is separate from any SODAR API token).

Please also note that if you create and use an API token then, currently, whoever bears your token has the same permissions to the SODAR system through the API as your user. Allowing to limit scope is on the list of future features, but currently this has not been implemented.

On creation, you can chose a number of hours that the token should be valid. Using an expiry time of 0 will make the token never expire.

User Profile

The user profile screen displays information about your account to you. Further, you can change global settings by using the Update link in the Settings box.

_images/user_profile.png

Currently, you can adjust the following settings:

UMD Predictor API Token

For enabling variant pathogenicity score using the UMD predictor, add your API token here. For more information see umd-predictor.eu.

GA4GH Beacon Network Widget

Opt-in into displaying the Global Alliance for Genomes and Health Beacon Network Widget. This allows you to query the beacon network for variants that you see in your cases.

Changelog seen in version

This value stores the last time that you clicked the New Features! button on. In a future version, this setting will be hidden from normal users.

Display project UUID copying link

Whether or not to display a little icon next to project name for easy copying of the project UUID to the clipboard.

Import Release Info

This screen displays the release version information of the background databases that have been imported into VarFish. These are variants such as gnomAD or OMIM that are used for enriching variant annotation when displaying variant results.

_images/import_release_info.png

Please note that the cases are annotated independently of the VarFish web server. The databases used to annotate individual case VCFs can be found in the case overview.

IGV Configuration

For each variant in the result set, VarFish provides you with a IGV link that will show the locus of the variant in the IGV browser window. For this, you must have IGV running locally and properly configured.

For this, you have to go to the open the preferences window by first clicking the View menu entry and then the Preferences menu entry. Select the Advanced tab. There, make sure that the Enable port checkbox is ticket and the port value is set to 60151. Finally, click OK to save your changes.

The settings is also illustrated in the following figure.

_images/igv_proxy_settings.png

VarFish Kiosk Tutorial

This is the tutorial for the VarFish Kiosk mode. It walks you through uploading a VCF into VarFish Kiosk and analyzing it using the filtration and prioritization features of VarFish. Finally, you download the result as an Excel file.

Note

The VarFish Kiosk Mode

VarFish can be run in two modes of operation: (1) The “classic” mode is available on site-specific installations and has all the features including multi case projects and allows for multiple users to collaborate. (2) The “kiosk” mode that is available centrally at https://varfish-kiosk.bihealth.org. Here you can upload your cases for analyzing them, all data will be discarded after 2 weeks.

In the case of any questions or problems, don’t hesitate to contact Manuel Holtgrewe.

Download Example Data

Please first download the pfeiffer-singleton.vcf.gz file. This is the exome data published into the public domain by Manuel Corpas as the corpasome with a spike-in of the variant FGFR2:p.Glu566Gly which is associated with Pfeiffer syndrome in ClinVar. The example data is taken from the Exomiser project.

Data Upload

Next, navigate to https://varfish-kiosk.bihealth.org/ and you will be presented with the following screen.

_images/kiosk_tutorial_file_upload.png

Select the previously downloaded pfeiffer-singleton.vcf.gz as VCF File and click Submit. Optionally, you can upload a pedigree information file with the PED File field or give its text content in the text field PED Text below. The example VCF file only contains one sample so this is not necessary here.

Note

PED Files

If you have a VCF file with multiple samples and do not specify the VCF file then no family information will be available subsequently. The PED file format is as follows. Each line gives the information of one pedigree member. Each line has the following format (separated by spaces): Family ID (if uncertain, put FAM), name of person, name of father (0 if founder), name of mother (0 if founder), specification of sex (1 for male, 2 for female, 0 if unknown), specification of disease (1 for unaffected, 2 for affected, 0 if unknown). The sample names must match the names in the VCF file! For example, a trio (with male-affected child and unaffected parents) could look as follows:

FAM index  father mother 1 2
FAM father 0      0      1 1
FAM mother 0      0      2 1

After hitting Submit, your file will be uploaded, sorted, and annotated.

_images/kiosk_tutorial_upload_process.png

In the case of failure, an error message will be displayed. Otherwise, you will be redirected to the case screen.

Note

Data Security in VarFish Kiosk

VarFish Kiosk is a login-less service. This allows you to get started with VarFish quickly but there is (currently) no way to display all of your uploaded cases etc. Instead, you have to copy and save the address of your case after upload to retrieve it. The URL is virtually impossible to guess. You can simply pass on a case that you uploaded by sending the address in an email. Similarly, you have to be careful not to publish the case address as anyone with the case URL can access the case. Data on varfish-kiosk.bihealth.org will be removed after two weeks and measures are in place to block users found trying to guess case URLs.

Depending on your local legislation and the consent of your data, VarFish Kiosk might not be suitable to analyze your clinical data.

Case Overview

In the case screen, you can find information about your case. Once you annotate variants with flags or text, this information will be displayed here as well.

_images/kiosk_tutorial_case_overview.png
  • Overview shows an overall overview of your case.

  • Quality Control shows quality control measures derived for your case, similar to the Peddy method.

    • Relatedness allows to validate whether your pedigrees belong together.

    • Rate of het. calls on chrX allows you to do a rough check of biological sex based on variant calls on the X chromosome.

    • Depth and heterozygosity gives insight into the coverage and ratio of heterozygous variant calls.

    • Variant types shows variant counts by variant type.

    • Variant effects shows a histogram of variants by predicted molecular effect.

    • Indels sizes shows the distribution of the sizes of indels in your data.

  • Variant Annotation shows your manual annotation of variants in your case.

After quality control, you can cut straight to chase and click Filter Variants on the top right. This will bring you to the variant filtration screen.

Variant Filtration

It is best to start out with a Quick Preset. Let us assume that for our case, we assume dominant mode of inheritance. Click Load Presets –> dominant which will select values that are a good starting point:

  • The maximal allowed population frequency will be set to ~0.2%.

  • Variant quality restrictions are set to relatively strict values.

  • Variants are limited to those where an amino acid change or change in splicing is predicted.

Click through the Frequency tab and the entries below More… to inspect the different filter options. You can quickly adjust the settings for individual categories by adjusting the dropbox between the categories such as Frequency or Impact. Once you perform such a change, the corresponding settings pane is displayed and you can see the effect of your action or perform further fine-adjustments.

Once you are happy with your selection (we recommend that you go back to defaults for dominant mode of inheritance with Load Presets –> dominant), click Filter & Display to start querying.

Note

Query Speed

The time a query takes to complete is proportional to the number of returned variants. It is thus recommended to start with relatively strict filter settings and screen the resulting variants. If you are unhappy with the results then relax the settings to obtain more results. In our hands, this proofed to be the most time-efficient way.

_images/kiosk_tutorial_filtration_results.png

After some patience, you will be shown your resulting list of variants.

Note

Result Record Count

Note well that by default the number of records to display is limited to 200. You can adjust this at More… –> Miscellaneous but this comes with longer query times and will have a heavier burden on your browser.

Below we show the anatomy of a result line:

_images/kiosk_tutorial_filtration_results_detail.png
  1. Click to expand for more details about the variant.

  2. Click the flag or comment symbol to flag the variant or add comments. Flagged or commented variants are marked with filled out symbols. The gray field next to those symbols opens the ACMG criteria form and will be filled with a number and color response.

  3. The first symbol in this group of three symbols marks if the variant is seen in dbSNP. The second symbol marks if the variant is seen in ClinVar, while the third symbol marks if the variant is seen in HGMD.

  4. The starting position of this variant.

  5. Reference and alternative allele of this variant.

  6. Frequency, number of homozygous and pLI score from ExAC (by default). This can be changed to other frequency database such as gnomAD or 1000G in the top of the results list.

  7. The gene name along with a dropdown menu for link-outs to several services for more information about the gene.

  8. A red doctor symbol right next to the gene name indicates whether this gene is listed in the ACMG incidental findings list.

  9. The protein effect for this variant.

  10. The genotype for each variant and member of the pedigree.

  11. Look up this variant in MutationTaster MT, jump to the position in your IGV browser or query other services for this variant.

Variant Prioritization

With our filter settings, we got 126 variants from the query. Of course, it is not feasible to review all of these variants. Instead, it is state of the art to obtain pathogenicity prediction scores for ones variants (e.g., using CADD or MutationTaster) and also compare the phenotypes of the gene that a variant affects to the phenotypes of your patient.

Note

Query Performance, Again

Pathogenicity and (in a less pronounced fashion) phenotype similarity computation will increase your query times. Try to first filter without scores and then activate the prioritization on not more than a few hundred resulting variants.

_images/kiosk_tutorial_prioritization.png

Click Prioritization to show the prioritization options. Next, enable variant pathogenicity prioritization and switch it to the CADD. Then, enable phenotype prioritization and select HiPhive (human only). We don’t have real patient information for the spiked-in variant but the HPO website tells us that Pfeiffer syndrome includes the following phenotypes: HP:0004440; HP:0003196; HP:0000244; HP:0000218. Just copy and paste these HPO terms into the HPO Terms field.

Finally, again hit Filter & Display to run the query with prioritization enabled. After waiting a few seconds, you will see the results and the spiked-in variant should be on the top.

_images/kiosk_tutorial_prioritization_results.png

We now go on to flag it as the final causative variant with good phenotype match…

_images/kiosk_tutorial_flagging.png

… and also perform an assessment of the variant following the ACMG guidelines.

_images/kiosk_tutorial_acmg.png

After flagging, commenting and assigning an ACMG rating, the resulting row will be highlighted.

_images/kiosk_tutorial_variant_highlighted.png

To get an overview of your flagged and commented variants for the whole case, go to the case overview by clicking Back to Case and then switch to the Variant Annotation tab.

_images/kiosk_tutorial_case_annotated.png

Finally, we go to the case overview by switching back to the Overview tab and mark the case as solved.

_images/kiosk_tutorial_closing_case.png

Export Results As Excel File

To export your results as Excel file, go into to the filter form again. Instead of clicking Filter & Display, click the arrow right next to it. This will open a dropwdown menu. Selecting Download as File will start the export and redirect you to the status page of the export process.

_images/kiosk_tutorial_download_xlsx.png

The export will take a moment. The page does not refresh automatically, please click Refresh page every once in a while. The process logs are displayed at the end of the page.

_images/kiosk_tutorial_download_xlsx_process.png

Once the export has finished, you will be offered a link to download the resulting file.

_images/kiosk_tutorial_download_xlsx_result.png

Closing Remarks

This is the end of this tutorial.

  • A good next step is to try this again with the following quartet VCF file which is again based on the public Corpasome data having the Pfeiffer variant spiked into one of the children as a de novo variant. You can use the following pedigree information:

    FAM index   father mother 1 2
    FAM sibling father mother 1 1
    FAM father  0      0      1 1
    FAM mother  0      0      2 1
    

    After upload of the data and selecting the Load Presets –> dominant, identifying the variant should be quick.

  • Another good next step is going through this manual. You can navigate using the links on the left.

  • While VarFish Kiosk is nice for ad-hoc analysis of single VCF files, we recommend sites anticipating a higher throughput to perform a dedicated installation of VarFish Classic. This documentation als contains instructions for the installation but this will require fast server hardware and knowledge about Linux server administration.

Variants & Cases

The variants are assigned to Cases. Use the Cases link on the left to see all cases in a project. Then, click on the case name to go to the case’s detail view.

Case Detail View

On the case detail view, you can see the following information:

Details

Case detail information such a creation date, case name, and name of individuals.

Pedigree

The full pedigree information with the information whether variants are present for the individuals (i.e., whether it was sequenced).

Flagged Variants

The variants flagged in the individual.

Comment Variants

The variants that were commented in the individual.

Background Jobs Overview

List of background job for this case, e.g., for file export generation.

The details view for a case.

The case details view for the demo case. Note the details on the different aspects of the case and in particular the Filter Case and ClinVar Report buttons on the top right.

Case Detail View Actions

On the top right, you can see the following button:

Filter Case

This takes you to the Variant Filtration view. Here you can filter the case’s variant by a multitude of criteria including genotype, call quality, and variant effect.

Variant Statistics & QC

VarFish is providing you with advanced integrating tools for quality control (QC) of your variant calls. When importing your cases, VarFish will compute statistics about the variants in your cases and check them with your pedigrees.

Sex & Relation QC

The first consistency check performed is whether all individuals used as father or mother in your pedigree have the appropriate sex. In the case of any issue, little red icons are displayed in your case listings and pedigree displays.

Sex and relationship problems displayed in case list.

Example for sex and relationship problems displayed in the case list. The little “venus-mars” icon indicates a problem with sex assignment, the little “people” icon indicates a problem with relationship.

The second check that is performed is computing the ratio of het./hom. calls on the X chromosome outside of the pseudo paralogous regions. This ratio should be small for males and large for females. Male samples whose ratio is above 1.0 and female samples whose ratio is below 1.0 are flagged as erroneous. In the case of problems, little red icons are displayed in the same way as with the incorrect parent-sex assignment described above.

The third check that is performed is looking at the relationship of your parent-child and sibling-sibling pairs in each pedigree. A relationship ratio is computed as well as the IBS0 value according to Pedersen & Quinlan (2018). The relationship ratio is higher for closely related individuals (about 0.5 for parent-child and sibling-sibling pairs). The IBS0 value is the number of variants that do not share any allele. This value should be close to 0 for parent-child relations and also small for siblings.

QC Plots

Further, the case details view displays six plots helpful for variant quality control.

Example for the six statistics & QC plots.

The six statistics and QC plots described in this section.

Relatedness vs. IBS0

For each sample pair in your pedigree, this plot shows the relatedness coefficient vs. the IBS0. Parent-child relationship should cluster at the top-left. The sibling-sinbling relationships should follow a bit further towards the right. Unrelated individuals (e.g., parents in non-consanguineous families) should display on the lower right.

Rate of het. calls on chrX

This plot displays the rate of heterozygous over homozygous variants on the X chromosome outside of the pseudoautosomal regions. This count is displayed for samples classified as male, female, and unknown in the pedigree. Values falling on the wrong side of the threshold of 1.0 described in Sex & Relation QC are colored red.

Depth and Heterozygosity

This plot displays the fraction of heterozygous calls vs. the median depth. Depth outliers are colored blue while ratio outliers are colored red. Values are counted as outliers if they are more than 3 inter-quartile ranges from the median. Keep this in mind when interpreting these plots.

Variant Type Histogram

For each sample, the number of called on-exome SNVs, indels and MNVs is displayed. Note that some variant callers such as the widely used GATK tools do not call MNVs but break them up into individual SNVs. Thus the MNV count will be 0 in many cases.

Variant Effect Histogram

For many relevant variant effect classes, the absolute frequency in on-exome variants is displayed in this histogram for each sample.

Indel Size Histogram

The number of bases deleted (negative) and inserted (positive) from 1 to 20 is displayed in this histogram for each sample.

QC Metrics

Variant Filtration

This view allows you to filter variants to a number of criteria. Further, you can trigger an export of the variants with your current criteria to a downloadable VCF, Excel, or TSV file.

You can open the variant filtration view for each case by first navigating to the case’s detail page and then clicking then Filter Case button on the top right.

On the top of the page, you can see the Variant Filtration Form for setting the parameters for creating your filtration. Below, the results will be displayed after submitting the form.

Note

VarFish will store every query that you make. When loading the filtration form, your previous form settings will be restored and a notification will be displayed to notify you of this.

Note

The implementation of the variant filter in VarFish is monolithic as we use the data from the user submitted form to compile a single, rather large, SQL query from it. This enables us to have a very efficient (in terms of computing time and resources) filtering step. The downside of this is that we can’t track how many variants are actually filtered out by which filter setting.

Variant Filtration Form

Note

As in many places, VarFish offer in-place online help: Move your mouse cursor over any item to display its tooltip description (if it has any).

The form has the following components. Note that some form tabs will be hidden below the More… tab depending on your screen size.

  • Genotype tab

  • Frequency tab

  • Variants & Effects tab

  • Quality tab

  • Gene Lists tab

  • Flags & Comments tab

  • ClinVar & HGMD tab

  • Configure Downloads tab

  • Miscalleneous tab

  • Filter Import Export tab

  • Load Presets button

  • RefSeq / ENSEMBL switch

  • Filter & Display button
    • The little triangle on the right gives access to the Download as File and Submit to MutationDistiller menu entries.

Genotype

The Genotype form tab on the Variant Filtration form.

In this tab, the individuals of your pedigree are displayed with their name, father and mother, sex, and disease state.

Here, you can configure the genotype pattern that you want to query for. The Genotype column contains select fields for each of your pedigree individuals. The value meanings are:

any (default)

Any genotype is allowed.

0/0

The genotype of this individual should be reference.

0/1

The genotype of this individual should be heterozygous.

1/1

The genotype of this individual should be homozygous alternative.

variant

The genotype of this individual should be heterozygous OR homozygous alternative.

non-variant

The genotype of this individual should be reference or no-call (./.).

non-reference

The genotype of this individual should be heterozygous OR homozygous alternative OR no-call (./.).

Further, you can check the enable comp. het. mode checkbox. In this case, the values of the Genotype column’s select fields are ignored. Instead, the list of variants will be filtered as follows:

  1. All variants are filtered according to the remaining tabs of the filtration form (all except Genotype).

  2. Two sets of variants are created:
    1. A paternal set with variants that are in heterozygous state in both the index and the father and which are reference in the mother.

    2. A maternal set with variants that are in heterozygous state in both the index and the mother and which are reference in the father.

  3. For each gene occuring in either set, the number of variants are counted, leading to paternal count and maternal count for each gene.

  4. Only those genes where both the paternal and maternal count is above zero are kept.

  5. All variants where the paternal and the maternal count are above zero are reported. This can include variants where the paternal or maternal count is above one.

Note

The compound heterozygous mode currently only works if you have a full trio in your data set (father/mother/child). Further, only the genotypes of these three individuals will be considered in the filtration.

Frequency

The Frequency form tab on the Variant Filtration form.

Here you can filter variants by their relative frequency in variation databases or how often they occur within in heterozygous or homozygous state. The population databases are 1000 Genomes Phase 3, ExAC, genomAD exomes, and gnomAD gnomes. You switch on/off a population for consideration by the little checkbox on the left.

The column Homozygous count limits the number of maximal occurences of a variant in homozygous state for each database. For example, setting 10 for 1000 Genomes, all variants occuring 11 times or more often in the 1000 Genomes dataset will be excluded. The Heterozygous count field works the same way but for number of heterozygous state.

The Frequency field works as follows. Here, you specify the maximal frequency in any sub population of the given database. For example, setting 0.01 for ExAC, you will exclude all variants occuring with a higher frequency than 1% in any sub population, e.g., if the variant has 2% in the African ExAC samples and 0.1% in the European samples, then it will be excluded.

In all homozygous/heterozygous/frequency fields, you can disable the corresponding filter by leaving the field empty.

Variants & Effects

The Variants & Effects form tab on the Variant Filtration form.

This tab allows for the fine-granular selection of variants based on the variant effects.

The Variant Types section allows you to select whether to include SNVs (single nucleotide variants, e.g., A>C), Indels (insertions or deletions, e.g., AC>T, A>CT, ACT>GG), or MNVs (multi-nucleotide variants where reference and alternative allele have the same number of bases and more than one base is affected, e.g., CC>TT, CCC>TTT).

The Transcript Type section allows you to select whether to include coding and/or non-coding variants.

In the Detailed Effects section, you can perform selection of variants on the finest level of granularity. The Effect Groups allow you to quickly select and unselect fields from the Detailed Effects section.

Quality

The Quality form tab on the Variant Filtration form.

This tab allows you to set quality thresholds on the genotype calls on a per-sample level. Further, you control how calls not passing the threshold in individuals are treated.

min DP het.

Minimal coverage of heterozygous variants to pass the quality filter.

min DP hom.

Minimal coverage of homozygous variants to pass the quality filter.

min AB

Minimal allelic balance. This settings is applied to heterozygous variant calls only. Given a variant with total coverage c and a reads supporting the alter native allele, the allelic balance AB is defined as a/c. A well-balanced variant has an allelic balance that is not too far from 0.5. To pass the quality filer, the allelic balance must be: min AB <= AB <= 1 - min AB.

min GQ

Minimal (Phred-scaled) genotype quality for variants to pass the quality filter.

min AD

Minimal number of reads supporting the alternative allele to pass the quality filter.

The “on FAIL” column determines the action to take for variants that don’t pass the quality filter:

drop variant

The whole variant is removed from the result if the quality filter fails in this individual. This makes a low-quality call in the particular sample remove the variant even if the quality is high in other individuals.

ignore

The quality filter is ignored for the particular sample.

no-call

The variant in this individual is counted as “no-call” in the Genotype filter settings.

Gene Lists

The Gene Lists form tab on the Variant Filtration form.

Enter any Entrez gene ID, ENSEMBL gene ID, HGNC/HUGO gene symbol in the Gene Blocklist field to remove variants in this gene from the result list. If a variant affects more than one gene, blocklisting one of them will not blocklist them in the other genes.

Similarly, enter any Entrez gene ID, ENSEMBL gene ID, HGNC/HUGO gene symbol into the Gene Allowlist field to limit variants to those in the allow-listed genes. Leave the allowlist empty to not apply any allow-listing.

Flags & Comments

The Flags & Comments form tab on the Variant Filtration form.

Here you can filter your variants based on the user-provided flags.

ClinVar & HGMD

The ClinVar & HGMD form tab on the Variant Filtration form.

You can use this to require membership in ClinVar and HGMD Public. When requiring ClinVar membership, you can limit the reported variants to those with a particular pathogenicity.

Note that the HGMD Public data is taken from the ENSEMBL browser and is several years behind the current HGMD Public and Professional versions.

Configure Downloads

The Configure Downloads form tab on the Variant Filtration form.

These fields allow you to configure how your file downloads are created. You can select the file type to use for the exprot (Excel, TSV, or VCF).

Further, you can select the individuals to include. This is useful for generating single-individual VCF files if you want to use tool that does not support multi-sample VCF files.

Also, you can select whether you want to export your flags and comments.

Miscalleneous

The Miscalleneous form tab on the Variant Filtration form.

Here you can select a row limit on the online variant display.

This limit will not be applied to your file downloads.

Filter Import Export

The Filter Import Export form tab on the Variant Filtration form.

Here you find the configuration stored in JSON format. While the format is machine and not human-oriented, it allows you to save your current form settings in a text file and restore them later.

Click the Download JSON button to download a text file with the value of the text area above. Clicking the JSON >> Settings button applies the changes from the text area to the form. The text area is automatically updated to reflect the current form settings when you change any form field.

Load Presets

Here you find shortcuts to several presets. Note that these are “factory” defaults at the moment. Currently, it is not possible to create your own presets. This will be possible in a future version.

RefSeq / ENSEMBL switch

Use this to choose between RefSeq and ENSEMBL transcripts when filtering for variant effects.

Filter & Display Button

Use this button to perform a new query with the current form settings and display the results below.

Download as File

When clicking on the little triangle next to the Filter & Display you can select the Download as File menu item. This will start a background job on the server to create a downloadable file from your current form settings. Note that the values from the Configure Downloads will be used for configuring the exported files while the row limit from the Miscalleneous will not be applied.

Note

VCF exports are meant for exporting whole exomes from VarFish (thousands of rows). In contrast, Excel and TSV exports are meant for exporting exomes filtered to “interesting” variant sets (up to hundreds of rows)

VCF export is much faster than Excel and TSV export. For performance reasons, filtration of VCF file exports is limited to the basics. Filtration, for genotype, frequency, variant effect etc., gene allow-/blocklist work as well as basic ClinVar membership. Filtration for HGMD public membership, clinvar details, user comments and flags is not applied to VCF exports.

Exports to TSV and Excel use the same filters as displayed when clicking on Filter & Display.

Submit to MutationDistiller

Also, the little triangle next to the Filter & Display gives you access to the Submit to MutationDistiller action. This is similar to generating a downloadable VCF file. However, clicking the button will submit the data to MutationDistiller after confirming this once again in popup window.

Here are the actions to create the recommended settings for submitting to MutationDistiller:

  • Select the appropriate Genotype configuration that you want to submit to MutationDistiller.

  • Note that MutationDistiller only supports single-sample VCF files at the moment. Go to the Configure Downloads tab via More … ‣ Configure Downloads and unselect all but the one individual that is to be exported.

  • Load presets for pulling all variants from the original VCF file via Load Presets ‣ Full Exome.

  • Click the little triangle next to Filter & Display, then click Submit to MutationDistiller.

  • A confirmation popup appears. Read the text carefully and then confirm the submission.

  • This will create a background job that first generates a VCF file with all selected variants and then submits this file to MutationDistiller.

Note

The MutationDistiller submission uses the same feature as th VarFish VCF export. Thus, the limitations described in Download as File apply.

Submit to SPANR

Also, the little triangle next to the Filter & Display gives you access to the Submit to SPANR action. This is similar to submitting ot MutationDistiller described above. Clicking the button will submit the data to SPANR after confirming this once again in popup window.

Variant Filtration Results

The filtration results display.

After form submission, the results are displayed below the form.

Filtration Results Header

The header contains a Frequencies switch that allows you to select the database for display population frequencies. Further, it shows the number of displayed and the number of result records. Lastly, it displays the transcript data source used.

Warning

Always monitor the number of displayed vs. total records. You might have to adjust the number of displayed rows so you don’t miss any variants!

Result Rows

The result rows consist of the following elements:

  • Clicking right-pointing arrow will show you more details on your variant below the result row.

  • The little bookmark sign indicates whether the variant has been flagged (filled if flags are present). The summary flag status is also indicated by the row color. Click on the bookmark sign to adjust the flags for this variant.

  • The little speech bubble indicates whether there are any comments for this flag (filled if comments are present).

  • The little database icon (three disks) indicates dbSNP membership of the variant (dark if present in dbSNP, very light if not). Click on the icon to go to its dbSNP entry.

  • The little hospital icon indicates ClinVar membership (again dark if present in ClinVar, very light if not).

  • The little circle indicates membership in HGMD Public (see ClinVar & HGMD for information about HGMD Public age).

  • The following columns indicate the variant position, reference and alternative bases.

  • This is followed by the frequency display from the population database selected in the header.

  • The next column shows the gene symbol, clicking on the little triangle next to it allows you to see the variant in various databases.

  • The variant effect on the protein level in HGVS notation. Moving the cursor over this field will show a textual explanation of the effect.

  • The next columns show the genotypes in the individuals. Moving the cursor over this field will show the genotype quality and number of reference and alternative reads.

  • The MT button will query MutationTaster for this variant.

  • The IGV button opens the selected locus in IGV if you have it open in the background and View ‣ Preferences ‣ Advanced ‣ Enable port` activated and the port set to 60151.

  • Clicking the little triangle next to IGV allows you to open the variant locus in various other genome browsers.

Project-Wide Queries and Stats

Project-Wide Statistics

You can also view joint statistics for all cases within a project. For this, open a project’s case list (open the project detail view, then click the Cases icon in the left bar).

Here, the project-wide variant statistics will be displayed above your cases if it has been generated already. If you want to (re)-generate it, use the Recompute Project-Wide Stats button on the top right. This will create a background job for the recomputation (it might take quite some time). After the job is complete, the updated data will be displayed on the case list.

Project-Wide Queries

Further, you can perform queries to all cases in your project. For this, navigate to the project’s case list. Then, click the Joint Filtration button on the top right.

The form that opens is very similar to the one described in Variant Filtration with the following differences:

  • All members of all cases in your project will appear.

  • Instead of having one row for each variant and one genotype column for each sample, you have one row for each variant and sample and one column with genotype information only. There is an additional column that gives the name of the sample that the row is for.

  • The TSV and Excel file download generation creates similarly-structured tables.

  • VCF export is currently not supported yet.

Variant Annotation

Variant Comments & Flags

Creating Comments & Flags

The flag and comment marker next to result rows.

The flag marker (little bookmark) and comment marker (little text buble) are shown for each result row. They are filled when the flags have been set or a comment has been submitted for the variant.

Use the little bookmark-shaped or text bubble icon next to each variant to open the “flagging / comment” window. Check the desired flags and/or enter your comment text in the text box below. Click Save to create a new comment and/or flags.

The Flags & Comments form tab on the Variant Filtration form.

When clicking the flag/comment markers, the “Flags & Comments” popup opens. Select the flags that you want to apply and/or enter a comment in the text box and then click the Save button. The Summary label also determines the color of the result row (green, yellow, red, or no coloring). Selecting no Summary but any other flag will highlight the result row in gray.

The filled flag and comment marker next to result rows.

The flag and comment marker are now filled.

Exporting Comments & Flags

You can export comments and flags together with your variants into an Excel file.

Viewing Comments & Flags

Comments and flags in variant details

Comments and flags are displayed when expanding the variant details.

The comments and flags for a variant are displayed in the variant details. For this, click the arrow at the beginning of a resulting row. The comments and flags are displayed in the box in the top right of the expanded variant details.

Alternatively, you can also view your comments and flags in the case details overview as described below in the “Viewing Annotations” section.

ACMG Rating

The ACMG marker next to result rows.

The ACMG marker (little gray box with a dash in the middle) is shown for each result row. It is filled with the ACMG rating and a corresponding coloring when the ACMG rating has been set for the variant.

Use the little gray box next to each variant to open the “ACMG Rating” window. Check the desired classifications and click Save to create a ACMG rating. The actual class is automatically computed. You can override the computation and set your own class by entering a number in the Class override box.

The ACMG Rating form tab on the Variant Filtration form.

When clicking the ACMG marker, the “ACMG Rating” popup opens. Select the classes that you want to apply and then click the Save button. The actual class is automatically computed. You can override the computation and set your own class by entering a number in the Class override box.

The filled ACMG marker next to result rows.

The ACMG rating marker is now filled.

Viewing Annotations

You can get a complete list of all the comments, flags and ACMG ratings for a case in the case details view. For this, go back to the case detail page and click on Variant Annotations.

Annotations in the case overview.

You can see all variant flags, comments and ACMG ratings in the case details view.

Databases

This sections gives information about the integrated databases and tools and the ones that are linked out to. Further, it provides some pointers towards how to extend VarFish’s database and tool collection.

Integrated Databases and Tools

The following databases are integrated into VarFish, meaning that their contents are available from within VarFish itself.

Category

Database

Frequency

gnomAD

ExAC

1000 Genomes

mtDB

helixMTdb

MITOMAP

Clinical

ClinVar

HGMD Public

Variant Database

dbSNP

Variant Tools

VariantValidator

Phenotype

HPO

OMI

MGI Mapping

Gene Description

HGNC

NCBI Gene Summary

NCBI GeneRIF

ACMG Recommendations Gene

HPO

Pathways

KEGG

Constraint Scores

gnomAD pLI/LOEUF

ExAC pLI

Conservation

UCSC 100 Vertebrates

Adding and Updating Databases and Tools

We invite users to contribute to VarFish databases and tools (of course also VarFish itself) through our project and GitHub issue tracker at https://github.com/bihealth/varfish-server or by emailing us directly. In this section, we summarise the process of extending the databases and tool selection. However, as this a very large topic, we suggest users contact us with their suggestions by email or through the GitHub issue tracker to get more information. We will also be happy to work with users in finding the best way of integrating new tools and database.

Database Modifications

Updating databases is more complicated. Overall, the steps are as follows:

  • The data must be downloaded and converted into TSV (tab-separated values) file(s). For this, we are maintaining a Snakemake workflow on GitHub at https://github.com/bihealth/varfish-db-downloader.

  • The VarFish source code must be modified to

    • create a new Django model class to manage the database table(s) for the new database,

    • create importer code for loading the data into the database,

    • adjust the code for the user interface to display the data (or use it in a different fashion),

    • (potentially) adjust the query generation code to incorporate the new database in the queries,

    Also, the documentation has to be adjusted.

We strongly recommend users to contact us for getting support with this.

Installation

This chapter describes how to install the VarFish core components and their requirements. The audience of this chapter are those who want to install VarFish on their own infrastructure.

Since v0.22.1 (about February 2021), the recommended way of installing VarFish is using Docker Compose. Docker Compose allows to describe the programs/services that are required to run VarFish as a site of Docker containers. Docker containers allow to the whole runtime environment of complex software packages in a transparent and efficient manner.

For the following, knowledge of Linux administration and exposure to Docker is required. Deeper knowledge to Docker and Docker Compose is of greater help in case of debugging. In the case that have problems, please open an issue in our Issue Tracker or send an email to cubi-helpdesk@bihealth.de. Please note that VarFish is academic software and we try to provide support on a best-effort.

You can find a quickstart-style manual in the varfish-docker-compose README.

Note that this will only perform installation of VarFish and related services with data (re)distributed by the VarFish authors. See Extra Services for installing extra services such as annotation with CADD scores.

Prerequisites

  • Hardware:
    • Memory: 64 GB of RAM

    • CPU: 16 cores

    • Disk: 600+ GB of free and fast disk space
      • about ~500 GB for initial database (on compression enabled ZFS it will consume only 167GB)

      • on installation: ~100 GB for data package file

      • per exome: ~200MB

      • a few (~5) GB for the Docker images

  • Operating System:
    • a modern Linux that is supported by Docker.

    • outgoing HTTPS connections to the internet are allowed to download data and Docker images

    • server ports 80 and 443 are open and free on the host that run on this on

  • Software:

Tuning database servers is an art of its own and you can have a look at the section Performance Tuning for getting started.

Install with Docker Compose

This section assumes that you have installed the prerequisites Git, Docker and Docker Compose. So the following two commands should work.

$ git version
git version 1.8.3.1
$ docker-compose -version
docker-compose version 1.28.2, build 67630359
$ docker version
Client: Docker Engine - Community
 Version:           20.10.3
[...]

First, we will obtain a checkout of varfish-docker-compose. This repository contains the docker-compose.yml and configuration files. On execution, about ten Docker containers will be spun up, each running a part of the services that are required to run VarFish. These include the Postgres database (that does the heavy lifting), Redis for caching, Jannovar for full functional effect annotation, Exomiser for variant priorisation, queue workers for performing database queries and similar tasks, and the VarFish web server itself. But this will come later.

$ git clone https://github.com/bihealth/varfish-docker-compose.git
$ cd varfish-docker-compose

Next, download and extract the VarFish site data archive which contains everything you need to get started (the download is ~100GB of data). This will create the volumes directory (500GB of data, ZFS compression yields us 167GB disk usage). Replace grch37 with grch38 in the command below if you want to use the GRCh38 release. We currently only provide prebuilt databases for either GRCh37 or GRCh38.

$ wget --no-check-certificate https://file-public.cubi.bihealth.org/transient/varfish/anthenea/varfish-site-data-v1-20210728-grch37.tar.gz{,.sha256}
$ sha256sum --check varfish-site-data-v1-20210728-grch37.tar.gz.sha256
$ tar xf varfish-site-data-v1-20210728-grch37.tar.gz
$ ls volumes
exomiser  jannovar  minio  postgres  redis  traefik

The next step is to create an installation-specific configuration file .env as a copy of env.example. You will have to at least set DJANGO_SECRET_KEY variable to something random (a bash one-liner for this is tr -dc A-Za-z0-9 </dev/urandom | head -c 64 ; echo ‘’).

$ cp env.example .env
$ $EDITOR .env

You can now bring up the site with Docker Compose. The site will come up at your server and listen on ports 80 and 443 (make sure that the ports are open), you can access it at https://<your-host>/ in your web browser. This will create a lot of output and will not return you to your shell. You can stop the servers with Ctrl-C.

$ docker-compose up

You can also use let Docker Compose run the containers in the background:

$ docker-compose up -d
Starting compose_exomiser-rest-prioritiser_1 ... done
Starting compose_jannovar_1                  ... done
Starting compose_traefik_1                   ... done
Starting compose_varfish-web_1               ... done
Starting compose_postgres_1                  ... done
Starting compose_redis_1                     ... done
Starting compose_minio_1                     ... done
Starting compose_varfish-celeryd-query_1     ... done
Starting compose_varfish-celeryd-default_1   ... done
Starting compose_varfish-celeryd-import_1    ... done
Starting compose_varfish-celerybeat_1        ... done

You can check that everything is running (the versions might be different in your installation):

$ docker ps
3ec78fb9f12c   bihealth/varfish-server:0.22.1-0                            "docker-entrypoint.s…"   17 hours ago   Up 31 seconds   8080/tcp                                   compose_varfish-celeryd-import_1
313afb611ab1   bihealth/varfish-server:0.22.1-0                            "docker-entrypoint.s…"   17 hours ago   Up 30 seconds   8080/tcp                                   compose_varfish-celerybeat_1
4d865726e83b   bihealth/varfish-server:0.22.1-0                            "docker-entrypoint.s…"   17 hours ago   Up 31 seconds   8080/tcp                                   compose_varfish-celeryd-query_1
a5f90232c4da   bihealth/varfish-server:0.22.1-0                            "docker-entrypoint.s…"   17 hours ago   Up 31 seconds   8080/tcp                                   compose_varfish-celeryd-default_1
96cec7caebe4   bihealth/varfish-server:0.22.1-0                            "docker-entrypoint.s…"   17 hours ago   Up 33 seconds   8080/tcp                                   compose_varfish-web_1
8d1f310c9b48   postgres:12                                                 "docker-entrypoint.s…"   17 hours ago   Up 32 seconds   5432/tcp                                   compose_postgres_1
8f12e16e20cd   minio/minio                                                 "/usr/bin/docker-ent…"   17 hours ago   Up 32 seconds   9000/tcp                                   compose_minio_1
03e877ac11db   quay.io/biocontainers/jannovar-cli:0.33--0                  "jannovar -Xmx6G -Xm…"   17 hours ago   Up 33 seconds                                              compose_jannovar_1
6af09b819e59   traefik:v2.3.1                                              "/entrypoint.sh --pr…"   17 hours ago   Up 33 seconds   0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   compose_traefik_1
514cb4386224   redis:6                                                     "docker-entrypoint.s…"   19 hours ago   Up 32 seconds   6379/tcp                                   compose_redis_1
5678b9e6797b   quay.io/biocontainers/exomiser-rest-prioritiser:12.1.0--1   "exomiser-rest-prior…"   19 hours ago   Up 34 seconds                                              compose_exomiser-rest-prioritiser_1

In the case of any error please report it to us via the Issue Tracker or email to cubi-helpdesk@bihealth.de. Please include the full output as a text file attachment.

Actually, your VarFish website is now ready to be used. Visit the website at https://<your-host>/ and login with the account root and password changeme.

_images/admin_login.png

There will be a warning about self-signed certificates, see TLS / SSL Configuration on how to deal with this. You can change it in the Django Admin (available from the menu with the little user icon on the top right). You can also use the Django Administration interface to create new user records.

You will observe that the database came with some demo data sets of public IGSR data that are ready for exploration.

_images/admin_view_project.png

Updating the Database

First, the tables that are to be updated should be generated. For this, follow the instructions in the VarFish DB Downloader repository.

At this point you should have a folder structure available that resembles:

varfish-db-downloader/
    GRCh37/
        <table_group>/
            <version>/
                <table>.tsv
                <table>.release_info
    GRCh37/
        [...]
    noref/
        [...]
    import_versions.tsv
    [...]

If the HPO and OMIM tables are supposed to be updated, it would look like this:

varfish-db-downloader/
    noref/
        hpo/
            20220126/
                Hpo.release_info
                Hpo.tsv
                HpoName.release_info
                HpoName.tsv
        mim2gene/
            20220126/
                Mim2geneMedgen.release_info
                Mim2geneMedgen.tsv
    import_versions.tsv
    [...]

Copy this structure on to the machine where the Docker compose is running. Take Docker compose down (this will shut down your VarFish instance!):

$ cd varfish-docker-compose  # make sure to be in the docker compose folder
$ docker-compose down

Modify the docker-compose.yml file by finding the following entry:

varfish-web:
  image: ghcr.io/bihealth/varfish-server:VERSION
  env_file:
    - .env
  networks:
    - varfish
  restart: unless-stopped
  volumes:
    - "/root/varfish-server-background-db-20210728:/data:ro"
  [...]

And add another volume that maps your directory into the container:

volumes:
  - "/root/varfish-server-background-db-20210728:/data:ro"
  - type: bind
    source: varfish-db-downloader/
    target: /data-db-downloader
    read_only: true

Start docker compose again:

$ docker-compose up

Once done, attach to your container:

$ docker exec -it varfish-docker-compose_varfish-web_1 bash -i

Switch to the application directory and start the import:

varfish-web-container$ cd /usr/src/app
varfish-web-container$ python manage.py import_tables --tables-path /data-db-downloader

The output of the command should look something like this:

Disabling autovacuum on all tables...
Hpo -- Importing Hpo 2022/01/26 (, source: /data-db-downloader/noref/hpo/20220126/Hpo.tsv) ...
Mim2geneMedgen -- Importing Mim2geneMedgen 2022/01/26 (, source: /data-db-downloader/noref/mim2gene/20220126/Mim2geneMedgen.tsv) ...
Hpo -- Removing old Hpo results.
Mim2geneMedgen -- Removing old Mim2geneMedgen results.
Mim2geneMedgen -- Importing new Mim2geneMedgen data
Hpo -- Importing new Hpo data
Mim2geneMedgen -- Finished importing Mim2geneMedgen 2022/01/26 (Mim2geneMedgen.tsv)
Hpo -- Finished importing Hpo 2022/01/26 (Hpo.tsv)
HpoName -- Importing HpoName 2022/01/26 (, source: /data-db-downloader/noref/hpo/20220126/HpoName.tsv) ...
HpoName -- Removing old HpoName results.
HpoName -- Importing new HpoName data
HpoName -- Finished importing HpoName 2022/01/26 (HpoName.tsv)
Enabling autovacuum on all tables...

To verify the import, switch to the VarFish web interface, find the users menu on the top right corner and select the Import Release Info entry. The updated tables should have the latest version.

_images/import_release_info.png

Extra Services

This section describes the installation of extra services.

Install Scoring with CADD

This section describes how to enable the scoring of variants with CADD using the CADD-scripts provided by the CADD authors. Note well that CADD-scripts is only free for non-commercial users as expressed in the CADD-scripts license. The installation is described for using a VarFish Docker Compose based installation.

First, create a directory volumes/cadd-rest-api inside the varfish-docker-compose directory and download an updated version of the install script.

$ cd varfish-docker-compose
$ mkdir -p volumes/cadd-rest-api/db
$ curl https://raw.githubusercontent.com/kircherlab/CADD-scripts/7502f47/install.sh \
    > volumes/cadd-rest-api/install.sh

Next, download the appropriate files using the install.sh script you just downloaded. The script will ask you for some decisions and the corresponding lines are highlighted below.

$ docker run -it -e CADD=/opt/miniconda3/share/cadd-scripts-1.6-0 \
    -v $PWD/volumes/cadd-rest-api:/data bihealth/cadd-rest-api:0.3.1-0 \
    bash /data/install.sh -b
Using kircherlab.bihealth.org as download server
CADD-v1.6 (c) University of Washington, Hudson-Alpha Institute for Biotechnology and Berlin Institute of Health 2013-
2020. All rights reserved.

The following questions will quide you through selecting the files and dependencies needed for CADD.
After this, you will see an overview of the selected files before the download and installation starts.
Please note, that for successfully running CADD locally, you will need the conda environment and at least one set of
annotations.

Do you want to install the virtual environments with all CADD dependencies via conda? (y)/n n
Do you want to install CADD v1.6 for GRCh37/hg19? (y)/n y
Do you want to install CADD v1.6 for GRCh38/hg38? (y)/n n
Do you want to load annotations (Annotations can also be downloaded manually from the website)? (y)/n y
Do you want to load prescored variants (Makes SNV calling faster. Can also be loaded/installed later.)? y/(n) y
Do you want to load prescored variants for scoring with annotations (Warning: These files are very big)? y/(n) y
Do you want to load prescored variants for scoring without annotations? y/(n) y
Do you also want to load prescored InDels? We provide scores for well known InDels from sources like ClinVar, gnomAD/TOPMed etc. y/(n) y

The following will be loaded: (disk space occupied)
 - Download CADD annotations for GRCh37-v1.6 (121 GB)
 - Download prescored SNV inclusive annotations for GRCh37-v1.6 (248 GB)
 - Download prescored InDels inclusive annotations for GRCh37-v1.6 (3.4 GB)
 - Download prescored SNV (without annotations) for GRCh37-v1.6 (78 GB)
 - Download prescored InDels (without annotations) for GRCh37-v1.6 (0.6 GB)
Please make sure you have enough disk space available.
Ready to continue? (y)/n y
Starting installation. This will take some time.
[...]
Connecting to kircherlab.bihealth.org (kircherlab.bihealth.org)|141.80.169.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 61 [application/x-gzip]
Saving to: ‘InDels_inclAnno.tsv.gz.tbi.md5’

InDels_inclAnno.tsv.gz.tbi.md5             100%[======================================================================================>]      61  --.-KB/s    in 0s
2021-03-08 18:55:10 (19.9 MB/s) - ‘InDels_inclAnno.tsv.gz.tbi.md5’ saved [61/61]

InDels_inclAnno.tsv.gz: OK
InDels_inclAnno.tsv.gz.tbi: OK

Then, update the .env file by uncommenting the lines that configure the variant prioritization with CADD in VarFish (use the contents of the .env file as the lines below might not be completely up to date).

# Extra: CADD REST API *****************************************************

# Uncomment the following lines to enable variant prioritization using the
# CADD score.  See the VarFish Server manual for installation instructions,
# in particular how to download the required data.
VARFISH_ENABLE_CADD=1
VARFISH_CADD_REST_API_URL=http://cadd-rest-api:8080
VARFISH_CADD_MAX_VARS=5000

Also, uncomment the lines in the docker-compose.yml file for the cadd-rest-api-server and cadd-rest-api-celeryd containers (the following listing is redacted, the docker-compose.yml file is up to date).

# Uncomment the following lines to enable the CADD REST API server that
# is used for variant prioritization using the CADD score.  We need both
# the server and the CADD-based worker.
cadd-rest-api-server:
  image: bihealth/cadd-rest-api:0.3.1-0
  env_file: cadd-rest-api.env
  command: ["wsgi"]
  # [...]

# You have to provide multiple cadd-rest-api-celeryd-worker container if
# you want to handle more than one query at a time.
cadd-rest-api-celeryd-worker-1:
[...]
cadd-rest-api-celeryd-worker-3:
  image: bihealth/cadd-rest-api:0.3.2-0
  env_file: cadd-rest-api.env
  command: ["celeryd"]
  networks: [varfish]
  restart: unless-stopped
  volumes:
    - "./volumes/cadd-rest-api/data/annotations:/opt/miniconda3/share/cadd-scripts-1.6-0/data/annotations:ro"
    - "./volumes/cadd-rest-api/data/prescored:/opt/miniconda3/share/cadd-scripts-1.6-0/data/prescored:ro"
    - "./volumes/cadd-rest-api/db:/data/db:rw"

Finally, restart your Docker container cluster including the new containers with docker-compose down && docker-compose up -d.

System Configuration

This section describes how to configure the varfish-docker-compose setup. When running with the varfish-docker-compose files and the provided database files, VarFish comes preconfigured with sensible default settings and also contains some example datasets to try out. There are a few things that you might want to tweak. Please note that there might be more settings that you can change when exploring the VarFish source code but right now their use is not supported for external users.

VarFish & Docker Compose

The recommended (and supported) way to deploy VarFish is using Docker compose. The VarFish server and its component are not installed on the system itself but rather a number of Docker containers with fixed Docker images are run and work together. The base docker-compose.yml file starts a fully functional VarFish server. Docker Compose supports using so-called override files.

Basically, the mechanism works by providing an docker-compose.override.yml file that is automatically read at startup when running docker-compose up. This file is put into the .gitignore so it is not in the varfish-docker-compose repository but rather created in the checkouts (e.g., manually or using a configuration management tool such as Ansible). On startup, Docker Compose will read first the base docker-compose.yml file. It will then read the override file (if it exists) and recursively merge both YAML files with the override file overriding taking precedence over the base file. Note that the recursive merging will be done on YAML dicts only, lists will overwritten. The mechanism in detail is described in the official documentation.

We provide the following files that you can use/combine into the local docker-compose.override.yml file of your installation.

  • docker-compose.override.yml-cert – use TLS encryption with your own certificate from your favourite certificate provider (by default an automatically generated self-signed certificate will be used by traefik, the reverse proxy).

  • docker-compose.override.yml-letsencrypt – use letsencrypt to obtain a certificate.

  • docker-compose.override.yml-cadd – spawn Docker containers for allowing pathogenicity annotation of your variants with CADD.

The overall process is to copy any of the *.override.yml-* files to docker-compose.yml and adjusting it to your need (e.g., merging with another such file).

Note that you could also explicitely provide multiple override files but we do not consider this further. For more information on the override mechanism see the official documentation.

The following sections describe the possible adjustment with Docker Compose override files.

TLS / SSL Configuration

The varfish-docker-compose setup uses traefik as a reverse proxy and must be reconfigured if you want to change the default behaviour of using self-signed certificates.

Use the contents of docker-compose.override.yml-cert for providing your own certificate. You have to put the cerver certificate and key into config/traefik/tls/server.crt and server.key and then restart the traefik container. Make sure to provide the full certificate chain if needed (e.g., for DFN issued certificates).

If your site is reachable from the internet then you can also use the contents of docker-compose.override.yml-letsencrypt which will use [letsencrypt](https://letsencrypt.org/) to obtain the certificates. Make sure to adjust the line with --certificatesresolvers.le.acme.email= to your email address. Note well that if you make your site reachable from the internet then you should be aware of the implications. VarFish is MIT licensed software which means that it comes “without any warranty of any kind”, see the LICENSE file for details.

After changing the configuration, restart the site (e.g., with docker-compose down && docker-compose up -d if it is running in detached mode).

LDAP Configuration

VarFish can be configured to use up to two upstream LDAP servers (e.g., OpenLDAP or Microsoft Active Directory). For this, you have to set the following environment variables in the file .env in your varfish-docker-compose checkout and restart the site. The variables are given with their default values.

ENABLE_LDAP=0

Enable primary LDAP authentication server (values: 0, 1).

AUTH_LDAP_SERVER_URI=

URI for primary LDAP server (e.g., ldap://ldap.example.com:port or ldaps://...).

AUTH_LDAP_BIND_DN=

Distinguished name (DN) to use for binding to the LDAP server.

AUTH_LDAP_BIND_PASSWORD=

Password to use for binding to the LDAP server.

AUTH_LDAP_USER_SEARCH_BASE=

DN to use for the search base, e.g., DC=com,DC=example,DC=ldap

AUTH_LDAP_USERNAME_DOMAIN=

Domain to use for user names, e.g. with EXAMPLE users from this domain can login with user@EXAMPLE.

AUTH_LDAP_DOMAIN_PRINTABLE=${AUTH_LDAP_USERNAME_DOMAIN}

Domain used for printing the user name.

If you have the first LDAP configured then you can also enable the second one and configure it.

ENABLE_LDAP_SECONDARY=0

Enable secondary LDAP authentication server (values: 0, 1).

The remaining variable names are derived from the ones of the primary server but using the prefix AUTH_LDAP2 instead of AUTH_LDAP.

SAML Configuration

Besides LDAP configuration, it is also possible to authenticate with existing SAML 2.0 ID Providers (e.g. Keycloak). Since varfish is built on top of sodar core, you can also refer to the sodar-core documentation for further help in configuring the ID Providers.

To enable SAML authentication with your ID Provider, a few steps are necessary. First, add a SAML Client for your ID Provider of choice. The sodar-core documentation features examples for Keycloak. Make sure you have assertion signing turned on and allow redirects to your varfish site. The SAML processing URL should be set to the externally visible address of your varfish deployment, e.g. https://varfish.example.com/saml2_auth/acs/.

Next, you need to obtain your metadata.xml aswell as the signing certificate and key file from the ID Provider. Make sure you convert these keys to standard OpenSSL format, before starting your varfish instance (you can find more details here). If you deploy varfish without docker, you can pass the file paths of your metadata.xml and key pair directly. Otherwise, make sure that you have included them into a single folder and added the corresponding folder to your docker-compose.yml (or add it as a docker-compose-overrrided.yml), like in the following snippet.

varfish-web:
  ...
  volumes:
    - "/path/to/my/secrets:/secrets:ro"

Then, define atleast the following variables in your docker-compose .env file (or the environment variables when running the server natively).

ENABLE_SAML

[Default 0] Enable [1] or Disable [0] SAML authentication

SAML_CLIENT_ENTITY_ID

The SAML client ID set in the ID Provider config (e.g. “varfish”)

SAML_CLIENT_ENTITY_URL

The externally visible URL of your varfish deployment

SAML_CLIENT_METADATA_FILE

The path to the metadata.xml file retrieved from your ID Provider. If you deploy using docker, this must be a path inside the container.

SAML_CLLIENT_IDP

The url to your IDP. In case of keycloak it can look something like https://keycloak.example.com/auth/realms/<my_varfish_realm>

SAML_CLIENT_KEY_FILE

Path to the SAML signing key for the client.

SAML_CLIENT_CERT_FILE

Path to the SAML certificate for the client.

SAML_CLIENT_XMLSEC1

[Default /usr/bin/xmlsec1] Path to the xmlsec executable.

By default, the SAML attributes map is configured to work with Keycloak as SAML Auth provider. If you are using a different ID Provider, or different settings you also need to adjust the SAML_ATTRIBUTES_MAP option.

SAML_ATTRIBUTES_MAP

A dictionary identifying the SAML claims needed to retrieve user information. You need to set atleast email, username, first_name and last_name. Example: SAML_ATTRIBUTES_MAP="email=email,username=uid,first_name=firstName,last_name=name"

To set initial user permissions on first login, you can use the following options:

SAML_NEW_USER_GROUPS

Comma separated list of groups for a new user to join.

SAML_NEW_USER_ACTIVE_STATUS

[Default True] Whether a new user is considered active.

SAML_NEW_USER_STAFF_STATUS

[Default True] New users get the staff status.

SAML_NEW_USER_SUPERUSER_STATUS

[Default False] New users are marked superusers (I advise leaving this one alone).

If you encounter any troubles with this rather involved procedure, feel free to take a look at the discussion forums on github and open a thread.

Sending of Emails

You can configure VarFish to send out emails, e.g., when permissions are granted to users.

PROJECTROLES_SEND_EMAIL=0

Enable sending of emails.

EMAIL_SENDER=

String to use for the sender, e.g., noreply@varfish.example.com.

EMAIL_SUBJECT_PREFIX=

Prefix to use for email subjects, e.g., [VarFish].

EMAIL_URL=

URL to the SMTP server to use, e.g., smtp://user:password@mail.example.com:1234.

External Postgres Server

In some setups, it might make sense to run your own Postgres server. The most common use case would be that you want to run VarFish in a setting where fast disks are not available (virtual machines or in a “cloud” setting). You might still have a dedicated, fast Postgres server running (or available as a service from your cloud provider). In this case, you can configure the database connection settings as follows.

DATABASE_URL=postgresql://postgres:password@postgres/varfish

Adjust to the credentials, server, and database name that you want to use.

The default settings do not make for secure settings in the general case. However, Docker Compose will create a private network that is only available to the Docker containers. In the default docker-compose setup, postgres server is thus not exposed to the outside and only reachable by the VarFish web server and queue workers.

Miscellaneous Configuration

VARFISH_LOGIN_PAGE_TEXT

Text to display on the login page.

FIELD_ENCRYPTION_KEY

Key to use for encrypting secrets in the database (such as saved public keys for the Beacon Site feature). You can generate such a key with the following command: python -c 'import os, base64; print(base64.urlsafe_b64encode(os.urandom(32)))'.

VARFISH_QUERY_MAX_UNION

Maximal number of cases to query for at the same time for joint queries. Default is 20.

Sentry Configuration

Sentry is a service for monitoring web apps. Their open source version can be installed on premise. You can configure sentry support as follows

ENABLE_SENTRY=0

Enable Sentry support.

SENTRY_DSN=

A sentry DSN to report to. See Sentry documentation for details.

HGMD Professional Documentation

Users can enable a gene and variant wise link-out to HGMD professional as follows.

VARFISH_ENABLE_HGMD_PRO_LINKOUT=0

Enable HGMD Professional link-out.

VARFISH_HGMD_PRO_LINKOUT_URL_PREFIX=https://my.qiagendigitalinsights.com/bbp/view/hgmd/pro/](https://my.qiagendigitalinsights.com/bbp/view/hgmd/pro

Configure the URL prefix for HGMD Professional link-outs.

System and Docker (Compose) Tweaks

A number of customizations customizations of the installation can be done using Docker or Docker Compose. Other customizations have to be done on the system level. This section lists those that the authors are aware of but in particular network-related settings can be done on many levels.

Using Non-Default HTTP(S) Ports

If you want to use non-standard HTTP and HTTPS ports (defaults are 80 and 443) then you can tweak this in the traefik container section. You have to adjust two parts, below we give them separately with full YAML “key” paths.

services:
  traefik:
    ports:
      - "80:80"
      - "443:443"

To listen on ports 8080 and 8443 instead, your override file should have:

services:
traefik:
ports:
  • “8080:80”

  • “8443:443”

Also, you have to adjust the command line arguments to traefik for the web (HTTP) and websecure (HTTPS) entrypoints.

services:
  traefik:
    command:
      # ...
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"

Use the following in your override file.

services:
  traefik:
    command:
      # ...
      - "--entrypoints.web.address=:8080"
      - "--entrypoints.websecure.address=:8443"

Based on the docker-compose.yml file alone, your docker-compose.override.yml file should contain the following line. You will have to adjust the file accordingly if you want to use a custom static certificate or letsencrypt by incorporating the files from the provided example docker-compose.override.yml-* files.

services:
  traefik:
    ports:
      - "8080:80"
      - "8443:443"
    command:
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.web.http.redirections.entryPoint.to=websecure"
      - "--entrypoints.web.http.redirections.entryPoint.scheme=https"
      - "--entrypoints.web.http.redirections.entrypoint.permanent=true"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"

Then, restart by calling docker-compose up -d in the directory with the docker-compose.yml file.

Listing on Specific IPs

By default, the traefik container will listen on all IPs and interfaces of the host machine.

You can change this by prefixing the ports list with the IPs to listen on. The settings to adjust here are:

services:
  traefik:
    ports:
      - "80:80"
      - "443:443"

And they need to be overwritten as follows in your override file.

services:
  traefik:
    ports:
      - "10.0.0.1:80:80"
      - "10.0.0.1:443:443"

More details can be found in the corresponding section of the Docker Compose manual. Of course, you can combine this with adjusting the ports, e.g., to 10.0.0.1:8080:80 etc.

Limit Incoming Traffic

In some settings you might want to limit incoming traffic to certain networks / IP ranges. In principle, this is possible with adjusting the Traefik load balancer/reverse proxy. However, we would recommend you to use the firewall of your operating system or your overall network for this purpose. Consult the corresponding manual (e.g., of firewalld for CentOS/Red Hat or of ufw for Debian/Ubuntu) for instructions. We remark that in most cases it is better to perform an actual separation of networks and place each (virtual) machine into one network only.

Understanding Volumes

The volumes sub directory of the varfish-docker-compose directory contains the data for the containers. These are as follows.

cadd-rest-api

Databases for variant annotation with CADD (large).

exomiser

Databases for variant prioritization (medium)

jannovar

Transcript databases for annotation (small).

minio

Storage for files uploaded from client via REST API (big).

postgres

PostgreSQL databases (very big).

redis

Storage for the work queues (small).

traefik

Configuration and certificates for load balancer (very small).

In principle, you can put these on different storages systems (e.g., some over the network and some on directly attached disks). The main motivation is that fast storage is expensive. Putting the small and medium sized directories on slower, cheaper storage will have little or no effect on storage efficiency. At the same time, access to redis and exomiser directories should be fast. As for postgres, this storage is accessed most heavily and should be on storage as fast as you can afford. cadd-rest-api should also be on fast storage but it is accessed almost only read-only. You can put the minio folder on slower storage to shave off some storage costs from your VarFish installation.

To summarize:

  • You can put minio on cheaper storage.

  • As for cadd-rest-api, you can probably get away to put this on cheaper storage.

  • Put everything else, in particular postgres on storage as fast as you can afford.

As described in the section Performance Tuning, the authors recommend using an advanced file system such as ZFS on multiple SSDs for large, fast storage and enabling compression. You will get excellent performance and can expect storage saving of 50%.

Beacon Site (Experimental)

An experimental support for the GA4GH beacon protocol.

VARFISH_ENABLE_BEACON_SITE=

Whether or not to enable experimental beacon site support.

Undocumented Configuration

The following list remains a points to implement with Docker Compose and document.

  • Kiosk Mode

  • Updating Extras Data

Ingesting Variants

This step describes how to ingest data into VarFish, that is

  1. annotating variants and preparing them for import into VarFish

  2. actually importing them into VarFish.

All of the steps below assume that you are running the Linux operating system. It might also work on Mac OS but is curently unsupported.

Variant Annotation

In order to import a VCF file with SNVs and small indels, the file has to be prepared for import into VarFish server. This is done using the Varfish Annotator software.

Installing the Annotator

The VarFish Annotator is written in Java and you can find the JAR on varfish-annotator Github releases page. However, it is recommended to install it via bioconda. For this, you first have to install bioconda as described in their manual. Please ensure that you have the channels conda-forge, bioconda, and defaults set in the correct order as described in the bioconda installation manual. A common pitfall is to forget the channel setup and subsequent failure to install varfish-annotator.

The next step is to install the varfish-annotator-cli package or create a conda environment with it.

# EITHER
$ conda install -y varfish-annotator-cli==0.14.0
# OR
$ conda create -y -n varfish-annotator varfish-annotator-cli==0.14.0
$ conda activate varfish-annotator

As a side remark, you might consider installing mamba first and then using mamba install and create in favour of conda install and create.

Obtaining the Annotator Data

The downloaded archive has a size of ~10 GB while the extracted data has a size of ~55 GB.

$ GENOME=grch37      # alternatively use grch38
$ RELEASE=20210728
$ mkdir varfish-annotator-20210728-$GENOME
$ cd varfish-annotator-20210728-$GENOME
$ wget --no-check-certificate \
    https://file-public.cubi.bihealth.org/transient/varfish/anthenea/varfish-annotator-db-$RELEASE-$GENOME.h2.db.gz{,.sha256} \
    https://file-public.cubi.bihealth.org/transient/varfish/anthenea/jannovar-db-$RELEASE-$GENOME.tar.gz{,.sha256}
$ sha256sum --check varfish-annotator-db-$RELEASE-$GENOME.h2.db.gz.sha256
varfish-annotator-db-20210728-grch37.h2.db.gz: OK
$ sha256sum --check jannovar-db-$RELEASE-$GENOME.tar.gz.sha256
jannovar-db-20210728-grch37.tar.gz: OK
$ gzip -d varfish-annotator-db-$RELEASE-$GENOME.h2.db.gz
$ tar xf jannovar-db-$RELEASE-$GENOME.tar.gz
$ rm jannovar-db-20210728-$RELEASE.tar.gz{,.sha256} \
    varfish-annotator-db-$RELEASE-$GENOME.h2.db.gz.sha256
$ mv jannovar-db-$RELEASE-$GENOME/* .
$ rmdir jannovar-db-$RELEASE-$GENOME

Annotating VCF Files

First, obtain some tests data for annotation and later import into VarFish Server.

# use $GENOME and $RELEASE from above
$ wget --no-check-certificate \
    https://file-public.cubi.bihealth.org/transient/varfish/anthenea/varfish-test-data-v1-20211125.tar.gz{,.sha256}
$ sha256sum --check varfish-test-data-v1-20211125.tar.gz.sha256
varfish-test-data-v1-20211125.tar.gz: OK
$ tar -xf varfish-test-data-v1-20211125.tar.gz
varfish-test-data-v1-20211125/
...
varfish-test-data-v1-20211125/GRCh37/vcf/HG00107-N1-DNA1-WES1/bwa.gatk_hc.HG00107-N1-DNA1-WES1.vcf.gz
...
Annotating Small Variant VCFs

Next, you can use the varfish-annotator command. You must provide an bgzip-compressed VCF file INPUT.vcf.gz

 1# Use the path to the FASTA file that you used for alignment.
 2$ REFERENCE=path/to/hs37fa.fa--or--hs38.fa
 3# use $GENOME and $RELEASE from above
 4$ varfish-annotator \
 5    -XX:MaxHeapSize=10g \
 6    -XX:+UseConcMarkSweepGC \
 7    annotate \
 8    --db-path varfish-annotator-20210728-$GENOME/varfish-annotator-db-$RELEASE-$GENOME.h2.db \
 9    --ensembl-ser-path varfish-annotator-20210728-$GENOME/ensembl*.ser \
10    --refseq-ser-path varfish-annotator-20210728-$GENOME/refseq_curated*.ser \
11    --ref-path $REFERENCE \
12    --input-vcf "INPUT.vcf.gz" \
13    --release "$GENOME" \
14    --output-db-info "FAM_name.db-infos.tsv" \
15    --output-gts "FAM_name.gts.tsv" \
16    --case-id "FAM_name"

Let us disect this call. The first three lines contain the code to the wrapper script and some arguments for the java binary to allow for enough memory when running.

1$ varfish-annotator \
2    -XX:MaxHeapSize=10g \
3    -XX:+UseConcMarkSweepGC \

The next lines use the annotate sub command and provide the needed paths to the database files needed for annotation. The .h2.db file contains information from variant databases such as gnomAD and ClinVar. The .ser file are transcript databases used by the Jannovar library. The .fa file is the path to the genome reference file used. While only release GRCh37/hg19 is supported, using a file with UCSC-style chromosome names having chr prefixes would also work.

4--db-path varfish-annotator-20210728-$GENOME/varfish-annotator-db-$RELEASE-$GENOME.h2.db \    --ensembl-ser-path varfish-annotator-20210728-$GENOME/ensembl*.ser \    --refseq-ser-path varfish-annotator-20210728-$GENOME/refseq_curated*.ser \    --ref-path $REFERENCE \

The following lines provide the path to the input VCF file, specify the release name (must be GRCh37) and the name of the case as written out. This could be the name of the index patient, for example.

9--input-vcf "INPUT.vcf.gz" \    --release "GRCh37" \    --case-id "index" \

The last lines

12--output-db-info "FAM_name.db-info.tsv" \    --output-gts "FAM_name.gts.tsv"

After the program terminates, you should create gzip files for the created TSV files and md5 sum files for them.

$ gzip -c FAM_name.db-info.tsv >FAM_name.db-info.tsv.gz
$ md5sum FAM_name.db-info.tsv.gz >FAM_name.db-info.tsv.gz.md5
$ gzip -c FAM_name.gts.tsv >FAM_name.gts.tsv.gz
$ md5sum FAM_name.gts.tsv.gz >FAM_name.gts.tsv.gz.md5

The next step is to import these files into VarFish server. For this, a PLINK PED file has to be provided. This is a tab-separated values (TSV) file with the following columns:

  1. family name

  2. individul name

  3. father name or 0 for founder

  4. mother name or 0 for founder

  5. sex of individual, 1 for male, 2 for female, 0 if unknown

  6. disease state of individual, 1 for unaffected, 2 for affected, 0 if unknown

For example, a trio would look as follows:

FAM_index   index       father  mother  2       2
FAM_index   father      0       0       1       1
FAM_index   mother      0       0       2       1

while a singleton could look as follows:

FAM_index   index       0       0       2       1

Note that you have to link family individuals with pseudo entries that have no corresponding entry in the VCF file. For example, if you have genotypes for two siblings but none for the parents:

FAM_index   sister      father  mother  2       2
FAM_index   broth       father  mother  2       2
FAM_index   father      0       0       1       1
FAM_index   mother      0       0       2       1
Annotating Structural Variant VCFs

Structural variants can be annotated as follows.

 1# use $GENOME from above
 2$ varfish-annotator \
 3    annotate-svs \
 4    -XX:MaxHeapSize=10g \
 5    -XX:+UseConcMarkSweepGC \
 6    \
 7    --default-sv-method=YOURCALLERvVERSION"
 8    --release $GENOME \
 9    \
10    --db-path varfish-annotator-20210728-$GENOME/varfish-annotator-db-$RELEASE-$GENOME.h2.db \
11    --ensembl-ser-path varfish-annotator-20210728-$GENOME/ensembl*.ser \
12    --refseq-ser-path varfish-annotator-20210728-$GENOME/refseq_curated*.ser \
13    \
14    --input-vcf FAM_sv_calls.vcf.gz \
15    --output-db-info FAM_sv_calls.db-info.tsv \
16    --output-gts FAM_sv_calls.gts.tsv
17    --output-feature-effects CASE_SV_CALLS.feature-effects.tsv

Note

varfish-annotator annotate-svs will write out the INFO/SVMETHOD column to the output file. If this value is empty then the value from --default-sv-method will be used. You must either provide INFO/SVMETHOD or --default-sv-method. Otherwise, you will get errors in the import step (visible in the case import background task view).

You can use the following shell snippet for adding INFO/SVMETHOD to your VCF file properly. Replace YOURCALLERvVERSION with the value that you want to provide to Varfish.

cat >$TMPDIR/header.txt <<"EOF"
##INFO=<ID=SVMETHOD,Number=1,Type=String,Description="Type of approach used to detect SV">
EOF

bcftools annotate \
    --header-lines $TMPDIR/header.txt \
    INPUT.vcf.gz \
| awk -F $'\t' '
    BEGIN { OFS = FS; }
    /^#/ { print $0; }
    /^[^#]/ { $8 = $8 ";SVMETHOD=YOURCALLERvVERSION"; print $0; }
    ' \
| bgzip -c \
> OUTPUT.vcf.gz
tabix -f OUTPUT.vcf.gz

Again, you have have to compress the output TSV files with gzip and compute MD5 sums.

$ gzip -c FAM_sv_calls.db-info.tsv >FAM_sv_calls.db-info.tsv.gz
$ md5sum FAM_sv_calls.db-info.tsv.gz >FAM_sv_calls.db-info.tsv.gz.md5
$ gzip -c FAM_sv_calls.gts.tsv >FAM_sv_calls.gts.tsv.gz
$ md5sum FAM_sv_calls.gts.tsv.gz >FAM_sv_calls.gts.tsv.gz.md5
$ gzip -c FAM_sv_calls.feature-effects.tsv >FAM_sv_calls.feature-effects.tsv.gz
$ md5sum FAM_sv_calls.feature-effects.tsv.gz >FAM_sv_calls.feature-effectstsv.gz.md5

Variant Import

As a prerequisite you need to install the VarFish command line interface (CLI) Python app varfish-cli. You can install it from PyPi with pip install varfish-cli or from Bioconda with conda install varfish-cli.

Second, you need to create a new API token as described in API Token Management. Then, setup your Varfish CLI configuration file ~/.varfishrc.toml as:

[global]
varfish_server_url = "https://varfish.example.com/"
varfish_api_token = "XXX"

Now you can import the data that you imported above. You will also find some example files in the test-data directory.

For the import you will also need the project UUID. You can get this from the URLs in VarFish that list project properties. The figure below shows this for the background job list but this also works for the project details view.

$ varfish-cli --no-verify-ssl case create-import-info --resubmit \
    94777783-8797-429c-870d-c12bec2dd6ea \
    test-data/tsv/HG00102-N1-DNA1-WES1/*.{tsv.gz,.ped}

When executing the import as shown above, you have to specify:

  • a pedigree file with suffix .ped,

  • a genotype annotation file as generated by varfish-annotator ending in .gts.tsv.gz,

  • a database info file as generated by varfish-annotator ending in .db-info.tsv.gz.

Optionally, you can also specify a TSV file with BAM quality control metris ending in .bam-qc.tsv.gz. Currently, the format is not properly documented yet but documentation and supporting tools are forthcoming.

If you want to import structural variants for your case, then you simply submit the output files from the SV annotation step together with the the .feature-effects.tsv.gz and .gts.tsv.gz files from the small variant annotation step.

Running the import command through VarFish CLI will create a background import job as shown below. Once the job is done, the created or updated case will appear in the case list.

_images/admin_import.png

Case Quality Control

You can provide an optional TSV file with case quality control data. The file name should end in .bam-qc.tsv.gz and also accompanied with a MD5 file. The format is a bit peculiar and will be documented better in the future.

The TSV file has three columns and starts with the header.

case_id     set_id      bam_stats

It is then followed by exactly one line where the first two fields have to have the value of a dot (.). The last row is then a PostgreSQL-encoded JSON dict with the per-sample quality control information. You can obtain the PostgreSQL-encoding by replacing all string delimiters (") with three ones (""""`).

The format of the JSON file is formally defined in varfish-server case QC info.

Briefly, the keys of the top level dict are the sample names as in the case that you upload. On the second level:

bamstats

The keys/values from the output of the samtools stats command.

min_cov_target

Coverage histogram per target (the smallest coverage per target/exon counts for the whole target). You provide the start of each bin, usually starting at "0", in increments of 10, up to "200". The keys are the bin lower bounds, the values are of JSON/JavaScript number type, so floating point numbers.

min_cov_base

The same information as min_cov_target but considering coverage base-wise and not target-wise.

summary

A summary of the target information.

idxstats

A per-chromosome count of mapped and unmapped reads as returned by the samtools idxstats command.

You can find the example of a real-world JSON QC file below for the first sample.

{
  "index": {
    "bamstats": {
      "raw total sequences": 154189250,
      "filtered sequences": 0,
      "sequences": 154189250,
      "is sorted": 1,
      "1st fragments": 77094625,
      "last fragments": 77094625,
      "reads mapped": 153919815,
      "reads mapped and paired": 153863370,
      "reads unmapped": 269435,
      "reads properly paired": 153071356,
      "reads paired": 154189250,
      "reads duplicated": 7273644,
      "reads MQ0": 2701485,
      "reads QC failed": 0,
      "non-primary alignments": 129724,
      "total length": 19427845500,
      "total first fragment length": 9713922750,
      "total last fragment length": 9713922750,
      "bases mapped": 19393896690,
      "bases mapped (cigar)": 19238950186,
      "bases trimmed": 0,
      "bases duplicated": 916479144,
      "mismatches": 61093079,
      "error rate": 0.003175489,
      "average length": 126,
      "average first fragment length": 126,
      "average last fragment length": 126,
      "maximum length": 126,
      "maximum first fragment length": 126,
      "maximum last fragment length": 126,
      "average quality": 35,
      "insert size average": 192.6,
      "insert size standard deviation": 54.3,
      "inward oriented pairs": 73269191,
      "outward oriented pairs": 3391556,
      "pairs with other orientation": 12579,
      "pairs on different chromosomes": 258359,
      "percentage of properly paired reads (%)": 99.3
    },
    "min_cov_target": {
      "0": 100,
      "10": 87.59,
      "190": 12.31,
      "200": 10.74
    },
    "min_cov_base": {
      "0": 100,
      "10": 95.89,
      "190": 46.55,
      "200": 43.88
    },
    "summary": {
      "mean coverage": 206.69,
      "target count": 232447,
      "total target size": 57464133
    },
    "idxstats": {
      "1": {
        "mapped": 14553406,
        "unmapped": 5166
      },
      "MT": {
        "mapped": 10058,
        "unmapped": 7
      },
      "*": {
        "mapped": 0,
        "unmapped": 212990
      }
    }
  },
  "father": {
    "bamstats": {

Performance Tuning

This chapter describes how to optimize the performance of VarFish and its components. Mainly, this amounts to optimizing the hardware and software of the PostgreSQL server used by VarFish. The audience of this chapter are those who have installed VarFish on their own infrastructure.

Selecting Hardware

Hardware selection is the most critical point. The sizing of CPU and RAM is not so critical for VarFish. 16 CPU cores and 64 GB of RAM should be good to start with while more will not hurt and is not that expensive these days. The focus should be in using a server with fast disk I/O.

From the author’s experience the ideal build consists of

  • multiple SSD disk,

  • host bus adapter (as opposed to a RAID controller),

  • using a ZFS setup.

The SSDs offer overall good throughput and excellent random I/O performance in particular. They should appear as block devices (e.g., sda) to the operating system such that ZFS can use them properly. You will find that there is some discussion on the best setup of ZFS. We have found ten SSDS in a single raidz2 pool with enabled compression (default) on the file system to offer excellent performance. Further, up to two disks can fail without loss of data.

Of course, you can also use a classic hardware RAID controller. We would advise against storing data on a SAN system and always recommend local disks (aka direct storage). While VarFish will run fine in a virtual machine, you have to take good care that disk access is fast. In particular, the QCOW driver of KVM is known to offer bad performance.

Configuration Tuning

The varfish-docker-compose repository contains a postgresql.conf file with pre-tuned database settings. When using Docker Compose for your VarFish site you will get this configuration automatically. This should be good enough for most instances.

Below are some proposals for starting points on tuning configuration. Please consult the Postgres configuration documentation on all settings. You will also find many resources on Postgres performance tuning on the internet using your favourite search engine.

ZFS optimization. In the case that you store your database files on a ZFS file system you can try setting the full_page_writes setting to off. This will improve the write performance and according to various sources ZFS file systems are “torn page resilient” which prevents data loss.

full_page_writes = off  # only do this on ZFS (!)

SSD optimization. If you are using SSDs then you can adjust the value of random_page_cost. This value helps the Postgres query planner to estimate the cost of random vs. sequential data access. For SSDs, you can set this to 1.1:

random_page_cost = 1.1  # optimized for SSD

Placing Tables and Indices

In principle, you can the table space feature of PostgreSQL to move certain tables and indices to different storage classes. The following tables and their indices are large and read-only after the initial import.

conservation_knowngeneaa
dbsnp_dbsnp
frequencies_*
extra_annos_*

Moving them to cheaper storage with higher latency than the rest of the data might be feasible if you are hard-pressed for saving storage. The authors have not tried this and would be very interested in experience reports.

Reference Times

For reference, here are some timings for importing the background database on different hardware.

Reference background data import times

Data

VarFish

Postgres

Storage

File System

Time

20210728-grch37

v0.23.9+42

12.9

25xSSD RBD 16.2.7

XFS

13.5h

20210728-grch38

v0.23.9+42

12.9

25xSSD RBD 16.2.7

XFS

15h

And some times for importing exome cases. Note that you can import multiple cases at the same time.

Exome case import time.

Data

VarFish

Postgres

Storage

File System

Time

WES singleton

v0.23.9+42

12.9

25xSSD RBD 16.2.7

XFS

2-3 min

WES trio

v0.23.9+42

12.9

25xSSD RBD 16.2.7

XFS

5-10 min

Upgrade Varfish Installation

This section contains upgrade instructions for upgrading your VarFish Server installation using VarFish Docker Compose.

Problem with Data Release 20210728 and GRCh37

The data release has a problem with the GRCh37 extra annotations. If you can then use the updated site data 20210728b release. If you already have an instance with 20210728 background data then you can use the following data file.

Download and extract the file and mount it as /data inside the varfish-web container. You can then apply the patch to your database with the following command.

$ docker exec -it varfish-docker-compose_varfish-web_1 python /usr/src/app/manage.py \
    import_tables --tables-path /data --truncate --force

You can find out more details, give feedback, and ask for help in this Github discussion.

v0.23.0 to v1.2.0

This includes all version in between, v0.23.1, …, v1.2.0.

Summary

This are minor bug fix releases and small added features. You should be able to upgrade by just updating your varfish-docker-compose repository clone and calling docker-compose up -d.

v0.23.1 to v0.23.2

Summary

This is a minor bug fix release that improved the deployment of the VarFish Demo and Kiosk sites. You should be able to upgrade by just updating your varfish-docker-compose repository clone and calling docker-compose up -d.

v0.22.1 to v0.23.0

Summary

  • The Docker Compose installer now provides support for setting up CADD score annotation via cadd-rest-api.

  • The environment variable FIELD_ENCRYPTION_KEY should be setup properly by the user.

  • Two new celery queues are needed: maintenance and export.

  • To enable the new and optional feature for uploading variants to SPANR you have to set the environment variable VARFISH_ENABLE_SPANR_SUBMISSION to 1.

Detailed Instructions

Docker Compose: cadd-rest-api

Update your varfish-docker-compose installation with the changes from the Github repository without installing cadd-rest-api. This will give you commented out lines for running one cadd-rest-api-server and multiple cadd-rest-api-celeryd-worker-? containers. For enabling them, follow the instructions in Install Scoring with CADD.

Additional Celery Queues

After updating your varfish-docker-compose.yml file, ensure that you the two additional containers varfish-celeryd-maintenance and varfish-celeryd-export. These will run the background jobs for running maintenance tasks and export results. They will be started when running docker-compose up.

Environment Variable: FIELD_ENCRYPTION_KEY

Set the environment variable in the .env file as documented in Miscellaneous Configuration. The default value is also stored in the public repository and thus not very secure.

PAP Configuration

This section describes the setup of VarFish behind a PAP (package filter, application gateway, package filter) structure.

VarFish stores human genetic data which is by its very nature very privacy sensitives. Administrators will thus want to set up VarFish in protected institution networks that are not accessible by the outside world. However, certain data exchange is generally desired, such as connecting two or more VarFish instances with the clinical beacon protocol.

PAP Structure

In such cases, the German agency for information security (BSI) recommends the P-A-P structure (link to 2021 edition of their recommendation). The following figure illustrates the structure

_images/pap-structure.png

Overview of VarFish server behind P-A-P structure.

The structure is as follows:

  • A demilitarized zone (DMZ) network is setup to contain an application gateway. In the case of HTTP(S), this is a reverse proxy.

  • Incoming traffic from the internet passes into the gateway passes through a packetfilter (in other words: firewall).

  • Outgoing traffic out of the gateway passes another packetfilter and it then reaches the destination server in protected network.

The reasoning behind the structure is explained in the NET 3.2 document linked to above. In the following section, we will explain the technical implementation.

Firewall and Network Setup

The German specification NET.3.2.A16 is as follows:

NET.3.2.A16 Aufbau einer “P-A-P” Struktur (S) Eine “Paketfilter - Application-Level-Gateway - Paketfilter”-(P-A-P)-Struktur SOLLTE eingesetzt werden. Sie MUSS aus mehreren Komponenten mit jeweils dafür geeigneter Hard- und Software bestehen. Für die wichtigsten verwendeten Protokolle SOLLTEN Sicherheitsproxies auf Anwendungsschicht vorhanden sein. Für andere Dienste SOLLTEN zumindest generische Sicherheitsproxies für TCP und UDP genutzt werden. Die Sicherheitsproxies SOLLTEN zudem innerhalb einer abgesicherten Laufzeitumgebung des Betriebssystems ablaufen.

Which translates into English roughly as follows:

NET.3.2.A16 Creating a “P-A-P” Structure (S) A “packet filter - application level gateway - packet filter”-(P-A-P)-Structure SHOULD be used. It MUST consist of multiple components with appropriate hardware and software. For the most important protocols, security proxies SHOULD exist on the application layer. For other services, at least generic security proxies for TCP and UDP SHOULD be used. The security proxies SHOULD run inside a secured runtime enviornment of the operating system.

A possible implementation looks as follows:

  • The VarFish server runs in the internal network with IP 10.0.10.10.

  • Create a separate VLAN for the PAP structure and use a /30 (or lower) CIDR prefix. Only place proxy services there, ideally only one.

    • Example: use 1.2.3.0/30 with IP gateway 1.2.3.1 and application gateway server 1.2.3.2.

  • Configure the firewall to allow incoming traffic via HTTPS (TCP/443) to 1.2.3.2 only.

  • Allow outgoing traffic from 192.168.0.1 via the packet filter to 10.0.10.10 via HTTPS (TCP/443) only.

The following section describes how to setup a Linux Docker container with the traefik reverse proxy. To the authors’ best understanding, this fulfills all of the required and optional rules for P-A-P by BSI.

Traefik Reverse Proxy Setup

Traefik is a versatile reverse proxy (and load balancer). It works well with Docker but configuring it can be a bit daunting for beginners. The following describes a straightforward and minimal setup.

Preparation:

  1. Install a modern Linux server on the gateway server (1.2.3.2 from above)

  2. On the server, install Docker following the official instructions

  3. Also install Docker Compose with the official instructinos

  4. Setup public DNS (e.g., varfish-ext.example.com) to point to 1.2.3.2 and ensure that public resolvers can resolve it (e.g., Google DNS at 8.8.8.8)

  5. We assume that your internal VarFish instance is available as varfish-int.example.com and it is setup with a valid TLS certificate.

  6. Collect the public IPs of the hosts on the internet that you want to be able to access your VarFish instance. These might be cluster IPs if the remote servers are behind NAT. In the example below we use the sub network 2.3.4.0/28 and IP 3.4.5.6 as valid sources.

First, create some directories with the following command:

# mkdir -p /etc/reverse-proxy
# mkdir -p /etc/reverse-proxy/var/traefik
# mkdir -p /etc/reverse-proxy/etc/trafik
# mkdir -p /etc/reverse-proxy/etc/trafik/conf.d

Now, create the file /etc/reverse-proxy/docker-compose.yaml as follows.

/etc/reverse-proxy/docker-compose.yaml
version: "2"

services:
  traefik:
    image: traefik:latest
    restart: always
    ports:
      - "443:443"
    networks:
      - web
    volumes:
      - ./var/traefik:/var/traefik:rw
      - ./etc/traefik:/etc/traefik:ro
    container_name: traefik

networks:
  web:

This will create a new container named traefik with the latest version of Traefik. The container goes into its own network and the port 443 is exposed. The container can read /etc/reverse-proxy/traefik as /etc/traefik via a bind mount and read and write /etc/reverse-proxy/var/traefik as /var/traefik. The first will contain configuration, the latter will be used for storing letsencrypt certificate generation state

Next, create /etc/reverse-proxy/etc/traefik/traefik.yaml and /etc/reverse-proxy/etc/traefik/conf.d/dynamic_config.yaml

/etc/reverse-proxy/etc/traefik/traefik.yaml
entryPoints:
  websecure:
    address: ":443"

providers:
  file:
    directory: /etc/traefik/conf.d
  docker:
    exposedByDefault: false

certificatesResolvers:
  le:
    acme:
      email: youremail@example.com
      storage: /var/traefik/acme.json
      tlsChallenge: true

This will setup traefik correctly using letsencrypt certificate.

Note

Regarding use of “legacy” technical language. Please note that the term ipwhitelist below is part of the traefik configuration syntax. We will update our documentation once updated terms are available.

/etc/reverse-proxy/etc/traefik/conf.d/dynamic_config.yaml
# (1) TLS store
tls:
  stores:
    default: {}

http:
  # (2) set routing source for reverse proxy
  routers:
    varfish:
      middlewares:
        - varfish-add-prefix
        - varfish-ip-allowlist
      entryPoints:
        - websecure
      service: varfish
      rule: "Host(`varfish-ext.example.com`)"
      tls:
        certresolver: le
  # (3) routing destination for the reverse proxy
  services:
    varfish:
      loadBalancer:
        servers:
          - url: "https://varfish-int.bihealth.org"

  middlewares:
    # (4) expose only beaconsite endpoint
    varfish-add-prefix:
      addprefix:
        prefix: "/beaconsite/endpoint"
    varfish-ip-allowlist:
      ipwhitelist:
        sourcerange: "2.3.4.0/28,3.4.5.6"

This will setup the

  1. TLS store for the certificates

  2. routing source and

  3. routing destination for the reverse proxy

  4. automatically add /beaconsite/endpoint prefix so only the beaconsite endpoint is exposed, and

  5. restrict access to the given source sites.

You can now startup the reverse proxy:

# cd /etc/reverse-proxy
# docker-compose up -d

You can inspect the logs by using docker logs --tail=100 --follow traefik. You can increase the log verbosity by placing the following block on top of traefik.yaml.

log:
  level: DEBUG

Data Backups

This section describes how to create data backups in VarFish. The assumption is that you are running VarFish in the recommended way via Docker Compose.

All valuable state is kept in the VarFish PostgreSQL database. VarFish provides a convenient way to call the PostgreSQL tool pg_dump.

You can call it in the following way when VarFish is running under Docker Compose and the postgres container is running as well.

# docker exec -it varfish-docker-compose_varfish-web_1 \
    python /usr/src/app/manage.py pg_dump --mode=MODE

This will execute python /usr/src/app/manage.py pg_dump --mode=MODE in the docker container that is running the VarFish web server.

You can use one of the following dump modes.

full

This will perform a full data dump including all background data.

backup-large

This will exclude the huge background data tables, e.g., dbSNP and gnomAD.

backup-small

This will also exclude all imported variant data. The assumption is that you have a separate backup of the imported TSV files or can easily regenerate them from the VCF files that you still have.

Here is an example on how to create a compressed “small” dump file named varfish-${day_of_week}.sql.gz such that you get a rotating daily dump.

# docker exec -it varfish-docker-compose_varfish-web_1 \
    python /usr/src/app/manage.py pg_dump --mode=MODE \
  | gzip -c \
  > varfish-$(date +%a).sql.gz

Introduction

This part describes strategies and procedures for the filtration of germline variant cases using the VarFish platform. It is meant as an addition to the standard VarFish manual in that it does not explain the individual VarFish functions in detail. Instead, it provides detailed instructions on how to filter cases from germline cases and contains proposed values and threshold for filter setting.

Intended Audience

The intended reader both has a good understanding of human/medical genetics high-throughput sequencing variant analysis (whole genome sequencing, or targeted/exome sequencing) and the resulting variant types. Further, the reader is interested in clinical genetics and the identification of pathogenic variants in Mendelian (monogenetic) disorders. The reader comes from a clinical diagnostics or research setting (or both). Thus, the overall aim is not to fundamentally educate in the application of high-throughput sequencing in a clinical settings. Rather, it provides instructions how to use VarFish for this application.

Structure of the SOPs

The term SOP (standard operating procedures) is meant here as a best effort to create reproducible approaches for causative variant identification in a research setting. The SOPs contained herein can serve as a starting point of creating actual clinical SOPs with adjustments to the clinical and laboratory setting. Of course, they should also be refined for the reader’s actual laboratory setting when used in a research setting as well.

Generally, all SOPs have the sections Aims/Scope, Results, Steps, and Thresholds. They document

  • the considered scope (and what is out of scope),

  • the expected result (and thus provide some guideline of what to check against),

  • the individual steps (in such brevity that each SOP fits on 1-2, ideally 1, page), and

  • finally the thresholds used for the individual thresholds with some reasoning (the thresholds are the largest reason for the second page).

References and Disclaimer

We expect the reader to be familiar with the relevant literature, including the following guidelines:

  • Richards, S., Aziz, N., Bale, S., Bick, D., Das, S., Gastier-Foster, J., … & Voelkerding, K. (2015). Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in medicine, 17(5), 405.

  • Ellard, S., Baple, E. L., Owens, M., Eccles, D. M., Abbs, S., Deans, Z. C., … & McMullan, D. J. (2017). ACGS best practice guidelines for variant classification 2017. ACGS Guidelines.

We close this introduction by emphasizing that VarFish is for research use only and by quoting the disclaimer of the ACMG guidelines that apply to the VarFish manual and the following SOPs in spirit as well.

These ACMG Standards and Guidelines were developed primarily as an educational resource for clinical laboratory geneticists to help them provide quality clinical laboratory services. Adherence to these standards and guidelines is voluntary and does not necessarily assure a successful medical outcome. These Standards and Guidelines should not be considered inclusive of all proper procedures and tests or exclusive of other procedures and tests that are reasonably directed to obtaining the same results. In determining the propriety of any specific procedure or test, the clinical laboratory geneticist should apply his or her own professional judgment to the specific circumstances presented by the individual patient or specimen. Clinical laboratory geneticists are encouraged to document in the patient’s record the rationale for the use of a particular procedure or test, whether or not it is in conformance with these Standards and Guidelines. They also are advised to take notice of the date any particular guideline was adopted and to consider other relevant medical and scientific information that becomes available after that date. It also would be prudent to consider whether intellectual property interests may restrict the performance of certain tests and other procedures.

Supporting SOPs

This appendix contains SOPs that do not directly deal with variant filtration but are supportive in the the workflow of causative variant identification in mendelian diseases from high-throughput sequencing data.

SOP: Quality Control

Aims and Scope

This SOP explains how to use VarFish to get a gauge of the quality of the exome at hand. For this, VarFish provides technical metrics such as exon depth of coverage and metrics that allow inference about the donor and thus allow to detect sample swaps.

Result

The result of this step is to indicate whether the sequencing results can be trusted in terms of consistency with pedigree and sex meta data and in terms of quality (depth of coverage and percentage of duplicated reads).

Steps

Consider the section “Alignment Quality Control”.

  1. The table “Target Coverage” indicates the percentage of targets in each sample that has coverage of at least 10x, 20x, etc. The detection of het. variants below 10x is not reliable, 20x or more is recommended. However, also note that if an target has a coverage of above 10x in all but one position that falls to 9x, the target counts as not having at least 10x coverage. The thresholds from below have worked well for the authors using recent technologies.

  2. The table “Stats” gives some overall sequencing metrics. The values of “Duplicates”. The values of “Pairs”, “Average Insert Sizes”, and “SD Insert Size” are mostly of informative value. They are useful for detecting outliers in the context of multiple samples of the same study.

Consider the Figures in QC Plots

  1. The plot “Relatedness vs. IBS0” is only informative for families. The relatedness coefficient (RC) should be around 1.0 for parent-child relations, 0.5 for siblings and decreases further with lower relatedness. The IBS0 value is around 0.0 for parent-child relations and increases with lower relatedness. The RC between monozygotic twins and technical replicates of same sample is expected to be around 2.0. Parent-child relations and sibling-sibling relations should be in the top left of the plot. Unless the parents are consanguineous, they should be in the lower-right corner of the plot. Unexpected RC counts indicate possible sample swaps or discordance between the samples’ genetics and the pedigree from the meta data.

  2. The plot “Rate of het. calls on chrX” allows inference of the genetic sex of the sample. This ration is expected to be well below 0.5 for male individuals and well above 0.5 (actually around 1.0-2.0) for female individuals. Unexpected ratios indicate a sample swap and the corresponding points will be indicated in red color.

In the case of unexpected relatedness, samples must be checked for sample swap. Unexpected inferred sex can be caused by incorrect meta data (e.g., for fetuses) and can also help resolve cases of unexpected relationships (e.g., child/parent swaps). Samples with technical quality metrics violating thresholds are candidates for being repeated.

Thresholds

Thresholds of course always depend on overall sequencing depth and technology. Based on our experience with recent technologies (Agilent SureSelect Human All Exome V6 on Illumina NextSeq 500 or HiSeq 4000 machines in 2018/2019) we propose the following thresholds. We recommend to adjust them in your setting depending on technology and previous experience.

Metric

Good / Green

Acceptable / Yellow

Below Standards / Red

10x coverage

≥ 98%

≥ 95%

< 95%

20x coverage

≥ 98%

≥ 95%

< 95%

Duplicates

≤ 10%

≤ 20%

> 20%

SOP: Database & Literature Research

Aims and Scope

The aim of this section is to highlight the most important databases that are either integrated into VarFish or that VarFish links out to. The list is not comprehensive and we refer the reader to the ACMG guidelines

Result

Steps

SOP: Pathogencity Score Interpretation

Aims and Scope

The aim of this section is to provide guidelines in the interpretation of variant pathogenicity scores. Please refer to the original scoring methods’ publications for authorative information.

Result

For each scored variant, an understanding of how likely a variant has a pathogenic biomedical effect.

Steps

  1. VarFish uses the PHRED-scaled CADD score, the CADD authors recommend a cutoff of 15 (“somewhere betwen 10 and 20, maybe 15”). As a frame of reference: a CADD score of 10 translates into the top 10% of CADD-scored SNVs, 15 to the top 3.1%, 20, to the top 1%, 30 to the top 0.1%.

  2. MutationTaster provides a classification into one of four possible types: disease causing automatic - known to be disease causing, polymorphism automatic - known to be benign, disease causing - predicted to be deleterious, polymorphism - predicted to be benign. Additionally, a probability for the prediction’s correctness by a Bayes classifier is given. The variants annotated with automatic can be generally trusted. The other predictions’ reliability can be gauged by the Bayes classifier probability. The probabilities themselves are difficult to interpret, they are best set into relation to each other.

  3. UMD Predictor can only be used for scoring SNVs. The scores range from 0 to 100 and the authors give the following thresholds in their original publication: “(i) <50 polymorphism; (ii) 50–64 probable polymorphism; (iii) 65–74 probably pathogenic mutation; and (iv) >74 pathogenic mutation.”

Thresholds

The following thresholds/grading of variants can be used for grading pathogenicity scores. Note that pathogenicity scores are extremely useful for sorting/ranking variants in the prioritization step. However, any cutoff and assignment of a pathogenicity will have false positives and false negatives.

score

benign

likely benign

likely pathogenic

pathogenic

CADD

<10

≥10, <15

≥15, <20

≥20

MutationTaster

polymorphims (automatic)

disease causing (automatic)

UMD Predictor

<50 ≥50, <65

≥65, <75

≥75

SOP: Phenotype Score Interpretation

Aims and Scope

The aim of this section is to provide guidelines in the interpretation of phenotype match scores. Please refer to the original scoring methods’ publications for authorative information.

Result

For each scored set of genes, an understanding of the individual scores.

Steps

Generally, the phenotype scores are computed for each gene and compare the phenotypes given for the affected individual and the phenotypes linked to the gene. Thus, they depend on a good clinical annotation of the case and the curation of the gene-to-phenotype database. VarFish uses the Exomiser software for implementing the Phenix, Phive, and HiPhive scores.

  1. The Phenix score is built from phenotypes of known human disease genes based on a concept called information content. Thus, only already known disease genes will obtain a non-zero score.

    An important caveat is that Phenix will normalize the scores with respect to the genes from the filtered variant list. Thus, a change in filter parameters and subsequently in the list of genes in the query will change the score of a given gene.

  2. The Phive score also incorporates mouse phenotypes by linking human and mouse physiology and homologous genes. Thus, it can be used to find new disease genes in human if the gene’s mouse homologue has a proper phenotype annotation.

    TODO: also normalized relatively?

  3. The HiPhive score extends the Phive idea with zebrafish and protein-protein interaction networks. It is the most powerful of the Phenix/Phive/HiPhive family in that new disease genes can be identified from mouse, fish, and also by a link via protein interactions. However, it also allows for relatively indirect links that might be more complex to followup and proof the etiology.

    TODO: also normalized relatively?

Overall, the phenotype prioritization scores are extremely useful for ranking genes by matches to the clinical phenotype annotation of the individual. However, they cannot be interpreted meaningfully on their own and are only meaningful when compared for the same list of genes.

Variant Filtration SOPs

This chapter contains SOPs directly related to the filtration, prioritization, and interpretation of variants. The first SOPs cover the filtration of variants for singleton and trio exomes in various modes of inheritance. When dealing with different case structures (e.g., siblings or only having one parent present), they can be handled with adjusted trio SOPs. This is followed with SOPs for assessing variants for pathogenicity and suitability as candidate variants.

SOP: Filtering Singletons for Autosomal Variants

Aims and Scope

The aim of this SOP is the filtration of singleton data for variants on the autosomal chromosomes. Depending on the hypothesis on the mode of inheritance the steps differ slightly. Alternative actions are given for de novo, dominant, homozygous recessive, and compound recessive variants.

Filtration for variants on the X chromosomes is described in SOP: Filtering Singletons for X-chromosomal Variants. The evaluation of variants is described in SOP: Variant Assessment, the use of phenotype and pathogenicity scores is described in SOP: Prioritization with Phenotype and Pathogenicity Scores.

Result

The result is a list of variants in compatible mode of inheritance with appropriate population frequency. These can then be assessed as described in SOP: Variant Assessment. A typical WES data set yields the following variant counts (numbers will vary depending on the enrichment kit):

de novo

dominant

hom. rec.

comp. rec.

0-80

100-500

0-30

TODO

Steps

  1. Use the Load Preset button to load filter presets (according to the table below and your mode of inheritance).

  2. Configure the Genotype according to the table below.

    setting

    de novo

    dominant

    hom. rec.

    comp. rec.

    presets

    De Novo

    Strict

    Recessive

    Recessive

    genotype

    0/1

    0/1

    1/1

    c/h index

    • For compound recessive mode of inheritance, selecting “c/h index” as mode of inheritance for the child enables the comp. het. mode.

  3. Click Filter & Display.

  4. Compare the resulting variant count with the numbers from the table above. Also check that all query result records are displayed1.

  5. Handle unexpected high and low number of variants.

    • In case of too few variants try relaxing the Quality settings, e.g., by setting DP het. to 8 and min AAB to 0.2.

    • Try adjusting the Frequency settings (keep in mind incidence rates of the case’s disorder).

    • The presets Relaxed and Super Strict can be used for non-recessive modes of inheritance to adjust multiple thresholds at once.

1(1,2,3,4)

Check the First N of M records label on above the results table, potentially adjust the Result row limit setting you can find in the More … ‣ Miscellaneous tab.

Thresholds

SOP: Filtering Singletons for X-chromosomal Variants

Aims and Scope

The aim of this SOP is the filtration of singleton data for variants on the X chromosome. Depending on the hypothesis on the mode of inheritance the steps differ slightly. Alternative actions are given for de novo, dominant, homozygous recessive, and compound recessive variants.

Filtration for variants on the autosomes is described in SOP: Filtering Singletons for Autosomal Variants. The evaluation of variants is described in SOP: Variant Assessment, the use of phenotype and pathogenicity scores is described in SOP: Prioritization with Phenotype and Pathogenicity Scores.

Result

The result is a list of variants in compatible mode of inheritance with appropriate population frequency. These can then be assessed as described in SOP: Variant Assessment. A typical WES data set yields the following variant counts (numbers will vary depending on the enrichment kit):

X de novo

X dominant

X hom. rec.

X comp. rec.

TODO

TODO

TODO

TODO

Steps

Note

The following needs work by a geneticists, also in terms of practicability

  1. Use the Load Preset button to load filter presets (according to the table below and your mode of inheritance).

  2. Configure the Genotype according to the table below.

    setting

    X de novo

    X dominant

    X hom. rec.

    X comp. rec.

    presets

    De Novo

    Strict

    Recessive

    Recessive

    genotype (M)

    1/1

    1/1

    N/A

    N/A

    genotype (F)

    0/1

    0/1

    1/1

    c/h index

    • The genotype of the index is chosen based on its sex (male M, female F).

    • For compound recessive mode of inheritance, selecting “c/h index” as mode of inheritance for the daughter.

  3. Enter chrX into the field Gene Lists & Regions ‣ Genomic Region.

  4. Click Filter & Display.

  5. Compare the resulting variant count with the numbers from the table above. Also check that all query result records are displayed1.

  6. Handle unexpected high and low number of variants.

    • In case of too few variants try relaxing the Quality settings, e.g., by setting DP het. to 8 and min AAB to 0.2.

    • Try adjusting the Frequency settings (keep in mind incidence rates of the case’s disorder).

    • The presets Relaxed and Super Strict can be used for non-recessive modes of inheritance to adjust multiple thresholds at once.

Thresholds

SOP: Filtering Trios for Autosomal Variants

Aims and Scope

The aim of this SOP is the filtration of trio data for variants on the autosomal chromosomes. Depending on the hypothesis on the mode of inheritance the steps differ slightly. Alternative actions are given for de novo, dominant, homozygous recessive, and compound recessive variants.

Filtration for variants on the X chromosomes is described in SOP: Filtering Trios for X-chromosomal variants. The evaluation of variants is described in SOP: Variant Assessment, the use of phenotype and pathogenicity scores is described in SOP: Prioritization with Phenotype and Pathogenicity Scores.

Result

The result is a list of variants in compatible mode of inheritance with appropriate population frequency. These can then be assessed as described in SOP: Variant Assessment. A typical WES data set yields the following variant counts (numbers will vary depending on the enrichment kit):

de novo

dominant

hom. rec.

comp. rec.

0-3

50-150

2-75

2-20

Steps

  1. Use the Load Preset button to load filter presets (according to the table below and your mode of inheritance).

  2. Configure the Genotype according to the table below.

    setting

    de novo

    dominant

    hom. rec.

    comp. rec.

    presets

    Strict

    Strict

    Recessive

    Recessive

    genotype

    index

    0/1

    0/1

    1/1

    c/h index

    parents

    0/0, 0/0

    0/0, 0/1

    0/1, 0/1

    • For dominant mode of inheritance, set the genotypes of the affected parent to 0/1 and the unaffected parent to 0/0.

    • For compound recessive mode of inheritance, selecting “c/h index” as mode of inheritance for the child enables the comp. het. mode and the parents’ genotype does have to be selected.

  3. Click Filter & Display.

  4. Compare the resulting variant count with the numbers from the table above. Also check that all query result records1.

  5. Handle unexpected high and low number of variants.

    • Too many de novo and too few variants in the other modes of inheritance can be an indicator of issues with the sample relatedness (cf. SOP: Quality Control).

    • In case of too few variants try relaxing the Quality settings, e.g., by setting DP het. to 8 and min AAB to 0.2. In the case of too few de novo variants, try setting the max AD setting of the parents to 2.

    • Try adjusting the Frequency settings (keep in mind incidence rates of the case’s disorder).

    • The presets Relaxed and Super Strict can be used for non-recessive modes of inheritance to adjust multiple thresholds at once.

Thresholds

TODO

SOP: Filtering Trios for X-chromosomal variants

Aims and Scope

The aim of this SOP is the filtration of trio data for variants on the X chromosome. Depending on the hypothesis on the mode of inheritance the steps differ slightly. Alternative actions are given for X-linked de novo, dominant, recessive.

Filtration for variants on the autosomes is described in SOP: Filtering Trios for Autosomal Variants. The evaluation of variants is described in SOP: Variant Assessment, the use of phenotype and pathogenicity scores is described in SOP: Prioritization with Phenotype and Pathogenicity Scores.

Result

The result is a list of variants in compatible mode of inheritance with appropriate population frequency. These can then be assessed as described in SOP: Variant Assessment. A typical WES data set yields the following variant counts (numbers will vary depending on the enrichment kit):

X de novo

X dominant

X hom. rec.

X comp. rec.

TODO

TODO

TODO

TODO

Steps

Note

The following needs work by a geneticists, also in terms of practicability

  1. Use the Load Preset button to load filter presets (according to the table below and your mode of inheritance).

  2. Configure the Genotype according to the table below.

    setting

    X de novo

    X dominant

    X hom. rec.

    X comp. rec.

    presets

    Strict

    Strict

    Recessive

    Recessive

    genotype

    index (M)

    1/1

    1/1

    N/A

    c/h index

    index (F)

    0/1

    0/1

    1/1

    c/h index

    mother

    0/0

    0/1 or 0/0

    0/1

    father

    0/0

    1/1 or 0/0

    1/1

    • The genotype of the index is chosen based on its sex (male M, female F).

    • For dominant mode of inheritance, set the genotypes of the affected parent to variant (0/1 or 1/1 according to the table) and of the unaffected to 0/0.

    • For compound recessive mode of inheritance, selecting “c/h index” as mode of inheritance for the child enables the comp. het. mode and the parents’ genotype does have to be selected.

  3. Enter chrX into the field Gene Lists & Regions ‣ Genomic Region.

  4. Click Filter & Display.

  5. Compare the resulting variant count with the numbers from the table above. Also check that all query result records are displayed (check the First N of M records label on above the results table, potentially adjust the Result row limit setting you can find in the More … ‣ Miscellaneous tab).

  6. Handle unexpected high and low number of variants.

    • Too many de novo and too few variants in the other modes of inheritance can be an indicator of issues with the sample relatedness (cf. SOP: Quality Control).

    • In case of too few variants try relaxing the Quality settings, e.g., by setting DP het. to 8 and min AAB to 0.2. In the case of too few de novo variants, try setting the max AD setting of the parents to 2.

    • Try adjusting the Frequency settings (keep in mind incidence rates of the case’s disorder).

    • The presets Relaxed and Super Strict can be used for non-recessive modes of inheritance to adjust multiple thresholds at once.

Thresholds

SOP: Prioritization with Phenotype and Pathogenicity Scores

Aims and Scope

The aim of this SOP is to use scores for prioritizing a list of candidate variants. Phenotype scores can be used for ranking variants by their affected gene’s match to the patient’s phenotypes. Pathogenicity scores can be used for estimating the impact of a variant.

The filtration of variants is described in the SOPs above. For guidelines on interpreting the scores see SOP: Phenotype Score Interpretation and SOP: Pathogencity Score Interpretation.

Result

The result is a list of variants annotated with phenotype and/or pathogenicity scores that can be used for sorting and ranking variants. Further, by putting thresholds on the largest rank to consider or thresholds on the scores, the list of variants to be assessed can be shortened.

Steps

  1. Open the More … ‣ Prioritization tab.

  2. For using phenotype-based prioritization

    • tick the Enable phenotype-based prioritization box,

    • select an appropriate prioritization Algorithms, and

    • enter (or paste) the HPO terms into the HPO Terms field.

  3. For using variant pathogenicity prioritization

    • tick the Enable variant pathogenicity-based prioritization box, and

    • select the scoring method2 to use.

  4. Click Filter & Display to trigger the filtration.

    • Also check that all query result records are displayed1. The limit is applied to the variants sent for prioritization. You will not see the N top-ranking records but you will see a ranking of an arbitrary selection of N records in the case that the limit of records to display is smaller than the query result size N.

  5. Click on the score and rank heading below the phenotype, pathogenicity, and/or pheno. & patho. columns to sort the table by phenotype, pathogenicity, or a combination of both scores.

  6. Consider the top variants by one of the sorting methods from above, stop based on the rank or score:

    • Rank: Consider the top N (e.g., =20) variants only.

      • If you are in a time-limited setting, you should pick the number N in advance of your study to get reproducible results in terms of diagnostic yield.

    • Score: (Note that the distribution of the different scores varies significantly).

      • Consider the top-scoring variants until the score drops by a factor of 2 from one variant to the next.

      • Consider the top-scoring variants until the score drops below a threshold T.

See SOP: Phenotype Score Interpretation and SOP: Pathogencity Score Interpretation for more information in score interpretation.

2(1,2)

For using the UMD Predictor score you have to obtain a API token from https://umd-predictor.eu/ and enter it in VarFish in your user profile. You can reach the user profile by clicking on the person icon on the top left, then User Profile ‣ Settings ‣ Update ‣ UMD Predictor API Token. Note that UMD Predictor can only score SNVs.

Thresholds

SOP: Variant Assessment

Aims and Scope

This SOP describes how to assess variants with the information integrated into VarFish. Clicking the little “>” on the left of the result table folds out the details of the given variant.

Result

The result is a better understanding of the variant and gene.

Steps

Note

The following needs refinement. Actually, it does not read like a SOP but rather an extended manual.

  1. Consider the Gene information box.

    • The Name, Gene Family, and NCBI Summary give a first impression about the gene and its molecular functional and implication in diseases. Genes with missing or very short NCBI Summary are often not well-characterized and such genes are hard to link to diseases.

    • ClinVar for Gene gives the number of pathogenic and likely pathogenic variants in the gene and shows how often the gene has been implicated in disease in ClinVar.

    • HPO Terms displays all HPO terms associated with a gene and, if present, the annotated modes of inheritance of diseases linked to this gene.

    • OMIM Phenotypes gives the OMIM diseases linked to the gene.

    • Gene RIFs displays short “reference into function” notes on PubMed articles that report on the gene.

    • Constraints shows gene contraint scores from ExAc and gnomAD for this gene.

    • The remaining fields provide link-outs into NCBI Entrez, ENSEMBL, and OMIM.

  2. The ClinVar for Variant table shows ClinVar annotations for the given variant, if any.

  3. The Frequency Details table provides detailed information about the frequency of the variant in different populations given in the different population databases.

  4. The Transcript Information table shows the impact of the variant on all transcripts of the gene.

  5. The Genotype and Call Infos provides detailed information about the variant call.

  6. The UCSC 100 Vertebrate Conservation box shows the alignment of the corresponding amino acid in the UCSC 100 vertebrate alignment (the evoluationary distance to human decreases from left to right), if available. This information can be used for getting a feeling on how conserved the location is in the gene.

REST API Overview

Varfish provides a growing set of REST APIs. You can find an Python library for accessing the API and a command line interface in varfish-cli.

Note

This documentation section is under development.

Using the API

Usage of the REST API is detailed in this section. Basic knowledge of HTTP APIs is assumed.

Authentication

The API supports authentication through Knox authentication tokens as well as logging in using your SODAR username and password. Tokens are the recommended method for security purposes.

For token access, first retrieve your token using the API Tokens site app on the VarFish web UI. Note that you can you only see the token once when creating it.

Add the token in the Authorization header of your HTTP request as follows:

Authorization: token 90c2483172515bc8f6d52fd608e5031db3fcdc06d5a83b24bec1688f39b72bcd

Versioning

The VarFish REST API uses accept header versioning. While specifying the desired API version in your HTTP requests is optional, it is strongly recommended. This ensures you will get the appropriate return data and avoid running into unexpected incompatibility issues.

To enable versioning, add the Accept header to your request with the following media type and version syntax. Replace the version number with your expected version.

Specific sections of the SODAR API may require their own accept header. See the exact header requirement in the respective documentation on each section of the API.

Model Access and Permissions

Objects in SODAR API views are accessed through their sodar_uuid field.

In the REST API documentation, “UUID” refers to the sodar_uuid field of each model unless otherwise noted.

For permissions the API uses the same rules which are in effect in the SODAR GUI. That means you need to have appropriate project access for each operation.

Return Data

The return data for each request will be a JSON document unless otherwise specified.

If return data is not specified in the documentation of an API view, it will return the appropriate HTTP status code along with an optional detail JSON field upon a successfully processed request.

Project Management API

The REST API for project access and management operations is described in this section.

API Views

The project management API is provided by the SODAR Core package. The documentation for the REST API views can be found in the SODAR Core Documentation.

Versioning

For accept header versioning, the following media type and version are expected in the current VarFish version:

Accept: application/vnd.bihealth.sodar-core+json; version=0.10.7

Case Import API

The REST API for case import functionality is documented in this section.

API Views

Note

This is currently not implemented.

Versioning

For accept header versioning, the following media type and version are expected in the current VarFish version:

Accept: application/vnd.bihealth.varfish+json; version=0.23.9

Case & Query API

The REST API for case access and is described in this section. Cases are not managed directly but through the Case Import API.

Versioning

For accept header versioning, the following media type and version are expected in the current VarFish version:

Accept: application/vnd.bihealth.varfish+json; version=0.23.9

Return Data

The return data for each request will be a JSON document unless otherwise specified.

If return data is not specified in the documentation of an API view, it will return the appropriate HTTP status code along with an optional detail JSON field upon a successfully processed request.

For creation views, the sodar_uuid of the created object is returned along with other object fields.

Query Settings

The query follows a JSON Schema.

API Views

class variants.views_api.CaseListApiView(**kwargs)[source]

List all cases in the current project.

URL: /variants/api/case/{project.sodar_uid}/

Methods: GET

Returns: List of project details (see CaseRetrieveApiView)

class variants.views_api.CaseRetrieveApiView(**kwargs)[source]

Retrieve detail of the specified case.

URL: /variants/api/case/{project.sodar_uuid}/{case.sodar_uuid}/

Methods: GET

Returns:

  • date_created - creation timestamp (ISO 8601 str)

  • date_modified - modification timestamp (ISO 8601 str)

  • index - index sample name (str)

  • name - case name (str)

  • notes - any notes related to case (str or null)

  • num_small_vars - number of small variants (int or null)

  • num_svs - number of structural variants (int or null)

  • pedigree - list of dict representing pedigree entries, dict have keys

    • sex - PLINK-PED encoded biological sample sex (int, 0-unknown, 1-male, 2-female)

    • father - father sample name (str)

    • mother - mother sample name (str)

    • name - current sample’s name (str)

    • affected - PLINK-PED encoded affected state (int, 0-unknown, 1-unaffected, 2-affected)

    • has_gt_entries - whether sample has genotype entries (boolean)

  • project - UUID of owning project (str)

  • release - genome build (str, one of ["GRCh37", "GRCh37"])

  • sodar_uuid - case UUID (str)

  • status - status of case (str, one of "initial", "active", "closed-unsolved", "closed-uncertain", "closed-solved")

  • tags - list of str tags

class variants.views_api.SmallVariantQueryListApiView(**kwargs)[source]

List small variant queries for the given Case.

URL: /variants/api/query-case/list/{case.sodar_uuid}

Methods: GET

Parameters:

  • page - specify page to return (default/first is 1)

  • page_size – number of elements per page (default is 10, maximum is 100)

Returns:

  • count - number of total elements (int)

  • next - URL to next page (str or null)

  • previous - URL to next page (str or null)

  • results - list of case small variant query details (see SmallVariantQuery)

class variants.views_api.SmallVariantQueryCreateApiView(**kwargs)[source]

Create new small variant query for the given case.

URL: /variants/api/query-case/create/{case.sodar_uuid}

Methods: POST

Parameters:

  • form_id: query settings form (str, use "variants.small_variant_filter_form")

  • form_version: query settings version (int, only valid: 1)

  • query_settings: the query settings (dict, cf. Case Query Schema V1)

  • name: optional string (str, defaults to None)

  • public: whether or not this query (settings) are public (bool, defaults to False)

Returns:

JSON serialization of case small variant query details (see SmallVariantQuery)

class variants.views_api.SmallVariantQueryRetrieveApiView(**kwargs)[source]

Retrieve small variant query details for the qiven query.

URL: /variants/api/query-case/retrieve/{query.sodar_uuid}

Methods: GET

Parameters:

None

Returns:

JSON serialization of case small variant query details (see SmallVariantQuery)

class variants.views_api.SmallVariantQueryStatusApiView(**kwargs)[source]

Returns the status of the small variant query.

URL: /variants/api/query-case/status/{query.sodar_uuid}

Methods: GET

Parameters:

None

Returns:

dict with one key status (str)

class variants.views_api.SmallVariantQueryUpdateApiView(**kwargs)[source]

Update small variant query for the qiven query.

URL: /variants/api/query-case/update/{query.sodar_uuid}

Methods: PUT, PATCH

Parameters:

  • name: new name attribute of the query

  • public: whether or not to make this query public

Returns:

JSON serialization of updated case small variant query details (see SmallVariantQuery)

class variants.views_api.SmallVariantQueryFetchResultsApiView(*args, **kwargs)[source]

Fetch results for small variant query.

Will return a HTTP 400 if the results are not ready yet.

URL: /variants/api/query-case/results/{query.sodar_uuid}

Methods: GET

  • page - specify page to return (default/first is 1)

  • page_size – number of elements per page (default is 10, maximum is 100)

Returns:

  • count - number of total elements (int)

  • next - URL to next page (str or null)

  • previous - URL to next page (str or null)

  • results - list of results (dict)

class variants.views_api.SmallVariantQuerySettingsShortcutApiView(**kwargs)[source]

Generate query settings for a given case by certain shortcuts.

URL: /variants/api/query-case/settings-shortcut/{case.uuid}

Methods: GET

Parameters:

  • database - the database to query, one of "refseq" (default) and "ensembl"

  • quick_preset - overall preset selection using the presets below, valid values are

    • defaults - applies presets that are recommended for starting out without a specific hypothesis

    • de_novo - applies presets that are recommended for starting out when the hypothesis is dominannt inheritance with de novo variants

    • dominant - applies presets that are recommended for starting out when the hypothesis is dominant inheritance (but not with de novo variants)

    • homozygous_recessive - applies presets that are recommended for starting out when the hypothesis is recessive with homzygous variants

    • compound_heterozygous - applies presets that are recommended for starting out when the hypothesis is recessive with compound heterozygous variants

    • recessive - applies presets that are recommended for starting out when the hypothesis is recessive mode of inheritance

    • x_recessive - applies presets that are recommended for starting out when the hypothesis is X recessive mode of inheritance

    • clinvar_pathogenic - apply presets that are recommended for screening variants for known pathogenic variants present Clinvar

    • mitochondrial - apply presets recommended for starting out to filter for mitochondrial mode of inheritance

    • whole_exomes - apply presets that return all variants of the case, regardless of frequency, quality etc.

  • inheritance - preset selection for mode of inheritance, valid values are

    • any - no particular constraint on inheritance (default)

    • dominant - allow variants compatible with dominant mode of inheritance (includes de novo variants)

    • homozygous_recessive - allow variants compatible with homozygous recessive mode of inheritance

    • compound_heterozygous - allow variants compatible with compound heterozygous recessive mode of inheritance

    • recessive - allow variants compatible with recessive mode of inheritance of a disease/trait (includes both homozygous and compound heterozygous recessive)

    • x_recessive - allow variants compatible with X_recessive mode of inheritance of a disease/trait

    • mitochondrial - mitochondrial inheritance (also applicable for “clinvar pathogenic”)

    • custom - indicates custom settings such that none of the above inheritance settings applies

  • frequency - preset selection for frequencies, valid values are

    • dominant_super_strict - apply thresholds considered “very strict” in a dominant disease context

    • dominant_strict - apply thresholds considered “strict” in a dominant disease context (default)

    • dominant_relaxed - apply thresholds considered “relaxed” in a dominant disease context

    • recessive_strict - apply thresholds considered “strict” in a recessiv disease context

    • recessive_relaxed - apply thresholds considered “relaxed” in a recessiv disease context

    • custom - indicates custom settings such that none of the above frequency settings applies

  • impact - preset selection for molecular impact values, valid values are

    • null_variant - allow variants that are predicted to be null variants

    • aa_change_splicing - allow variants that are predicted to change the amino acid of the gene’s protein and also splicing variants

    • all_coding_deep_intronic - allow all coding variants and also deeply intronic ones

    • whole_transcript - allow variants from the whole transcript (exonic/intronic)

    • any_impact - allow any predicted molecular impact

    • custom - indicates custom settings such that none of the above impact settings applies

  • quality - preset selection for variant call quality values, valid values are

    • super_strict - very stricdt quality settings

    • strict - strict quality settings, used as the default

    • relaxed - relaxed quality settings

    • any - ignore quality, all variants pass filter

    • custom - indicates custom settings such that none of the above quality settings applies

  • chromosomes - preset selection for selecting chromosomes/regions/genes allow/block lists, valid values are

    • whole_genome - the defaults settings selecting the whole genome

    • autosomes - select the variants lying on the autosomes only

    • x_chromosome - select variants on the X chromosome only

    • y_chromosome - select variants on the Y chromosome only

    • mt_chromosome - select variants on the mitochondrial chromosome only

    • custom - indicates custom settings such that none of the above chromosomes presets applies

  • flags_etc - preset selection for “flags etc.” section, valid values are

    • defaults - the defaults also used in the user interface

    • clinvar_only - select variants present in Clinvar only

    • user_flagged - select user_flagged variants only

    • custom - indicates custom settings such that none of the above flags etc. presets apply

Returns:

  • presets - a dict with the following keys; this mirrors back the quick presets and further presets selected in the parameters

    • quick_presets - one of the quick_presets preset values from above

    • inheritance - one of the inheritance preset values from above

    • frequency - one of the frequency preset values from above

    • impact - one of the impact preset values from above

    • quality - one of the quality preset values from above

    • chromosomes - one of the chromosomes preset values from above

    • flags_etc - one of the flags_etc preset values from above

  • query_settings - a dict with the query settings ready to be used for the given case; this will follow Case Query Schema V1.

JSON Schema

This section contains the JSON schemas used in the VarFish Server API.

Case Query Schema V1

varfish-server case query settings

https://raw.githubusercontent.com/bihealth/varfish-server/main/variants/schemas/case-query-v1.json

Single case query settings for varfish-server

type

object

properties

  • database

The transcript database to use

You can select between either using refseq or ensembl transcripts, defaults to refseq

type

string

examples

refseq

ensembl

default

refseq

  • effects

The effects schema

An explanation about the purpose of this instance.

type

array

examples

missense_variant

stop_gained

stop_lost

default

items

type

string

enum

3_prime_UTR_exon_variant, 3_prime_UTR_intron_variant, 5_prime_UTR_exon_variant, 5_prime_UTR_intron_variant, coding_transcript_intron_variant, complex_substitution, direct_tandem_duplication, disruptive_inframe_deletion, disruptive_inframe_insertion, downstream_gene_variant, exon_loss_variant, feature_truncation, frameshift_elongation, frameshift_truncation, frameshift_variant, inframe_deletion, inframe_insertion, intergenic_variant, internal_feature_elongation, missense_variant, mnv, non_coding_transcript_exon_variant, non_coding_transcript_intron_variant, splice_acceptor_variant, splice_donor_variant, splice_region_variant, start_lost, stop_gained, stop_lost, stop_retained_variant, structural_variant, synonymous_variant, transcript_ablation, upstream_gene_variant

additionalItems

False

uniqueItems

True

  • exac_enabled

Whether to enable ExAC frequency filter

Set to true to enable ExAC frequency filter

type

boolean

examples

True

False

default

False

  • exac_frequency

anyOf

type

null

Maximal allele frequency in ExAC

When exac_enabled then only variants with at an allele frequency of exac_frequency or below will pass the filter, use null for not applying threshold

type

number

examples

0.05

maximum

0.05

minimum

0

  • exac_heterozygous

anyOf

type

null

Maximal heterozygous state count in ExAC

When exac_enabled then only variants with at most exac_heterozygous variants in heterozygous state will pass the filter, use null for not applying threshold

type

integer

examples

1

10

default

0

  • exac_homozygous

anyOf

type

null

Maximal homozygous state count in ExAC

When exac_enabled then only variants with at most exac_homozygous variants in homozygous state will pass the filter, use null for not applying threshold

type

integer

examples

1

10

default

0

  • exac_hemizygous

anyOf

type

null

Maximal hemizygous state count in ExAC

When exac_enabled then only variants with at most exac_hemizygous variants in hemizygous state will pass the filter, use null for not applying threshold

type

integer

examples

1

10

minimum

0

  • gnomad_exomes_enabled

Whether to enable gnomAD exomes frequency filter

Set to true to enable gnomAD exomes frequency filter

type

boolean

examples

True

False

default

False

  • gnomad_exomes_frequency

anyOf

type

null

Maximal allele frequency in gnomAD exomes

When gnomad_exomes_enabled then only variants with at an allele frequency of gnomad_exomes_frequency or below will pass the filter, use null for not applying threshold

type

number

examples

0.05

maximum

0.05

minimum

0

  • gnomad_exomes_heterozygous

anyOf

type

null

Maximal heterozygous state count in gnomAD exomes

When gnomad_exomes_enabled then only variants with at most gnomad_exomes_heterozygous variants in heterozygous state will pass the filter, use null for not applying threshold

type

integer

examples

1

10

default

0

  • gnomad_exomes_homozygous

anyOf

type

null

Maximal homozygous state count in gnomAD exomes

When gnomad_exomes_enabled then only variants with at most gnomad_exomes_homozygous variants in homozygous state will pass the filter, use null for not applying threshold

type

integer

examples

1

10

default

0

  • gnomad_exomes_hemizygous

anyOf

type

null

Maximal hemizygous state count in gnomAD exomes

When gnomad_exomes_enabled then only variants with at most gnomad_exomes_hemizygous variants in hemizygous state will pass the filter, use null for not applying threshold

type

integer

examples

1

10

minimum

0

  • gnomad_genomes_enabled

Whether to enable gnomAD genomes frequency filter

Set to true to enable gnomAD genomes frequency filter

type

boolean

examples

True

False

default

False

  • gnomad_genomes_frequency

anyOf

type

null

Maximal allele frequency in gnomAD genomes

When gnomad_genomes_enabled then only variants with at an allele frequency of gnomad_genomes_frequency or below will pass the filter, use null for not applying threshold

type

number

examples

0.05

maximum

0.05

minimum

0

  • gnomad_genomes_heterozygous

anyOf

type

null

Maximal heterozygous state count in gnomAD genomes

When gnomad_genomes_enabled then only variants with at most gnomad_genomes_heterozygous variants in heterozygous state will pass the filter, use null for not applying threshold

type

integer

examples

1

10

default

0

  • gnomad_genomes_homozygous

anyOf

type

null

Maximal homozygous state count in gnomAD genomes

When gnomad_genomes_enabled then only variants with at most gnomad_genomes_homozygous variants in homozygous state will pass the filter, use null for not applying threshold

type

integer

examples

1

10

default

0

  • gnomad_genomes_hemizygous

anyOf

type

null

Maximal hemizygous state count in gnomAD genomes

When gnomad_genomes_enabled then only variants with at most gnomad_genomes_hemizygous variants in hemizygous state will pass the filter, use null for not applying threshold

type

integer

examples

1

10

minimum

0

  • thousand_genomes_enabled

Whether to enable thousand genomes frequency filter

Set to true to enable thousand genomes frequency filter

type

boolean

examples

True

False

default

False

  • thousand_genomes_frequency

anyOf

type

null

Maximal allele frequency in thousand genomes

When thousand_genomes_enabled then only variants with at an allele frequency of thousand_genomes_frequency or below will pass the filter, use null for not applying threshold

type

number

examples

0.05

maximum

0.05

minimum

0

  • thousand_genomes_heterozygous

anyOf

type

null

Maximal heterozygous state count in thousand genomes

When thousand_genomes_enabled then only variants with at most thousand_genomes_heterozygous variants in heterozygous state will pass the filter, use null for not applying threshold

type

integer

examples

1

10

default

0

  • thousand_genomes_homozygous

anyOf

type

null

Maximal homozygous state count in thousand genomes

When thousand_genomes_enabled then only variants with at most thousand_genomes_homozygous variants in homozygous state will pass the filter, use null for not applying threshold

type

integer

examples

1

10

default

0

  • thousand_genomes_hemizygous

anyOf

type

null

Maximal hemizygous state count in thousand genomes

When thousand_genomes_enabled then only variants with at most thousand_genomes_hemizygous variants in hemizygous state will pass the filter, use null for not applying threshold

type

integer

examples

1

10

minimum

0

  • inhouse_enabled

Whether to enable thousand genomes frequency filter

Set to true to enable in-house frequency filter

type

boolean

examples

True

False

default

False

  • inhouse_carriers

anyOf

type

null

Maximal carrier count in in-house database

When inhouse_enabled then only variants with at most inhouse_carriers carriers in the in-house database will pass the filter, use null, for not applying threshold

type

integer

examples

20

minimum

0

  • inhouse_heterozygous

anyOf

type

null

Maximal heterozygous state count in thousand genomes

When inhouse_enabled then only variants with at most inhouse_heterozygous variants in heterozygous state will pass the filter, use null for not applying threshold

type

integer

examples

10

minimum

0

  • inhouse_homozygous

anyOf

type

null

Maximal homozygous state count in thousand genomes

When inhouse_enabled then only variants with at most inhouse_homozygous variants in homozygous state will pass the filter, use null for not applying threshold

type

integer

examples

10

minimum

0

  • inhouse_hemizygous

anyOf

type

null

Maximal hemizygous state count in thousand genomes

When inhouse_enabled then only variants with at most inhouse_hemizygous variants in hemizygous state will pass the filter, use null for not applying threshold

type

integer

examples

10

minimum

0

  • mtdb_enabled

Whether to enable mtdb frequency filter

Set to true to enable mtdb frequency filter

type

boolean

examples

True

False

default

False

  • mtdb_count

anyOf

type

null

Maximal number/absolute frequency of carriers in mtdb

When mtdb_enabled then only variants with at most mtdb_count carriers will pass the filter, use null for not applying threshold

type

integer

examples

1

10

minimum

0

  • mtdb_frequency

anyOf

type

null

Maximal relative frequencey of carriers in mtdb

When mtdb_enabled then only variants with a fraction (between 0 and 0.05) of at most mtdb_frequency, use null for not applying threshold

type

number

examples

0.05

maximum

0.05

minimum

0

  • helixmtdb_enabled

Whether to enable helixmtdb frequency filter

Set to true to enable helixmtdb frequency filter

type

boolean

examples

True

False

default

False

  • helixmtdb_frequency

anyOf

type

null

Maximal carrier frequency in helixmtdb

When helixmtdb_enabled then only variants with at a carrier frequency of helixmtdb_frequency or below will pass the filter, use null for not applying threshold

type

number

examples

0.001

0.05

maximum

1

minimum

0

  • helixmtdb_het_count

anyOf

type

null

Maximal heteroplasmy frequency in helixmtdb

When helixmtdb_enabled then only variants with at number of carriers in heteroplasmic state of helixmtdb_het_count or below will pass the filter, use null for not applying threshold

type

integer

examples

1

10

minimum

0

  • helixmtdb_hom_count

anyOf

type

null

Maximal homoplasmy frequency in helixmtdb

When helixmtdb_enabled then only variants with at number of carriers in homoplasmic state of helixmtdb_hom_count or below will pass the filter, use null for not applying threshold

type

integer

examples

1

10

minimum

0

  • mitomap_enabled

Whether to enable the mitomap carrier filter

Set to true to enable mitomap carrier filter

type

boolean

examples

True

False

default

False

  • mitomap_count

anyOf

type

null

Maximal number of carriers in mtDB

When mitomap_enabled then only variants with at most mitomap_count carriers will pass the filter, use null for not applying threshold

type

integer

examples

10

minimum

0

  • mitomap_frequency

anyOf

type

null

The mitomap_frequency schema

When mitomap_enabled then only variants with a relative frequency (between 0 and 1) will pass the filter, use null for not enabling threshold

type

number

examples

0.001

0.05

maximum

1

minimum

0

  • transcripts_coding

Include variants on coding transcripts

When enabled then variants whose most pathogenic effect is on a coding transcripts

type

boolean

examples

True

False

default

False

  • transcripts_noncoding

Include variants on non-coding transcripts

When enabled then variants whose most pathogenic effect is on a non-coding transcripts

type

boolean

examples

True

False

default

True

  • var_type_snv

Include SNV variants

When set to true then include singlenucleotide variants in the results

type

boolean

examples

True

False

default

True

  • var_type_indel

Include indel variants

When set to true then include insertion and deletion variants (e.g., CGA>C and C>CGA) in the results

type

boolean

examples

True

False

default

True

  • var_type_mnv

Include MVN variants

When set to true then include multinucleotide variants (e.g., CG>TT) in the results

type

boolean

examples

True

False

default

True

  • max_exon_dist

anyOf

type

null

The largest distance to exons

When set then only variants with at most max_exon_dist to the next exon are included, leave unset to not filter based on this

type

integer

examples

1

10

minimum

0

  • flag_simple_empty

Include variants marked with no flag

When set (default) then variants that have no simple flag set are included in the result

type

boolean

examples

True

False

default

True

  • flag_bookmarked

Include variants marked with “bookmarked” flag

When set (default) then variants that have the “bookmarked” simple flag set are included in the result

type

boolean

examples

True

False

default

True

  • flag_candidate

Include variants marked with “candidate” flag

When set (default) then variants that have the “candidate” simple flag set are included in the result

type

boolean

examples

True

False

default

True

  • flag_doesnt_segregate

Include variants marked with “does not segregate” flag

When set (default) then variants that have the “does not segregate” simple flag set are included in the result

type

boolean

examples

True

False

default

True

  • flag_final_causative

Include variants marked with “final causative” flag

When set (default) then variants that have the “final causative” simple flag set are included in the result

type

boolean

examples

True

False

default

True

  • flag_for_validation

Include variants marked with “for validation” flag

When set (default) then variants that have the “for validation” simple flag set are included in the result

type

boolean

examples

True

False

default

True

  • flag_no_disease_association

Include variants marked with “no disease association” flag

When set (default) then variants that have the “no disease association” simple flag set are included in the result

type

boolean

examples

True

False

default

True

  • flag_segregates

Include variants marked with “segregates” flag

When set (default) then variants that have the “segregates” simple flag set are included in the result

type

boolean

examples

True

False

default

True

  • flag_molecular_empty

Include variants that have the “molecular” flag unset

When set (default) then variants that have the “molecular” flag unset are included in the result

type

boolean

examples

True

False

default

True

  • flag_molecular_negative

Include variants marked with “molecular” flag set to “negative”

When set (default) then variants that have the “molecular” flag set to “negative” are included in the result

type

boolean

examples

True

False

default

True

  • flag_molecular_positive

Include variants marked with “molecular” flag set to “positive”

When set (default) then variants that have the “molecular” flag set to “positive” are included in the result

type

boolean

examples

True

False

default

True

  • flag_molecular_uncertain

Include variants marked with “molecular” flag set to “uncertain”

When set (default) then variants that have the “molecular” flag set to “uncertain” are included in the result

type

boolean

examples

True

False

default

True

  • flag_phenotype_match_empty

Include variants that have the “phenotype match” flag unset

When set (default) then variants that have the “phenotype match” flag unset are included in the result

type

boolean

examples

True

False

default

True

  • flag_phenotype_match_negative

Include variants marked with “phenotype match” flag set to “negative”

When set (default) then variants that have the “phenotype match” flag set to “negative” are included in the result

type

boolean

examples

True

False

default

True

  • flag_phenotype_match_positive

Include variants marked with “phenotype match” flag set to “positive”

When set (default) then variants that have the “phenotype match” flag set to “positive” are included in the result

type

boolean

examples

True

False

default

True

  • flag_phenotype_match_uncertain

Include variants marked with “phenotype match” flag set to “uncertain”

When set (default) then variants that have the “phenotype match” flag set to “uncertain” are included in the result

type

boolean

examples

True

False

default

True

  • flag_summary_empty

Include variants that have the “summary” flag unset

When set (default) then variants that have the “summary” flag unset are included in the result

type

boolean

examples

True

False

default

True

  • flag_summary_negative

Include variants marked with “summary” flag set to “negative”

When set (default) then variants that have the “summary” flag set to “negative” are included in the result

type

boolean

examples

True

False

default

True

  • flag_summary_positive

Include variants marked with “summary” flag set to “positive”

When set (default) then variants that have the “summary” flag set to “positive” are included in the result

type

boolean

examples

True

False

default

True

  • flag_summary_uncertain

Include variants marked with “summary” flag set to “uncertain”

When set (default) then variants that have the “summary” flag set to “uncertain” are included in the result

type

boolean

examples

True

False

default

True

  • flag_validation_empty

Include variants that have the “validation” flag unset

When set (default) then variants that have the “validation” flag unset are included in the result

type

boolean

examples

True

False

default

True

  • flag_validation_negative

Include variants marked with “validation” flag set to “negative”

When set (default) then variants that have the “validation” flag set to “negative” are included in the result

type

boolean

examples

True

False

default

True

  • flag_validation_positive

Include variants marked with “validation” flag set to “positive”

When set (default) then variants that have the “validation” flag set to “positive” are included in the result

type

boolean

examples

True

False

default

True

  • flag_validation_uncertain

Include variants marked with “validation” flag set to “uncertain”

When set (default) then variants that have the “validation” flag set to “uncertain” are included in the result

type

boolean

examples

True

False

default

True

  • flag_visual_empty

Include variants that have the “visual” flag unset

When set (default) then variants that have the “visual” flag unset are included in the result

type

boolean

examples

True

False

default

True

  • flag_visual_negative

Include variants marked with “visual” flag set to “negative”

When set (default) then variants that have the “visual” flag set to “negative” are included in the result

type

boolean

examples

True

False

default

True

  • flag_visual_positive

Include variants marked with “visual” flag set to “positive”

When set (default) then variants that have the “visual” flag set to “positive” are included in the result

type

boolean

examples

True

False

default

True

  • flag_visual_uncertain

Include variants marked with “visual” flag set to “uncertain”

When set (default) then variants that have the “visual” flag set to “uncertain” are included in the result

type

boolean

examples

True

False

default

True

  • gene_allowlist

List of genes to restrict the resulting variants to

List of gene symbols, entrez gene identifiers, or ENSEMBL gene identifiers to limit variants for (for a variant affecting multiple genes, the combinations of the variants and genes will be reported independently), leave empty to apply no such filter

type

array

examples

TTN

default

items

type

string

pattern

^([a-zA-Z0-9_-]+)$

additionalItems

True

  • gene_blocklist

List of genes to exclude from the result

List of gene symbols, entrez gene identifiers, or ENSEMBL gene identifiers to exclude variants for (for a variant affecting multiple genes, the combinations of the variants and genes will be reported independently), leave empty to apply no such filter

type

array

examples

TTN

default

items

type

string

pattern

^([a-zA-Z0-9_-]+)$

additionalItems

True

  • remove_if_in_dbsnp

Remove variant if it exists in local copy dbSNP

Set to true to exclude variants that are present in dbSNP from the result set

type

boolean

examples

True

False

default

False

  • require_in_clinvar

Restrict variants to those in local copy of Clinvar

Set to true to restrict variants to those present in local copy of Clinvar

type

boolean

examples

True

False

default

False

  • clinvar_paranoid_mode

Weaken weight of ‘criteria provided’ in variant assessment

When set, then variant assessments with and without assertion are interpreted as equally important. By default, they are not those with assessment override the others.

type

boolean

examples

True

False

default

False

  • clinvar_include_benign

Whether to include variants marked as benign in local Clinvar copy if ``require_in_clinvar``

Set to true (default) to make variants pass the filter that are marked as benign in the local Clinvar copy, set to false to make them not pass the filter

type

boolean

examples

True

False

default

True

  • clinvar_include_pathogenic

Whether to include variants marked as pathogenic in local Clinvar copy if ``require_in_clinvar``

Set to true (default) to make variants pass the filter that are marked as pathogenic in the local Clinvar copy, set to false to make them not pass the filter

type

boolean

examples

True

False

default

True

  • clinvar_include_likely_benign

Whether to include variants marked as likely benign in local Clinvar copy if ``require_in_clinvar``

Set to true (default) to make variants pass the filter that are marked as likely benign in the local Clinvar copy, set to false to make them not pass the filter

type

boolean

examples

True

False

default

True

  • clinvar_include_likely_pathogenic

Whether to include variants marked as likely pathogenic in local Clinvar copy if ``require_in_clinvar``

Set to true (default) to make variants pass the filter that are marked as likely pathogenic in the local Clinvar copy, set to false to make them not pass the filter

type

boolean

examples

True

False

default

True

  • clinvar_include_uncertain_significance

Whether to include variants marked as unknown certificance in local Clinvar copy if ``require_in_clinvar``

Set to true (default) to make variants pass the filter that are marked as of unknown significance in the local Clinvar copy, set to false to make them not pass the filter

type

boolean

examples

True

False

default

True

  • genomic_region

List of genomic regions to limit the query to

When set thenonly variants contained in or overlapping with the given genomic regions pass the filter, leave empty to apply no region filter

type

array

examples

chr1:100,000,00-110,00,00

chrY

X

Y

default

items

type

string

pattern

^[a-zA-Z0-9]+(:(\d+(,\d+)*)-(\d+(,\d+)*))?$

  • patho_enabled

Enable pathogenicity annotation

Set to true to enable annotation with pathogenicity, requires setting a value for patho_score

type

boolean

examples

True

False

default

False

  • patho_score

anyOf

type

null

The pathogenicity score to use for annotating variants

Select pathogenicity score to use if patho_enabled. Must be one of the pathogenicity scores enabled in the VarFish server instance (depends on the installation)

type

string

examples

cadd

mutationtaster

  • prio_enabled

Enable phenotype-based prioritization of variants

Select

type

boolean

examples

True

False

default

False

  • prio_algorithm

anyOf

type

null

The phenotype-based prioritization algorithm to use for priorizing variants

Select algorithm to use if prio_enabled. Must be one of the algorithms enabled in the VarFish server instance (depends on the installation)

type

string

examples

phenix

hiphive

hiphive-human

hiphive-mouse

  • prio_hpo_terms

anyOf

type

null

The prio_hpo_terms schema

An explanation about the purpose of this instance.

type

array

examples

default

items

type

string

pattern

HP:\d+

additionalItems

True

  • require_in_hgmd_public

The require_in_hgmd_public schema

An explanation about the purpose of this instance.

type

boolean

examples

False

default

False

  • recessive_mode

anyOf

type

null

Enable and select the biallelic recessive inheritance filter

Use “compound-recessive” to restrict to variants compatible with compound recessive mode of inheritance and “recessive” to restrict to compatibility with either compound and homozygous recessive mode of inheritance. Use recessive_index to select the index for recessive inheritance

type

string

enum

recessive, compound-recessive

  • recessive_index

anyOf

type

null

Select the recessive index

Set to the identifier of the recessive index

type

string

examples

CHILD-NAME

  • denovo_index

anyOf

type

null

Select the denovo index

Set to the identifier of the de novo index

type

string

examples

CHILD-NAME

  • quality

Quality filter threshold

Set quality thresholds for each individual. The key are the individual names and the values follows the defined schema from below

type

object

examples

SAMPLE

dp_het

10

dp_hom

5

ab

0.3

gq

20

ad

3

ad_max

200

fail

drop-variant

FATHER

gq

40

fail

ignore

MOTHER

gq

40

fail

ignore

CHILD

gq

40

fail

drop-variant

patternProperties

  • .*

type

object

properties

  • dp_het

Minimal total depth of coverage for heterozygous variants

If set then exclude variants with lower total depth of coverage in sample’s genotype call for heterozygous variants

type

integer

minimum

0

default

0

  • dp_hom

Minimal total depth coverage for homozygous and hemizygous variants

If set then exclude variants with lower total depth of coverage in sample’s genotype call for homozygous variants

type

integer

minimum

0

default

0

  • ab

Minimal allelic balance for heterozygous variants

If set then exclude variants with lower allelic balance in sample’s genotype call

type

number

maximum

1

minimum

0

default

0

  • gq

Minimal genotype call quality

If set then exclude variants with lower genotype quality in sample’s genotype call

type

integer

minimum

0

default

0

  • ad

Minimal number of read in alternative allele

If set then exclude variants with lower depth of coverage on alternate allele in sample’s genotype call

type

integer

minimum

0

default

0

  • ad_max

anyOf

type

null

Maximal alternate allele depth of coverage

If set then exclude variants with higher depth of coverage on alternate allele in sample’s genotype call

type

integer

minimum

0

  • fail

Action to perform when genotype filter threshold is not passed

Actions: ignore: ignore failure, drop-variant: drop whole variant (if ONE genotype in the variant fails filter), no-call: interpret as no-call

type

string

enum

ignore, drop-variant, no-call

default

ignore

additionalProperties

False

  • genotype

Genotype filter settings

Set genotype filter for each individual, must be given for each individual in query with genotype data

type

object

examples

SAMPLE

hom

FATHER

ref

MOTHER

ref

CHILD

het

patternProperties

  • .*

anyOf

type

null

type

string

enum

any, ref, het, hom, non-hom, variant, non-variant, non-reference

additionalProperties

False

Case QC Schema V1

varfish-server case QC info

https://raw.githubusercontent.com/bihealth/varfish-server/main/importer/schemas/case-qc-v1.json

Per case quality control information for varfish

type

object

patternProperties

  • .*

type

object

properties

  • bamstats

type

object

properties

  • raw total sequences

raw total sequences

type

integer

minimum

0

  • filtered sequences

filtered sequences

type

integer

minimum

0

  • sequences

sequences

type

integer

minimum

0

  • is sorted

is sorted

type

integer

minimum

0

  • 1st fragments

1st fragments

type

integer

minimum

0

  • last fragments

last fragments

type

integer

minimum

0

  • reads mapped

reads mapped

type

integer

minimum

0

  • reads mapped and paired

reads mapped and paired

type

integer

minimum

0

  • reads unmapped

reads unmapped

type

integer

minimum

0

  • reads properly paired

reads properly paired

type

integer

minimum

0

  • reads paired

reads paired

type

integer

minimum

0

  • reads duplicated

reads duplicated

type

integer

minimum

0

  • reads MQ0

reads MQ0

type

integer

minimum

0

  • reads QC failed

reads QC failed

type

integer

minimum

0

  • non-primary alignments

non-primary alignments

type

integer

minimum

0

  • total length

total length

type

integer

minimum

0

  • total first fragment length

total first fragment length

type

integer

minimum

0

  • total last fragment length

total last fragment length

type

integer

minimum

0

  • bases mapped

bases mapped

type

integer

minimum

0

  • bases mapped (cigar)

bases mapped (cigar)

type

integer

minimum

0

  • bases trimmed

bases trimmed

type

integer

minimum

0

  • bases duplicated

bases duplicated

type

integer

minimum

0

  • mismatches

mismatches

type

integer

minimum

0

  • error rate

error rate

error rate as fractions of 1

type

number

maximum

1

minimum

0

  • average length

average length

type

number

minimum

0

  • average first fragment length

average first fragment length

type

number

minimum

0

  • average last fragment length

average last fragment length

type

number

minimum

0

  • maximum length

maximum length

type

integer

minimum

0

  • maximum first fragment length

maximum first fragment length

type

integer

minimum

0

  • maximum last fragment length

maximum last fragment length

type

integer

minimum

0

  • average quality

average quality

type

number

minimum

0

  • insert size average

insert size average

type

number

minimum

0

  • insert size standard deviation

insert size standard deviation

type

number

minimum

0

  • inward oriented pairs

inward oriented pairs

type

integer

minimum

0

  • outward oriented pairs

outward oriented pairs

type

integer

minimum

0

  • pairs with other orientation

pairs with other orientation

type

integer

minimum

0

  • pairs on different chromosomes

pairs on different chromosomes

type

integer

minimum

0

  • percentage of properly paired reads (%)

percentage of properly paired reads (%)

type

number

maximum

100

minimum

0

  • min_cov_target

Minimal coverage percentage, counted per target

Considering all targets, histogram of distribution regarding “minimal coverage of…”, the smallest coverage on a target makes the whole target count at that value

type

object

patternProperties

  • \d+

Minimal coverage value histogram entry

type

number

examples

100

99.9

0

maximum

100

minimum

0

additionalProperties

False

  • min_cov_base

Minimal coverage percentage, counted per base

Considering all target bases, histogram of distribution regarding “minimal coverage of…”

type

object

patternProperties

  • \d+

Minimal coverage value histogram entry

type

number

examples

100

99.9

0

maximum

100

minimum

0

additionalProperties

False

  • summary

Coverage summary

type

object

properties

  • mean coverage

Mean on-target coverage

type

number

examples

0

100

minimum

0

  • target count

Total number of targets

type

integer

examples

0

100

minimum

0

  • total target size

Total target size in bp

type

integer

examples

0

100

minimum

0

additionalProperties

False

  • idxstats

type

object

patternProperties

  • .*

Read count for each chromosome

type

object

properties

  • mapped

Mapped read count

Number of mapped read on chromosome

type

integer

examples

0

100

minimum

0

  • unmapped

Unmapped read count

Number of unmapped read on chromosome (usually the mate maps)

type

integer

examples

0

100

minimum

0

additionalProperties

False

Clinical Beacon Protocol

This section describes the “Clinical Beacon” protocol version 1 (“Clinical Beacon v1”). It follows the GA4GH Beacon Protocol v1 (“Beacon v1”) in large parts with slight deviations. The end points and payloads are the same as in Beacon v1. However, we add two important features, as explained below.

  1. The client sends the current user in the X-Beacon-User header.

  2. The client has to sign the X-Beacon-User and Date HTTP headers using the Signing HTTP Messages IETF draft.

You can find a simple Python implementation of a standalone client on Github.

X-Beacon-User Header

The GA4GH Beacon v1 protocol is meant to be used in a “zero trust” environment and they specify that authentication is done using OAuth2. In an ideal world, VarFish sites having installed VarFish would be able to connect to local OpenID instances. In reality, many sites will be seated in clinical environments where Microsoft ActiveDirectory is used for authentication and Microsoft Federated Services use SAML instead.

Further, VarFish sites connecting to each other will have real-world paper contracts for data exchange agreements and after signing such contracts they can trust each other. In the first version we thus decided not to implement zero trust concepts.

The client thus has to set the X-Beacon-User header to a string that identifies the querying user uniquely. It is the decision of the client whether it uses interpretable user names or for the sake of user data security, it can use pseudonyms. This is left to the discretion of the implementing sites and contract partners. VarFish currently implements this by sending the clear text user names.

Date Header

This is a standard HTTP header that is mandatory in the Clinical Beacon v1 protocol.

Header Signing

We use the Signing HTTP Messages IETF draft for signing HTTP requests. The signature header will typically look as follows (without wrapping of course):

Signature keyId="org.bihealth.varfish",algorithm="rsa-sha512",headers="date x-beacon-user",\
signature="mxY7+9vizRbO7mUJVyvxXm3VgpYycQWNulrAafMOWJ29WYQYMf2i5PBPP3jYBhIGd/3zZ+x+mlQw8xEw\
M6UWvE3QRqzlzBE0ZHeWKgX4h11N1MhtXTnhXL9CL/VqbcgbBI9trkwB/xxaXhUOpvavA37J1ljrdTbXhghCHZ65hMi\
04fUnKKkFhuwOzZ6N5/amIuizc2JeDe73Pg+D5HA4AnE2bnCmf8AqhKLd434SdchcYAHqYTJaxBA2Pxngerg6oSenli\
rgukzrBdbdRpvnFFtQzZsQ56v9hS8cqF/phtl+isAT/dcwvO9/lCKaf3QE8YKCcQmDnPJiQLdtQ9mZKw==",\
created="1646407724"'

Where

  • keyId is the ID of the key pair used for signing

  • algorithm is the algorithm that has been used for generating the key pair

  • headers is the space-separated list of headers that are signed (must be date x-beacon-user)

  • signature is the Base 64 encoded signature.

This leaves open the question for generating the key. We use standard RSA and ECDSA keys, Varfish supports the following algorithms:

  • rsa-sha256

  • rsa-sha512

  • ecdsa-sha256

  • ecdsa-sha256

The standalone client on Github provides examples for key generation.

Key exchange is trivial as only the public key needs to be registered by the server but it also must be registered by the server before making any query.

Final Remarks

Thus, the Clinical Beacon Protocol v1 is equal to the GA4GH Beacon Protocol v1 with the exception that:

  • sites are expected to have a certain level of trust as they share non-public data,

  • sites send a string with each query to identify the querying user, and

  • all queries are signed with public/private key pairs and each client first needs to register with each server by sending its public key.

As a final remark, API endpoints should of course be deployed behind HTTPS but that is out of scope here.

Installation

The VarFish installation for developers should be set up differently from the installation for production use.

The reason being is that the installation for production use runs completely in a Docker environment. All containers are assigned to a Docker network that the host by default has no access to, except for the reverse proxy that gives access to the VarFish webinterface.

The developers installation is intended not to carry the full VarFish database such that it is light-weight and fits on a laptop. We advise to install the services not running in a Docker container.

Install Postgres

Follow the instructions for your operating system to install Postgres. Make sure that the version is 12 (11 and 13 would also work). Ubuntu 20 already includes postgresql 12. In case of older Ubuntu versions, this would be:

sudo apt install postgresql-12

Install Redis

Redis is the broker that celery uses to manage the queues. Follow the instructions for your operating system to install Redis. For Ubuntu, this would be:

sudo apt install redis-server

Install miniconda

miniconda helps to set up encapsulated Python environments. This step is optional. You can also use pipenv, but to our experience, resolving the dependencies in pipenv is terribly slow.

$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh -b -p ~/miniconda3
$ source ~/miniconda3/bin/activate
$ conda init
$ conda create -n varfish python=3.8 pip
$ conda activate varfish

Clone git repository

Clone the VarFish Server repository and switch into the checkout.

$ git clone https://github.com/bihealth/varfish-server
$ cd varfish-server

Install Python Requirements

Some required packages have dependencies that are usually not preinstalled. Therefore, run

$ sudo apt install libsasl2-dev python-dev libldap2-dev libssl-dev

Now, with the conda/Python environment activated, install all the requirements.

$ for i in requirements/*; do pip install -r $i; done

Setup Database

Use the tool provided in utility/ to set up the database. The name for the database should be varfish (create new user: yes, name: varfish, password: varfish).

$ bash utility/setup_database.sh

Setup vue.js

Use the tool provided in utility/ to set up vue.js.

$ sudo bash utility/install_vue_dev.sh

Open an additional terminal and switch into the vue directory. Then install the Clinvar Exprot vue app.

$ cd clinvar_export/vueapp
$ npm install

When finished, keep this terminal open to run the vue app.

$ npm run serve

Setup VarFish

First, create a .env file with the following content.

export DATABASE_URL="postgres://varfish:varfish@127.0.0.1/varfish"
export CELERY_BROKER_URL=redis://localhost:6379/0
export PROJECTROLES_ADMIN_OWNER=root
export DJANGO_SETTINGS_MODULE=config.settings.local

If you wish to enable structural variants, add the following line.

export VARFISH_ENABLE_SVS=1

To create the tables in the VarFish database, run the migrate command. This step can take a few minutes.

$ python manage.py migrate

Once done, create a superuser for your VarFish instance. By default, the VarFish root user is named root (the setting can be changed in the .env file with the PROJECTROLES_ADMIN_OWNER variable).

$ python manage.py createsuperuser

Last, download the icon sets for VarFish and make scripts, stylesheets and icons available.

$ python manage.py geticons -c bi cil fa-regular fa-solid gridicons octicon
$ python manage.py collectstatic

When done, open two terminals and start the VarFish server and the celery server.

terminal1$ make serve
terminal2$ make celery

Database Import

First, download the pre-build database files that we provide and unpack them. Please make sure that you have enough space available. The packed file consumes 31 Gb. When unpacked, it consumed additional 188 Gb.

$ cd /plenty/space
$ wget https://file-public.bihealth.org/transient/varfish/varfish-server-background-db-20201006.tar.gz{,.sha256}
$ sha256sum -c varfish-server-background-db-20201006.tar.gz.sha256
$ tar xzvf varfish-server-background-db-20201006.tar.gz

We recommend to exclude the large databases: frequency tables, extra annotations and dbSNP. Also, keep in mind that importing the whole database takes >24h, depending on the speed of your HDD.

This is a list of the possible imports, sorted by its size:

Component

Size

Exclude

Function

gnomAD_genomes

80G

highly recommended

frequency annotation

extra-annos

50G

highly recommended

diverse

dbSNP

32G

highly recommended

SNP annotation

thousand_genomes

6,5G

highly recommended

frequency annotation

gnomAD_exomes

6,0G

highly recommended

frequency annotation

knowngeneaa

4,5G

highly recommended

alignment annotation

clinvar

3,3G

highly recommended

pathogenicity classification

ExAC

1,9G

highly recommended

frequency annotation

dbVar

573M

recommended

SNP annotation

gnomAD_SV

250M

recommended

SV frequency annotation

ncbi_gene

151M

gene annotation

ensembl_regulatory

77M

frequency annotation

DGV

43M

SV annotation

hpo

22M

phenotype information

hgnc

15M

gene annotation

gnomAD_constraints

13M

frequency annotation

mgi

10M

mouse gene annotation

ensembltorefseq

8,3M

identifier mapping

hgmd_public

5,0M

gene annotation

ExAC_constraints

4,6M

frequency annotation

refseqtoensembl

2,0M

identifier mapping

ensembltogenesymbol

1,6M

identifier mapping

ensembl_genes

1,2M

gene annotation

HelixMTdb

1,2M

MT frequency annotation

refseqtogenesymbol

1,1M

identifier mapping

refseq_genes

804K

gene annotation

mim2gene

764K

phenotype information

MITOMAP

660K

MT frequency annotation

kegg

632K

pathway annotation

mtDB

336K

MT frequency annotation

tads_hesc

108K

domain annotation

tads_imr90

108K

domain annotation

vista

104K

orthologous region annotation

acmg

16K

disease gene annotation

You can find the import_versions.tsv file in the root folder of the package. This file determines which component (called table_group and represented as folder in the package) gets imported when the import command is issued. To exclude a table, simply comment out (#) or delete the line. Excluding tables that are not required for development can reduce time and space consumption. Also, the GRCh38 tables can be excluded.

A space-consumption-friendly version of the file would look like this:

build       table_group     version
GRCh37      acmg    v2.0
#GRCh37     clinvar 20200929
#GRCh37     dbSNP   b151
#GRCh37     dbVar   latest
GRCh37      DGV     2016
GRCh37      ensembl_genes   r96
GRCh37      ensembl_regulatory      latest
GRCh37      ensembltogenesymbol     latest
GRCh37      ensembltorefseq latest
GRCh37      ExAC_constraints        r0.3.1
#GRCh37     ExAC    r1
#GRCh37     extra-annos     20200704
GRCh37      gnomAD_constraints      v2.1.1
#GRCh37     gnomAD_exomes   r2.1
#GRCh37     gnomAD_genomes  r2.1
#GRCh37     gnomAD_SV       v2
GRCh37      HelixMTdb       20190926
GRCh37      hgmd_public     ensembl_r75
GRCh37      hgnc    latest
GRCh37      hpo     latest
GRCh37      kegg    april2011
#GRCh37     knowngeneaa     latest
GRCh37      mgi     latest
GRCh37      mim2gene        latest
GRCh37      MITOMAP 20200116
GRCh37      mtDB    latest
GRCh37      ncbi_gene       latest
GRCh37      refseq_genes    r105
GRCh37      refseqtoensembl latest
GRCh37      refseqtogenesymbol      latest
GRCh37      tads_hesc       dixon2012
GRCh37      tads_imr90      dixon2012
#GRCh37     thousand_genomes        phase3
GRCh37      vista   latest
#GRCh38     clinvar 20200929
#GRCh38     dbVar   latest
#GRCh38     DGV     2016

To perform the import, issue:

$ python manage.py import_tables --tables-path /plenty/space/varfish-server-background-db-20201006

Performing the import twice will automatically skip tables that are already imported. To re-import tables, add the --force parameter to the command:

$ python manage.py import_tables --tables-path varfish-db-downloader --force

Development

Working With Sodar Core

VarFish is based on the Sodar Core framework which has a developer manual itself. It is worth reading its development instructions. The following lists the most important topics:

Running Tests

Running the VarFish test suite is easy, but can take a long time to finish (>10 minutes).

$ make test

You can exclude time-consuming UI tests:

$ make test-noselenium

If you are working on one only a few tests, it is better to run them directly. To specify them, follow the path to the test file, add the class name and the test function, all separated by a dot:

$ python manage.py test -v2 --settings=config.settings.test variants.tests.test_ui.TestVariantsCaseFilterView.test_variant_filter_case_multi_bookmark_one_variant

This would run the UI tests in the variants app for the case filter view.

Working With Git

In this section we will briefly describe the workflow how to contribute to VarFish. This is not a git tutorial and we expect basic knowledge. We recommend gitready for any questions regarding git. We do use git rebase a lot.

In general, we recommend to work with git gui and gitk.

The first thing for you to do is to create a fork of our github repository in your github space. To do so, go to the VarFish repository and click on the Fork button in the top right.

Update Main

Pull with rebase on gitready

$ git pull --rebase

Create Working Branch

Always create your working branch from the latest main branch. Use the ticket number and description as name, following the format <ticket_number>-<ticket_title>, e.g.

$ git checkout -b 123-adding-useful-feature

Write A Sensible Commit Message

A commit message should only have 72 characters per line. As the first line is the representative, it should sum up everything the commit does. Leave a blank line and add three lines of github directives to reference the issue.

Fixed serious bug that prevented user from doing x.

Closes: #123
Related-Issue: #123
Projected-Results-Impact: none

Cleanup Before Pull Request

We suggest to first squash your commits and then do a rebase to the main branch.

Squash Multiple Commits (Or Use Amend)

Pull with rebase on gitready

We prefer to have only one commit per feature (most of the time there is only one feature per branch). When your branch is rebased on the main branch, do:

$ git rebase -i main

Alternatively, you can always use git commit --amend to modify your last commit. This allows you also to change your latest commit message.

Rebase To Main

Make sure your main is up-to-date. In you branch, do:

$ git checkout 123-adding-useful-feature
$ git rebase main

In case of conflicts, resolve them (find <<<< in conflicting files) and do:

$ git add conflicting.file
$ git rebase --continue

If unsure, abort the rebase:

$ git rebase --abort
Push To Origin
$ git push origin 123-adding-useful-feature

In case you squashed and/or rebased and already pushed the branch, you need to force the push:

$ git push -f origin 123-adding-useful-feature

Kiosk

The Kiosk mode in VarFish enables users to upload VCF files. This is not intended for production use as every upload will create it’s own project, so there is no way of organizing your cases properly. The mode serves only as a way to try out VarFish for external users.

Configuration

First, you need to download the VarFish annotator data (11Gb) and unpack it.

$ wget https://file-public.bihealth.org/transient/varfish/varfish-annotator-{,transcripts-}20191129.tar.gz{,.sha256}
$ tar xzvf varfish-annotator-20191129.tar.gz
$ tar xzvf varfish-transcripts-20191129.tar.gz

If you want to enable Kiosk mode, add the following lines to the .env file.

export VARFISH_KIOSK_MODE=1
export VARFISH_KIOSK_VARFISH_ANNOTATOR_REFSEQ_SER_PATH=/path/to/varfish-annotator-transcripts-20191129/hg19_refseq_curated.ser
export VARFISH_KIOSK_VARFISH_ANNOTATOR_ENSEMBL_SER_PATH=/path/to/varfish-annotator-transcripts-20191129/hg19_ensembl.ser
export VARFISH_KIOSK_VARFISH_ANNOTATOR_REFERENCE_PATH=/path/to/unpacked/varfish-annotator-20191129/hs37d5.fa
export VARFISH_KIOSK_VARFISH_ANNOTATOR_DB_PATH=/path/to/unpacked/varfish-annotator-20191129/varfish-annotator-db-20191129.h2.db
export VARFISH_KIOSK_CONDA_PATH=/path/to/miniconda/bin/activate

Run

To run the kiosk mode, simply (re)start the webserver server and the celery server.

terminal1$ make serve
terminal2$ make celery

Templates (for Issues etc.)

We do organize bug reports and feature request in the Github issue tracker. Please choose the template that fits best what you want to report and fill out the questions to help us decide on how to approach the task.

Bug Reports

The template for bug reports has the following form (an up-to-date form is located in the Github issue tracker):

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Desktop (please complete the following information):**
 - OS: [e.g. iOS]
 - Browser [e.g. chrome, safari]
 - Version [e.g. 22]

**Smartphone (please complete the following information):**
 - Device: [e.g. iPhone6]
 - OS: [e.g. iOS8.1]
 - Browser [e.g. stock browser, safari]
 - Version [e.g. 22]

**Additional context**
Add any other context about the problem here.

Root Cause Analysis

In the following, a root cause analysis (RCA) needs to be done. The ticket will get an answer with the title Root Cause Analysis and a thorough description of what might cause the bug.

Resolution Proposal

When the root cause is determined, a solution needs to be proposed, following this form:

**Resolution Proposal**
e.g. The component X needs to be changed to Y so Z is not executed when M occurs.

**Affected Components**
e.g. VarFish server

**Affected Modules/Files**
e.g. variants module or queries.py

**Required Architectural Changes**
e.g. Function F needs to be moved to X.

**Required Database Changes**
i.e. name any model that needs changing, to be added and will lead to a migration

**Backport Possible?**
e.g., "Yes" if this is a bug fix or small change and should be backported to the current stable version

**Resolution Sketch**
e.g. Change X in F. Then do Y.

Commits

Almost all commits should refer to a ticket in trailing parenthesis, e.g.

Resolve some issue (#NUMBER)

Required trailing lines are required for each commit. You must either specify Related-Issue or No-Related-Issue. Examples:

Related-Issue: #123
No-Related-Issue: Short text reason

Further, each commit should be marked whether it is expected to change filtration results with Projected-Results-Impact. Allowed values are none or require-revalidation.

Projected-Results-Impact: none
Projected-Results-Impact: require-revalidation

Fix & Pull Request

  1. Create new branch (name starts with issue number), e.g. 123-fix-for-issue

  2. Create pull request in “Draft” state

  3. Fix problem, ideally in a test-driven way, remove “Draft” state

Review & Merge

  1. Perform code review

  2. Ensure fix is documented in changelog (link to bug and PR #ids)

Feature Requests

A feature request follows the same workflow as a bug request (an up-to-date form is located in the Github issue tracker):

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.

Design

In the following, the design of the feature needs to be specified:

**Implementation Proposal**
e.g. The component X needs to be changed to Y so Z is not executed when M occurs.

**Affected Components**
e.g. VarFish server

**Affected Modules/Files**
e.g. variants module or queries.py

**Required Architectural Changes**
e.g. Function F needs to be moved to X.

**Implementation Sketch**
e.g. Change X in F. Then do Y.

Implement & Test

  1. Create feature branch, named starting with issue ID

  2. Perform implementation, ideally in a test-driven way

  3. Tests and documentation must be augmented/updated as well

Review & Merge

  1. Perform code review

  2. Ensure change is documented in changelog (link to feature issue and PR #ids)

Checklists

Releases

Prerequisites:

  • Have all issues done for the next milestone.

Tasks:

  1. Create ticket with the following template and assign it to the proper milestone.

    Release for version vVERSION
    
    - [ ] edit `HISTORY.rst` and ensure a proper section is added
    - [ ] edit `admin_upgrade.rst` to reflect the upgrade instructions
    - [ ] create a git tag `v.MAJOR.MINOR.PATCH` and `git push --tags`
    - [ ] create a "Github release` based on the tag with the text
    
          ```
          All details can be found in the `HISTORY.rst` file.
          ```
    
  2. Follow through the items.

Data & Software Validation

Prerequisites:

  • Have all background data imported into dedicated instances for validation. (Internally we use varfish-build-release-{37,38}.cubi.bihealth.org).

  • Create the varfish-site-data-X.tar.gz tarball with the database dump.

  • Have a token ready for the root user.

Tasks:

  1. Create a ticket with the following template.

    Validate data for:
    
    - **VarFish:** vMAJOR.MINOR.PATCH
    - **Site Data:** vVERSION (`sha256:CHECKSUM`)
    - **Genome Build:** GRCh37 or GRCh38
    
    Result Reports:
    
    PASTE HERE
    
  2. Use the varfish-wf-validation Snakemake workflow for running the validation.

  3. Paste the result reports into the tickets.

Docker & Data Builds

This section describes how to build the Docker images and also the VarFish site data tarballs. The intended audience are VarFish developers.

Build Docker Images

Building the image:

$ ./docker/build-docker.sh

By default the latest tag is used. You can change this with.

$ GIT_TAG=v0.1.0 ./docker/build-docker.sh

Get varfish-docker-compose

The database is built in varfish-docker-compose.

$ git clone git@github.com:bihealth/varfish-docker-compose.git
$ cd varfish-docker-compose
$ ./init.sh

First-Time Container Startup

You have to startup the postgres container once to create the Postgres database. Once it has been initialized, shutdown with Ctrl-C.

$ docker-compose up postgres
<Ctrl-C>

Now copy over the postgresql.conf file that has been tuned for the VarFish use cases.

$ cp config/postgres/postgresql.conf volumes/postgres/data/postgresql.conf

Bring up the site again so we can build the database.

$ docker-compose up

Wait until varfish-web is up and running and all migrations have been applied, look for VARFISH MIGRATIONS END in the output of run-docker-compose-up.sh.

Pre-Build Postgres Database

Download static data

$ cd /plenty/space
$ wget https://file-public.bihealth.org/transient/varfish/anthenea/varfish-server-background-db-20201006.tar.gz{,.sha256}
$ sha256sum -c varfish-server-background-db-20201006.tar.gz.sha256
$ tar xzvf varfish-server-background-db-20201006.tar.gz

Adjust the docker-compose.yml file such that /plenty/space is visible in the varfish-web container.

volumes:
    - "/plenty/space:/data"

Get the name of the running varfish-web container.

$ docker ps
CONTAINER ID   IMAGE                                                       COMMAND                  CREATED          STATUS              PORTS                                      NAMES
44be6ece102e   minio/minio                                                 "/usr/bin/docker-ent…"   11 minutes ago   Up About a minute   9000/tcp                                   varfish-docker-compose_minio_1
3b23113e5aa1   quay.io/biocontainers/exomiser-rest-prioritiser:12.1.0--1   "exomiser-rest-prior…"   11 minutes ago   Up About a minute                                              varfish-docker-compose_exomiser-rest-prioritiser_1
b8c49e8c24a6   quay.io/biocontainers/jannovar-cli:0.33--0                  "jannovar -Xmx6G -Xm…"   11 minutes ago   Up About a minute                                              varfish-docker-compose_jannovar_1
409a535b9951   bihealth/varfish-server:0.22.1-0                            "docker-entrypoint.s…"   12 minutes ago   Up About a minute   8080/tcp                                   varfish-docker-compose_varfish-celerybeat_1
7eb7425c59e2   bihealth/varfish-server:0.22.1-0                            "docker-entrypoint.s…"   12 minutes ago   Up About a minute   8080/tcp                                   varfish-docker-compose_varfish-celeryd-import_1
020811fde306   bihealth/varfish-server:0.22.1-0                            "docker-entrypoint.s…"   12 minutes ago   Up About a minute   8080/tcp                                   varfish-docker-compose_varfish-celeryd-query_1
87b03ee0249b   bihealth/varfish-server:0.22.1-0                            "docker-entrypoint.s…"   12 minutes ago   Up About a minute   8080/tcp                                   varfish-docker-compose_varfish-celeryd-default_1
7a3fdb337fae   bihealth/varfish-server:0.22.1-0                            "docker-entrypoint.s…"   12 minutes ago   Up About a minute   8080/tcp                                   varfish-docker-compose_varfish-web_1
9295a101570f   postgres:12                                                 "docker-entrypoint.s…"   12 minutes ago   Up About a minute   5432/tcp                                   varfish-docker-compose_postgres_1
1c4d6e235074   traefik:v2.3.1                                              "/entrypoint.sh --pr…"   12 minutes ago   Up About a minute   0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   varfish-docker-compose_traefik_1
8d72fd096743   redis:6                                                     "docker-entrypoint.s…"   12 minutes ago   Up About a minute   6379/tcp                                   varfish-docker-compose_redis_1

Initialize the tables (while at least docker-compose up varfish-web postgres redis is running).

$ docker exec -it -w /usr/src/app varfish-docker-compose_varfish-web_1 python manage.py import_tables --tables-path /data --threads 8

Then, shutdown the docker-compose up, remove the volumes: entry for varfish-web, and create a tarball of the postgres database to have a clean copy.

Add Other Data

Copy the other required data for jannovar and exomiser. You can find the appropriate files to download on the Jannovar (via Zenodo) and Exomiser data download sites:

You should use the hg19 data for Exomiser for any genome release as we will only use the the gene to phenotype prioritization that is independent of the genome release.

The result should look similar to this:

# tree volumes/jannovar volumes/exomiser
volumes/jannovar
├── hg19_ensembl.ser
├── hg19_refseq_curated.ser
└── hg19_refseq.ser
volumes/exomiser
├── 1909_hg19
│   ├── 1909_hg19_clinvar_whitelist.tsv.gz
.   .   [..]
│   └── 1909_hg19_variants.mv.db
└── 1909_phenotype
    ├── 1909_phenotype.h2.db
    ├── phenix
    │   ├── 10.out
    .   .   [..]
    │   ├── ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt
    │   ├── hp.obo
    │   └── phenotype_annotation.tab
    └── rw_string_10.mv

3 directories, 55 files

Create a Superuser

While the docker-compose up is running

$ docker exec -it -w /usr/src/app varfish-docker-compose_varfish-web_1 python manage.py createsuperuser
Username: root
Email address:
Password: <changeme>
Password (again): <changeme>
Superuser created successfully.

Setup Initial Data

Create test category & project.

Obtain API key and configure varfish-cli.

Import some test data through the API.

$ varfish-cli --no-verify-ssl case create-import-info --resubmit \
    92f5d735-0967-4db2-a801-50fe96359f51 \
    $(find path/to/variant_export/work/*NA12878* -name '*.tsv.gz' -or -name '*.ped')

Create Data Tarballs

Now create the released data tarballs.

tar -cf - volumes | pigz -c > varfish-site-data-v1-20210728-grch37.tar.gz && sha256sum varfish-site-data-v1-20210728-grch37.tar.gz >varfish-site-data-v1-20210728-grch37.tar.gz.sha256 &
tar -cf - volumes | pigz -c > varfish-site-data-v1-20210728-grch38.tar.gz && sha256sum varfish-site-data-v1-20210728-grch38.tar.gz >varfish-site-data-v1-20210728-grch38.tar.gz.sha256 &
tar -cf - test-data | pigz -c > varfish-test-data-v1-20211125.tar.gz && sha256sum varfish-test-data-v1-20211125.tar.gz >varfish-test-data-v1-20211125.tar.gz.sha256

ClinVar Notes

This section contains notes regarding ClinVar and its integration into VarFish. It outlines issues with the interpretation of variants as well as their resolution in VarFish and the rationale for the taken decisions.

ClinVar entries have two major labels:

variant assertion

The assertion about the pathogenicity of a variant, e.g., likely benign or pathogenic.

review status

A grading of how well a variant is reviewed. This is shown as a star rating on the ClinVar website.

Some reference ClinVar records (RVC identifiers) refer to one submission (SCV identifiers). Multiple reference ClinVar records are summarised in variant ClinVar records (VCV identifiers).

Review Status Interpretation

The interpretation of the status of a ClinVar record can be challenging. This is caused by two points.

Overall, there are the following occurences in ClinVar of clinvar assertion (June 4, 2020). Note that some only make sense together with the others (e.g., “no conflicts” only makes sense if there is more than one submission).

Count

ClinVar Status

12,342

conflicting interpretations

839,966

criteria provided

55,467

multiple submitters

71,858

no assertion criteria provided

17,068

no assertion provided

55,467

no conflicts

5,751

practice guideline

11,172

reviewed by expert panel

772,157

single submitter

In ClinVar the star ratings are assigned as follows

Stars

Description

none

no assertion criteria provided OR single submitter, no assertion provided

one

single submitter, criteria provided OR criteria provided & multiple submitters, conflicting interpretations

two

criteria provided, multiple submitters, no conflicts

three

reviewed by expert panel

four

practice guideline

In particular, the missing distinction between “no assertion criteria provided” and “no assertion provided” is misleading. Also, it can be misleading that records with an assertion criteria override those without. In several records, good literature has been curated without an assertion criteria while many records from clinical testing companies have an assertion criteria but no phenotype and less diligence has been made as with good research.

Merging of ClinVar Records

The algorithm for merging multiple records in ClinVar to display the VCV records is not public. Also, given the issues with ClinVar’s star rating from above, VarFish uses a modified display from ClinVar’s. Instead of ClinVar’s gold stars, VarFish assigns points.

Points

Condition

none

origin is somatic OR no assertion provided

one

single submitter OR multiple submitters, conflicting interpretations

two

multiple submitters, no conflicting interpretation

three

reviewed by expert panel OR practice guideline

Importantly, Varfish will still display all ClinVar records in the variant display and link out to ClinVar so the user can make their own assessment. The role of ClinVar in VarFish is to assist the user in quickly find variants present in ClinVar and not to override the user in any way.

The rationale:

  • ClinVar entries for somatic variants and those without a variant assessment are of little interest.

  • Multiple submitters are better than one submitter, regardless of the assertion criteria. Requiring assertion criteria or expert panel status is good for ClinVar to foster submission of assertion criteria or applications for expert panels but less important for VarFish users.

  • Variants for practice guideline are less important for VarFish’s use case. Thus, collapsing them with “reviewed by expert panel” should not make a problem.

VarFish merges ClinVar records based on the following algorithm.

  1. Generally, benign and likely benign is merged to likely benign/benign, same for pathogenic and likely pathogenic. Records with uncertain significance are ignored in merging if there is at least one (likely) benign/pathogenic assessment.

  2. Records flagged with practice guideline or expert panel will be assigned three points and override any other assessment. Within three point variants, practice guideline beat expert panel.

  3. In the case that there is only one record, that record’s assessment is used. Note that this will include RCV records in ClinVar that are already merged. Assign one point.

  4. In the case of two or more records:

    • Ignore uncertain significance records as outlined in (0).

    • If there are conflicting interpretations, mark the record as such.

    • Otherwise, merge likely and non-likely assertions and add no conflicting interpretation if more than one non-uncertain significance record.

    • Assign one point in case of conflicts and two points in case of consistency.

Further, each variant is annotated with an ACMG-style rating. In the case of having an “likely X/X” assertion, ACMG:1.5 or ACMG:4.5 is assigned. In the case of conflicting assertions, an ACMG score of 3 is assigned but the variant is flagged with a “C” to indicate conflicting interpretations. Note that uncertain vs. benign does not create a conflict as well as uncertain vs. pathogenic.

Examples

  1. INPUT
    • practice guideline, likely pathogenic

    • reviewed by expert panel, likely pathogenic

    • single submitter, pathogenic

    OUTPUT
    • reviewed by expert panel, likely pathogenic

    • three points; ACMG:4-LP

  2. INPUT
    • single submitter, pathogenic

    • multiple submitters, no conflict, likely pathogenic

    OUTPUT
    • multiple submitters, no conflict, likely pathogenic/pathogenic

    • two points; ACMG:4.5-LP-P

  3. INPUT
    • single submitter, pathogenic

    • single submitter, uncertain significance

    • single submitter, likely pathogenic

    OUTPUT
    • multiple submitters, no conflict, likely pathogenic/pathogenic

    • two points; ACMG:4.5-LP-P

  4. INPUT
    • single submitter, pathogenic

    • multiple submitters, uncertain significance

    OUTPUT
    • single submitter, likely pathogenic

    • one point; ACMG:4-LP

  5. INPUT
    • single submitter, pathogenic

    • multiple single submitters, likely benign

    OUTPUT
    • multiple submitters, conflicting interpretations, uncertain significance

    • one point; ACMG:3

Contributors

In alphabetical order:

  • Dieter Beule

  • Felix Boschann

  • Nadja Ehmke

  • Manuel Holtgrewe

  • Oliver Stolpe

Release Cycle

This section documents the versioning and branching model of VarFish. Generally, we follow the idea of release cycles as also employed by Ceph.

There is a new stable release every year, targeting the month of April. Each stable release receives a name (e.g., “Anthenea”) and a major release number, (e.g., 1 as “A” is the first letter of the alphabet).

Releases are named after starfish species.

Version numbers have three components, x.y.z. x identifies the release cycle (e.g., 1 for Anthenea). y identifies the release type:

  • x.0.z - development versions (the bleeding edge)

  • x.1.z - release candidates (for test users)

  • x.2.z - stable/bugfix releases (for the general public)

Stable Releases (x.2.z)

There will be a new stable release per year (“x”) with a small number of bug fixes and “trivial feature” releases (“z”). Stable releases will be supported for 14-16 months, so users have some time to upgrade

Release Candidates (x.1.z)

We will start feature freezes roughly a month before the next stable releases. The release candidates are suitable for testing the

Development Versions (x.0.z)

These releases are suitable for sites that are involved in the development of Varfish themselves or that want to track the “bleeding edge” very closely. The main developing sites (currently Berlin, Bonn) deploy self-built Docker containers from the current development branch.

Release Names

Year

Version

Release Name

Species

2022

1.y.z

Anthenea

Anthenea aspera

2023

2.y.z

Bollonaster

Bollonaster pectinatus

2024

3.y.z

Culcita

Culcita coriacea

2025

4.y.z

Doraster

Doraster constellatus

2026

5.y.z

Euretaster

Euretaster cibrosus

Releases History

Starting with the 1.0.0 release.

History / Changelog

v1.2.3 (anthenea)

End-User Summary

  • Create single result row even for multiple clinvar entries (#565).

  • Adding warning in case of truncated display (#641).

  • Adding coordinate indices on HelixMtDb and Mitomap (#635).

  • Fixing clinvar pathogenic filter (#296).

  • Improving Clinvar record aggregation (#640).

  • Fixed wrong colored WIP result rows (#673).

  • Fixing ClinVar submission XML generation (#677).

  • Regular refresh ClinVar individual from Case (#158).

  • Fixing hemizygous count display in fold-outs (#646).

  • Fixing clinvar submission sex/gender update (#686).

  • Fixing issue with phenotype name in Clinvar (#689).

  • Changing ClinVar link-out to VCV entry instead of coordinates (#693)

  • Bugfix that allow clinvar export submission set deletion (#713).

  • Adding genepanels app for defining gene panels (#723).

  • Allow excluding cases from in-house database (#579).

  • Allow to upload per-case gene annotation (#575).

  • Adding varannos app (#747).

  • Adding ACMG v3.0 + v3.1 incidental findings to gene allowlist preset (#829).

  • Adding locus link-out to genoox Franklin (#748).

Full Change List

  • Create single result row even for multiple clinvar entries (#565).

  • Adding warning in case of truncated display (#641).

  • Adding coordinate indices on HelixMtDb and Mitomap (#635).

  • Fixing Docker builds (#660)

  • Fixing clinvar pathogenic filter (#296).

  • Improving Clinvar record aggregation (#640).

  • Fixed wrong colored WIP result rows (#673).

  • Fixing ClinVar submission XML generation (#677).

  • Regular refresh ClinVar individual from Case (#158).

  • Fixing hemizygous count display in fold-outs (#646).

  • Fixing clinvar submission sex/gender update (#686).

  • Fixing issue with phenotype name in Clinvar (#689).

  • Changing ClinVar link-out to VCV entry instead of coordinates (#693)

  • Adding unit tests for clinvar export Vue app (#692)

  • Move varfish export Vue app (#711)

  • Bugfix that allow clinvar export submission set deletion (#713).

  • Removing dependency on bootstrap-vue (#716)

  • Migrating clinvar export to Pinia/Vue3 (#720).

  • Adding genepanels app for defining gene panels (#723).

  • Allow excluding cases from in-house database (#579).

  • Allow to upload per-case gene annotation (#575).

  • Add missing directory in Dockerfile.

  • Adding varannos app (#747).

  • Adding ACMG v3.0 + v3.1 incidental findings to gene allowlist preset (#829).

  • Adding locus link-out to genoox Franklin (#748).

v1.2.2 (anthenea)

End-User Summary

  • Add Transcripts GnomadAD constraints and clinvar reports in the export (#568).

  • Extra annotations in export completed and tested (#495).

  • Fixed bug where Exac and thousand genomes settings were not shown in frequency tab for GRCh37 (#597).

  • Form template reports error if genomebuild variable is not set (#607).

  • Added locus link-out for genoox Franklin (#748).

Full Change List

  • Extra annotations in export completed and tested (#495).

  • Fixing issue with sync-from-remote when no remote is defined (#570).

  • Fixed bug where Exac and thousand genomes settings were not shown in frequency tab for GRCh37 (#597).

  • Form template reports error if genomebuild variable is not set (#607).

  • Added locus link-out for genoox Franklin (#748).

v1.2.1 (anthenea)

End-User Summary

  • Starting with branch of stable version Athenea (VarFish v1).

  • Documenting problem with extra annotations in 20210728` data release (#450). Includes instructions on how to apply patch to get ``20210728b.

  • Removing problematic username modification behaviour on login page (#459).

  • Displaying login page text from settings again (#458).

  • Suppress “submit to CADD” and “submit to SPANR” buttons for multi-case form (#478). This has not been implemented so far.

  • Fixing paths in “Variant Ingest” documentation (#472).

  • Small extension of “Resolution proposal” template (#472).

  • Adjusting wrong release name to “anthenea” (#479).

  • Adding “show all variant carriers” feature (#470).

  • Properly display the clinvar annotations that we have in the database (#464).

  • Adjusting default frequency filters for “clinvar pathogenic” filter: remove all threshold (#464).

  • Adding note about difference with upstream Clinvar (#464).

  • Switching scoring to MutationTaster 85 interface, added back MT 85 link-out alongside MT 2021 link-out (#509).

  • Made flag filter and flag form nomenclature consistent (#297).

  • Fixed broken VariantValidator query (#523).

  • Fixed smallvariant flags filter query (#502).

  • Added flags segregates, doesnt_segregate and no_disease_association to file export (#502).

  • Adding feature to enable and configure link-out to HGMD (#576).

Full Change List

  • Starting with branch of stable version Athenea (VarFish v1).

  • Documenting problem with extra annotations in 20210728` data release (#450). Includes instructions on how to apply patch to get ``20210728b.

  • Removing problematic username modification behaviour on login page (#459).

  • Displaying login page text from settings again (#458).

  • Suppress “submit to CADD” and “submit to SPANR” buttons for multi-case form (#478). This has not been implemented so far.

  • Fixing paths in “Variant Ingest” documentation (#472).

  • Small extension of “Resolution proposal” template (#472).

  • Adjusting wrong release name to “anthenea” (#479).

  • Adding “show all variant carriers” feature (#470).

  • Properly display the clinvar annotations that we have in the database (#464).

  • Adjusting default frequency filters for “clinvar pathogenic” filter: remove all threshold (#464).

  • Adding note about difference with upstream Clinvar (#464).

  • Switching scoring to MutationTaster 85 interface, added back MT 85 link-out alongside MT 2021 link-out (#509).

  • Made flag filter and flag form nomenclature consistent (#297).

  • Fixed broken VariantValidator query (#523).

  • Fixed smallvariant flags filter query (#502).

  • Added flags segregates, doesnt_segregate and no_disease_association to file export (#502).

  • Converted not cooperative tooltip to standard title on Filter & Display button (#508).

  • Adding feature to enable and configure link-out to HGMD (#576).

v1.2.0

This is the first stable VarFish Server release. It is the same as v1.1.4.

v1.1.4

End-User Summary

Full Change List

  • Installing same postgres version as in docker-compose server (12).

v1.1.3

End-User Summary

  • Fixing problem with import info display for non-superusers (#431)

  • Schema and documentation for case QC info (#428)

  • Adding support for HGNC IDs in gene allow lists (#432)

  • PanelApp will now populate the gene allow list with HGNC gene IDs (#432)

Full Change List

  • Fixing problem with import info display for non-superusers (#431)

  • Schema and documentation for case QC info (#428)

  • Adding support for HGNC IDs in gene allow lists (#432)

  • PanelApp will now populate the gene allow list with HGNC gene IDs (#432)

  • Adding pg_dump admin command and documentation (#430)

v1.1.2

End-User Summary

  • Fixing bug in XLSX export (#417)

  • Fixing problem with multi-sample queries (#419)

  • Fixing issue with cohort queries (#420)

  • Fixing issue with mutationtaster queries (#423)

  • Fixing problem with multi-variant update (#419)

Full Change List

  • Fixing bug in corner case of multi variant annotation (#412)

  • Updating documentation for v1 release (#410)

  • Fixing issue with fa-solid:refresh icon (#409)

  • Fixing page titles (#409)

  • Fixing bug in XLSX export (#417)

  • Fixing problem with multi-sample queries (#419). This is done by rolling back adding the _ClosingWrapper class. We will need a different approach for the queries than was previously attempted here.

  • Fixing issue with cohort queries (#420)

  • Fixing issue with mutationtaster queries (#423)

  • Fixing problem with multi-variant update (#419)

v1.1.1

This is the first release candidate of the VarFish “Anthenea” release (v1). Importantly, the first stable release for v1 will be v1.2.0 (see Release Cycle Documentation for a full explanation of version semantics).

This release adds some more indices so the migrations might take some more time.

End-User Summary

  • Fixing problem with CNV import (#386)

  • Fixing problem with user annotation of nonexistent variants (#404)

Full Change List

  • Adding REST API for generating query shortcuts (#367)

  • Filter queries in REST API to selected case and not all by user

  • Fixing problem with CNV import (#386)

  • Adding index to improve beaconsite performance (#389)

  • Adding missing mdi iconset (#284)

  • Strip trailing slashes in beconsite entrypoints (#388)

  • Documenting PAP setup (#393)

  • Adding more indices (#395)

  • Fixing discrepancy with REST API query shortcuts (#402)

v1.1.0

This is the first release candidate of the VarFish “Anthenea” release (v1). Importantly, the first stable release for v1 will be v1.2.0 (see Release Cycle Documentation for a full explanation of version semantics).

Breaking changes, see below.

End-User Summary

  • Fixing Kiosk mode of VarFish.

  • Fixing displaying of beacon information in results table.

  • Fixing broken flags & comments popup for structural variants.

  • Fixing broken search field.

  • Extended manual for bug report workflow.

  • Fixed recompute of variant stats of large small variant sets.

  • Added index for SmallVariant model filtering for case_id and set_id. This may take a while!

  • Allowing project owners and delegates to import cases via API (#207).

  • Fix for broken link-out into MutationTaster (#240).

  • Fixing SODAR Core template inconsistency (#150).

  • Imports via API now are only allowed for projects of type PROJECT (#237).

  • Fixing ensembl gene link-out to wrong genome build (#156).

  • Added section for developers in manual (#267).

  • Updating Clinvar export schema to 1.7 version (#226).

  • Migrated icons to iconify (#208).

  • Bumped chrome-driver version (#208).

  • VarFish now allows for the import of GRCh38 annotated variants. For this, GRCh38 background data must be imported. Kiosk mode does not support GRCh38 yet. This is a breaking change, new data and CLI must be used!

  • Added feature to select multiple rows in results to create same annotation (#259)

  • Added parameter to Docker entrypoint file to accept number of gunicorn workers

  • Extended documentation for how to update specific tables (#177)

  • Improving performance of project overview (#303)

  • Improving performance of case listing (#304)

  • Adding shortcut buttons to phenotype annotation (#289)

  • Fixing issue with multiple added variants (#283)

  • Implementing several usability improvements for clinvar submission editor (#286)

  • Make clinvar UI work with many annotations (#302)

  • Fixing CADD annotation (#319)

  • Adding mitochondrial inheritance to case phenotype annotation (#325)

  • Fix issue with variant annotation export (#328)

  • Allowing direct update of variant annotations and ACMG ratings on case annotations details (#344)

  • Fixing problem with ACMD classifiction where VUS-3 was given but should be LB-2 (#359)

  • Adding REST API for creating small variant queries (#332)

  • Fixing beaconsite queries with dots in the key id (#369)

  • Allowing joint queries of larger cohorts (#241)

  • Documenting Clinical Beacon v1 protocol

  • Improving performance for fetching result queries (#371)

  • Capping max. number of cases to query at once (#372)

  • Documenting release cycle and branch names

  • Add extra annotations, i.e. additional variant scores to the filtered variants (#242)

  • Fixing bug in project/cohort filter (#379)

Full Change List

  • Resolving problem with varfish-kiosk.
    • Auto-creating user kiosk_user when running in Kiosk mode.

    • Using custom middleware for kiosk user (#215).

  • Kiosk annotation now uses set -x flag if settings.DEBUG is true.

  • Mapping kiosk jobs to import queue.

  • Fixing displaying of beacon information in results table.

  • Fixing broken flags & comments popup for structural variants.

  • Fixing broken search field.

  • Extended manual for bug report workflow.

  • Fixed recompute of variant stats of large small variant sets.

  • Added index for SmallVariant model filtering for case_id and set_id. This may take a while!

  • Allowing project owners and delegates to import cases via API (#207).

  • Fix for broken link-out into MutationTaster (#240).

  • Fixing SODAR Core template inconsistency (#150).

  • Imports via API now are only allowed for projects of type PROJECT (#237).

  • Fixing ensembl gene link-out to wrong genome build (#156).

  • Added section for developers in manual (#267).

  • Updating Clinvar export schema to the latest 1.7 version (#226).

  • Migrated icons to iconify (#208).

  • Bumped chrome-driver version (#208).

  • Skipping codacy if token is not defined (#275).

  • Adjusting models and UI for supporting GRCh38 annotated cases. It is currently not possible to migrate a GRCh37 case to GRCh38.

  • Adjusting models and UI for supporting GRCh38 annotated cases. It is currently not possible to migrate a GRCh37 case to GRCh38.

  • Setting VARFISH_CADD_SUBMISSION_RELEASE is called VARFISH_CADD_SUBMISSION_VERSION now (breaking change).

  • import_info.tsv expected as in data release from 20210728 as built from varfish-db-downloader 1b03e97 or later.

  • Extending columns of Hgnc to upstream update.

  • Added feature to select multiple rows in results to create same annotation (#259)

  • Added parameter to Docker entrypoint file to accept number of gunicorn workers

  • Extended documentation for how to update specific tables (#177)

  • Improving performance of project overview (#303)

  • Improving performance of case listing (#304)

  • Adding shortcut buttons to phenotype annotation (#289)

  • Fixing issue with multiple added variants (#283)

  • Make clinvar UI work with many annotations by making it load them lazily for one case at a time (#302)

  • Implementing several usability improvements for clinvar submission editor (#286)

  • Adding CI builds for Python 3.10 in Github actions, bumping numpy/pandas dependencies. Dropping support for Python 3.7.

  • Fixing CADD annotation (#319)

  • Adding mitochondrial inheritance to case phenotype annotation (#325)

  • Fix issue with variant annotation export (#328)

  • Adding REST API versioning (#333)

  • Adding more postgres versions to CI (#337)

  • Make migrations compatible with Postgres 14 (#338)

  • DgvSvs and DgvGoldStandardSvs are two different data sources now

  • Adding deep linking into case details tab (#344)

  • Allowing direct update of variant annotations and ACMG ratings on case annotations details (#344)

  • Removing display_hgmd_public_membership (#363)

  • Fixing problem with ACMD classifiction where VUS-3 was given but should be LB-2 (#359)

  • Adding REST API for creating small variant queries (#332)

  • Upgrading sodar-core dependency to 0.10.10

  • Fixing beaconsite queries with dots in the key id (#369)

  • Allowing joint queries of larger cohorts (#241) This is achieved by performing fewer UNION queries (at most VARFISH_QUERY_MAX_UNION=20 at one time)

  • Documenting Clinical Beacon v1 protocol

  • Improving performance for fetching result queries (#371)

  • Fix to support sodar-core v0.10.10

  • Capping max. number of cases to query at once (#372)

  • Documenting release cycle and branch names

  • Checking commit message trailers (#323)

  • Add extra annotations to the filtered variants (#242)

  • Fixing bug in project/cohort filter (#379)

v0.23.9

End-User Summary

  • Bugfix release.

Full Change List

  • Fixing bugs that prevented properly running in production environment.

v0.23.8

End-User Summary

  • Added SAML Login possibility from sodar-core to varfish

  • Upgraded some icons and look and feel (via sodar-core).

Full Change List

  • Fixing bug that occured when variants were annotated earlier by the user with the variant disappering later on. This could be caused if the case is updated from singleton to trio later on.

  • Added sso urls to config/urls.py

  • Added SAML configuration to config/settings/base.py

  • Added necessary tools to the Dockerfile

  • Fix for missing PROJECTROLES_DISABLE_CATEGORIES variable in settings.

  • Upgrading sodar-core dependency. This implies that we now require Python 3.7 or later.

  • Upgrading various other packages including Django itself.

  • Docker images are now published via ghcr.io.

v0.23.7

IMPORTANT

This release contains a critical update. Prior to this release, all small and structural variant tables were marked as UNLOGGED. This was originally introduce to improve insert performance. However, it turned out that stability is greatly decreased. In the case of a PostgreSQL crash, these tables are emptied. This change should have been rolled back much earlier but that rollback was buggy. This release now includes a working and verified fix.

End-User Summary

  • Fixing stability issue with database schema.

Full Change List

  • Bump sodar-core to hotfix version. Fixes problem with remote permission synchronization.

  • Adding migration to mark all UNLOGGED tables back to LOGGED. This should have been reverted earlier but because of a bug it did not.

  • Fixing CI by calling sudo apt-get update once more.

v0.23.6

End-User Summary

  • Fixing problem with remote permission synchronization.

Full Change List

  • Bump sodar-core to hotfix version. Fixes problem with remote permission synchronization.

v0.23.5

End-User Summary

  • Adding back missing manual.

  • Fixing undefined variable bug.

  • Fixing result rows not colored anymore.

  • Fixing double CSS import.

Full Change List

  • Fixing problem with PROJECTROLES_ADMIN_OWNER being set to admin default but the system user being root in the prebuilt databases. The value now defaults to root.

  • Adding back missing manual in Docker image.

  • Fixing problem with “stopwords” corpus of nltk not being present. This is now downloaded when building the Docker image.

  • Fixing undefined variable bug.

  • Fixing result rows not colored anymore.

  • Fixing double CSS import.

v0.23.4

End-User Summary

  • Fixing issue of database query in Clinvar Export feature where too large queries were created.

  • Fixing search feature.

Full Change List

  • Docker image now includes commits to the next tag so the versioneer version display makes sense.

  • Dockerfile entrypoint script uses timeout of 600s now for guniorn workers.

  • Fixing issue of database query in Clinvar Export feature where too large queries were created and postgres ran out of stack memory.

  • Adding more Sentry integrations (redis, celery, sqlalchemy).

  • Fixing search feature.

v0.23.3

End-User Summary

  • Bug fix release.

Full Change List

  • Bug fix release where the clinvar submission Vue.js app was not built.

  • Fixing env file example for SENTRY_DSN.

v0.23.2

End-User Summary

  • Bug fix release.

Full Change List

  • Bug fix release where Javascript was missing.

v0.23.1

End-User Summary

  • Allowing to download all users annotation for whole project in one Excel/TSV file.

  • Improving variant annotation overview per case/project and allowing download.

  • Adding “not hom. alt.” filter setting.

  • Allowing users to easily copy case UUID by icon in case heading.

  • Fixing bug that made the user icon top right disappear.

Full Change List

  • Allowing to download all users annotation for whole project in one Excel/TSV file.

  • Using SQL Alchemy query instrastructure for per-case/project annotation feature.

  • Removing vendored JS/CSS, using CDN for development and download on Docker build instead.

  • Adding “not hom. alt.” filter setting.

  • Improving admin configuration documentation.

  • Extending admin tuning documentation.

  • Allowing users to easily copy case UUID by icon in case heading.

  • Fixing bug that made the user icon top right disappear when beaconsite was disabled.

  • Upgrade to sodar-core v0.9.1

v0.23.0

End-User Summary

  • Fixed occasionally breaking tests ProjectExportTest by sorting member list. This bug didn’t affect the correct output but wasn’t consistent in the order of samples.

  • Fixed above mentioned bug again by consolidating two distinct Meta classes in Case model.

  • Fixed bug in SV tests that became visibly by above fix and created an additional variant that wasn’t intended.

  • Adapted core installation instructions in manual for latest data release and introduced use of VarFish API for import.

  • Allowing (VarFish admins) to import regulatory maps. Users can use these maps when analyzing SVs.

  • Adding “padding” field to SV filter form (regulatory tab).

  • Celerybeat tasks in variants app are now executing again.

  • Fixed check_installation management command. Index for dbsnp was missing.

  • Bumped chromedriver version to 87.

  • Fixed bug where file export was not possible when nubmer of resulting variants were < 10.

  • Fixed bug that made it impossible to properly sort by genotype in the results table.

  • Cases can now be annotated with phenotypes and diseases. To speed up annotation, all phenotypes of all previous queries are listed for copy and paste. SODAR can also be queried for phenotypes.

  • Properly sanitized output by Exomiser.

  • Rebuild of variant summary database table happens every Sunday at 2:22am.

  • Added celery queues maintenance and export.

  • Adding support for connecting two sites via the GAGH Beacon protocol.

  • Adding link-out to “GenCC”.

  • Adding “submit to SPANR” feature.

Full Change List

  • Fixed occasionally breaking tests ProjectExportTest by sorting member list. This bug didn’t affect the correct output but wasn’t consistent in the order of samples. Reason for this is unknown but might be that the order of cases a project is not always returned as in order they were created.

  • Fixed above mentioned bug again by consolidating two distinct Meta classes in Case model.

  • Fixed bug in SV tests that became visibly by above fix and created an additional variant that wasn’t intended.

  • Adapted core installation instructions in manual for latest data release and introduced use of VarFish API for import.

  • Adding regmaps app for regulatory maps.

  • Allowing users to specify padding for regulatory elements.

  • Celerybeat tasks in variants app are now executing again. Issue was a wrong decorator.

  • Fixed check_installation management command. Index for dbsnp was missing.

  • Bumped chromedriver version to 87.

  • Fixed bug where file export was not possible when number of resulting variants were < 10.

  • Fixed bug that made it impossible to properly sort by genotype in the results table.

  • Adding tests for upstream sychronization backend code.

  • Allowing users with the Contributor role to a project to annotate cases with phenotype and disease terms. They can obtain the phenotypes from all queries of all users for a case and also fetch them from SODAR.

  • Adding files for building Docker images and documenting Docker (Compose) deployment.

  • Properly sanitized output by Exomiser.

  • Rebuild of variant summary database table happens every Sunday at 2:22am.

  • Added celery queues maintenance and export.

  • Adding support for connecting two sites via the GAGH Beacon protocol.

  • Making CADD version behind CADD REST API configurable.

  • Adding link-out to “GenCC”.

  • Adding “submit to SPANR” feature.

v0.22.1

End-User Summary

  • Bumping chromedriver version.

  • Fixed extra-annos import.

Full Change List

  • Bumping chromedriver version.

  • Fixed extra-annos import.

v0.22.0

End-User Summary

  • Fixed bug where some variant flags didn’t color the row in filtering results after reloading the page.

  • Fixed upload bug in VarFish Kiosk when vcf file was too small.

  • Blocking upload of VCF files with GRCh38/hg38/hg19 builds for VarFish Kiosk.

  • Support for displaying GATK-gCNV SVs.

  • Tracking global maintenance jobs with background jobs and displaying them to super user.

  • Adding “Submit to CADD” feature similar to “Submit to MutationDistiller”.

  • Increased default frequency setting of HelixMTdb max hom filter to 200 for strict and 400 for relaxed.

  • It is now possible to delete ACMG ratings by clearing the form and saving it.

  • Fixed bug when inheritance preset was wrongly selected when switching to variant in an index-only case.

  • Added hemizygous counts filter option to frequency filter form.

  • Added synonymous effect to be also selected when checking all coding/deep intronic preset.

  • Saving uploads pre-checking in kiosk mode to facilitate debugging.

  • Kiosk mode also accepts VCFs based on hg19.

  • VariantValidator output now displays three-letter representation of AA.

  • Documented new clinvar aggregation method and VarFish “point rating”.

  • Implemented new clinvar data display in variant detail.

  • Added feature to assemble cohorts from cases spanning multiple projects and filter for them in a project-like query.

  • Added column to results list indicating if a variant lies in a disease gene, i.e. a gene listed in OMIM.

  • Displaying warning if priorization is not enabled when entering HPO terms.

  • Added possibility to import “extra annotations” for display along with the variants.

  • On sites deployed by BIH CUBI, we make the CADD, SpliceAI, MMSp, and dbscSNV scores available.

  • In priorization mode, ORPHA and DECIPHER terms are now selectable.

  • Fixed bug of wrong order when sorting by LOEUF score.

  • Adding some UI documenation.

  • Fixed bug where case alignment stats were not properly imported.

  • Fixed bug where unfolding smallvariant details of a variant in a cohort that was not part of the base project caused a 404 error.

  • Fixed bug that prevented case import from API.

  • Increased speed of listing cases in case list view.

  • Fixed bug that prevented export of project-wide filter results as XLS file.

  • Adjusted genotype quality relaxed filter setting to 10.

  • Added column with family name to results table of joint filtration.

  • Added export of filter settings as JSON to structural variant filter form.

  • Varseak Splicing link-out also considers refseq transcript.

  • Fixed bug that occurred when sample statistics were available but sample was marked with having no genotype.

  • Adjusted genotype quality strict filter setting to 10.

  • Added possibility to export VCF file for cohorts.

  • Increased logging during sample variant statistics computation.

  • Using gnomAD exomes as initially selected frequency in results table.

  • Using CADD as initially selected score metric in prioritization form.

  • Fixed missing disease gene and mode of inheritance annotation in project/cohort filter results table.

  • Catching errors during Kiosk annotation step properly.

  • Fixed issues with file extension check in Kiosk mode during upload.

  • “1” is now registered as heterozygous and homozygous state in genotype filter.

  • Loading annotation and QC tabs in project cases list asyncronously.

  • Increased timeout for VariantValidator response to 30 seconds.

  • Digesting more VariantValidator responses.

  • Fixed bug where when re-importing a case, the sample variants stats computation was performed on the member list of the old case. This could lead to the inconsistent state that when new members where added, the stats were not available for them. This lead to a 500 error when displaying the case overview page.

  • Fixed missing QC plots in case detail view.

  • Fixed bug in case VCF export where a variant existing twice in the results was breaking the export.

  • Fixed log entries for file export when pathogenicity or phenotype scoring was activated.

  • Bumped Chrome Driver version to 84 to be compatible with gitlab CI.

  • CADD is now selected as default in pathogenicity scoring form (when available).

  • Added global maintenance commands to clear old kiosk cases, inactive variant sets and expired exported files.

  • Added SvAnnotationReleaseInfo model, information is filled during import and displayed in case detail view.

  • Fixed bug that left number of small variants empty when they actually existed.

  • Increased logging during case import.

  • Marked old style import as deprecated.

  • Fixed bug that prevented re-import of SVs.

  • Fixed bug where a re-import of genotypes was not possible when the same variant types weren’t present as in the initial import.

  • Fixed bug where imported state of CaseImportInfo was already set after importing the first variant set.

  • Integrated Genomics England PanelApp.

  • Added command to check selected indexes and data types in database.

  • Added columns to results table: cDNA effect, protein effect, effect text, distance to splicesite.

  • Made effect columns and distance to splicesite column hide-able.

  • Added warning to project/cohort query when a user tries to load previous results where not all variants are accessible.

  • Renamed all occurrences of whitelist to allowlist and of blacklist to blocklist (sticking to what google introduced in their products).

  • Fixed bug where cases were not deletable when using Chrome browser.

  • Harmonized computation for relatedness in project-wide QC and in case QC (thus showing the same results if project only contains one family).

  • Fixed failing case API re-import when user is not owner of previous import.

  • Added PROJECTROLES_EMAIL_ to config.

  • Avoiding variants with asterisk alternative alleles.

Full Change List

  • Fixed bug where some variant flags didn’t color the row in filtering results after reloading the page.

  • Fixed upload bug in VarFish Kiosk when vcf file was too small and the file copy process didn’t flush the file completely resulting in only a parly available header.

  • Blocking upload of VCF files with GRCh38/hg38/hg19 builds for VarFish Kiosk.

  • Bumping sodar-core dependency to v0.8.1.

  • Using new sodar-core REST API infrastructure.

  • Using sodar-core tokens app instead of local one.

  • Support for displaying GATK-gCNV SVs.

  • Fix of REST API-based import.

  • Tracking global maintenance jobs with background jobs.

  • Global background jobs are displayed with site plugin point via bgjobs.

  • Bumping Chromedriver to make CI work.

  • Adding “Submit to CADD” feature similar to “Submit to MutationDistiller”.

  • Increased default frequency setting of HelixMTdb max hom filter to 200 for strict and 400 for relaxed.

  • It is now possible to delete ACMG ratings by clearing the form and saving it.

  • Updated reference and contact information.

  • File upload in Kiosk mode now checks for VCF file without samples.

  • Fixed bug when inheritance preset was wrongly selected when switching to variant in an index-only case.

  • Added hemizygous counts filter option to frequency filter form.

  • Added synonymous effect to be also selected when checking all coding/deep intronic preset.

  • Saving uploads pre-checking in kiosk mode to facilitate debugging.

  • Kiosk mode also accepts VCFs based on hg19.

  • VariantValidator output now displays three-letter representation of AA.

  • Documented new clinvar aggregation method and VarFish “point rating”.

  • Implemented new clinvar data display in variant detail.

  • Case/project overview allows to download all annotated variants as a file now.

  • Querying for annotated variants on the case/project overview now uses the common query infrastructure.

  • Updating plotly to v0.54.5 (displays message on missing WebGL).

  • Added feature to assemble cohorts from cases spanning multiple projects and filter for them in a project-like query.

  • Added column to results list indicating if a variant lies in a disease gene, i.e. a gene listed in OMIM.

  • Displaying warning if priorization is not enabled when entering HPO terms.

  • Added possibility to import “extra annotations” for display along with the variants.

  • On sites deployed by BIH CUBI, we make the CADD, SpliceAI, MMSp, and dbscSNV scores available.

  • In priorization mode, ORPHA and DECIPHER terms are now selectable.

  • Fixed bug of wrong order when sorting by LOEUF score.

  • Adding some UI documenation.

  • Fixed bug where case alignment stats were not properly imported. Refactored case import in a sense that the new variant set gets activated when it is successfully imported.

  • Fixed bug where unfolding smallvariant details of a variant in a cohort that was not part of the base project caused a 404 error.

  • Fixed bug that prevented case import from API.

  • Increased speed of listing cases in case list view.

  • Fixed bug that prevented export of project-wide filter results as XLS file.

  • Adjusted genotype quality relaxed filter setting to 10.

  • Added column with family name to results table of joint filtration.

  • Added export of filter settings as JSON to structural variant filter form.

  • Varseak Splicing link-out also considers refseq transcript. This could lead to inconsistency when Varseak picked the wrong transcript to the HGVS information.

  • Fixed bug that occurred when sample statistics were available but sample was marked with having no genotype.

  • Adjusted genotype quality strict filter setting to 10.

  • Added possibility to export VCF file for cohorts.

  • Increased logging during sample variant statistics computation.

  • Using gnomAD exomes as initially selected frequency in results table.

  • Using CADD as initially selected score metric in prioritization form.

  • Fixed missing disease gene and mode of inheritance annotation in project/cohort filter results table.

  • Catching errors during Kiosk annotation step properly.

  • Fixed issues with file extension check in Kiosk mode during upload.

  • “1” is now registered as heterozygous and homozygous state in genotype filter.

  • Loading annotation and QC tabs in project cases list asyncronously.

  • Increased timeout for VariantValidator response to 30 seconds.

  • Digesting more VariantValidator responses, namely intergenic_variant_\d+ and validation_warning_\d+.

  • Fixed bug where when re-importing a case, the sample variants stats computation was performed on the member list of the old case. This could lead to the inconsistent state that when new members where added, the stats were not available for them. This lead to a 500 error when displaying the case overview page.

  • Fixed missing QC plots in case detail view.

  • Fixed bug in case VCF export where a variant existing twice in the results was breaking the export.

  • Fixed log entries for file export when pathogenicity or phenotype scoring was activated. The variants are sorted by score in this case which led to messy logging which was designed for logging when the chromosome changes.

  • Bumped Chrome Driver version to 84 to be compatible with gitlab CI.

  • CADD is now selected as default in pathogenicity scoring form (when available).

  • Added global maintenance commands to clear old kiosk cases, inactive variant sets and expired exported files.

  • Added SvAnnotationReleaseInfo model, information is filled during import and displayed in case detail view.

  • Fixed bug that left number of small variants empty when they actually existed. This happened when SNVs and SVs were imported at the same time.

  • Increased logging during case import.

  • Marked old style import as deprecated.

  • Fixed bug that prevented re-import of SVs by altering the unique constraint on the StructuralVariant table.

  • Fixed bug where a re-import of genotypes was not possible when the same variant types weren’t present as in the initial import. This was done by adding a state field to the VariantSetImportInfo model.

  • Fixed bug where imported state of CaseImportInfo was already set after importing the first variant set.

  • Integrated Genomics England PanelApp via their API.

  • Added command to check selected indexes and data types in database.

  • Added columns to results table: cDNA effect, protein effect, effect text, distance to splicesite.

  • Made effect columns and distance to splicesite column hide-able.

  • Added warning to project/cohort query when a user tries to load previous results where not all variants are accessible.

  • Renamed all occurrences of whitelist to allowlist and of blacklist to blocklist (sticking to what google introduced in their products).

  • Fixed bug where cases were not deletable when using Chrome browser.

  • Harmonized computation for relatedness in project-wide QC and in case QC (thus showing the same results if project only contains one family).

  • Fixed failing case API re-import when user is not owner of previous import. Now also all users with access to the project (except guests) can list the cases.

  • Added PROJECTROLES_EMAIL_ to config.

  • Avoiding variants with asterisk alternative alleles.

v0.21.0

End-User Summary

  • Added preset for mitochondrial filter settings.

  • Fixed bug where HPO name wasn’t displayed in textarea after reloading page.

  • Added possibility to enter OMIM terms in phenotype prioritization filter.

  • Added maximal exon distance field to Variants & Effects tab.

  • Adapted HelixMTdb filter settings, allowing to differntiate between hetero- and homoplasmy counts.

  • Increased default max collective background count in SV filter from 0 to 5.

  • Included lists of genomic regions, black and white genelists and reworked HPO list in table header as response for what was filtered for (if set).

  • Added molecular assessment flag for variant classification.

  • Fixed bug where activated mitochondrial frequency filter didn’t include variants that had no frequency database entry.

  • Added inheritance preset and quick preset for X recessive filter.

  • Removed VariantValidator link-out.

  • Now smallvariant comments, flags and ACMG are updating in the smallvariant details once submitted.

  • Deleting a case (only possible as root) runs now as background job.

  • Fixed bug in compound heterozygous filter with parents in pedigree but without genotype that resulted in variants in genes that didn’t match the pattern.

  • Bumped django version to 1.11.28 and sodar core version to bug fix commit.

  • Fixed bug where structural variant results were not displayed anymore after introduced molecular assessment flag.

  • Fixed bug where variant comments and flags popup was not shown in structural variant results after updating smallvariant details on the fly.

  • Made Download as File and Submit to MutationDistiller buttons more promiment.

  • Adapted preset settings for ClinVar Pathogenic setting.

  • Finalized mitochondrial presets.

  • Added identifier to results table and smallvariant details when mitochondrial variant is located in D-loop region in mtDB.

  • Fixed per-sample metrics in case variant control.

  • Made ACMG and Beacon popover disappear when clicking anywhere.

  • Fixed bug when a filter setting with multiple HPO terms resulted in only showing one HPO term after reloading the page.

  • Extended information when entering the filter page and no previous filter job existed.

  • Disabled relatedness plot for singletons.

  • Replaced tables in case QC with downloadable TSV files.

  • QC charts should now be displayed properly.

  • Consolidated flags, comments and ACMG rating into one table in the case detail view, with one table for small variants and one for structural variants.

  • Added VariantValidator link to submit to REST API.

  • Fixed alignment stats in project-wide QC.

  • Added more documentation throughout the UI.

  • Added option to toggle displaying of logs during filtration, by default they are hidden.

  • Fixed broken displaying of inhouse frequencies in variant detail view.

  • Added variant annotation list (comments, flags, ACMG ratings) to project-wide info page.

  • Row in filter results now turns gray when any flag is set (except bookmark flag; summary flag still colours in other colour).

  • Fixed bug where comments and flags in variant details weren’t updated when the variant details have been opened before.

  • Added QC TSV download and per-sample metrics table to projec-wide QC.

  • Removed ExAC locus link in result list, added gnomAD link to gene.

  • Catching connection exceptions during file export with enabled pathogenicity and/or phenotype scoring.

  • Fixed project/case search that delivered search results for projects that the searching user had no access to (only search was affected, access was not granted).

  • Made case comments count change in real time.

Full Change List

  • Added preset for mitochondrial filter settings.

  • Fixed bug where HPO name wasn’t displayed in textarea after reloading page. HPO terms are now also checked for validity in textbox on the fly.

  • Added possibility to enter OMIM terms in phenotype prioritization filter. The same textbox as for HPO terms also accepts OMIM terms now.

  • Added maximal exon distance field to Variants & Effects tab.

  • (Hopefully) fixing importer bug (#524).

  • Adapted HelixMTdb filter settings, allowing to differntiate between hetero- and homoplasmy counts.

  • Fixed inactive filter button to switch from SV filter to small variant filter.

  • Increased default max collective background count in SV filter from 0 to 5.

  • Included lists of genomic regions, black and white genelists and reworked HPO list in table header as response for what was filtered for (if set).

  • Added molecular assessment flag for variant classification.

  • Fixed bug where activated mitochondrial frequency filter didn’t include variants that had no frequency database entry.

  • Added inheritance preset and quick preset for X recessive filter.

  • Removed VariantValidator link-out.

  • Now smallvariant comments, flags and ACMG are updating in the smallvariant details once submitted.

  • Deleting a case (only possible as root) runs now as background job.

  • Fixed bug in compound heterozygous filter with parents in pedigree but without genotype that resulted in variants in genes that didn’t match the pattern.

  • Bumped django version to 1.11.28 and sodar core version to bug fix commit.

  • Fixed bug where structural variant results were not displayed anymore after introduced molecular assessment flag.

  • Fixed bug where variant comments and flags popup was not shown in structural variant results after updating smallvariant details on the fly.

  • Made Download as File and Submit to MutationDistiller buttons more promiment.

  • Adapted preset settings for ClinVar Pathogenic setting.

  • Finalized mitochondrial presets.

  • Added identifier to results table and smallvariant details when mitochondrial variant is located in D-loop region in mtDB.

  • Fixed per-sample metrics in case variant control.

  • Made ACMG and Beacon popover disappear when clicking anywhere.

  • Fixed bug when a filter setting with multiple HPO terms resulted in only showing one HPO term after reloading the page.

  • Extended information when entering the filter page and no previous filter job existed.

  • Added lodash javascript to static.

  • Disabled relatedness plot for singletons.

  • Replaced tables in case QC with downloadable TSV files.

  • QC charts should now be displayed properly.

  • Consolidated flags, comments and ACMG rating into one table in the case detail view, with one table for small variants and one for structural variants.

  • Added VariantValidator link to submit to REST API.

  • Fixed alignment stats in project-wide QC.

  • Added more documentation throughout the UI.

  • Added option to toggle displaying of logs during filtration, by default they are hidden.

  • Fixed broken displaying of inhouse frequencies in variant detail view.

  • Added variant annotation list (comments, flags, ACMG ratings) to project-wide info page.

  • Row in filter results now turns gray when any flag is set (except bookmark flag; summary flag still colours in other colour).

  • Fixed bug where comments and flags in variant details weren’t updated when the variant details have been opened before.

  • Added QC TSV download and per-sample metrics table to projec-wide QC.

  • Removed ExAC locus link in result list, added gnomAD link to gene.

  • Catching connection exceptions during file export with enabled pathogenicity and/or phenotype scoring.

  • Fixed project/case search that delivered search results for projects that the searching user had no access to (only search was affected, access was not granted).

  • Made case comments count change in real time.

v0.20.0

End-User Summary

  • Added count of annotations to case detail view in Variant Annotation tab.

  • De-novo quick preset now selects AA change, splicing (default) for sub-preset Impact, instead of all coding, deep intronic.

  • Added project-wide option to disable pedigree sex check.

  • Added button to case detail and case list to fix sex errors in pedigree for case or project-wide.

  • Added command import_cases_bulk for case bulk import, reading arguments from a JSON file.

  • Entering and suggeting HPO terms now requires at least 3 typed charaters.

  • Fixed broken variant details page when an HPO id had no matching HPO name.

  • Fixed bug in joint filtration filter view where previous genomic regions where not properly restored in the form.

  • Fixed bug that lead to an AJAX error in the filter view when previous filter results failed to load because the variants of a case were deleted in the meantime.

  • Entering the filter view is now only possible when there are variants and a variant set. When there are variant reported but no variant set, a warning in form of a small red icon next to the number of variants is displayed, complaining about an inconsistent state.

  • In case of errors, you can now give feedback in a form via Sentry.

  • Fixed bug that occurred during project file export and MutationTaster pathogenicity scoring and a variant was multiple times in the query string for mutation taster.

  • Adding REST API for Cases.

  • Adding site app for API token management.

  • Added frequency databases for mitochondrial chromosome, providing frequency information in the small variant details.

  • Fixed periodic tasks (contained clean-up jobs) and fixed tests for periodic tasks.

  • Adding REST API for Cases and uploading cases.

  • Adding GA4GH beacon button to variant list row and details. Note that this must be activated in the user profile settings.

  • Added filter support to queries and to filter form for mitochondrial genome.

Full Change List

  • Added count of annotations to case detail view in Variant Annotation tab.

  • De-novo quick preset now selects AA change, splicing (default) for sub-preset Impact, instead of all coding, deep intronic.

  • Added project-wide option to disable pedigree sex check.

  • Added button to case detail and case list to fix sex errors in pedigree for case or project-wide.

  • Added command import_cases_bulk for case bulk import, reading arguments from a JSON file.

  • Entering and suggeting HPO terms now requires at least 3 typed charaters. Also only sending the query if the HPO term string changed to reduce number of executed database queries.

  • Fixed broken variant details page when an HPO id had no matching HPO name. This happened when gathering HPO names, retrieving HPO id from Hpo database given the OMIM id and then the name from HpoName. The databases Hpo and HpoName don’t match necessarly via hpo_id, in this case because of an obsolete HPO id HP:0031988. Now reporting "unknown" for the name instead of None which broke the sorting routine.

  • Fixed bug in ProjectCasesFilterView where previous genomic regions where not properly restored in the form.

  • Fixed bug that lead to an AJAX error in the filter view when previous filter results failed to load because the variants of a case were deleted in the meantime.

  • Entering the filter view is now only possible when there are variants and a variant set. When there are variant reported but no variant set, a warning in form of a small red icon next to the number of variants is displayed, complaining about an inconsistent state.

  • Using latest sentry SDK client.

  • Fixed bug that occurred during project file export and MutationTaster pathogenicity scoring and a variant was multiple times in the query string for mutation taster.

  • Adding REST API for Cases.

  • Copying over token management app from Digestiflow.

  • Added frequency databases mtDB, HelixMTdb and MITOMAP for mitochondrial chromosome. Frequency information is provided in the small variant detail view.

  • Fixed periodic tasks (contained clean-up jobs) and fixed tests for periodic tasks.

  • Adding REST API for Case.

  • Extending importer app with API to upload annotated TSV files and models to support this.

  • Adding GA4GH beacon button to variant list row and details. Note that this must be activated in the user profile settings.

  • Added filter support to queries and to filter form for mitochondrial genome.

v0.19.0

End-User Summary

  • Added inhouse frequency information to variant detail page.

  • Added link-out in locus dropdown menu in results table to VariantValidator.

  • Added filter-by-status dropdown menu to case overview page.

  • Added link-out to pubmed in NCBI gene RIF list in variant details view.

  • Fixing syncing project with upstream SODAR project.

  • Added controls to gnomad genomes and gnomad exomes frequencies in variant details view.

  • Adding more HiPhive variants.

  • Replacing old global presets with one preset per filter category.

  • Added recessive, homozygous recessive and denovo filter to genotype settings.

  • Entering HPO terms received a typeahead feature and the input is organized in tags/badges.

  • Import of background database now less memory intensive.

  • Added project-wide alignment statistics.

  • Added django_su to allow superusers to temporarily take on the identity of another user.

  • Fixed bug in which some variants in comphet mode only had one variant in results list.

  • Added user-definable, project-specific tags to be attached to a case. Enter them in the project settings, use them in the case details page.

  • Added alert fields for all ajax calls.

  • Removed (non function-disturbing) javascript error when pre-loaded HPO terms were decorated into tags.

  • Fixed coloring of rows when flags have been set.

  • Fixed dominant/denovo genotype preset.

  • Minor adjustments/renamings to presets.

  • Link-out to genomics england panelapp.

  • Fixed partly broken error decoration on hidden tabs on field input errors.

  • Added Kiosk mode.

  • Fixed bug when exporting a file with enabled pathogenicity scoring led to an error.

  • Entering filter form without previous settings now sets default settings correctly.

  • Switched to SODAR core v0.7.1

  • HPO terms are now pastable, especially from SODAR.

  • Some UI cleanup and refinements, adding shortcut links.

  • Large speed up for file export queries.

  • Fixed UI bug when selecting ClinVar only as flags.

  • Added link-out to variant when present in ClinVar.

  • Fixed broken SV filter button in smallvariant filter form.

  • Added link-out to case from import bg job detail page.

  • Added recessive quick presets setting.

  • Added functionality to delete small variants and structural variants of a case separately.

  • Fixed bug in which deleting a case didn’t delete the sodar core background jobs.

  • Old variants stats data is not displayed anymore in case QC overview when case is re-imported.

Full Change List

  • Added inhouse frequency information to variant detail page.

  • Added link-out in locus dropdown menu in results table to VariantValidator. To be able to construct the link, refseq_hgvs_c and refseq_transcript_id are also exported in query.

  • Added filter-by-status dropdown menu to case overview page. With this, the bootstrap addon bootstrap-select was added to the static folder.

  • Added link-out to pubmed in NCBI gene RIF list in variant details view. For this, NcbiGeneRif table was extended with a pubmed_ids field.

  • Fixing syncing project with upstream SODAR project.

  • Added controls to gnomad genomes and gnomad exomes frequencies in the database table by extending the fields. Added controls to frequency table in variant details view.

  • Improving HiPhive integration:
    • Adding human, human/mouse similarity search.

    • Using POST request to Exomiser to increase maximal number of genes.

  • Replacing old global presets with one preset per filter category.

  • Using ISA-tab for syncing with upstream project.

  • Added recessive, homozygous recessive and denovo filter to genotype settings. Homozygous recessive and denovo filter are JS code re-setting values in dropdown boxes. Recessive filter behaves as comp het filter UI-wise, but joins results of both homozygous and compound heterozygous filter internally.

  • Entering HPO terms received a typeahead feature and the input is organized in tags/badges.

  • Import of background database now less memory intensive by disabling autovacuum option during import and removing atomic transactions. Instead, tables are emptied by genome release in case of failure in import.

  • Added project-wide alignment statistics.

  • Added django_su to allow superusers to temporarily take on the identity of another user.

  • Fixed bug in which some variants in comphet mode only had one variant in results list. The hgmd query was able to create multiple entries for one variant which was reduced to one entry in the resulting list. To correct for that, the range query was fixed and the grouping in the lateral join was removed.

  • Added user-definable, project-specific tags to be attached to a case.

  • Added alert fields for all ajax calls.

  • Removed javascript error when pre-loaded HPO terms were decorated into tags.

  • Removed (non function-disturbing) javascript error when pre-loaded HPO terms were decorated into tags.

  • Fixed coloring of rows when flags have been set. When summary is not set but other flags, the row is colored in gray to represent a WIP state. Coloring happens now immediately and not only when page is re-loaded.

  • Fixed dominant/denovo genotype preset.

  • Minor adjustments/renamings to presets.

  • Link-out to genomics england panelapp.

  • Fixed partly broken error decoration on hidden tabs on field input errors.

  • Introduced bigint fields into postgres sequences counter for smallvariant, smallvariantquery_query_results and projectcasessmallvariantquery_query_results tables.

  • Added Kiosk mode.

  • Fixed bug when exporting a file with enabled pathogenicity scoring led to an error.

  • Entering filter form without previous settings now sets default settings correctly.

  • Switched to SODAR core v0.7.1

  • Changing default partition count to 16.

  • Allowing users to put a text on the login page.

  • Renaming partitioned SV tables, making logged again.

  • HPO terms are now pastable, especially from SODAR.

  • Some UI cleanup and refinements, adding shortcut links.

  • Large speed up for file export queries by adding indices and columns to HGNC and KnownGeneAA table.

  • Fixed UI bug when selecting ClinVar only as flags.

  • Added link-out to variant when present in ClinVar by adding the SCV field from the HGNC database to the query.

  • Fixed broken SV filter button in smallvariant filter form.

  • Added link-out to case from import bg job detail page.

  • Added recessive quick presets setting.

  • Added functionality to delete small variants and structural variants of a case separately.

  • Fixed bug in which deleting a case didn’t delete the sodar core background jobs.

  • Old variants stats data is not displayed anymore in case QC overview when case is re-imported.

v0.18.0

End-User Summary

  • Added caching for pathogenicity scores api results.

  • Added column to the project wide filter results table that displays the number of affected cases per gene.

  • Enabled pathogenicity scoring for project-wide filtration.

  • Added LOEUF gnomAD constraint column to results table.

  • Added link-out to MetaDome in results table.

Full Change List

  • Added new database tables CaddPathogenicityScoreCache, UmdPathogenicityScoreCache, MutationtasterPathogenicityScoreCache to cache pathogenicity scores api results.

  • Added column to the project wide filter results table that displays the number of affected cases per gene. I.e. the cases (not samples) that have a variant in a gene are counted and reported.

  • Enabled pathogenicity scoring for project-wide filtration. This introduced a new table ProjectCasesSmallVariantQueryVariantScores to store the scoring results for a query.

  • Added LOEUF gnomAD constraint column to results table.

  • Added link-out to MetaDome in results table.

v0.17.6

End-User Summary

  • MutationTaster scoring now able to score InDels.

  • MutationTaster rank now displayed as numbers, not as stars, with -1 corresponding to an error during scoring.

  • Adding “closed uncertain” state.

  • Project-wide filtration allows for comp het filter for individual families.

Full Change List

  • MutationTaster scoring now able to score InDels.

  • MutationTaster rank now displayed as numbers, not as stars. Rank -1 and probability -1 correspond to error during MutationTaster ranking or empty results from MutationTaster.

  • Improving display and logging in alignment QC import.

  • Adding “closed uncertain” state.

  • Project-wide filtration allows for comp het filter for individual families.

v0.17.5

End-User Summary

  • BAM statistics (including target coverage information) can now be imported and displayed.

  • Mitochondrial variants can now be properly displayed.

  • Added Delete Case button and functionality to case overview, only visible for superusers.

  • Fixed error response when MutationDistiller submission wasn’t submitted with a single individual.

  • Now using 404 & 500 error page from sodar core.

  • Visual error response on tabs is now more prominent.

  • Included MutationTaster as additional pathogenicity score.

  • Included UMD-Predictor as additional pathogenicity score.

  • Project-wide filter now applicable when the project contains cases with no small variants (e.g. completely empty or only SVs).

  • Ignoring option remove if in dbSNP when ClinVar membership required is activated as every ClinVar entry has a dbSNP id.

  • Fixed indices on SmallVariantFlags and SmallVariantComment and introduced indices for ExacConstraints and GnomadConstraints that sped up large queries significantly.

  • Fixed issue where gene dropdown menu was overlayed by sticky top.

  • Adding progress bar on top of case list.

  • Improving case list and detail overview page layout and usability.

  • Upgrade of the SODAR-core library app, includes various improvements such background job pagination and improvements to membership management.

  • Included tables for converting refseq and ensembl gene ids to gene symbols.

  • Added warning about missing UMD indel scoring.

  • Now sorting comments and flags in the case overview by chromosomal position.

  • Now sorting HPO terms in variant detail view alphabetically.

  • Improved pubmed linkout string.

  • Added EnsEMBL and ClinVar linkouts to gene dropdown menu in results list.

  • Added 3 more variant flags: no known disease association, variant does segregate, variant doesn’t segregate.

  • Compound heterozygous filter is now applicable to singletons and index patients with only one parent.

  • Extending the manual with SOPs and guidelines.

Full Change List

  • Adding code for importing, storing, and displaying BAM quality control values.

  • Fixing urls configuration bug preventing chrMT matches.

  • Added Delete Case button and functionality to case overview, only visible for superusers. Deletes record from Case and variants from SmallVariant, StructuralVariant and StructuralVariantGeneAnnotation associated with this case.

  • Fixed error response when MutationDistiller submission wasn’t submitted with a single individual. Error is now displayed via messages after reloading the filter page. All form errors that are raised during submission of file export or to MutationTaster are handled now this way.

  • Now using 404 & 500 error page from sodar core.

  • Visual error response on tabs is now more prominent.

  • Included MutationTaster as additional pathogenicity score.

  • Included UMD-Predictor as additional pathogenicity score.

  • Project-wide filter now applicable when the project contains cases with no small variants (e.g. completely empty or only SVs).

  • Ignoring option remove if in dbSNP when ClinVar membership required is activated as every ClinVar entry has a dbSNP id.

  • Fixed indices on SmallVariantFlags and SmallVariantComment and introduced indices for ExacConstraints and GnomadConstraints that sped up large queries significantly.

  • Fixed issue where gene dropdown menu was overlayed by sticky top.

  • Adding progress bar on top of case list.

  • Improving case list and detail overview page layout and usability.

  • Upgraded to SODAR core v0.7.0.

  • Included tables RefseqToGeneSymbol and EnsemblToGeneSymbol convert gene ids to gene symbols to get a better coverage of gene symbols.

  • Added warning about missing UMD indel scoring.

  • Now sorting comments and flags in the case overview by chromosomal position. For this, a chromosome_no field was introduced in SmallVariantComments and SmallVariantFlags that is automatically filled when record is saved, derived from chromosome field.

  • Now sorting HPO terms in variant detail view alphabetically.

  • Improved pubmed linkout string.

  • Added EnsEMBL and ClinVar linkouts to gene dropdown menu in results list.

  • Added 3 more variant flags: no known disease association, variant does segregate, variant doesn’t segregate.

  • Compound heterozygous filter is now applicable to singletons and index patients with only one parent.

  • Extending the manual with SOPs and guidelines.

v0.17.4

End-User Summary

  • Fixed bug in exporting files when pathogencity scoring is activated.

  • Added IGV button to small/structural comment list in case overview.

  • Adapted to new CADD REST API implementation.

Full Change List

  • Fixed function call to missing function in exporting files when pathogencity scoring is activated.

  • Added IGV button to small/structural comment list in case overview.

  • Adapted to new CADD REST API implementation.

  • Adding generic info field to small variants and fields for distance to refseq/ensembl exons. The import is augmented such that the fields are filled with appropriate empty/null values when importing TSV files that don’t have this field yet.

v0.17.3

End-User Summary

  • Improving QC plot performance.

  • Displaying case statistics in project list.

  • Removed ClinVar view and added alternative column switch to smallvariant results table.

  • ClinVar settings were extended to allow filtering for origin somatic and germline.

  • When ClinVar membership is NOT required, variants that have origin somatic and no germline in ClinVar, are removed.

  • Improved sorting of results table for gene and chromosomal position column.

  • Fixed bug where settings of the previous query wasn’t restored for certain fields.

  • Fixed bug where ClinVar data could break rendering of results table template.

  • Improved speed of queries.

  • Invalid form data now more prominently placed.

  • Improved joining of HGNC information for refseq transcripts to not ignore borderd cases.

  • Max AD field in quality filter is now also applied to genotype 0/0.

  • Minor fixes in case overview comments/flags/acmg tables.

  • Fixed issue in SV results table where columns were missing when the genotype was missing.

  • Comments on variants are now editable and deletable, in the case detail view as well as the variant detail view.

  • Case comments are now edtiable.

  • Fixed pathogenicity and phenotype score column headings in results table.

Full Change List

  • Using "scattergl" for QC plots which leads to a speedup.

  • Making the large tables UNLOGGED to improve bulk insertion performance.

  • Displaying case statistics in project list.

  • Removed ClinVar view and added alternative column switch to smallvariant results table. All models, urls, views, queries and templates concerning ClinVar view were removed. SmallVariant queries now join ClinVar information and display them via switch in the UI.

  • ClinVar settings were extended to allow filtering for origin somatic and germline.

  • When ClinVar membership is NOT required, variants that have origin somatic and no germline in ClinVar, are removed.

  • Results table is now sortable by chromosome and position. And by gene column using the following keys in that given order: ACMG membership, HPO inheritance term, gene name. And by sign. & rating column using the following keys in that given order: significance, rating.

  • Fixed bug where settings of the previous query were overwritten by a JavaScript routine and appeared to be lost.

  • Fixed bug where unexpected ClinVar significance crashed the template tags.

  • Added index on human_entrez_id field to MgiMapping materialized view to speed up the join to the results table.

  • Invalid form data is now displayed as boxes rather than tooltips.

  • Joining of the HGNC information for RefSeq transcripts additionally directly via HGNC to improve results.

  • Max AD field in quality filter is now also applied to genotype 0/0.

  • Minor fixes in case overview comments/flags/acmg tables.

  • Fixed issue in SV results table where columns were missing when the genotype was missing.

  • Main JavaScript functionality transferred from HTML to static JS files.

  • Comments on variants are now editable and deletable, in the case detail view as well as the variant detail view.

  • Case comments are now edtiable.

  • Moved and consolidated further JS code from HTML to JS files.

  • Fixed pathogenicity and phenotype score column headings in results table.

v0.17.2

End-User Summary

  • Improving case list and case detail views.

  • Adjusting chrX het threshold for telling male/female apart.

Full Change List

  • Shuffling around case detail view a bit.

  • Adding icons for case status.

  • Adjusting chrX het threshold for telling male/female apart.

v0.17.1

End-User Summary

  • Syncing with upstream now also checks parents.

  • Fixing saving of ACMG rating.

  • Increasing maximal number of characters in gene whitelist to 1 million.

  • Fixing QC display issues for cases without variants.

  • Fixing UI error where tab wasn’t selectable after invalid data input.

  • Improving gene and variant detail display.

  • Adding installation manual.

Full Change List

  • Syncing with upstream now also checks parents.

  • Fixing template, form, and model for ACMG rating (adjust to using start/end/bin fields).

  • Increasing maximal number of characters in gene whitelist to 1 million.

  • Fixing QC display issues for cases without variants.

  • Fixing UI error where tab wasn’t selectable after invalid data input.

  • Improving gene and variant detail display.

  • Adding installation manual.

v0.17.0

End-User Summary

  • Fixing problems with link-out to varSEAK.

  • UI improvement for the compound heterozygous mode.

  • Fixing bug in genomic region filter form that took only the last character of chromosome names.

  • Fixing overflow bug in genotype and quality tab when presenting more individuals than would fit in the form.

  • Fixing genotype settings pre-selector dropdown that was trapped in parent container and possibly not entirely accessible.

  • Added editable notes and status fields to case detail view to enable the user to take a note/summarize the case.

  • Added support to add multiple comments by different users to a case in the case detail view.

  • Fixed bug where using genotype presets wasn’t fully executed while in comp. het. mode.

  • Fixed bug where the genomic region form wasn’t properly reconstructed when only a chromosome was given.

  • Properly sorting results now by chromomsome in order as expected (numerical followed by X, Y, MT).

  • Included MGI mouse gene link-out in gene dropdown menu in result list.

  • Fixed bug where the filter button wasn’t disabled when the selected variant set wasn’t in state active.

  • Renamed index field in genotype dropdown to c/h index to indicate comp het mode.

  • Fixing bug in retreiving comments on structural variants.

Full Change List

  • URL-escaping hgvs_p to varSEAK.

  • Compound heterozygous mode is now activated via the GT field selection that offers an index entry for potential index patients. This is a UI/Javascript improvement and does not affect the code of the query except that setting an index enables the filter, contrary to before where there was an additional boolean field that enabled the mode.

  • Fixing regex bug in genomic region field of the filter form that took only the last charactar of a chromosome name. Therefore it affected regions with chromosome names with more than one character (e.g. ‘10’, ‘11’, …)

  • Fixing overflow bug in genotype and quality tab when presenting more individuals than would fit in the form.

  • Fixing genotype settings pre-selector dropdown that was trapped in parent container and possibly not entirely accessible.

  • Added editable notes and status fields to Case model to enable the user in the case detail view to take notes and assign a status to the case.

  • Fixed displaying of status in case detail view when it was never set.

  • Added model CaseComments to enable assigning comments to a case by different users in the case detail view.

  • Fixed bug where using genotype presets wasn’t fully executed while in comp. het. mode.

  • Fixed bug where the genomic region form wasn’t properly reconstructed when only a chromosome was given.

  • Sorting results now by the numerical representation of the chromosome.

  • Included MGI mouse gene link-out in gene dropdown menu in result list. This is accomplished by introducing new table MgiHomMouseHumanSequence and a condensing materialized view MgiMapping that maps entrez_id to MGI ID.

  • Removed annotation app.

  • Fixed bug where the filter button wasn’t disabled when the selected variant set wasn’t in state active.

  • Added management command rebuild_project_case_stats to rebuild stats of all cases of a given project.

  • Import of database tables now handles non-existing entries in a more logical way.

  • Making variant partion count come from environment variable (#368).

  • Renamed index field in genotype dropdown to c/h index to indicate comp het mode.

  • Fixed bug that replaced missing form fields in old queries with default settings.

  • Merged import_sv_dbs into import_tables manage command.

  • Fixing bug in retreiving comments on structural variants.

  • Fixing recomputation of variant stats that now properly handles json decoding.

  • Adding installation manual.

v0.16.1

End-User Summary

  • Cases with no variants or no associated variant set can’t be filtered anymore.

Full Change List

  • Cases with no variants or no associated variant set caused queries to return all variants. This bug was fixed by disabling the filter button (UI) or throwing an error query) if the query is executed.

v0.16.0

End-User Summary

  • Genomic regions now also able to filter only by chromosome.

  • Added preset selector for genotypes, setting affected or unaffected individuals to the selected setting.

  • dbSNP ID in file export is now set to None instead of an empty field.

  • Fixed sorting issues with ranks and scores.

  • Added quality field to set MAX allelic depth (AD) for filtering variants (hom or ref). Default is unset, i.e. filtering behaviour as usual. Only quality setting that doesn’t require a value.

  • Added main navigation as dropdown menu for smaller screen sizes.

  • Added template settings for quality filter form to copy to each individual, or affectded/unaffected.

  • Fixed bug that occurred during file export with activated gene prioritization.

  • Improved database connection to avoid occasional JSON field retrieval errors.

Full Change List

  • Genomic regions filter accepts now only chromosome as region, internally setting start/end positions to 0/INT_MAX values.

  • Structural variant databases are now imported in the same style as the small variant databases.

  • Removed model_support.py file from variants app.

  • Added preset selector for genotypes, setting affected or unaffected individuals to the selected setting.

  • dbSNP ID in file export is now set to None instead of an empty field.

  • Ranks in the results table are now displayed without the hash tag to make them properly sortable. Pathogenicity and phenotype scores in the results table now sort in a numerical order. Ranks and scores are now in separate fields.

  • Small variant filter now considers set id together with case id.

  • Removed remaining fixtures from test_submit_filter.py

  • Quality filter now can filter variants for max allelic depth.

  • Added main navigation as dropdown menu for smaller screen sizes.

  • Added template settings for quality filter form to copy to each individual, or affectded/unaffected.

  • Fixed function call of gene prioritization function in file export task causing file export to break when gene prioritization was activated.

  • Remove switching psycopg2 JSON (de)serializer during database query execution to avoid occasional JSON field retrieval errors. Instead, replace the JSON (de)serializers for sqlalchemy and leave it to psycopg2 to take care of this.

  • Increased length of Case.index field from 32 to 512 chars.

v0.15.6

End-User Summary

  • Row colouring in results table for commented and flagged variants is now back again.

Full Change List

  • Removing Annotation model.

  • Fixed importer bug where info wasn’t imported when table was newly imported and --force flag was set.

  • Removed whitening of table rows from DataTables css to prevent it from overwriting our row colouring feature.

  • Doing dbSNP import now chromosome-wise to prevent import from timing out.

  • Removed old style fixtures from UI tests.

v0.15.5

End-User Summary

  • Displaying SV coordinates in detail box.

  • Displaying family errors in red in “rate of het. calls on chrX” plot.

  • Compound het query now allows index selection for all patients with parents, not only sibling of the index.

Full Change List

  • Displaying SV coordinates in detail box.

  • Fixing sex error generation (only using source name).

  • Fixing pedigree editor form to use int for sex & affected.

  • Compound het query now allows index selection for all patients with parents, not only sibling of the index.

v0.15.4

End-User Summary

  • ExAC constraints in results table are now displayed.

  • Constraints in results table now show consistenly 3 floating points and are sortable.

  • Fixing QC plot display.

  • Fixing in-house counts in results table (filtering by them worked).

  • Fixing filtration with members that have no genotype.

  • Fixing SV length display.

  • Adjusting filter presets.

  • Fixing filtration for in-house filter.

  • Changing display to per-transcript effects to table.

  • Index patient for compound heterozygous query is now selectable.

  • Fixed bug where clinvar report queries didn’t select for the case.

Full Change List

  • Increased SmallVariant table partitioning to modulo 1024.

  • ExAC constraints are now joined via ensembl gene id to results table.

  • Constraints in results table now show consistenly 3 floating points and are sortable.

  • ExAC constraints are now consistent with variant details and in results table.

  • Various fixes to QC plot display, some to JS, some to Python/Django views code.

  • Clinvar pathogenic genes materialized view gets updated when there is new data imported in one of the dependent tables.

  • Making prefetch filter load inhouse counts.

  • Fixing filtration with members that have no genotype.

  • Making prefetch filter load inhouse counts.

  • Fixing filtration with members that have no genotype.

  • Adding back fetching of SV length to queries.

  • First adjustments of filter presets for NAMSE analyses.

  • Fixing coalescing when filtering with in-house filter.

  • Changing display to per-transcript effects to table.

  • Extended tests to cover missing in-house filter records for existing variants.

  • Index patient for compound heterozygous query can be selected. Only patients that share the same parents as the original index patients are selectable in addition.

  • After reworking the database query structure, clinvar report queries didn’t select for the case.

v0.15.3

Bug-fix release.

End-User Summary

  • none

Full Change List

  • fixing bug in recomputing small and structural variant counts on importing

v0.15.2

End-User Summary

  • Fixed broken genomic region filter.

  • Making gene information in SV results consistent with display in small variant results.

  • --force parameter for import_tables now works on all tables.

  • Resulting table is now sortable.

  • Fixed broken EnsEMBL link-out.

  • Added OMIM gene information to gene card in variant details view.

  • Refactored database small variant database queries.

  • Adding case and donor counts to project list.

  • QC plots are now loaded asynchronously. This should improve page loading time for the case and project overview pages.

  • Adding inheritance mode information to results table.

  • Admins/superusers can now update case information and pedigrees.

  • Projects can now synchronise (check) with upstream SODAR sites, only admins/superusers can trigger this.

  • Adapting SmallVariants and SmallVariant DBs to new start-end coordinates and UCSC binning.

  • Fixed frequency table in SmallVariant details that had wrong names assigned to columns and total values were not present.

  • Added pLI score to variant details constraint information.

  • Added constraints information column with selector to results table.

Full Change List

  • Increased view test coverage to 100%.

  • Unification of gene information display between SVs and small variants.

  • Fixed bug that wrongly parsed genomic regions and resulted in filter reporting nothing while active.

  • Small fix to small variant import.

  • Extended --force parameter for import_tables command to be applied to all tables.

  • Fixed bug in creating materialized view that prevented setting up database when applying migrations from scratch.

  • Added datatables library to add sorting feature to resulting table.

  • Fixed broken EnsEMBL link-out.

  • Added conversion table RefseqToEnsembl (complementing EnsemblToRefseq). Now used in ExAC/gnomAD constraint information when refseq transcript database is selected.

  • Gene card in variant details view now show OMIM gene information, i.e. when an OMIM entry is marked as gene in Mim2geneMedgen table.

  • “All transcript” annotations now come from Jannovar REST web service instead of the Annotation model.

  • Refactored database small variant database queries. The database queries now make full use of lateral joins to keep the nesting flat. The code generation part now doesn’t use the mixin structure anymore that was intransparent and error-prone.

  • Bumping sodar_core dependency to v0.6.1
    • Showing case and donor counts to project listing.

    • Showing site-wide statistics via siteinfo app.

  • Adding missing release column to KnownGeneAA table + adapting queries accordingly.

  • Cleaning up and refactoring QC plotting code.
    • Separating plotting JS and data generation Python code.

    • Load data asynchronously.

  • Now displaying inheritance mode information for results, based on HPO terms for inheritance and hgnc information.

  • Not importing Annotation data any more.

  • Adding view for updating a case.

  • Implementing “sync with upstream SODAR site” for projects based on background jobs.

  • Removing bgjobs app in favour of the one from SODAR-core.

  • Removing containing_bins columns.

  • Removing svs tests _fixtures.py.

  • Adapting SmallVariants and SmallVariant DBs now containt start and end column, replacing position. This is for internal queries only, the outside representation for SmallVariants is still via position. An additional column bin for the ucsc binning was included.

  • Frequency table in SmallVariant details had wrong names assigned to columns and total values were not present. The values in the columns were 1 column behind of its names, and thus the last column had a name that should have had missing values. These missing values were also a bug in that case that total was not reported (i.e. af or het without population).

  • Constraints information in variant details now shows also pLI score.

  • Now joining constraints information to results table and added selector to display source/metric in one column.

  • Fixed: Ensembl transcript ids in SmallVariant list were truncated because of too short database field.

  • Importing SVs and small variants is done in a background job now.

  • Small variant and SV tables are now partitioned (by case ID). This should speedup import as indices are smaller and also each partition can be written to independently.

  • import_tables improvements:
    • can now use threads to import multiple tables at once

    • uses SQL Alchemy instead of Django ORM based deletion

  • Refining celery configuration now, assuming queues “import”, “query”, and “default”.

  • Removing some redundant indices on frequencies an dbsnp.

v0.15.1

A bug fix release for SV filtration (fixing overlaps).

End-User Summary

  • Fixed conservation bug (was shown only in 2/3 of all cases).

  • Showing small and structural variant count for each case.

  • Improving layout of case list (adding sorting and filtering).

  • Improved render speed of case list.

  • Fixing problem with interval overlaps for structural variant queries.

Full Change List

  • Increased test coverage to 100% for small variant model support tests.

  • Fixed bug in displaying conservation track for all bases in an AA base triplet. Only two of three bases were decorated with the conservation track information.

  • Fixed bug that Clinvar report didn’t support compound heterozygous queries anymore.

  • Variant view tests are now running on factory boy.

  • Adding tests of SV-related code.

  • Also interpreting phased diploid genotypes.

  • Improving layout of case list (adding sorting and filtering).

  • Improved render speed of case list.

  • Fixing UCSC binning overlap queries.

  • Adding “For research use only” to login screen.

v0.15.0

The most important change is the change of colors: Green now means benign and red means pathogenic.

End-User Summary

  • Renamed Human Splice Finder to Human Splicing Finder.

  • Added “1” and “0” genotype for “variant”, “reference”, and “non-reference” genotype.

  • Added support for WGS CNV calling results to SV filtration.

  • Simplifying variant selection for SVs as diploid calls unreliable (it’s better to distinguish only variant/reference).

  • Changing color meaning: green now means benign/artifact and red means pathogenic/good candidate.

  • Adding link-out to varsome

  • Adding support for ACMG criteria annotation

  • SV filtration: do not set max count in background by default

  • SV filtration: display of call properties of XHMM and SV2

Full Change List

  • Allow import for more than one genotypes/feature effects for structural variants.

  • Starting to base fixture creation on factory boy.

  • Renamed Human Splice Finder to Human Splicing Finder.

  • Added “1” and “0” genotype for “variant”, “reference”, and “non-reference” genotype.

  • Added support for WGS CNV calling results to SV filtration.

  • Simplifying selection of variants for SVs. Further, also allowing for phased haplotypes (irrelevance in practice until we start interpreting the GATK HC haplotypes in annotator).

  • Changing color meaning: green now means benign/artifact and red means pathogenic/good candidate.

  • Adding link-out to varsome

  • Adding support for ACMG criteria annotation

  • Model support tests now running on factory boy.

  • SV filtration: do not set max count in background by default

  • SV filtration: display of call properties of XHMM and SV2

v0.14.8

Multiple steps towards v0.15.0 milestone.

End-User Summary

  • Adding link-out to the UMD Predictor (requires users to configure a UMD Predictor API Token).

  • Adding user settings feature.

  • Improving link-out to PubMed.

  • Adding gene display on case overview for flags and comments.

  • Added warning icon to results table indicating significant differences in frequency databases.

  • Added command to rebuild variant summary materialized view rebuild_variant_summary.

  • Added ExAC and gnomAD constraint information to variant details gene card.

  • Displaying allelic balance in genotype hover and variant detail fold-out.

Full Change List

  • Added elapsed time display to import_case

  • Speedup deletion of old data using SQL Alchemy for import_case.

  • Added indices to hgnc, mim2genemedgen, acmg and hgmd tables.

  • Added command to rebuild variant summary materialized view rebuild_variant_summary.

  • Adding link-out to PubMed with gene symbol and phenotype term names.

  • Improving existing link-out to Entrez Gene if the Entrez gene ID is known.

  • Adding user settings through latest SODAR-core feature.

  • Adding ImportInfo to django admin.

  • Adding “New Features” button to to the top navigation bar.

  • Adding link-out to the UMD Predictor (requires users to configure a UMD Predictor API Token).

  • Overlapping gene IDs now displayed for flags and comments on the case overview/detail view.

  • Added warning icon to results table when a frequency in a non-selected frequency table is > 0.1. Or if hom count is > 50. For inhouse it is only hom > 50 as there is no frequency.

  • Added ExAC and gnomAD constraint information to variant details gene card. Two new tables were added, GnomadConstraint and ExacConstraint.

  • Displaying allelic balance in genotype hover and variant detail fold-out.

  • Removing unique constraint on SmallVariant.

  • Fixing case update in the case of the variants being referenced from query results.

v0.14.7

End-User Summary

  • Bug fix release.

Full Change List

  • Fixed bug that inhouse frequencies were not joined to resulting table.

  • Removed restriction that didn’t allow pasting into number fields.

v0.14.6

End-User Summary

  • Adding experimental filtration of SVs.

  • Added names to OMIM IDs in variant detail view.

  • Added input check for chromosomal region filter.

  • User gets informed about database versions during annotation and in VarFish.

  • Added ClinVar information about gene and variant to variant detail view.

  • Added selector for preset gene filter lists (HLA, MUC, ACMG).

  • Added comments and flags to variant details view.

  • Fixed bug that transcripts in variant details view were from RefSeq when EnsEMBL was selected.

  • Added icon to variant when RefSeq and EnsEMBL effect predicition differ.

  • Adjusted ranking of genes such that equal scores get the same rank assigned.

Full Change List

  • Adding initial support for filtration of SVs and SV databases.

  • Added names to OMIM IDs in variant detail view.

  • Added input check for chromosomal region filter.

  • Made ImportInfo table not unique for release info.

  • Made annotation release info available in case overview.

  • Made import release info available in site app accessable from user menu.

  • Added materialized view to gather information about pathogenic and likely pathogenic variants in ClinVar. This information is displayed in the gene card of the detail view.

  • Added ClinVar information about variant to variant detail view.

  • Added selector to gene white/blacklist filter, adding common gene lists (HLA, MUC, ACMG) to the filter field.

  • Added comments and flags to variant details view.

  • Fixed bug that transcripts in variant details view were from RefSeq when EnsEMBL was selected.

  • Added icon to variant when RefSeq and EnsEMBL effect predicition for the most pathogenic transcript (in SmallVariant) differ.

  • Adjusted ranking of genes such that equal scores in two genes get the same rank assigned. In case of the pathogenicity and joint score the highest variant score in a gene represents the gene score. The next ranking gene is assigned not the next larger integer but the rank is increased by the number of genes with the same rank.

v0.14.5

End-User Summary

  • Bug fix release.

Full Change List

  • Fixed bug that made query slow when black/whitelist filter was used.

v0.14.4

End-User Summary

  • Fixed bug in comp het filter.

  • Fixed bug in displaying correct previous joint filter query.

  • Fixed bug in displaying not all HPO terms.

  • Added OMIM terms to variant detail view.

  • Fixed bug in variant detail view displaying all het counts as zero.

  • Fixed colouring of variant effect badges in variant detail view’s transcript information.

Full Change List

  • Fixed bug in comp. het. filter that was caused by downstream inhouse filter.

  • Fixed bug that selected previous joint filter query of the user, independet of the project.

  • Fixed bug in displaying not all HPO terms.

  • Added OMIM terms to variant detail view.

  • Fixed bug that the het properties of the frequencies models were not returned when converted to dict.

  • Removing old templates.

  • Fixed colouring of variant effect badges in variant detail view’s transcript information.

v0.14.3

End-User Summary

  • Fixed bug in displaying gene info with refseq ID.

  • Fixed bug in displaying correct number of rows in joint query.

  • User interface error response improved.

  • Fixed “too many connections” error.

  • Added ACMG annotation.

Full Change List

  • Fixed bug in gene info with refseq ID and symbol in list is now also “rescued”.

  • Fixed bug in displaying correct number of rows in joint query.

  • Improved error response when non-existing genes are entered in white/blacklist.

  • Using direct database calls instead of connections to prevent connection leaking.

  • New table Acmg added that is joined in main query. A small icon in results indicates existence in ACMG.

v0.14.2

End-User Summary

  • Added strategy to display missing gene symbols

  • Allow importing into importinfo table without importing data.

  • Added misc option to hide colouring of flagged variant rows.

  • Improved effect filter form.

  • Extended gene link-outs.

  • Fixed bug in pheno/patho rank computation.

  • Improved UI responses during requests.

Full Change List

  • Added new table with mapping Entrez ID to HGNC ID to improve finding of gene symbols.

  • Allow importing of meta information of tables that have no data but are used in microservices.

  • Added misc option that hides colouring of flagged variant rows and also the bookmark icons.

  • Added checkbox group ‘nonsense’ to effect filter form to group-(un)select certain variant effects.

  • Added gene link-out to Human Protein Atlas.

  • Fixed incrementor for rank computation of phenotype and pathogenicity score ranks.

  • Better UI responses with extended logging during asynchronous calls.

  • Project overview now provides link to full cases list.

  • Added option to display only variants without dbSNP membership.

  • Adapted to SODAR Core 0.5.0

  • Fixed length of allowed characters for db info table name.

v0.14.1

End-User Summary

  • Bug fix release

Full Change List

  • Fixing bug in the case that no HPO term with an HpoName entry is entered.

v0.14.0

End-User Summary

  • Added prioritization by pathogenicity using CADD.

  • Added support to filter genomic regions.

  • Added support for querying for counts within the VarFish database.

  • Fixed bug that displayed variants in comphet query results twice.

  • Improved UI response.

  • Added HPO terms to variant detail view.

Full Change List

  • Added additional field to specify multiple genomic regions to restrict query.

  • Fixed mixed up sex display in genotype filter tab.

  • Extended SmallVariant model to have counts for hom. ref. etc. counts.

  • Adding SmallVariantSummary materialized view and supporting SQL Alchemy query infastructure.

  • Adding form and view infrastructure for querying against in-house database.

  • Fixed bug in comphet query that executed the query on the results again during fetching, which displayed variants twice.

  • Proper error response in asynchronous queries when server is not reachable.

  • Fixed broken tooltip information in results table.

  • Resubmitting a file export job now remembers the file type, if changed.

  • Added integration with in-house CADD REST API (https://github.com/bihealth/cadd-rest-api) similar to Exomiser REST API integration.

  • Added HPO terms to variant detail view and queried HPO terms are added to results table header.

  • Added tests for filter jobs, including mocks for CADD and Exomiser requests.

v0.13.0

End-User Summary

Adding initial version of phenotype-based prioritization using the Exomiser REST Prioritiser API.

Full Change List

  • Adding missing field for exon loss variant to form.

  • Comments in view class adjusted.

  • Added HPO to disease name mapping.

  • Phenotype match scores are added to the file downloads as well.

  • Sorting of variants by phenotype match added.

  • Added annotation of variants with phenotyping variant score.

  • Added tab to the form form entering HPO term IDs.

  • Adding settings for enabling configuring REST API URL through environment variables.

v0.12.2

End-User Summary

Internal import fixes.

Full Change List

  • Case updating only removes variant and genotype info instead of replacing case.

  • Allowing import of gziped db-info files.

v0.12.1

Bugfix release.

End-User Summary

  • Fix in clinvar job detail view.

Full Change List

  • Clinvar job detail view was partially borken and job resubmitting didn’t work.

v0.12.0

User experience improvement, tests extended.

End-User Summary

  • Filtering jobs can now be aborted.

  • Proper visual error response in forms.

  • Tests for all views completed.

  • Variant details now use full table space.

  • Clinvar report jobs are now using AJAX as well and are running in background.

Full Change List

  • Filtering jobs runs now as background job and can be aborted.

  • Invalid fields and affiliated tabs are now marked with a red border.

  • Deleted empty files from apps.

  • Tests for all views completed.

  • Bugfix in rendering download results files for ProjectCases.

  • Bugfix in template for job detail view.

  • Bugfix in listing background jobs for a case.

  • Variant details do not load anymore when detail view is closed.

  • Variant details now use full table space.

  • Flags and comments do not depend on EnsEMBL gene id anymore. All traces where removed, including the database column.

  • Clinvar jobs now have their own background job model. They also use the AJAX query state machine to control job submission and canceling.

  • Now using sodar_core v0.4.5

  • Warning appears when Micorsoft Internet Explorer is detected.

v0.11.8

Case importer command improved.

End-User Summary

  • Case import command registers database version that was used during annotation.

Full Change List

  • Case import also imports annotation release infos into new table.

  • Import information now also recognizes the genomebuild.

  • Tests for case importer.

  • Fixed bug that didn’t distinguish gzipped from plain text import files.

v0.11.7

Bugfix release.

End-User Summary

  • Fixed yet another bug in setting SmallVariantFlags.

Full Change List

  • Fixing bug that variant flags are displayed no matter the case.

v0.11.6

Bugfix release.

End-User Summary

  • Fixed another bug in setting SmallVariantFlags.

Full Change List

  • Fixed bug that under certain conditions reported two variants at the same position as none and failed flag updating.

v0.11.5

Bugfix release.

End-User Summary

  • Databases import now as Django manage command.

  • Fixed bug in loading last query results.

  • Fixed bug in setting SmallVariantFlags.

Full Change List

  • Databases import is now a Django manage command and import commands are removed from the Makefile. Instead of one command for each database, a single command imports all databases stated in a config file.

  • Fixed bug that displayed last query of user without considering case.

  • Fixed bug that under certain conditions reported two variants at the same position as none and failed flag updating.

v0.11.4

This is a quick release to fix a bug in retrieving the results from a filter job. This was caused by the celery worker in the production system configuration.

End-User Summary

  • Zooming in QC plot is now supported.

  • Fixing bug in delivering filter results.

Full Change List

  • Replacing Chart.js components by plotly. This has the major advantage that zooming into charts is now supported. Further, users can now enable and disable plotting of certain data points by clicking. This is hugely useful for debugging meta data.

  • Allow skipping Selenium tests

  • Fixing bug with celery worker for submitting filter jobs affecting production system.

v0.11.3

This release improves the user experience by pushing filter jobs to the background and load them asynchronously.

End-User Summary

  • Push filter jobs to the background and povide them via AJAX to not block the UI during execution

  • Storing of filter query results

  • Load previous filter query results upon filter form page entry

Full Change List

  • Adapted to SODAR core version 0.4.2

  • Unified several empty forms

  • Adapted database query for loading previous results

  • Unified filter form templates

  • Fixed bug in accessing dict without checking availability of key.

  • Removed two view tests that have to be replaced in the future for ajax request.

  • Fixed bug in displaying time in background job list overview + ordering by timestamp

  • Pushing filter job to background

  • Loading filter results via AJAX (single case and joint project)

  • Loading of previous filter results when entering the filter form

v0.11.2

This is a bug fix release.

End-User Summary

  • Removed an internal restriction that prevented data import.

Full Change List

  • Making id fields for SmallVariant and Annotation into big integers.

  • The importer now supports gzip-ed files.

v0.11.1

  • Fixing frequency display, including gnomAD genomes.

v0.11.0

This release adds more textual information about genes to the database and displays it.

End-User Summary

  • Adding gene summaries and reference-into-function from NCBI Gene database.

Full Change List

  • Adding models NcbiGeneInfo and NcbiGeneInfo in geneinfo app.

  • Displaying this information in the gene details page.

v0.10.0

Accumulation of previous updates. The main new feature is the improved variant details card below variant rows.

End-User Summary

  • Fixing variant detail cards below results row.

  • Adding row numbers in more places.

Full Change List

  • Rendering variant details cards on the server instead of filling them out in JS.

v0.9.6

This release fixes project-roles synchronization from SODAR site.

  • Fixing celery setup; syncing projects and roles regularly now.

v0.9.5

Small additions, fixing MutationDistiller integration.

  • Adding link-out to loci in Ensemble, gnomAD, and ExAC.

  • Adding link-out for Polyphen 2, Human Splicing Finder, and varSEAK Splicing.

  • Project-wide variant recreation registers started state now correctly.

  • Fixing URL for MutationDistiller Links.

  • Using HTTPS links for ENSEMBL and MutationTaster.

v0.9.4

Yet another bug fix release.

  • Adding missing 5’ UTR fields to forms.

  • Adding command for rebuilding project stats.

  • Changing display color of relatedness (red indicates error).

  • Computing cohort statistics in a transaction. This should get rid of possible inconsistencies.

v0.9.3

This is a bug fix release.

  • Removing restriction on single comment per variant.

  • Improving display of sex errors.

v0.9.2

This is a bugfix release.

  • Fixing error in displaying variants statistics for empty project.

  • Improving relationship error display.

  • Putting “sibling-sibling” instead of “parent-child” where it belongs.

  • Fixing problem with MutationDistiller submission.

  • Fixing ClinVar form.

  • Adding gene link-out to HGMD.

v0.9.1

This release fixes some bugs introduced in v0.9.0.

Full Change List

  • Adding missing dependency on django_redis.

  • Fixing counting in project-wide statistics computation.

  • Fixing references to pedigree_relatedness.

  • Fixing sex display in template, sex error message “male” where “female should be”.

  • Fixing sex assignment in sex scatter plot.

v0.9.0

This release adds project-wide statistics and variant querying.

End-User Summary

  • You can now see project-wide case QC statistics plots on your project’s Case List.

  • You can now perform project-wide queries to your variants and also export them to TSV and Excel files.

Full Change List

  • Added models for storing project-wide statistics, job code for creating this, views for viewing etc.

  • Adjusting the existing plot and model code to accommodate for this.

  • Refactoring filtration form class into composition from multiple mixins.

  • Refactoring small variant query model to use abstract base class and add query model for project-wide queries.

  • Implementing download as tabular data for project-wide filtration.

  • Improving index structure for project-wide queries with gene white-lists.

v0.8.0

This release adds variant statistics and quality control features.

End-User Summary

  • Gathering an extended set of statistics for each individuals in a case.

  • Inconsistencies within pedigree and between pedigree and variant information displayed throughout UI.

  • Several statistics and quality control plots are displayed on the case details page.

Full Change List

  • Adding var_qc_stats module with analysis algorithms similar to (Pedersen and Quinlan, 2017).

  • Adding models for gathering per-sample and per-sample-pair statistics.

  • Display statistics results on case detail page in tableas and plots.

  • Highlighting of consistency and sanity check errors throughout the views.

  • Importer computes statistics for new cases, migration adds them to existing cases.

v0.7.0

This release has one main feature: it adds support for submitting variants to MutationDistiller.

End-User Summary

  • Added support for submitting variants to MutationDistiller from the Variant Filtration Form.

  • Added “Full Exome” filter preset for including all variants passing genotype filter.

  • Greatly speeded up VCF export.

Full Change List

v0.6.3

A bugfix release.

End-User Summary

  • Fixing bug that caused the clinvar report to fail when restoring previous query.

Full Change List

  • Making sure returning to clinvar report works again.

  • Enabling SODAR-core adminalerts app.

  • Including authors and changelog in manual.

v0.6.2

A bugfix release.

End-User Summary

  • Fixing search bug with upper/lower case normalization.

  • Fixed bug with whitelist/blacklist when restoring settings.

  • Extended documentation, added screenshots.

  • Previous flag state is now properly written to the timeline.

v0.6.1

End-User Summary

  • Adding forgotten help link to title bar.

v0.6.0

End-User Summary

  • Various smaller bug fixes and user interface improvements.

  • Adding summary flag for colouring result lines.

  • Allow filtering variants by flags.

  • Integrating flags etc. also into downloadable TSV/Excel files.

  • Adding new annotation: HGMD public via ENSEMBL.

  • Adding comments and flags now appears in the timeline.

  • Varfish stores your previous settings automatically and restores them on the next form view.

Full Change List

  • Allowing Javascript to access CSRF token, enables AJAX in production.

  • SmallVariant``s are now also identified by the ``ensembl_gene_id. This fixes an annotation error.

  • Adding flag_summary to SmallVariantFlags for giving an overall summary.

  • Extending filtration form to filter by flags.

  • Added new app hgmd for HGMD_PUBLIC data from ENSEMBL.

  • Adding make black to Makefile.

  • Changed default frequencies.

  • Improving integration of comments and flags with the timeline app.

  • Also properly integrating import of cases etc. with timeline app.

  • Added SmallVariantQuery model and integrated it for automatically storing form queries and restoring them.

v0.5.0

End-User Summary

This is a major upgrade in terms of features and usability. Please note that this a “dot zero” release, we will fix broken things in a timely manner.

Major changes include:

  • The “AD” form field was split into one for het. and one for hom. variants.

  • Clinvar entries are now properly displayed.

  • Enabling filtering for clinvar membership and pathogenicity.

  • Fixing file export.

  • Allowing to mark variants with flags and add comments to them.

  • Adding clinvar-centric report.

  • Filtration now also works for pedigrees containing samples without genotypes.

  • Adding functionality to search for samples.

Full Change List

  • Adding support for filtering presence in Clinvar. The user has to enable the filter and can then select the

  • Fixing pedigree display in filter form

  • Splitting “${person}_ad” field into “*_ad_het” and “*ad_hom”, also adjusting tests etc.

  • Fixing clinvar queries (was a +/-1 error)

  • Adding more comprehensive tests for views and query.

  • Fixing bug in file_export module caused by not adjusting to SQL Alchemy filter querying.

  • Added various tests and fixed smaller bugs.

  • Adding VariantSmallComment and VariantFlags models for user annotation of variants.

  • Adding clinvar-centric support for easily screening variants for relevant Clinvar entries.

  • The importer now also writes "has_gt_fields" key to Pedigree lines.

  • The templates, views, and query generation now also heed "has_gt_fields" field.

  • Adding migration that automatically adds the "has_gt_fields".

  • Adding back display of search bar.

  • Integrating search functionality for variants app.

  • Self-hosting CSS, JS, etc. now.

  • Adding search_tokens to Case with lower-case IDs.

v0.4.0

End-User Summary

This is the first release made available to the public. Major features include

  • Categories and projects as well as access control assignment is taken from the main SODAR site. Organizing projects and users is done in the main SODAR site.

  • Variant filtration can be done on a large number of attributes. This includes a specialized compound recessive filter.

  • Filtration results can be converted into TSV/XLSX files for opening in Excel or VCF for further processing.

Full Change List

  • Sodar-core integration for user and project management

  • Download of filter results in TSV, VCF or EXCEL file format

  • SQLAlchemy replaces for raw query generation for filter queries

  • Heterozygous database entries of frequency databases are now properties of the model

  • UI improvements

  • Updated and completed database query tests

  • Refinement of indices and queries improves filter query performance

  • Simplifying import from gts TSV, vars TSV, and PED file in one go

Glossary

AAB

Alternate Allele Balance, computed as min(AD/DP, 1 - AD / DP), e.g., 3/10 reads have an AAB of 0.3, as do 7/10 reads.

ACGS

Association for Clinical Genomic Science

ACMG

American College of Medical Genetics

AD

Alternative Depth, number of reads showing alternative allele.

ClinVar

A database of variants with their clinical annotation.

CADD

Combined Annotation Dependent Depletion, a variant pathogenicity score available from https://cadd.gs.washington.edu

DP

Depth of coverage, number of reads covering a position.

ENSEMBL

TODO

Entrez

TODO

Exomiser

TODO

IGV

Integrated Genome Viewer

HiPhive

TODO

HTS

High-Throughput Sequencing

MEDLINE

The most relevant bibliographic database for the life sciences.

MutationDistiller

A variant pathogenicity tool available at https://mutationdistiller.org

MutationTaster

A variant pathogenicity tool available at https://mutationtaster.org

NCBI

TODO

OMIM

Online Mendelian Inheritance in Man

Phenix

TODO

Phive

TODO

PubMed

A free search engine primarily accessing the MEDLINE database of references

QC

Quality Control

SNV

Single Nucleotide Variant

SOP

Standard Operating Procedure

UCSC

University of California, Santa Cruz; hosting the very popular UCSC genome browser

UMD Predictor

A variant pathogenicity prediction tool available at https://umd-predictor.eu

Varsome

A commercial website/product that aggregates information about a variant and allows the public annotation of variants; available at https://www.varsome.com

WES

Whole Exome Sequencing

WGS

Whole Genome Sequencing

Indices and tables