Download FILTUS

FILTUS TUTORIAL: EXOME ANALYSIS BACKGROUND This exercise will take you through the downstream analysis of the exome of a real patient. The patient is a child with severe epilepsy, while both parents are healthy. The disorder has unknown cause, but is believed to be monogenic and recessive. Furthermore, it turns out that the parents are first cousins, suggesting that autozygosity mapping may be helpful in this case. The exome of the child has been sequenced, and the resulting variants are annotated with Annovar. The annotation adds information about each variant, including which gene it lies in, how it affects the protein (synonymous, non-synonymous, stopgain, etc.), allele frequencies and predictions on its effect on the protein function. For legal/ethical reasons, all gene names, transcript names, variant identifiers and variant positions are irreversibly masked or changed. However, care has been taken to preserve dependencies between the variants, so that the answers you find are almost identical to the actual analysis of this patient. DOWNLOAD FILTUS Windows: Go to http://folk.uio.no/magnusv/GenetiskTeori and download "FiltusExercise.zip". Save it somewhere on your computer and unzip it. After unzipping, start FILTUS by double clicking “FILTUS104.exe” inside the Filtus1.0.4 folder. Mac: Open a terminal window and run the command "pip install filtus". If this completes successfully, you can then start the program by executing "filtus". The files needed for the exercise can be downloaded from http://folk.uio.no/magnusv/GenetiskTeori/Mac. If the installation didn't work, ask for help. EXERCISE FILES This tutorial/exercise analyzes the variant file “exome1.csv", which contains all variants resulting from the sequencing of the patient’s exome. You should find the file in the FiltusExercise folder. You should also see the filter configuration file “HQ.fconfig” which you will need towards the end of the tutorial. TUTORIAL To get a first impression of what the file looks like, open “exome1.csv” in a plain text editor (e.g. Notepad). a) What is the column separator? Do the columns have headers? What are the first few columns? Plain text editors are not well suited for exome analysis. Instead you should now load the file into FILTUS. Do this by opening FILTUS and choosing Load variant files (simple) in the File menu. Take a moment to study the input settings dialog, but leave everything unchanged (Filtus is good at guessing file the format) and press “Use for all files”. b) How many variants does FILTUS report in the “Unfiltered summaries“ box? What is the gene count? c) Double click on the entry in "Unfiltered summaries" to display the variants. Locate the first line with an exonic variant, and find out: i. Which gene is the variant in? Which chromosome and base position? ii. What sort of variant is it? How does it affect the protein? iii. What is the variant's frequency in the 1000 Genomes database ("1000g2010nov_ALL")? [Hint: Triple clicking on a line will highlight it, making it easier to keep track when scrolling.] The columns REF and ALT contain the reference allele and the alternative allele for each variant. The observed genotype can then be read off from the GT column: 0/1 means REF/ALT (i.e. heterozygous), while 1/1 means ALT/ALT (i.e. homozygous for the alternative allele). This is part of the widely used VCF format for variant files, which you can read more about at http://www.1000genomes.org if you want. d) Consider again the first exonic variant. Is it heterozygous or homozygous? What is the observed genotype? For experts: How many sequencing reads had the REF allele vs. the ALT allele? [Hint: This is in the AD column.] To get an overview of the file's structure and contents it is always a good idea to summarize some of the key columns. For this we use the Summarize column function in the View menu. e) Make a summary of the FILTER column. How many variants - raw count and in percent - have PASS in this column? (These are the variants that passed all quality filters in the variant calling process.) f) Enter "FILTER - equal to - PASS" as a column filter and press "Apply filter". The number in the "Filtered summaries" box should agree with your answer above. This filter should remain present in all further analysis. g) Do a summary of the Func column, and make sure you know roughly what the different categories mean. How many percent of the variants are exonic? [Bonus question: How come so many of the variants are not exonic?] The Summarize column function also allows us to do a quick-and-dirty check of the patient’s gender: h) Add a column filter keeping only the variant on the X chromosome, and then make a summary of the GT column. Use the result to deduce if the patient is a boy or a girl. (Remove the chromosome filter before you continue.) The next steps give an example on how we can filter the variants down to a small set of interesting variants. What makes a variant interesting depends on the context, but in this case we focus on rare variants that have a big effect on the gene product. i) We are primarily interested in variants that are either exonic or affect splicing. Add the column filter "Func - starts with - exon OR splic" to remove everything else. How many variants/genes remain? j) Make a summary of the ExonicFunc column and familiarize yourself with the different categories. (NB: Splice site variants have empty entries in this column.) A loss-of-function (LoF) variant is a variant that disrupts the protein, for instance by introducing a premature stop codon or by an indel (insertion or deletion) which shifts the reading frame. k) Reduce to LoF variants by applying a suitable column filter. Hint: Use “AND” or “OR” to combine phrases. l) Also remove variants with a frequency higher than 1% in the 1000 Genomes database or the ESP5400 database (the "1000g2010nov_ALL" and "ESP5400_ALL" columns). How many variants/genes remain? NB: For the filters in the last step is it vital to tick the KIM-box ("keep if missing"). An empty entry in a database column means that the variant is not reported in the database; we certainly want to keep those! We have now reduced the original haystack down to a set of high quality, very rare (or novel) variants having damaging effect on the protein. Since the condition is thought to be recessive, we look for genes containing at least two of the remaining variants (compound heterozygous model), or one of them in homozygous state. m) In the "Gene sharing" window, type "1" in the "Affected" field, choose "Recessive c/h" as the model, and press "Analyze". How many genes turn up? Right click on the gene names to inspect the variants in each gene (or all of them at the same time). Do any of them seem less likely than others to be pathogenic? It is revealed to us that the parents of our patient are first cousins. This suggests that the causal variant lies in an autozygous stretch of the genome – i.e. a long homozygous region where both haplotypes originate from the same great grandparent. Restricting our search to these regions will hopefully reduce the number of genes we have to investigate. To identify these autozygous regions we use the AutEx algorithm implemented in Filtus. We go through the process step-by-step below. Before you start, you should click “Save current filter configuration” in the Filters menu, and save to a file that you can reload later. You can name the file whatever you like, but I’ll refer to it as “rare_LOF.fconfig” later. n) For AutEx to work well it is important to remove as many erroneous variants as possible. In the Filters menu choose “Load filter configuration” and load the “HQ.fconfig” file. Do you agree that these filters are sensible? Press “Apply”. o) Open the AutEx dialog from the Analysis menu. Set the parental relation to be cousins, run the program using “1000g2010nov_ALL“ as the frequency column. How many regions are found? You can set the “Minimum segment size” to be e.g. 1 cM and 20 variants to get rid of the worst noise. p) Make a plot showing the autozygosity on chromosome 7, and save it. q) After closing the AutEx window, save the identified regions by clicking on “Save main window content” in the File menu. Name the file something like “sample1_autozygous_regions.txt”. r) Now load the filters you saved earlier (“rare_LOF.fconfig”), and add the file from the last step as a “Restrict to regions” filter. Press “Apply filters”, and then repeat the “Gene sharing” analysis. How many genes are you left with now? s) Discuss possible ways to proceed at this point. Can you think of resources we haven't used in this exercise, which could help us eliminate further variants? [Keywords to get you started: Technical artefacts, family members, online resources, phenotype.] Epilogue: The gene hiding behind the code GENE5661 is in fact KCTD7, a known epilepsy gene matching the phenotype of our patient. Furthermore, the parents were shown to carry one copy each of the LOF variant, confirming the recessive inheritance. In the end it was concluded that this variant caused the patients disorder.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download FILTUS