Download Knowledge Help

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Metabolic network modelling wikipedia , lookup

Transcript
FAQ - Using IPA
Description FAQs about Using IPA
To assist you with using the features of IPA, here are answers to some frequently
asked questions.
Core, Tox, and Metabolomics Analysis FAQs
What file formats can I use to upload my data into IPA?
IPA accepts Excel 1997-2003 files for data upload. Please see Data Upload
Workflow for more details.
What species identifiers are accepted for analysis by IPA?
IPA support upload of identifiers from the following species.
Arabidopsis thaliana
Bovine
Caenorhabditis elegans
Chicken
Chimp
Danio rerio
Dog
Drosophila melanogaster
Human
Mouse
Rat
Rhesus Monkey
Saccharomyces cerevisiae
Schizosaccharomyces pombe
IPA also accepts chemical identifiers. For more information, see Data Upload
Definitions.
Why does IPA map my input molecules?
Mapping takes the identifiers from your dataset file and compares them to all of
the molecules in the Ingenuity Knowledge Base. This process unambiguously
identifies the input molecules and ensures that the correct molecules are
considered for the analysis.
Can datasets containing multiple identifier types be mapped?
Yes. For more details, please see the Data Upload workflow.
Why don’t all of the molecules in my dataset map to the Ingenuity
Knowledge Base?
This could be due to one of several reasons:
1. The gene ID does not correspond to a known gene product. For example, most
ESTs are not found in the Ingenuity Knowledge Base (exception: ESTs that have
a corresponding Entrez Gene identifier are found in the Ingenuity Knowledge
Base).
2. There are insufficient Findings in the literature regarding this molecule.
3. Findings for this molecule have not been entered in the Ingenuity Knowledge
Base.
4. A gene/protein ID corresponds to several loci or more than one gene. Such
identifiers are left unmapped in the application due to the ambiguity of the identity.
I re-ran an analysis and now some of my identifiers that used to map do
not. Why?
We are now unable to clearly disambiguate the mapping. This can be caused by
the identifier mapping to several genes, of which we do not have a clear way of
determining which mapping should be favored over another. Alternatively, the
identifier which we previously used to be able to disambiguate has now been
deprecated. Lastly, in some cases the use of a specific identifier may be
deprecated by the vendor.
What are Network Eligible molecules? What are Functions/ Pathways
Eligible molecules?
Network Eligible Molecules are molecules from your dataset that meet the
following criteria:
1. They have been designated as being of interest (e.g. by the values in the
Expression Value, Absent and Override columns).
2. They interact with other molecules in the Ingenuity Knowledge Base.
Functions/ Pathway Eligible molecules are molecules from your dataset file that
meet the following criteria:
1. They have been designated as being of interest (e.g. by the values in the
Expression Value, Absent, and Override columns).
2. They have at least one functional annotation or disease association in the
Ingenuity Knowledge Base.
Why aren’t all the IDs in my dataset that meet the expression value cutoff
selected as Network Eligible molecules?
Only those molecules that have demonstrated relationships to other genes,
proteins or endogenous chemicals can be integrated into the analysis. In other
words, a molecule that is not known to have a relationship with any other
molecule cannot be incorporated into a network. In addition, microarrays
frequently include multiple Gene IDs that correspond to the same gene and
although these would all be mapped, they refer to a single gene. Finally, in some
cases this information may not reside in the Ingenuity Knowledge Base at this
time.
How may the number of Network Eligible molecules be increased?
The number of Network Eligible molecules may be increased by:
1. Expression Value Cutoff: Change the cutoff value to include more molecules.
2. Override: Annotate more molecules with an ”X” in the Override column of the
input file.
3. Focus On: Include both upregulated and downregulated molecules instead of
only one or the other.
4. Absent: Specify fewer molecules that meet the cutoff as absent.
5. Additional IDs: Add additional identifiers to your input file that meet the cutoff
or override criteria.
What if my molecule-of-interest is not in the Ingenuity Knowledge Base?
IPA may provide other information about molecules that are not Network Eligible
or Functions/Pathways Eligible such as subcellular localization, tissue expression
and protein family membership. This information is available for all mapped
identifiers and is contained in the Gene View (for genes) or Chemical View (for
chemicals). Molecules that are not eligible for network generation or that are not
yet incorporated in the Ingenuity Knowledge Base may be added as custom
nodes to networks and pathways by using the Add Molecule feature.
Is there a cutoff for the number or size of networks generated in an
analysis?
In IPA, there are network size parameters that enable you to select the number of
molecules per network, and the number of networks that are returned.
You have the flexibility to build larger networks, consolidating key molecular
events and highlighting central regulatory molecules. You can also build smaller
networks if you prefer hone in on key events.
Within the Create Analysis page (Core, Tox, and Metabolomics Analysis),
you can select the size and number of networks you would like
generated for a dataset.
Networks containing 35, 70, or 140 molecules can now be
created.
The network generation algorithm will pull in molecules from your
dataset and the Ingenuity Knowledge Base based on molecular
relationships until it reaches the network size limit you have
specified.
Depending on the dataset, not all networks generated will have
the maximum number of molecules.
You can specify how many networks should be returned for an analysis.
If you choose to create networks with 35 molecules, you may
generate 10, 25, or 50 networks
Note: 25 networks is the default setting for networks with 35
molecules.
If you choose to create networks with 70 molecules, you may
generate 10 or 25 networks.
Note: 10 networks is the default setting for networks with 70
molecules.
If you choose to create networks with 140 molecules, you may
generate 10 or 25 networks.
Note: 10 networks is the default setting for networks with 140
molecules
Are networks pre-computed and compared to the dataset or are they
generated de novo based on the input data?
Networks are generated de novo and are dependent upon the input data. For
example, by setting the cutoff value higher you can focus on networks which are
centered around the most differentially regulated genes in your dataset.
Network Generation Algorithm FAQs
What are the steps in the Network Generation Algorithm?
1. The user designates molecules of interest on the Create Analysis page before
running the analysis. Molecules of interest which interact with each other and
molecules in the Ingenuity Knowledge Base are identified as Network Eligible
Genes. Network Eligible Molecules serve as "seeds" for generating networks.
2. Network Eligible Molecules are combined into networks that maximize their
specific connectivity, which is their interconnectedness with each other relative to
all molecules they are connected to in the Ingenuity Knowledge Base.
3. Additional molecules from the Ingenuity Knowledge Base are used to
specifically connect two or more smaller networks by merging them into a larger
one. Networks can be built with 35, 70, or 140 molecules each to keep them to a
usable size. (Note: You may select to include endogenous chemicals in networks
on the Create Analysis page. If this is deselected, only genes, RNA, or proteins
will be used for Network Generation.)
4. Networks are scored based on the number of Network Eligible Molecules they
contain. The higher the score, the lower the probability of finding the observed
number of Network Eligible Molecules in a given network by random chance.
For more details, see the IPA Network Generation Algorithm whitepaper.
How does the Network Generation Algorithm work if I have uploaded a list
of molecules without expression values?
IPA considers all Network Eligible Molecules on your list to be of equal
importance when generating networks for gene lists. Network Eligible Molecules
are uploaded genes that have interactions with other molecules in the Ingenuity
Knowledge Base. See ”What are the steps in the Network Generation Algorithm”.
Are all relationships displayed in a network for a particular set of
molecules?
Networks show relevant relationships as specified by the Analysis Components
settings in the Create Analysis page when the analysis was run. The relationships
that are displayed are direct interactions (two molecules that make physical
contact with each other such as binding or phosphorylation) and indirect
interactions (do not require physical contact between the two molecules, such as
signaling events).
Some Findings present in the Ingenuity Knowledge Base are not used in the
network generation process such as localization, expression, and mutant
information. You may often find these additional relationships helpful and
biologically important. There are several ways to view additional relationships:
1. You can view the full complement of direct and indirect interactions for a gene
by double-clicking it to see its Node View summary and then clicking the
Neighborhood Explorer link. See the Mutant Information section of the Node View
for interactions involving functionally mutant forms of the gene.
2. You may add interactions to a network by clicking the Build button and using
the Grow (to add new molecules) or Connect tools after selecting molecules of
interest. Alternatively, you can add custom interactions by clicjing the Draw
button, and using the Add Relationship feature.
3. You may select multiple networks of interest in the Networks tab and then click
the Merge Networks button to combine them into one network which adds and
highlights all interactions between genes in different local networks.
How does IPA use my expression values in the Network Generation
Algorithm?
If you set a cutoff value in the Create Analysis page, IPA compares the
expression values of your genes to it to identify the Network Eligible Molecules.
Expression values are also used along with specific connectivity to prioritize
addition of molecules that are not Network Eligible into networks.
What is the "Score" for a network, how is it calculated, and how should I
interpret this?
The score is a numerical value used to rank networks according to their degree of
relevance to the Network Eligible Molecules in your dataset. The score takes into
account the number of Network Eligible Molecules in the network and its size, as
well as the total number of Network Eligible Molecules analyzed and the total
number of molecules in the Ingenuity Knowledge Base that could potentially be
included in networks. In the Networks view, networks are ordered according to
their score, with the highest scoring network displayed at the top of the page.
The network Score is based on the hypergeometric distribution and is calculated
with the right-tailed Fisher's Exact Test. The score is the -log(Fisher's Exact test
result).
For this example, suppose that a network of 35 molecules has a Fisher Exact
Test result of 1x10-6. The network’s Score = -log(Fisher's Exact test result) = 6.
This can be interpreted as, "There is a 1 in a million chance of getting a network
containing at least the same number of Network Eligible molecules by chance
when randomly picking 35 genes that can be in networks from the Ingenuity
Knowledge Base”.
The score is not an indication of the quality or biological relevance of the network;
it simply calculates the approximate "fit" between each network and your Network
Eligible Molecules.
How do "hub" molecules affect the Network Generation Algorithm?
The network generation algorithm optimizes for specific connectivity, so when a
"hub" molecules is included in a network it connects a higher fraction of Network
Eligible molecules relative to all genes the "hub" gene is connected to.
Biologically many such "hub" genes often exist in multiple protein complexes and
this is represented in networks as many genes connected to the "hub" gene
rather than many protein complex nodes.
If you see a "hub" molecule that is not a Network Eligible Molecule and believe it
is unlikely to be present or active in your biological context, we recommend
flagging the gene as Absent in your input file, which will cause the algorithm to
exclude it. Additionally, "hub" molecules often have many indirect molecular
signaling effects, so running analyses with direct interactions only also reduces
the likelihood of hub genes by only including them in cases where they directly
physically interact with Network Eligible molecules.
One way to see the specific connectivity significance of a hub molecule is to add
it to a new MyPathway. Use the Grow function with the same relationship types
(e.g. direct and indirect interactions or direct only) and molecule types as in the
analysis to get the total number of molecules. Then overlay the expression values
from the analysis to determine the number of nodes that are Network Eligible
molecules.
Why do I get significant network scores when I submit a random list of
molecules into IPA?
The purpose of IPA's network generation algorithm is to find networks of highly
connected Network Eligible molecules. If you submit molecules chosen at
random, IPA will still do its best to bring as many Network Eligible Molecules into
a single network as possible, because it assumes that these molecules have
some interest to you. If the number of Network Eligible Molecules is large,
resulting networks can often receive a high score because of the generally high
interconnectivity of molecules in the Ingenuity Knowledge Base.
When looking at the List of Networks generated for a random list of molecules,
the highest-scoring network typically has a lower score than a non-random list.
Additionally, the distribution of network scores (e.g. when comparing sorted order)
typically is lower and falls off more sharply for random networks relative to actual
data (see the algorithm whitepaper).
It's important to keep the biological context in mind when evaluating the networks
since the goal of the algorithm is to come up with the best hypotheses it can
about how the Network Eligible Molecules may be interacting biologically, using
the principle of specific connectivity. The algorithm assumes there are some
biological commonalities and attempts to identify and highlight them for you. The
score and the processes associated with each network are intended to help you
identify the most striking and relevant networks given your biological context.
The network score is not intended to prove that a particular network represents
what is happening in a biological system. It just indicates that the network is rare
relative to all hypotheses it could come up with.
References
1.) For more information on the IPA Network Generation Algorithm, click here.
2.) For a peer-reviewed explanation of the IPA Network Generation Algorithm,
see:
Calvano SE, Xiao W, Richards DR, Felciano RM, Baker HV, Cho RJ, Chen RO,
Brownstein BH, Cobb JP, Tschoeke SK, Miller-Graziano C, Moldawer LL,
Mindrinos MN, Davis RW, Tompkins RG, Lowry SF; Inflamm and Host Response
to Injury Large Scale Collab. Res. Program. "A network-based analysis of
systemic inflammation in humans". Nature. 2005 Oct 13;437 (7061):1032-7.
PMID: 16136080.
Attachment