|
1
|
|
|
2
|
|
|
3
|
|
|
4
|
- Nucleic acid hybridization/antibody based methodology for expression or
mapping studies
- high throughput screening method
- Utilizes very large collections of substrate immobilized DNA or Proteins
- Large scale and concurrent survey of
- Gene/Protein expression changes
- Gene expression/protein profiling
- Genomic Variation
- Single nucleotide polymorpism detection/ Haplotype analysis
(HAPMap)
- requires bioinformatic approaches for data management
- and analysis
|
|
5
|
- Molecular description of a cell (system) at ultra high resolution
- Transcriptome Mapping
- Molecular Finger Printing
- Genetic Networks
- Transcriptional dependencies
- Inferential Pathway Analysis
- Signal transduction targets
- Systems biology tool
|
|
6
|
- Probe = spotted DNA
- Target = labeled sample
- Feature = Probe
- Pitch = center to center spacing of probes
- Feature density = total number of probes
- Probes are not the same as genes
- Unigene ID = EST to EST association
- Entrez ID = sequence associated with genomic locus
- GO ID = functionally annotated genes
|
|
7
|
|
|
8
|
|
|
9
|
- Arrayer
- Substrates
- Membranes (33P)
- Glass (Fluorescence)
- Beads (Fluorescence)
- Probe sets
- Oligonucleotides
- EST/cDNA
- Scanner
- RNA
|
|
10
|
- Substrate linked Synthesis (Glass, beads)
- Affymetrix, Nimblegen (fluorescence)
- Light directed synthesis
- oligonucleotides
- Agilent (fluorescence)
- Spatially directed fluidics based synthesis
- oligonucleotides
- Illumina, Luminex (fluorescence)
- Bead linked synthesis
- oligonucleotides
- Contact Printing (glass, membranes)
- cDNA (fluorescence, 33P)
- Oligonucleotides (fluorescence, 33P)
- Proteins (antibodies, fluorescence)
|
|
11
|
|
|
12
|
- Oligos - designed based on sequence data
- ORFs - PCR primers designed based on sequence data, often use tailed primers
- cDNA - clones from various clone collections, PCR amplified and
purified
|
|
13
|
- Short Oligonucleotides
- 25 mer (Affymetrix
- 50 mer (Nimblegen)
- 40-1300k DNA oligos on ~ 2.5 cm2
glass surface.
- Expression arrays
- Human, Mouse, Rat, Yeast, E. coli, Drosophila, C. elegans, Dog,
Soybean, Plasmodium/Anopheles, Pseudomonas, Arabidopsis, Zebrafish, Xenopus,
etc.
- DNA analysis arrays
- Resequencing, SNP analysis, LOH
- Custom arrays
|
|
14
|
- The manufacturing of GeneChip® probe arrays is a combination
of photolithography and combinatorial chemistry. See: http://www.Affymetrix.com/Technology.html
|
|
15
|
|
|
16
|
- inkjet technology long (60mer) oligonucleotide arrays
- Arrays include
- Expression arrays (Human, Mouse, Rat, Arabidopsis, rice, Magnaporthe
and yeast expression arrays)
- Promoter arrays (human and mouse)
- Custom arrays (Rapid turnaround,8.4K or 22K feature sizes, up to 8
fields per slide)
- Custom arryasAgilent also creates custom arrays
|
|
17
|
|
|
18
|
|
|
19
|
|
|
20
|
|
|
21
|
|
|
22
|
- 50 base gene-specific Probe linked to 23 base Address
- Hybridized to labeled nucleic acid made from total RNA
- Each bead coated with same probe oligo (100,000’s copies)
- ~30 copies of each bead type per array
|
|
23
|
|
|
24
|
|
|
25
|
|
|
26
|
|
|
27
|
|
|
28
|
|
|
29
|
|
|
30
|
|
|
31
|
- Arrays
- Feature Consistency
- Spotting Volume
- Pin characteristics
- Substrate
- Homogeneity
- DNA binding capacity
- Environment
- Dust
- Humidity
- Aerosols
- Light
- Temperature
- Oxidants
- RNA
- Purity à Efficiency of
cDNA synthesis
- Integrity à Length of FSR
transcript
- Labeling
|
|
32
|
|
|
33
|
|
|
34
|
- Experimental Design
- What is the biological question, i.e. what comparisons should be made?
- What is the type of biological comparisons?
- How is sample complexity controlled?
- How many biological replicates are required?
- Data Analysis
- How will differentially expressed genes be identified?
- How will errors be estimated?
|
|
35
|
- Most integrated and optimized:
- Commercial Software
- SAS, SPSS, S-Plus (general)
- Spotfire, GeneSight, GeneSpring (specific)
- Custom
- TM4, BAMarray, Powerarray, ClusFavor etc. (specific)
- Main Issues
- Proprietary
- Hidden data handling
- Most versatile, most recent and transparent data handling:
- Open Source
- BioConductor/R
- SAM
- SPH/EB-arrays
- LIMMA & Tcl GUI
- R/MANOVA & JAVA GUI
- D-Chip & C++ GUI
- Main Issues
- Data import
- File format
- Requires programmer’s support
|
|
36
|
|
|
37
|
|
|
38
|
|
|
39
|
|
|
40
|
- How to remove systematical biases!
|
|
41
|
|
|
42
|
- None
- DNA vs Substrate
- No Imputation/Offset
- Local
- Negative Signal Intensities likely
- Imputation/Offset required
- Global
- Negative Signal Intensities likely
- Imputation/Offset required
- Moving Minimum
- 3x3 spot average background
- Negative Signal Intensities likely
- Imputation/Offset required
|
|
43
|
|
|
44
|
|
|
45
|
- Intra Arrray
- Intensity dependent
- Model based Stat Tests (MAANOVA)
|
|
46
|
- Intra/Inter Array
- Factor based
- Removal of High Intensity Outliers
- Standard Stat Tests (SAM, SPH, LIMMA)
|
|
47
|
- Any data adjustment, be it performed as sophisticated or industrious as
possible, cannot convert low quality data into high quality data
- Data adjustment always removes a part of the biology
- !!Use it as sparingly as possible!!
|
|
48
|
- How to select differentially expressed Genes!
|
|
49
|
- Degree of Regulation
- Small changes
- Less reproducible
- Few genes @ Standard Significance threshold
- More replicates required
- Large Changes
- More reproducible
- Many genes @ Standard Significance threshold
- Fewer replicates required
- But:
- Effect size is gene dependent (Transcriptome Survey)
- Data preparation and Statistical Analysis
|
|
50
|
|
|
51
|
- 11: ratios from A vs B comparison; replicate 1
- 12: ratios from A vs B
comparison; replicate 2
- 13: ratios from B vs A
comparison; replicate 3
- 15: ratios from B vs a
comparison; replicate 4
- Concordance coeff.: 0.947 –
0.961
|
|
52
|
|
|
53
|
|
|
54
|
|
|
55
|
|
|
56
|
|
|
57
|
|
|
58
|
|
|
59
|
|
|
60
|
|
|
61
|
|
|
62
|
|
|
63
|
- When diffg and
are small, then is big
(then p becomes very small). So gene g is more likely to be declared as
a DE gene.
- This method is biased in favor of selecting genes with small diffg
and .
|
|
64
|
|
|
65
|
|
|
66
|
|
|
67
|
- T-tests (SAM)
- High accuracy data
- Minimal preprocessing artifacts
- Level1 Model (LIMMA, MAANOVA)
- Gene specific
- Yobs =f(Ytrue)
- Yobs =g(Yarray) + g(Ydye) + g(Ybatch)
+………g(Ytreatment)
- Level2 Model (SPH, EBayes)
- Population specific
- Yobs =f(Yup) + f(Ydown) + f(Ynot)
- Borrow and share information appropriately for better estimates
|
|
68
|
- Ability to model various sources of variability:
- Level 1 model: Gene Specific Model
- Model the observed log intensities as a function of the unknown true log
intensity.
- detailed modeling of experimental variability: within array, between
array, estimation of gene specific variability …
- Level 2 model: Population Average Model
- All unknown quantities are given prior distributions
- Building of all these features into a common model
- Ability to borrow and share information in appropriate ways to get
better estimates
|
|
69
|
|
|
70
|
- By computing the Analysis of Variance (ANOVA), we can mathematically
estimate the different sources of variation and systematically detect
treatment effects in the data.
- The real question is: which genes are differentially expressed between
the samples? In our framework, we ask which variety-by-gene (VG) effects
are statistically significant.
- Thanks to the use of Model, you can omit certain preprocessing steps.
|
|
71
|
|
|
72
|
- F1g measures the gene specific
treatment effect
- F3g measures the gth gene
treatment effect using the pooled variance estimator
- F2g measures the gth gene
treatment effect using both the gene specific variance estimator and the
pooled variance estimator with equal weight.
- Fsg measures the gth gene
treatment effect using a shrinkage variance estimator.
- Fs is the most robust and usually most powerful (Cui, Churchill et all)
|
|
73
|
|
|
74
|
- A ‘volcano’ plot provides a graphical summary of the simultaneous
results from all four F-tests.
- On the plot, the y-axis value is -log10(P-value) for the F1 test. The
x-axis value is proportional to the fold changes.
- A horizontal line represents the significance threshold of the F1 test.
- Blue dots: EE genes
- Green dots: F3
- Orange dots: Fs
- Red dots: F2
- (In example graph, F2 tests
- were not run.)
|
|
75
|
- Limited use of Preprocessing techniques
- Per gene estimation of factors contributing to variance
- permits random effects
- Systematic detection of treatment effects
- But: parametric model
|
|
76
|
- Similar to MAANOVA
- Linear model of factors (treatment, dye, etc.)
- Factors are independent (no interaction)
- Requires a Contrast Matrix
- Uses either:
- log-odds (B) – Statistics (large
B à DE gene)
- Bg=log (odds ratio) =l og (odds of gene g to be DE versus EE)
- moderated t-statistics (large T à DE gene)
|
|
77
|
|
|
78
|
- gene g is equivalently expressed (EE)
- gene g is differentially expressed (DE)
|
|
79
|
|
|
80
|
|
|
81
|
|
|
82
|
|
|
83
|
|
|
84
|
|
|
85
|
|
|
86
|
|
|
87
|
|
|
88
|
|