Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
Figure 1.1 Typical phases involved in a data mining process model. 2
Figure 2.1 An example of the alignment of five biological sequences. Here "-" denotes the gap inserted between different residues. 13
Figure 3.1 Overview of the Motif-All algorithm. In the first phase, it finds frequent motifs from P to reduce the number of candidate motifs. In the second phase, it performs the significance testing procedure to report all statistically significant motifs to the user. 22
Figure 3.2 Overview of the C-Motif algorithm. The algorithm generates and tests candidate phosphorylation motifs in a breath-first manner, where the support and the statistical significance values are evaluated simultaneously. 23
Figure 3.3 The calculation of conditional significance in C-Motif. In the figure, Sig(m, P(mi), N(mi)) denotes the new significance value of m on its ith submotif induced data sets. 23
Figure 4.1 An illustration on the training data construction methods for non-kinase-specific phosphorylation site prediction. Here the shadowed part denotes the set of phosphorylated proteins and the unshadowed area represents the set of unphosphorylated proteins. 30
Figure 4.2 An illustration on the training data construction methods for kinase-specific phosphorylation site prediction. The proteins are divided into three parts: (I) the set of proteins that are phosphorylated by the target kinase, (II) the set of proteins that are phosphorylated by the other kinases, and (III) the set of unphosphorylated proteins. 31
Figure 4.3 An illustration on the basic idea of the active learning procedure for phosphorylation site prediction. (a) The SVM classifier (solid line) generated from the original training data. (b) The new SVM classifier (dashed line) built from the enlarged training data. The enlarged training data are composed of the initial training data and a new labeled sample. 33
Figure 4.4 An overview of the PHOSFER method. The training data are constructed with peptides from both soybean and other organisms, in which different training peptides have different weights. The classifier (e.g., random forest) is built on the training data set to predict the phosphorylation status of remaining S/T/Y residues in the soybean organism. 34
Figure 5.1 The protein identification process. In shotgun proteomics, the protein identification procedure has two main steps: peptide identification and protein inference. 40
Figure 5.2 An overview of the BagReg method. It is composed of three major steps: feature extraction, prediction model construction, and prediction result combination. In feature extraction, the BagReg method generates five features that are highly correlated with the presence probabilities of proteins. In prediction model construction, five classification models are built and applied to predict the presence probability of proteins, respectively. In prediction result combination, the presence probabilities from different classification models are combined to obtain a consensus probability. 41
Figure 5.3 The feature extraction process. Five features are extracted from the original input data for each protein: the number of matched peptides (MP), the number of unique peptides (UP), the number of matched spectra (MS), the maximal score of matched peptides (MSP), and the average score of matched peptides (AMP). 42
Figure 5.4 A single learning process. Each separate learning process accomplishes a typical supervised learning procedure. The model construction phase involves constructing the training set and learning the classification model. And the prediction phase is to predict the presence probabilities of all candidate proteins with the classifier obtained in the previous phase. 43
Figure 5.5 The basic idea of ProteinLasso. ProteinLasso formulates the protein inference problem as a minimization problem, where yi is the peptide probability, Di represents the vector of peptide detectabilities for the ith peptide, xj denotes the unknown protein probability of the jth protein, and ? is a user-specified parameter. This optimization problem is the well-known Lasso regression problem in statistics and data mining. 44
Figure 5.6 The target-decoy strategy for evaluating protein inference results. The MS/MS spectra are searched against the target-decoy database, and the identified proteins are sorted according to their scores or probabilities. The false discovery rate at a threshold can be estimated as the ratio of the number of decoy matches to that of target matches. 45
Figure 5.7 An overview of the decoy-free FDR estimation algorithm. 46
Figure 5.8 The correct and incorrect procedure for assessing the performance of protein inference algorithms. In model selection, we cannot use any ground truth information that should only be visible in the model assessment stage. Otherwise, we may overestimate the actual performance of inference algorithms. 47
Figure 6.1 A typical AP-MS workflow for constructing PPI network. A typical AP-MS study performs a set of experiments on bait proteins of interest, with the goal of identifying their interaction partners. In each experiment, a bait protein is first tagged and expressed in the cell. Then, the bait protein and their potential interaction partners (prey proteins) are affinity purified using AP. The resulting proteins (both bait and prey proteins) are digested into peptides and passed to tandem mass spectrometer for analysis. Peptides are identified from the MS/MS spectra with peptide identification algorithms and proteins are inferred from identified peptides with protein inference algorithms. In addition, the label-free quantification method such as spectral counting is typically used to estimate the protein abundance in each experiment. Such pull-down bait-prey data from all AP-MS runs are used to filter contaminants and construct the PPI network. 52
Figure 6.2 A sample AP-MS data set with six purifications. 54
Figure 6.3 The PPI network constructed from the sample data. Here DC is used as the correlation measure and the score threshold is 0.5, that is, a protein pair is considered to be a true interaction if the DC score is above 0.5. In the figure, the width of the edge that connects two proteins is proportional to the corresponding DC score. 55
Figure 6.4 An illustration of database-free method for validating the interaction prediction results. Under the null hypothesis that each bait protein captures a prey protein is a random event, some simulated data sets are generated such that they are comparable to the original one. Then, an empirical p-value representing the probability that an original interaction score for a protein pair would occur in the random data sets by chance can be calculated. Finally, the false discovery rate is calculated according to these p-values. 58
Figure 7.1 An example bait-prey graph. In this figure, each Bi (i = 1, 2, 3, 4) denotes a bait protein and each Pi (i = 1, 2, 3, 4, 5, 6) represents a prey protein. The score that measures interaction strength between a bait-prey pair is provided as well. 63
Figure 7.2 Three maximal bicliques are identified. Among these three bicliques, C1 and C2 are reliable and only C1 is finally reported as a protein-complex core. 63
Figure 7.3 The final protein complex by including both the protein complex core C1 and an attachment B3. 64
Figure 8.1 A typical data analysis pipeline for biomarker discovery from mass spectrometry data. In this workflow, there are three preprocessing steps: feature extraction, feature alignment, and feature transformation. After preprocessing the raw data, feature selection techniques are employed to identify a subset of features as the biomarker. 70
Figure 8.2 An illustration of feature transformation based on protein-protein interaction (PPI) information. The PPI information is used to find groups of correlated features in terms of proteins. These identified feature groups are transformed into a set of new features for biomarker...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.
Dateiformat: PDFKopierschutz: Adobe-DRM (Digital Rights Management)
Das Dateiformat PDF zeigt auf jeder Hardware eine Buchseite stets identisch an. Daher ist eine PDF auch für ein komplexes Layout geeignet, wie es bei Lehr- und Fachbüchern verwendet wird (Bilder, Tabellen, Spalten, Fußnoten). Bei kleinen Displays von E-Readern oder Smartphones sind PDF leider eher nervig, weil zu viel Scrollen notwendig ist. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.
Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Dateiformat: ePUBKopierschutz: Wasserzeichen-DRM (Digital Rights Management)
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet - also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Wasserzeichen-DRM wird hier ein „weicher” Kopierschutz verwendet. Daher ist technisch zwar alles möglich – sogar eine unzulässige Weitergabe. Aber an sichtbaren und unsichtbaren Stellen wird der Käufer des E-Books als Wasserzeichen hinterlegt, sodass im Falle eines Missbrauchs die Spur zurückverfolgt werden kann.