Suggested TMT processing pipeline for PNNL DMS *phospho* data for MoTrPAC project. Peptide level.
Source:vignettes/tmt_pipeline_for_PNNL_DMS_phos_pep_MoTrPAC.Rmd
tmt_pipeline_for_PNNL_DMS_phos_pep_MoTrPAC.Rmd
PlexedPiper
is the main library for the pipeline.
MSnID
package is a flexible way of handling MS/MS
identifications. It handles MS/MS data filtering part.
library(PlexedPiper)
library(PNNL.DMS.utils)
library(MSnID)
library(dplyr)
library(Biostrings)
Data source
This vignette demonstrates processing usign PNNL’s DMS as a data
source. PNNL DMS is based on MS SQL Server. Typically data is organized
into data packages according to experiments. Data package that contains
phospho data mirroring PlexedPiperTestData
is number
3626
. Smaller version of the data package, suggested for
testing purposes is 3625
.
data_package_num <- 3625
# if there is no connection to PNNL DMS, vignette is not compiled
if(!is_PNNL_DMS_connection_successful()){
message("There is no connection to PNNL DMS. This code in this vignette won't be evaluated.")
knitr::opts_chunk$set(eval = FALSE)
}
Prepare MS/MS IDs
Read the MS-GF+ output
First step is to determine MS-GF+ jobs using the data package number.
This simply reads parsed to text output of MS-GF+ search engine. The
text files are collated together and the resulting
data.frame
used to create MSnID object.
msnid <- read_msgf_data_from_DMS(data_package_num)
show(msnid)
AScore
Phospho dataset involve Ascore jobs for improving phospho-site
localization. There should be one Ascore job per data package. The
fetched object is a data.frame
that links datasets, scans
and original PTM localization to newly suggested locations. Importantly
it contains AScore
column that signifies the confidence of
PTM assingment. AScore > 17
is considered confident.
ascore <- read_AScore_results_from_DMS(data_package_num)
msnid <- best_PTM_location_by_ascore(msnid, ascore)
Remove non-phospho. Need to be sure that phospho symbol is *!
msnid <- apply_filter(msnid, "grepl(\"\\\\*\", peptide)")
Inference of parsimonius set
msnid <- infer_parsimonious_accessions(msnid)
Mapping sites to protein sequence
Mapping sites to protein sequence. Call creates number of columns
describing mapping of the site/s onto the protein sequences. The most
important for the user is SiteID
.
Prepare FASTA to make sure entry names in FASTA file match
MSnID
accessions. The plan is to make this conversion
automatic. Note, this type of ID extraction will ignore non-RefSeq
entries such as contaminants.
fst_dms_pth <- path_to_FASTA_used_by_DMS(data_package_num)
fst <- readAAStringSet(fst_dms_pth)
names(fst) <- gsub(" .*", "", names(fst)) # make names match accessions
Mapping main call.
msnid <- map_mod_sites(msnid, fst,
accession_col = "accession",
peptide_mod_col = "peptide",
mod_char = "*",
site_delimiter = "lower")
head(psms(msnid))
Prepare MASIC reporter ion intensities
Fetching and preparing reporter intensities based on MASIC ouput.
masic_data <- read_masic_data_from_DMS(data_package_num,
interference_score = T)
masic_data <- filter_masic_data(masic_data, 0.5, 0)
Fetch study design tables
study_design_tables <- read_study_design_from_DMS(data_package_num)
fractions <- study_design_tables$fractions
samples <- study_design_tables$samples
references <- study_design_tables$references
Create cross-tab
Aggregation at RefSeq / peptide level.
msnid <- apply_filter(msnid, "!isDecoy")
# aggregation_level <- c("SiteID")
aggregation_level <- c("accession","peptide")
quant_cross_tab <- create_crosstab(msnid,
masic_data,
aggregation_level,
fractions, samples, references)
dim(quant_cross_tab)
head(quant_cross_tab)
Post-processing. Linking with AScores
x_ascore <-
psms(msnid) %>%
group_by(accession, peptide, SiteID) %>%
summarise(maxAScore = max(maxAScore))
x <- quant_cross_tab %>%
as.data.frame() %>%
tibble::rownames_to_column("ID") %>%
mutate(RefSeq = sub("(.*)@(.\\..*\\..)","\\1",ID),
peptide = sub("(.*)@(.\\..*\\..)","\\2",ID)) %>%
dplyr::select(-ID) %>%
inner_join(x_ascore) %>%
mutate(ConfidentSite = maxAScore > 17,
peptide = gsub(".\\.(.*)\\..","\\1",peptide, perl=T),
peptide = gsub("(\\w)\\*","\\L\\1",peptide, perl=T),
SiteID = sub("-", "_", SiteID),
ptm_peptide = paste0(SiteID, "-", peptide)) %>%
dplyr::rename(ptm_id = SiteID,
ascore = maxAScore,
confidence = ConfidentSite) %>%
dplyr::select(-c(accession, RefSeq)) %>%
dplyr::select(ptm_peptide, ptm_id, peptide, ascore, confidence, everything())
head(x)