Normalize Phosphoproteomics Data and Perform Differential Analysis with `limma`

This function processes raw phosphoproteomics intensity data, performs filtering, imputation, and median scaling using `PhosR` functions, and then conducts differential phosphorylation analysis using the `limma` package. Optionally, it can generate a volcano plot to visualize the results.

Usage

get_norm_phos(
  data,
  alpha = 0.5,
  beta = 0.7,
  plot = c("no", "volcano_plot"),
  value = c("p_value", "adj_p_value")
)

Arguments

data

A data frame containing phosphoproteomics quantification data. It is expected to have columns:

`protein_group`: Character, representing protein identifiers.
`amino_acid`: Character, representing the phosphorylated amino acid (e.g., "S", "T", "Y").
`site`: Numeric, representing the position of the phosphorylation site.
`modification_sites`: Character, a unique identifier for each phosphosite (e.g., "protein_S123"). This will be used as the row names for the output table.
Columns suffixed with "_control" and "_treat" followed by a sample identifier (e.g., "p1_control", "p7_control", "p5_treat", "p6_treat") containing quantitative values.

alpha

Numeric, a parameter for `PhosR::selectGrps` function, controlling the proportion of missing values allowed. Default is `0.5`.

beta

Numeric, a parameter for `PhosR::scImpute` function, controlling the maximum number of neighbors for imputation. Default is `0.7`.

plot

character string specifying whether to generate a plot.

"no": No plot is generated; only the differential phosphorylation table is returned.
"volcano_plot": Generates a volcano plot to visualize differential phosphorylation.

value

A character string indicating which p-value to use for the y-axis of the volcano plot when `plot = "volcano_plot"`.

"p_value": Uses nominal p-values (`P.Value`) for the y-axis.
"adj_p_value": Uses adjusted p-values (`adj.P.Val`) for the y-axis.

This parameter is ignored if `plot = "no"`.

Value

If `plot = "no"`, returns a `data.frame` (tibble) containing the results of the `limma` differential analysis (e.g., `logFC`, `P.Value`, `adj.P.Val`). If `plot = "volcano_plot"`, returns a `list` containing:

`table`: The `data.frame` of differential phosphorylation results.
`plot`: A `ggplot` object of the generated volcano plot.

Examples

if (FALSE) { # \dontrun{
# Create a data frame that mimics omics data
# The column names must contain "control" and "treat"
data <- tibble(
        modification_sites = paste0("site_", 1:10),
        protein_group = rep(LETTERS[1:5], 2),
        amino_acid = c(rep("S", 5), rep("T", 5)),
        site = 1:10,
        PTM.Group = paste0("group_", 1:10),
        p1_control = rnorm(10, mean = 100, sd = 10),
        p2_control = rnorm(10, mean = 95, sd = 15),
        p3_control = rnorm(10, mean = 105, sd = 12),
        p4_treat = c(rnorm(5, mean = 150, sd = 20), rnorm(5, mean = 50, sd = 10)),
        p5_treat = c(rnorm(5, mean = 145, sd = 18), rnorm(5, mean = 55, sd = 11)),
        p6_treat = c(rnorm(5, mean = 160, sd = 22), rnorm(5, mean = 60, sd = 13))
)

# Run the function to get the normalized data table
# We set plot = "no" to suppress the volcano plot
results_table <- get_norm_phos(
        data = data,
        plot = "no"
)
# Print the head of the results table
print(head(results_table))
# Run the function to get the results and the volcano plot
# We set plot = "volcano_plot" and value = "adj_p_value"
results_with_plot <- get_norm_phos(
        data = data,
        plot = "volcano_plot",
        value = "adj_p_value"
)

# The function returns a list, so we can access the plot and table separately
results_table <- results_with_plot$table
volcano_plot <- results_with_plot$plot

# Print the plot
print(volcano_plot)

# To save the plot to a file
# ggplot2::ggsave("volcano_plot.png", plot = volcano_plot, width = 8, height = 6, dpi = 300)
} # }