Skip to contents

This function calculates the absolute counts and percentages of phosphorylation sites (Serine (S), Threonine (T), Tyrosine (Y)) for each sample, based on a user-defined phosphorylation site probability cutoff. It can then optionally generate bar plots to visualize these counts or percentages across different sample groups.

Usage

get_sty(
  data,
  plot = c("no", "count", "percentage"),
  sample_name,
  group,
  ptm_site_prob_col,
  ptm_site_prob_val = 0.75,
  ptm_site_amino_acid,
  selected_group = NULL
)

Arguments

data

A data frame containing phosphoproteomics data. This data frame is expected to have columns for sample names, phosphorylation site probabilities, amino acid type of the phosphorylation site, and a grouping variable.

plot

A character string specifying the type of plot to generate.

  • "no": No plot is generated; only the summary table is returned.

  • "count": Generates bar plots showing the absolute counts of S, T, and Y phosphosites for each group, faceted by amino acid type.

  • "percentage": Generates bar plots showing the percentage distribution of S, T, and Y phosphosites within each group, faceted by amino acid type.

sample_name

[`data-masking`][rlang::args_data_masking]> The unquoted name of the column in `data` that contains unique sample identifiers. e.g., `SampleID`.

group

<[`data-masking`][rlang::args_data_masking]> The unquoted name of the column in `data` that defines the experimental groups (e.g., "control", "treat").

ptm_site_prob_col

<[`data-masking`][rlang::args_data_masking]> The unquoted name of the column in `data` that contains phosphorylation site probabilities (e.g., "Localization_Probabilities").

ptm_site_prob_val

Numeric, a cutoff value for `ptm_site_prob_col`. Only sites with a probability greater than or equal to this value will be included in the analysis. Default is `0.75`.

ptm_site_amino_acid

<[`data-masking`][rlang::args_data_masking]> The unquoted name of the column in `data` that contains the amino acid type of the phosphorylation site (expected to be "S", "T", or "Y"). e.g., `Amino_Acid`.

selected_group

A character vector specifying which groups from the `group` column should be included in the plots. If omitted or `NULL`, all groups will be plotted. This parameter is ignored if `plot = "no"`.

Value

#' If `plot = "no"`, returns a `data.frame` (tibble) summarizing the counts and percentages of S, T, Y phosphosites per sample. If `plot = "count"` or `plot = "percentage"`, returns a `list` containing:

  • `summary_table`: The `data.frame` of S/T/Y counts and percentages.

  • `plot`: A `ggplot` object of the generated bar plot.

Examples

# Create data for demonstration
set.seed(123)
phospho_data <- data.frame(
  SampleID = rep(c("S1_ctrl", "S2_ctrl", "S3_treat", "S4_treat"), each = 25),
  Group = rep(c("Control", "Control", "Treatment", "Treatment"), each = 25),
  Protein = paste0("Prot", 1:100),
  Localization_Probabilities = runif(100, 0.5, 0.99),
  Amino_Acid = sample(c("S", "T", "Y"), 100, replace = TRUE, prob = c(0.7, 0.2, 0.1))
)

# Case 1: Calculate S/T/Y distribution and return data only
sty_counts_df <- get_sty(
  data = phospho_data,
  plot = "no",
  sample_name = SampleID,
  group = Group,
  ptm_site_prob_col = Localization_Probabilities,
  ptm_site_prob_val = 0.75,
  ptm_site_amino_acid = Amino_Acid
)
print(sty_counts_df)
#> # A tibble: 4 × 9
#>   SampleID Group         S     Y     T   STY pct_S pct_T pct_Y
#>   <chr>    <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 S1_ctrl  Control      12     4     0    16  75     0    25  
#> 2 S2_ctrl  Control       4     1     4     9  44.4  44.4  11.1
#> 3 S3_treat Treatment    10     0     2    12  83.3  16.7   0  
#> 4 S4_treat Treatment     9     1     0    10  90     0    10  

# Case 2: Calculate S/T/Y distribution and generate count plots
# Define selected groups for plotting
selected_groups_for_plot <- c("Control", "Treatment")

sty_counts_plot <- get_sty(
  data = phospho_data,
  plot = "count",
  sample_name = SampleID,
  group = Group,
  ptm_site_prob_col = Localization_Probabilities,
  ptm_site_prob_val = 0.75,
  ptm_site_amino_acid = Amino_Acid,
  selected_group = selected_groups_for_plot
)
print(sty_counts_plot$summary_table)
#> # A tibble: 4 × 9
#>   SampleID Group         S     Y     T   STY pct_S pct_T pct_Y
#>   <chr>    <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 S1_ctrl  Control      12     4     0    16  75     0    25  
#> 2 S2_ctrl  Control       4     1     4     9  44.4  44.4  11.1
#> 3 S3_treat Treatment    10     0     2    12  83.3  16.7   0  
#> 4 S4_treat Treatment     9     1     0    10  90     0    10  
sty_counts_plot$plot # Display the count plot


# Case 3: Calculate S/T/Y distribution and generate percentage plots
sty_percentage_plot <- get_sty(
  data = phospho_data,
  plot = "percentage",
  sample_name = SampleID,
  group = Group,
  ptm_site_prob_col = Localization_Probabilities,
  ptm_site_prob_val = 0.75,
  ptm_site_amino_acid = Amino_Acid,
  selected_group = selected_groups_for_plot
)
print(sty_percentage_plot$summary_table)
#> # A tibble: 4 × 9
#>   SampleID Group         S     Y     T   STY pct_S pct_T pct_Y
#>   <chr>    <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 S1_ctrl  Control      12     4     0    16  75     0    25  
#> 2 S2_ctrl  Control       4     1     4     9  44.4  44.4  11.1
#> 3 S3_treat Treatment    10     0     2    12  83.3  16.7   0  
#> 4 S4_treat Treatment     9     1     0    10  90     0    10  
sty_percentage_plot$plot # Display the percentage plot