
Count Protein groups Based on Probability Threshold
Source:R/get_count_protein.R
get_count_protein.Rd
This function processes a data frame to count unique protein groups based on a specified probability threshold. It groups the data by a "Group" extracted from the `Sample` column, calculates the average protein group count for each group, and can optionally generate a bar plot to visualize the results.
Usage
get_count_protein(data, prob_threshold, plot = c("bar_plot", "no"))
Arguments
- data
A data frame containing the data. It must have columns named `Sample`, `Protein_group`, and `Probability`. The `Sample` column is expected to be in the format 'sampleID_groupName' (e.g., 'p1_control', 'p3_dose-1').
- prob_threshold
A numeric value between 0 and 1. This is the probability cutoff; the function will only consider phosphosites with a `Probability` value greater than or equal to this threshold.
- plot
A character string. Specifies whether to generate a plot. Accepted values are `"bar_plot"` to generate the plot or `"no"` to return only the summary table. Defaults to `"bar_plot"`.
Value
If `plot` is set to `"no"`, the function returns a `tibble` summarizing the mean `Number_of_protein_group` for each `Group`. If `plot` is `"bar_plot"`, the function returns a `list` with two elements:
`table`: The summary `tibble` with mean counts per group.
`plot`: A `ggplot` object of the bar plot, ready for printing.
The function will stop and return an error if the required columns are missing or the `prob_threshold` is not a valid number.
Details
The generated plot shows the mean count for each "Group" as a bar and includes jittered points representing the individual sample counts.
Examples
if (FALSE) { # \dontrun{
# Create a data frame for demonstration
data <- data.frame(
Sample = c("p1_control", "p2_control", "p3_dose-1", "p4_dose-1", "p5_dose-2", "p6_dose-2"),
Protein_group = c("Protein_1", "Protein_2", "Protein_1",
"Protein_1", "Protein_3", "Protein_1"),
Probability = c(0.95, 0.82, 0.65, 0.91, 0.73, 0.99)
)
# Get the summary table and the bar plot for a probability threshold of 0.75
results <- get_count_protein(data, prob_threshold = 0.75, plot = "bar_plot")
print(results$table)
print(results$plot)
# Get only the summary table for a probability threshold of 0.9
summary_table <- get_count_protein(data, prob_threshold = 0.9, plot = "no")
print(summary_table)
} # }