Run a variety of probe and sample filters in tandem, then plot results
by specifying all of your options at once, instead of running every part of methylcheck in piacemeal fashion.
this is analogous to using the methylcheck CLI, but for notebooks/scripts
- df: (required)
- data as a DataFrame of beta values (or DataFrame of m_values)
- sample names in columns and probes in rows
- verbose: (True/False)
- default: False – shows extra info about processing if True
- silent: (True/False)
- default: False – suppresses all warnings/info
- filters out probes on sex-chromosomes
- filters out illumina control probes
- filters out the most probes (sex-linked, control, and all sketchy-listed probes from papers)
exclude: (list of strings, shorthand references to papers with sketchy probes to exclude)
- If the array is 450K the publications may include:
'Chen2013' 'Price2013' 'Zhou2016' 'Naeem2014' 'DacaRoszak2015'
- If the array is EPIC the publications may include:
- or these reasons:
'Polymorphism' 'CrossHybridization' 'BaseColorChange' 'RepeatSequenceElements'
- or use
- to do maximum filtering, including all of these papers’ lists.
- plot: (list of strings)
- [‘mean_beta_plot’, ‘beta_density_plot’, ‘cumulative_sum_beta_distribution’, ‘beta_mds_plot’, ‘all’] if ‘all’, then all of these plots will be generated. if omitted, no plots are created.
- save_plots: (True|False)
- default: False
- export (True|False):
- default: False – will export the filtered df as a pkl file if True
- this pipeline cannot also apply the array-level methylcheck.run_qc() function because that relies on additional probe information that may not be present. Everything in this pipeline applies to a dataframe of beta or m-values for a set of samples.
- a filtered dataframe object