methylcheck.run_pipeline(df, **kwargs)

Run a variety of probe and sample filters in tandem, then plot results

by specifying all of your options at once, instead of running every part of methylcheck in piacemeal fashion.

this is analogous to using the methylcheck CLI, but for notebooks/scripts

df: (required)
  • data as a DataFrame of beta values (or DataFrame of m_values)
  • sample names in columns and probes in rows
verbose: (True/False)
default: False – shows extra info about processing if True
silent: (True/False)
default: False – suppresses all warnings/info
filters out probes on sex-chromosomes
filters out illumina control probes
filters out the most probes (sex-linked, control, and all sketchy-listed probes from papers)

exclude: (list of strings, shorthand references to papers with sketchy probes to exclude)

If the array is 450K the publications may include:
'Chen2013' 'Price2013' 'Zhou2016' 'Naeem2014' 'DacaRoszak2015'
If the array is EPIC the publications may include:
'Zhou2016' 'McCartney2016'
or these reasons:
'Polymorphism' 'CrossHybridization' 'BaseColorChange' 'RepeatSequenceElements'
or use 'exclude_all':
to do maximum filtering, including all of these papers’ lists.
plot: (list of strings)
[‘mean_beta_plot’, ‘beta_density_plot’, ‘cumulative_sum_beta_distribution’, ‘beta_mds_plot’, ‘all’] if ‘all’, then all of these plots will be generated. if omitted, no plots are created.
save_plots: (True|False)
default: False
export (True|False):
default: False – will export the filtered df as a pkl file if True
this pipeline cannot also apply the array-level methylcheck.run_qc() function because that relies on additional probe information that may not be present. Everything in this pipeline applies to a dataframe of beta or m-values for a set of samples.
a filtered dataframe object