methylcheck.beta_mds_plot

methylcheck.beta_mds_plot(df, filter_stdev=1.5, verbose=False, save=False, silent=False, multi_params={'draw_box': True}, plot_removed=False, nafill='quick', poobah=None, palette=None, labels=None, extend_poobah_range=True, plot=True)

Performs multidimensional scaling on a dataframe of samples

Arguments

df:
dataframe of beta values for a batch of samples (rows are probes; cols are samples)
filter_stdev:
a value (unit: standard deviations) between 0 and 3 (typically) that represents the fraction of samples to include, based on the standard deviation of this batch of samples. So using the default value of 1.5 means that all samples whose MDS-transformed beta sort_values are within +/- 1.5 standard deviations of the average beta are retained in the data returned.
plot_removed:
if True, displays a plot of samples’ beta-distributions that were removed by MDS filtering. ignored if silent=True.
nafill: (‘quick’ | ‘impute’)
by default, most samples will contain missing values where probes failed the signal-noise detection in methylprep. By default, it will use the fastest method of filling in samples from adjacent sample’s probe values with the ‘quick’ method. Or, if you want it to use the average value for all samples for each probe, use ‘impute’, which will be much slower.
poobah:
path to poobah_values.pkl file. Default is None. If supplied, this will color code dots according to percent of failed probes for each sample as a second dimension of QC on the plot. Does not filter or affect the output dataframe returned.
palette:
Optional - Specify a matplotlib/seaborn palette name, such as ‘CMRmap_r’, ‘coolwarm’, or ‘nipy_spectral’. Default is ‘twilight’.
labels:
pass in a dictionary with sample names found in df columns and a (number or string) representing the groups to assign samples to. Use this to color-code the samples against a known classification scheme, such as cell type, and observe whether the MDS clustering pattern aligns with this input parameter. This feature is not compatible with poobah or multi_params.
extend_poobah_range:
True means 7 colors appear covering 0-30%. False means 5 colors and 0-20%. Default is True.
multi_params:
is a dict, passed into this function from a multi-compare-MDS wrapper function, containing: {return_plot_obj=True, fig=None, ax=None, draw_box=False, xy_lim=None, color_num=0, PSF=1.2 – plot scale factor (margin beyond points to display)}

Options

verbose:
If True, provides additional messages
silent:
  • if running from command line in an automated process, you can run in silent mode to suppress any user interaction.
  • In this case, whatever filter_stdev you assign is the final value, and a file will be processed with that param.
plot: (default True)
  • plot is False, this suppresses plots (images) from being generated and shown on screen.
  • .png files are still saved if save == True.

Returns

Returns a filtered dataframe. If return_plot_obj is True, it returns the plot, for making overlays in methylize.

Requires

pandas, numpy, pyplot, sklearn.manifold.MDS

Notes

this will remove probes from ALL samples in batch from consideration if any samples contain NaN (missing values) for that probe.