To combine (or segment) datasets for multidimensional scaling analysis
Use this function on multiple dataframes to combine datasets, or to visualize parts of the same dataset in separate colors. It is a wrapper of methylcheck.beta_mds_plot() and applies multidimensional scaling to cluster similar samples based on patterns in probe values, as well as identify possible outlier samples (and exclude them).
- combine datasets,
- run MDS,
- see how each dataset (or subset) overlaps with the others on a plot,
- exclude outlier samples based on a composite cutoff box (the average bounds of the component data sets)
- calculate the percent of data excluded from the group
- pass in any number of pandas dataframes, and it will combine them into one mds plot.
- alternatively, you may pass in a list of filepaths as strings, and it will attempt to load these files as pickles.
but they must be pickles of pandas dataframes containing beta values or m-values
silent: (default False)
- (automated processing mode) if True, suppresses most information and avoids prompting user for anything. silent mode processes data but doesn’t show the plot.
save: (default False)
- if True, saves the plot png to disk.
verbose: (default False)
- if True, prints extra debug information to screen or logger.
- how broadly should you retain samples? units are standard deviations, defaults to 1.5 STDEV. if you increase this number, fewer outlier samples will be removed.
- returns a dataframe of transformed samples