methylcheck.combine_mds

methylcheck.combine_mds(*args, **kwargs)

To combine (or segment) datasets for multidimensional scaling analysis

Use this function on multiple dataframes to combine datasets, or to visualize parts of the same dataset in separate colors. It is a wrapper of methylcheck.beta_mds_plot() and applies multidimensional scaling to cluster similar samples based on patterns in probe values, as well as identify possible outlier samples (and exclude them).

  • combine datasets,
  • run MDS,
  • see how each dataset (or subset) overlaps with the others on a plot,
  • exclude outlier samples based on a composite cutoff box (the average bounds of the component data sets)
  • calculate the percent of data excluded from the group
  • *args:
    • pass in any number of pandas dataframes, and it will combine them into one mds plot.
    • alternatively, you may pass in a list of filepaths as strings, and it will attempt to load these files as pickles.

    but they must be pickles of pandas dataframes containing beta values or m-values

  • silent: (default False)
    (automated processing mode) if True, suppresses most information and avoids prompting user for anything. silent mode processes data but doesn’t show the plot.
  • save: (default False)
    if True, saves the plot png to disk.
  • verbose: (default False)
    if True, prints extra debug information to screen or logger.
  • filter_stdev:
    how broadly should you retain samples? units are standard deviations, defaults to 1.5 STDEV. if you increase this number, fewer outlier samples will be removed.
  • returns a dataframe of transformed samples