- functions support .parquet files as input (produced by methylprep pipeline v1.7.0 or higher)
- methylcheck.load() includes a ‘control’ format that returns a dictionary of dataframes, keyed to sample names.
- consistent flexible function inputs: Any function that accepted a filepath or dataframe as first parameter
now accepts either of these interchangeably, except for loading functions and Report classes; these cannot handle dataframe inputs:
- improved .load speed
- get_sex bug fixes; supports plots, returning figure, returning labels, or returning predicted sex dataframe
- add testing via github actions
- updated documentation
- added support for sample sheets with the legacy Illumina [Header] … [Data] format. This requires
methylprepbe installed for the controls report to run now.
- .load gives clearer error when loading beta values from CSVs (‘beta_csv’) if probe names are not unique, and returns a list of series for each sample when indeces fail to merge (pandas.concat)
- .beta_mds_plot() can now suppress the interactive portion and still display plots, using
plotis a new kwarg, and defaults to
silentmode would suppress both prompts and plot display. Change in behavior:
silentmode will not disable plotting. Must also include
- Fixed bug in
- Added more detailed error message on
.load; it cannot load and merge two meth/unmeth dataframes with redundant probe names.
ReportPDFaccepts ‘poobah_colormap’ kwarg to feed in beta_mds_plot colormap.
ReportPDFcustom tables: You can insert your custom table on the first page by specifying ‘order_after’ == None.
palettecan now be any matplotlib colormap name. Defaults to ‘magma’ if not specified. The palette is only used to color-code poobah failure rates, if the poobah file path is specified.
extend_poobah_range: Default (True) shows 7 colors for poobah failure rates. If False, will show only 5.
- Reading IDATs loading bar didn’t work correctly, showed up after loading.
- Fixed error/logging messages:
- exclude_sex_control_probes() said 916 control probes were removed, then said “it appears your sample had no control probes”
- Erroneous message about missing values in poobah file: “color coding may be inaccurate.”
- Filtering probes info message said there were N samples when it meant probes.
- methylprep.download.build_composite_dataset() Process time was negative.
- Target Removal and Staining graphs in plot_controls() had unreadable X-axis sample names. Labels are suppressed when showing more than 30 samples.
- methylcheck.detect_array() sometimes returned array types in wrong case. All functions expect lowercase array types now.
- resolves exclude_sex_control_probes bugs.
- run_qc() and get_sex() did not recognize poobah_values.pkl on MacOS when using “~” in the filepath.
- methylcheck.problem_probe_reasons() lists probes matching any/all criteria when passing in no arguments, as documented
- get_sex() understands samplesheet ‘m’ and ‘f’ when not capitalized now.
- Load_both: always returns dataframe with probes in rows now, like .load() does.
- plot_M_vs_U now loads the noob_meth_values.pkl files if noob=True and files are found; otherwise it uses whatever meth/unmeth data is available.
- Methylcheck.qc_plot.qc_signal_intensity returns a dictionary of data about good/bad samples based on signal intensity. Previously it was only returning this if ‘plot’ was False.
- controls_report() bug fixed: methylprep was producing samplesheet meta data pickles that contained Sample_ID twice, because the GEO series_matrix files had this data appear twice. This broke the report, but this case is caught and avoided now. controls_report() will recognize a wider array of samplesheet filenames now; anything with ‘samplesheet’ or ‘meta_data’ in the filename.
- added ‘methylcheck report’ CLI option to create a ReportPDF
- updated documentation
- minor bug fixes in read_geo()
- qc_plot() now handles mouse probe type differently
- handles importing from multiple pandas versions correctly
- read_geo can open series_matrix.txt files now
- fixed big where csv data_files were not included in pypi
- Improved ReportPDF custom tables option
- if fields are too long, it will truncate them or auto scale the font size smaller to fit on page.
- added GCT score to controls_report() used in the ReportPDF class.
- ReportPDF changes
- uses noob_meth/unmeth instead of raw, uncorrected meth/unmeth values for GCT and U vs M plot
- inverted poobah table to report percent passing (instead of failing) probes per sample
- this changed input from ‘poobah_max_percent’ (default 5%) to ‘poobah_min_percent’, (default 80%)
- M_vs_U not included by default, because redundant with qc_signal_intensity
- M_vs_U compare=True now labels each sample and has legend, so you can see effect of NOOB+dye correction on batch
- added poobah color-coding to MDS plot
- get_sex improved plotting
- will read poobah data and size sample points according to percent of failed probes
- save plots, or return fig, and more options now
- Added a controls_report() function that creates a spreadsheet summary of control probe performance.
- New unit test coverage. Note that because methylprep v1.4.0 changes processing, the results will change slightly
- to match
minfi, with nonlinear-dye-bias correction and infer-type-I-probe-switching.
- changed org name from FoxoBioScience to FoxoTech
- Illumina Mouse Array Support
- Complete rewrite of documentation
- qc_signal_intensity and plot_M_vs_U have additional options, including superimposing poobah (percent probe failures per sample) on the plot coloring.
- .load will work on control_probes.pkl and mouse_probes.pkl files (with alt structure: dictionary of dataframe)
- .sample_plot uses “best” legend positioning now, because it was not fitting on screen with prev settings.
- get_sex() function returns a dataframe that also includes percent of X and Y probes that failed p-value-probe detection, as an indication of whether the predicted sex is reliable.
- better unit test coverage of predictions, load, load_both, and container_to_pkl functions
- fixed bug in load( ‘meth_df’)
- fixed bug in detect_array() where it labeled EPIC+ as EPIC
- minor fixes to load() and read_geo()
- exclude_probes() accepts problem_probes criteria as alternate way to specify probes.
- Exclude probes from a df of samples. Use list_problem_probes() to obtain a list of probes (or pass in the names of ‘Criteria’ from problem probes), then pass that in as a probe_list along with the dataframe of beta values (array)
- load_processed now has a –no_filter=False option that will remove probes that failed p-value detection, if passing in beta_values.pkl and poobah_values.pkl files.
- load() now handles gzipped files the same way (so .pkl.gz or .csv.gz OK as file or folder inputs)
- seaborn v0.10 –> v0.11 deprecrated the distplot() function that was used heavily. So now this employs kdeplot() in its place, with similar results.
- exposed more beta_density_plot parameters, so it can be used to make a QC plot (highlighting one or several samples within a larger batch, and graying out the others in the plot).
- improved read_geo() function, for downloading GEO methylation data sets and parsing meta_data from projects.
- changed org name from life-epigenetics to FoxoBioScience on Github.
- qc_plot bug fixes -99
- -99 bug in negative controls fixed
- tweaking custom-tables in ReportPDF
- ReportPDF.run_qc() supports on_lambda, and any functions that require .methylprep_manifest_files can be set to look for manifests in /tmp using on_lambda=True
- sklearn now optional for MDS
- adds kwargs to functions for silent processing returning figure objects, and a report_pdf class that can run QC and generate a PDF report.
- added version
- p-value probe detection
- hdbscan clustering functions
- more QC methods testing
- more tests, smart about df orientation, and re-organized files
- added read_geo() for processed datafiles, and unit tests for it. Works with txt,csv,xlsx,pkl files
- read_geo() docs
- debugged filters.list_problem_probes:
- updated the docs to have correct spelling for refs/reasons.
- added a function that lets you see more detail on the probes and reasons/pubs criteria
- added more genome studio QC functions,
- improved .load function (but not consolidated through methyl-suite yet)
- function .assign() for manually categorizing samples
- unit testing on the predict.sex function
- get_sex() prediction
- consolidated data loading for functions and uses fastest option