methylcheck.get_sex

methylcheck.get_sex(data_source, array_type=None, verbose=False, plot=False, save=False, on_lambda=False, median_cutoff=-2, include_probe_failure_percent=True, poobah_cutoff=20, custom_label=None, return_fig=False, return_labels=False)

This will calculate and predict the sex of each sample.

the “data_source” can be any one of:
path – to a folder with csv data that contains processed sample data path – to a folder with the ‘meth_values.pkl’ and ‘unmeth_values.pkl’ dataframes path – to a folder also containing samplesheet pkl and poobah_values.pkl, if you want to compare predicted sex with actual sex. data_containers – object created from methylprep.run_pipeline() or methylcheck.load(path, ‘meth’) tuple of (meth, unmeth) dataframes
array_type (string)
enum: {‘27k’,’450k’,’epic’,’epic+’,’mouse’} if not specified, it will load the data from data_source and determine the array for you.
median_cutoff (default is -2)
the minimum difference in the medians of X and Y probe copy numbers to assign male or female (copied from the minfi sex predict function)
include_probe_failure_percent:
True: includes poobah percent per sample as column in the output table and on the plot. Note: you must supply a ‘path’ as data_source to include poobah in plots.
poobah_cutoff
The maximum percent of sample probes that can fail before the sample fails. Default is 20 (percent) Has no effect if include_probe_failure_percent is False.
plot
True: creates a plot, with option to save as image or return_fig.
save
True: saves the plot, if plot is True
return_fig
If True, returns a pyplot figure instead of a dataframe. Default is False. Note: return_fig will not show a plot on screen.
return_labels: (requires plot == True)
When using poobah_cutoff, the figure only includes A-Z,1…N labels on samples on plot to make it easier to read. So to get what sample_ids these labels correspond to, you can rerun the function with return_labels=True and it will skip plotting and just return a dictionary with sample_ids and these labels, to embed in a PDF report if you like.
custom_label:
Option to provide a dictionary with keys as sample_ids and values as labels to apply to samples. e.g. add more data about samples to the multi-dimensional QC plot

while providing a filepath is the easiest way, you can also pass in a data_containers object, a list of data_containers containing raw meth/unmeth values, instead. This object is produced by methylprep.run_pipeline, or by using methylcheck.load(filepath, format=’meth’) and lets you customize the import if your files were not prepared using methylprep (non-standand CSV columns, for example)

If a poobah_values.pkl file can be found in path, the dataframe returned will also include percent of probes for X and Y chromosomes that failed quality control, and warn the user if any did. This feature won’t work if a containers object or tuple of dataframes is passed in, instead of a path.

Note: ~90% of Y probes should fail if the sample is female. That chromosome is missing.