The chess-hic command line tool

chess pairs

This command allows the automatic creation of a bedpe file as required by chess sim if its intended use is to search a whole genome or a number of chromosomes for regions with differences in their chromatin conformation between two samples.

The window size should not be smaller than 20x the bin size of the data that will be compared with chess sim. We recommend to use window sizes not smaller than 100x the bin size of the data.

usage: chess [-h] [--file-input] [--chromosome CHROMOSOME]
             genome window step output

Positional Arguments

genome UCSC genome identifier (as recognized by pybedtools), or path to tab-separated chrom sizes file with columns <chromosome name> <chromosome size>. Will use the path only if no UCSC entry with that name is found, or –file-input is specified.
window Window size in base pairs
step Step size in base pairs
output Path to output file

Named Arguments

--file-input

Will not check for UCSC entry of genomeinput with pybedtools if set

Default: False

--chromosome Produce window pairs only for the specified chromosome

chess sim

This command runs the comparisons between two sets of chromatin contact data.

Please note that the size of the regions passed in the pairs file cannot be smaller than 20x the bin size of the data. We recommend to use regions spanning at least 100x the bin size of the data.

usage: chess [-h] [--reference-regions REFERENCE_REGIONS]
             [--query-regions QUERY_REGIONS]
             [--background-regions BACKGROUND_REGIONS] [--background-query]
             [-p THREADS] [--keep-unmappable-bins]
             [--mappability-cutoff MAPPABILITY_CUTOFF]
             [-r RELATIVE_WINDOWSIZE] [-a ABSOLUTE_WINDOWSIZE] [--oe-input]
             [--limit-background]
             reference_contacts query_contacts pairs out

Positional Arguments

reference_contacts
 Balanced contact matrix for the reference sample in one of the following formats: fanc .hic, juicer .hic@<resolution>,cooler .cool@<resolution> or .mcool@<resolution>, sparse format (each line: <row region index> <column region index> <matrix value>). If the file is in sparse format, the corresponding regions BED file needs to be passed via –reference-regions.
query_contacts Balanced contact matrix for the query sample in one of the following formats: fanc .hic, juicer .hic@<resolution>,cooler .cool@<resolution> or .mcool@<resolution>, sparse format (each line: <row region index> <column region index> <matrix value>). If the file is in sparse format, the corresponding regions BED file needs to be passed via –query-regions.
pairs Region pairs to compare. Expected to be in 10 column BEDPE format as defined in the BEDTools docs: https://bedtools.readthedocs.io/en/latest/content/general-usage.html, with chrom1, start1, etc corresponding to reference and chrom2, start2, etc to query. The “score” column is not used, but should be present (can be “.” or another placeholder value). All other columns are required.
out Path to outfile.

Named Arguments

--reference-regions
 BED file (no header) with regions corresponding to the number of rows in the provided reference matrix.
--query-regions
 BED file (no header) with regions corresponding to the number of rows in the provided query matrix.
--background-regions
 BED file with regions to be used for background calculations. If provided, CHESS will generate Z-scores and P-values for similarities.
--background-query
 

Use every region of the same size as the reference from the query genome as background. Useful, for example, as background for inter-species comparisons.

Default: False

-p

Number of cores to use.

Default: 1

--keep-unmappable-bins
 

Disable deletion of unmappable bins.

Default: False

--mappability-cutoff
 

Low pass threshold for fraction of unmappable bins. Matrices with a higher content of unmappable bins will not be considered. Unmappable bins will be deleted from matrices passing the filter.

Default: 0.1

-r, --relative-windowsize
 

Relative window size value for the win_size param in the ssim function. Fraction of the matrix size.

Default: 1

-a, --absolute-windowsize
 Absolute window size value in bins for the win_size param in the ssim function. Overwrites -r.
--oe-input

Use if input contacts are already observed / expected transformed.

Default: False

--limit-background
 

Restrict background computation to the syntenic / paired chromosome as indicated in the pairs file.

Default: False

chess extract

This command extracts features from a set of input regions.

Please note that the input parameters have to be fine tuned depending on the size of the analyzed regions and the target features. For now, some experimentation by the user is required, but we are planning to release a guide to this in the future.

This command will write two files into the output directory: gained_features.tsv and lost_features.tsv, for the gained and lost features in the query matrix compared to the reference, respectively. These files contain the information about the position, and the shape of the features. The first value of each row correspond to the region ID (same as in the chess sim output) the second to an ID of the feature. The following four values correspond to the position of the corners of the rectangle that contain the feature (xmin, xmax, ymin and ymax) in the region matrix. The following columns contain the contact values for the portion of the region matrix belonging to the feature in the query matrix (gained_features.tsv) or reference matrix (lost_features.tsv).

usage: chess [-h] [--reference-regions REFERENCE_REGIONS]
             [--query-regions QUERY_REGIONS] [--windowsize WINDOWSIZE]
             [--sigma-spatial SIGMA_SPATIAL]
             [--size-medianfilter SIZE_MEDIANFILTER]
             [--closing-square CLOSING_SQUARE]
             pairs reference_contacts query_contacts out

Positional Arguments

pairs Region pairs that have been identified to contain structural differences.Expected to be in 10 column BEDPE format as defined in the BEDTools docs: https://bedtools.readthedocs.io/en/latest/content/general-usage.html, with chrom1, start1, etc corresponding to reference and chrom2, start2, etc to query. The “score” column is not used, but should be present (can be “.” or another placeholder value). All other columns are required.
reference_contacts
 Balanced contact matrix for the reference sample in one of the following formats: fanc .hic, juicer .hic@<resolution>,cooler .cool@<resolution> or .mcool@<resolution>, sparse format (each line: <row region index> <column region index> <matrix value>). If the file is in sparse format, the corresponding regions BED file needs to be passed via –reference-regions.
query_contacts Balanced contact matrix for the query sample in one of the following formats: fanc .hic, juicer .hic@<resolution>,cooler .cool@<resolution> or .mcool@<resolution>, sparse format (each line: <row region index> <column region index> <matrix value>). If the file is in sparse format, the corresponding regions BED file needs to be passed via –query-regions.
out Path to output directory.

Named Arguments

--reference-regions
 BED file (no header) with regions corresponding to the number of rows in the provided reference matrix.
--query-regions
 BED file (no header) with regions corresponding to the number of rows in the provided query matrix.
--windowsize

Window size to average the bins according to their spatial closeness and their radiometric similarity, by default the windows size is the 3 x 3 bins. Larger values will average bins with larger differences.

Default: 3

--sigma-spatial
 

Gaussian function of the Euclidean distance between two bins and its standard deviation. Larger values will average bins with larger differences.

Default: 3

--size-medianfilter
 

Windows size used to scan and smooth the contained bins. Higher values will smooth larger figures, while smaller values will consider subtle signals (i.e. loops).

Default: 9

--closing-square
 

Side length of the square used to remove noise, and fill structures. Larger values will enclose larger structures and remove punctuate or looping structures.

Default: 8

chess crosscorrelate

This command clusters extracted features by their topology.

usage: chess [-h] extracted_file pairs outdir

Positional Arguments

extracted_file Output from extract sub-command.
pairs Region pairs that have been identified to contain structural differences. Expected to be in 10 column BEDPE format as defined in the BEDTools docs: https://bedtools.readthedocs.io/en/latest/content/general-usage.html, with chrom1, start1, etc corresponding to reference and chrom2, start2, etc to query. The “score” column is not used, but should be present (can be “.” or another placeholder value). All other columns are required.
outdir Path to output directory.