RnaChipIntegrator

Integrative analyses of RNA-Seq with ChIP-Seq data

About

RnaChipIntegrator is a utility that performs integrated analyses of RNA-Seq and ChIP-Seq data, identifying the nearest ChIP peaks to each transcript, and vice versa. It can also be used with any set of genomic features e.g. canonical genes, CpG islands etc, or expression data e.g. microarrays.

Download

Download the latest version:

See the change log for the most recent changes, and the downloads page for all available versions.

This release addresses a bug that was discovered in the previous version of the code (v0.3.3), which meant that peaks within transcripts or promoter regions on the negative strand might sometimes be incorrectly flagged as not within those regions.

This bug did not affect the reporting of the peaks and transcripts

Installation

A number of installation options are available:

  1. You can use Python's setuptools to install the RnaChipIntegrator code. To do this, download the archive (either .tar.gz or .zip), unpack it and install as follows:

    $ cd RnaChipIntegrator-0.4.0
    $ sudo python setup.py install
  2. Alternatively if you're using pip (see www.pip-installer.org/en/latest/index.html) to manage your Python installations then you can install as follows:

    $ sudo pip install RnaChipIntegrator.tar.gz
  3. If you don't want to do a system install then you can just unpack the archive and run the Python code directly using:

    $ /path/to/RnaChipIntegrator.py [options] ...

    (Note you will need to add the '.py' extension to the program name when running the examples below.)

See the INSTALL document for more detailed information about the how to install and uninstall the utility.

Usage

General usage syntax is:

$ RnaChipIntegrator [options] <rna-data-file> <chip-data-file>

Use -h to produce a list of options and descriptions of their functions:

$ RnaChipIntegrator -h

Specific examples using example data included in the downloads:

$ RnaChipIntegrator --window=130000 ExpressionData.txt ChIP_edges.txt
$ RnaChipIntegrator --window=130000 ExpressionData.txt ChIP_peaks.txt

See the manual for more detailed information about the analyses.

Expression Data

The expression data (RNA-seq) file must be a tab-delimited file with 5 columns of data for each genomic feature (one per line):

ID  chr  start  end  strand

chr is the chromosome name, start and end define the limits of the feature, and strand must be either + or -. GeneID is a name which is used to identify the genomic feature in the output. Genes in between a reported gene and the ChIP site are also reported.

Optionally there can be a 6th column, indicating whether the gene was differentially expressed (= 1) or not (= 0).

(See the example ExpressionData.txt file.)

ChIP-Seq Data

The ChIP-seq data file must be a tab-delimited file with 3 columns of data for each ChIP peak (one per line):

chr  start  end

chr is the chromosome name (must match those in the expression data file), and start and end define the ChIP peaks - these can either be summits (in which case end - start = 1), or regions (with start and end indicating the extent).

(See the example ChIP_edges.txt and ChIP_peaks.txt files.)

Note that different analyses will be selected depending on whether the ChIP peaks are defined as summits or regions.

Dependencies

RnaChipIntegrator was written for Python 2.7, but has also been used with Python 2.4. It uses the xlwt, xlrd and xlutils Python libraries to write its XLS output files. If these are not available then the program will still run but won't produce an XLS file.

License

Artistic Licence 2.0

Authors

Ian Donaldson, Leo Zeef and Peter Briggs, University of Manchester, Faculty of Life Sciences Bioinformatics Core Facility Facility

Contact

Peter Briggs (peter.briggs [at] manchester.ac.uk)

For developers

See the source code on GitHub: https://github.com/fls-bioinformatics-core/RnaChipIntegrator

You can clone the project with Git by running:

$ git clone git://github.com/fls-bioinformatics-core/RnaChipIntegrator