1. Overview:
SpotTSS predicts Transcription Start Sites (TSSs) for plant. It uses six different structural features to find the potential TSS candidates and filters the best on their core motif profiles. Finally, it clusters predictions at a defined distance based on their confidence scores and provides the predicted TSSs.


2. System requirements

SpotTSS is suitable for 32-bit or 64-bit machines with Linux operating systems. To run SpotTSS python 2.6 or higher and R 2.13 or higher are required. You also need the python libraries: Biopython, rpy, numpy, scipy, Matplotlib and python-cluster and e1071 library for R. For a single processor at least 1GB of system memory is recommended while using 1000bp chunk size.


Download the source code for linux by clicking here

3. Installation:

Downlaod SpotTSS package in
./SpotTSS_v1.tar.gz

Untar using the following command

$ tar xvzf SpotTSS_v1.tar.gz
$ cd SpotTSS_v1

Done!! At this point all the scripts are in current directory.

4. Using SpotTSS

4.1 Build a model

To build a model, BuildTSSPredictionModel.py script needs two type of datasets. Dataset “TSS” has sequences with 300bp length where the TSS is in 145-155 bp for each frame whereas dataset “nonTSS” are some randomly chosen promoter, coding and intronic sequences that don’t overlap with any annotated TSS.We have provided the “TSS” and “nonTSS” datasets as “all_tss_frames.fasta” and “all_nontss_frames.fasta” that we used to build the model “model_1”.

Usage: BuildTSSPredictionModel.py -t tss_frames -n no_tss_frames -m model_name

Options:
-h, –help            show this help message and exit
-t TSS, –tss=TSS     fasta file of TSS frames
-n NOTSS, –notss=NOTSS
fasta file of non-TSS frames
-m MODEL, –model=MODEL
name of model file to build

4.2 Motif Score generation

This is a simple script named “motif_score.py” which scans five selected core motifs on some confined ranges with respect to TSS and writes the scores in a text file. All the promoters must have the TSS in the same position and promoter length must be >=200bp. Here the “all_loci.fasta” file has all the 850 loci sequences we have used as training data and TSS is located at 1001bp for these sequences.

Usage: motif_score.py -i inputfile -o outputfile -p position

Options:
-h, –help            show this help message and exit
-i INFILE, –infile=INFILE
Input file
-o OUTFILE, –outfile=OUTFILE
Output file
-p POSITION, –position=POSITION
position of the TSS,all sequences must have TSS in the
same position and the promoter length should be
>=200bp

4.3 Predict TSS

To predict TSS on a given sequence ApplyTSSPredictionModel.py script needs a model. We have provided “model_1” as a suitable model that has been tested on various plant species. ApplyTSSPredictionModel.py uses a list of motif scores for percentile ranking. We provided the scores in a file named “score_01.txt” or you can make your own score list as described in section 4.2. Predicting TSS with multiple features is a processor intesive work therefore we suggest to use -p options for larger sequences. Without this option the script will run on a single processor.

Usage: ApplyTSSPredictionModel.py -i inputfile -o outputfile -m model -p processes -s strand -r Motif_score_file

Options:
-h, –help            show this help message and exit
-i INFILE, –infile=INFILE
Input file
-o OUTFILE, –outfile=OUTFILE
Output file
-m MODEL, –model=MODEL
Name of the model file
-p PROCESSES, –processes=PROCESSES
Number of processors to be used
-s STRAND, –strand=STRAND
Strand of the given sequences to scan; 1 for forward
strand and 2 for reverse strand
-r RANK, –rank=RANK  Motif Score file for ranking

4.4 Output

SpotTSS outputs a tab-delimited text file as follows:

Seq_ID    TSS_position    Strand    Score
Chr1    100683    +    0.71765
Chr1    100613    +    0.95395