De novo profile generator

Synthetic Profile Builder

SPBuild is a de novo profile generator based on a deep learning algorithm, LSTM. It accepts a sequence in fasta format as a sole input and generates position specific scoring matrix (PSSM).

SPBuild standalone


Benchmark dataset

SCOP20 test dataset

SCOP20 learning dataset

You will need about 20GB memory to build the program.

$ tar zxvf spbuild-20171107.tar.gz
$ cd spbuild/code/spbuild/cc
$ ./configure
$ make
$ make install

SPBuild accepts a fasta format file of an amino acid sequence as an input file and outputs PSSM file in mtx and/or asnt format.

$ spbuild -i <fasta> [-m <mtx output file>] [-a <asnt output file>]

The output file (.asnt) can be used as an input file of PSI-BLAST search.

$ psiblast -in_pssm <asnt output file> -db <database file>

Yamada KD and Kinoshita K, De novo profile generation based on sequence context specificity with the long short-term memory network, bioRxiv, 240515, 2017