Abstract
About 14% of the protein sequences in the Swissprot database contain repetitive region, viz., tandem repeats, multiple copies of motifs/profiles, multiple copies of domain. And eukaryotic proteins are more likely to have repeats than Bacteria and Archaea. Our main focus is only on tandem repeats. Tandem repeat can be defined as contiguous repeat pattern of two or more copies. These copies can be exact or approximate. Many proteins with these repeat are involved in functions like transcription, translation, protein-protein interaction. Proteins with tandem repeats are involved in various neurodegenerative diseases like Huntington's disease. These proteins are also found to occur in sequences which are poorly conserved in evolution. We have developed a database of tandem repeats in protein sequences: Protein Tandem Repeats DataBase(PTRDB). The data for this database is extracted using our in-house tool PEPPER, a tool for identifying PEPtide PEriodic Repeats. This database is built on SwissProt (ver 51.7). PTRDB have 3145 proteins with 4713 tandem repeats. 77.74% are found in Eukaryota, 16.63% in Bacteria and 0.67% in Archaea. About 5% of proteins in this database are associated with disease. We have classified the database organism wise. This database can be queried by various attributes like SwissProt ID, organism name, PDBID, repeat pattern, repeat length, keyword and copy number. It gives the detail information of tandem repeats like SwissProt ID (hyperlinked to SwissProt to give Fasta sequence), organism name, PDBID (hyperlinked to PDB), one line description of the protein, gene name, protein name, taxonomy ID (hyperlinked to EBI), Family information (hyperlinked to Pfam), Description of associated disease, OMIM ID, scoring matrix with gap extension for the repeat, repeat pattern with alignment score, copy number, repeat length, start point, end point, alignment of repeat pattern with repeat region, secondary structure information of the repeat region. With the increasing abundance of tandem protein repeats in various proteomes this database will be increasingly important in proteome comparison.