Scientific Computing Xpositions

MySQL Implementation for a Database of Protein Structures: SupSQL

Abstract

Computational biologists are increasingly interested in combining information from both sequence structure and sequence evolution. Historically databases have been generated to archive structural sequences or evolutionary trees but to date there have been few attempts to integrate this knowledge. Here we propose a MySQL database implementation (SupSQL) to explicitly incorporate structural and evolutionary information. The database will include useful information about the class of structure (globular/transmembrane), quaternary structure, genomic sequence, introns, taxonomy and protein family. All of these data are readily available on the internet but are spread out throughout several databases like UniProtKB, Pfam, RCSB Protein Data Bank, PDBTM, EMBL, NCBI Taxonomy, etc. The proposed implementation will let us combine all of this knowledge in a single database and by using a standardized language like SQL it would be easy to access and share with interested parties. With the information contained in this database, we would have a systematic method to select proteins that match certain criteria and therefore we don't depend on randomly selecting proteins based on past experience or biased knowledge.