The Cross-Beta DB is a database focused on gathering naturally occurring-cross-beta-forming amyloids into one place.
All data present in it have shown experimental proof of cross-beta structure formation following the later criteria.
The main purpose of the database is to provide data for training and benchmarking of new amyloid prediction models (see
Cross-Beta predictor).
But it also includes data about experimental conditions and other information for general usage. All entries of the
database are downloadable individually or by group. The benchmark set and other database versions used can be download in the
"Download" section.
The full description of all the variables present in the database and accessible by downloading one or several entries is available
Here.
For more information about how to use the database interface and all the features in it, check the
"Tutorial" section.
Due to a shift in environmental conditions or other factors, certain soluble proteins undergo aggregation,
resulting in the formation of clumps of amyloid fibrils. Understanding of this phenomenon is of paramount
importance due to its association with various diseases including Alzheimer's disorder as well as an increasingly
abundant data on its functional roles. Numerous studies have demonstrated that the propensity to form amyloids
is coded by the amino acid sequence and this finding paved the way for the development of several computational
predictors of amyloidogenicity. The ultimate objective of computational methods is to accurately predict the
formation of disease-related and functionally relevant amyloids that occur in vivo. These amyloid fibrils are
known to form a very specific “cross-beta structure” by protein regions longer than about 15 residues. Remarkably,
despite the significance of naturally occurring amyloids, there had been a lack of datasets specifically dedicated
to them. Hence, we built Cross-Beta DB, a database composed of cross-beta amyloids formed in the natural conditions.
This database is expected to be indispensable for benchmarking amyloid predictors. Moreover, as machine learning is
demonstrating its high potential in various fields, we used Cross-Beta DB to train several such algorithms. Their
benchmark revealed that Cross-Beta RF Predictor, developed on the basis of the random forest algorithm demonstrates
the best performance. The benchmark results also demonstrate superior performance of Cross-Beta RF Predictor over
the other existing methods, fostering high expectations for an improved prediction of naturally occurring amyloids.