Any text can be entered in the search bar, which results in a free text search. A query term kinase cancer
, for example, is translated to the search for
the binding site entries that contain the (sub-)terms kinase and cancer anywhere in their descriptions. The white space character between the terms is
interpreted as the logical AND operator.
Alternatively, a query can be made specific by providing the column name in which to perform the search, and the value for this column.
For example, pdb_id=1got
retrieves exactly one protein from the database.
Valid column names are:
The following examples show how to properly use the query expression language in the search bar to retrieve whatever binding sites you wish from the database:
leave empty
bs_id=1
bs_id>1
(note: our method does not reliably determine if a site is allosteric)bsite_rest=small.substrate.competitive
bsite_rest=small.cofactor.competitive
bs_id=[1,2]
min_beta_factor>90
compound_score>90
compound_score>50 compound_score<91
(note: space character between the terms is interpreted as logical AND)bsite_rest=water cons>0.9 num_occ>10
(note: these waters can be important for protein function or structural stability)Binding sites were predicted using the ProBiS-Fold method for the whole AlphaFold DB. Here, we showcase a few examples of potential usage, advantages and properties of the predicted binding sites.
Binding site grid is then generated for each ligand cluster by sampling hexagonal close-packed points spaced 0.2 Å apart that fall within the radius of any of the ligands’ atoms but do not overlap with any of the protein atoms. A binding site grid thus follows the contours of the molecular surface of the biological assembly and the space occupied by the predicted ligands up to 8 Å from the protein surface. Binding site centroids are then calculated as grid points sampled at approximately 3 Å intervals. Each centroid is assigned a radius of about 4 Å, and each binding site is represented by a set of overlapping centroids with radiuses that closely follow its contours.
Primary binding sites are those with rank equal to 1 and typically correspond to a main binding site in a protein. Secondary binding sites are those with rank>1 according to our binding sites prioritization score. Ligand binding to an secondary site that is also an allosteric site can lead to a conformational change within the orthosteric binding site, thus modulating the protein’s activity.[5,6] As such, secondary sites are important in proteins as they often serve as natural control loops, such as feedback from downstream products of enzymes, while also being crucial in cell signaling. The secondary binding sites in our database can readily be used in the identification of previously unknown binding locations and subsequently the design of drugs acting on previously un-targeted binding sites, potentially resulting in drugs exhibiting novel and unique scaffolds, while still acting on the same target protein as the existing drugs that target orthosteric sites.
The criterion for assuming that a ligand of a protein can be transposed into a binding site on a query protein is the similarity between the binding site of the originating protein and the binding site of the query protein. Ligands are transposed from similar proteins if they have binding sites that are sufficiently similar to the binding site(s) on the query protein. Sufficiency for transposition is determined separately for each ligand type using a Z-score metric.
Cofactor ligands and cofactor binding sites are identified based on the list of known cofactors extended with a few more. The coordinate file for each cofactor are obtained from the PubChem database, and, basically, all PDB ligands that are very similar to the cofactors in this list, are considered cofactors themselves. The cofactors that we consider are the following:
This is the list of known monosaccharides that are used to determine if a PDB ligand is a part of a glycan.
Biologically relevant metal ions and conserved water molecules are determined by counting the number of times an ion is found in similar binding sites at the same or a similar position according to the methodology described in the ProBiS H2O approach. Biologically relevant ions are identified based on the candidate ions (see Figure 3 in the accompanying paper) and an additional filter which is used to determine that they belong to clusters of at least 10 members. Those ions that do not meet both criteria are considered artifacts and classified as buffer. Similarly, water is labeled as conserved water if it belongs to clusters with >10 members, otherwise it is considered to bind nonspecifically.
Z-scores are assigned by the ProBiS-ligands approach to each pairwise protein superimposition and measure the local structural similarity of the superimposed protein patches, where higher Z-scores indicate higher structural similarity of the compared binding sites.[4] For compounds, cofactors, glycans, and water molecules the superimpositions with Z-score ≥ 2.5 are used, while for metal ions this threshold is set to ≥ 2.0. Further, three different cases are distinguished for transposition: if a ligand originates from a non-representative protein within the same sequence cluster (Step 1, see our paper) as the query protein chain, then the rotational-translational matrix obtained in Step 2 is applied to the ligand’s coordinates to transpose them into the coordinate frame of the query protein chain; if the ligand originates from a representative protein of another cluster, then the rotational-translational matrix obtained in Step 3 is used; finally, if the ligand is from a non-representative protein from another sequence cluster, then both the corresponding matrices from Step 2 and Step 3 are applied to the ligand’s coordinates to transpose the ligand into the binding site of the query protein.
This is an updated and extended list of non-specific binders given as PDB Chemical IDs based on the list of non-specific binders available here.
12P, 144, 15P, 16D, 16P, 1BO, 1PE, 1PG, 1PS, ACA, ACE, ACN, ACT, ACY, AE3, AE4, AGC, AZI, B3P, B7G, BCN, BE7, BEN, BEQ, BEZ, BGC, BMA, BNG, BOG, BTB, BTC, BU1, BU2, BU3, C10, C15, C8E, CAC, CBM, CBX, CCN, CE1, CIT, CM, CM5, CN, CPS, CRY, CXE, CYN, CYS, D10, DDQ, DHD, DIA, DIO, DMF, DMS, DMU, DMX, DOD, DOX, DPR, DR6, DTT, DXE, DXG, EDO, EEE, EGL, EOH, EPE, ETE, ETF, FCL, FCY, FMT, FRU, GBL, GCD, GLC, GLO, GLY, GOL, GPX, HEZ, HTG, HTO, ICI, ICT, IDT, IOH, IPA, IPH, JEF, LAK, LAT, LBT, LDA, LMT, M2M, MA4, MAN, ME2, MES, MG8, MHA, MLI, MOH, MPD, MPO, MRD, MRY, MTL, MXE, N8E, NDG, NH4, NHE, NO3, O4B, OTE, P15, P33, P3G, P4C, P4G, P6G, PDO, PE3, PE4, PE5, PE7, PE8, PEG, PEU, PG0, PG4, PG5, PG6, PGE, PGF, PGO, PGQ, PGR, PIG, PIN, PO4, POL, SAL, SBT, SCN, SDS, SO4, SOR, SPD, SPK, SPM, SUC, SUL, SYL, TAR, TAU, TBU, TEP, TLA, TMA, TOE, TRE, TRS, TRT, UMQ, UNK, URE, VO4, XPE, XYP, AL, CS, BR, CL, F, IOD, PB, LI, HG, K, RB, AG, NA, SR, YT3, Y1, XE