The chemical language CurlySMILES provides a format to map a molecular sketch, in which molecular-graph-based and boxed parts are present, into a linear notation. A typical boxdyl consists of a metaterm, categorizing the structural part or defining its role. This metaterm is framed by a rectangular, oval or otherwise shaped line. In a CurlySMILES notation the term is assigned as a value to the annotation dictionary key annotation dictionary key box, for example:
This example illustrates the encoding of the modified growth hormone ARX201, as shown in the sketch. The framed term Growth hormone is the boxdyl. The structure of the potential drug ARX201 can formally be divided into three parts: (1) the growth hormone and (2) an oxime-functionalized derivative of an unnatural amino acid to which (3) a polyethylene glycol (PEG) chain is attached. The latter two parts take center stage here, since they are critical to the functionality of the drug, which would fail if PEG had to be attached to another amino acid and there interfer with the hormone's normal activity. To capture this concept, the CurlySMILES notation encodes the latter two parts—an alkoxyamine-functionalized PEG conjugated with the acetyl group of the unnatural amino acid p-acetylphenylalanine—in detail, while collapsing the complex growth hormon part into a boxdyl.
Keywords: cheminformatics, supramolecular drawings, linear notation, molecular graph, super-concepts, metaterms