Monday, February 28, 2011

When birds are called boids

When a large number of birds assemble in flight and synchronize their movements, they together display flock behavior. Similar dynamics can be observed in the shape and movement of a herd of land animals, a school of fish or a swarm of insects. Going beyond living agents, solid particles and liquid droplets may exhibit swarm behavior as a cloud. Therefore, it is not surprising that computer simulations of such systems are performed and studied to gain some insight in how such large groups of individual actors manage to orchestrate precise aggregate motion [1,2].

Craig Reynolds wrote a computer program to simulate and graphically display “flocking objects.” Inspired by bird flocks, he coined the term boid to generically denote such objects. In a footnote he explains that boid means bird-like object, derived from “bird-oid” [1]. And Brian Hayes gives us an update on what is going in boidland [2].

Should the flocks, herds, schools and swarms exceed our computational resources, then cloud computing is waiting at the horizon for distributed boid simulation.

Reference
[1] Craig W. Reynolds: Flocks, Herds, and Schools: A Distributed Behavioral Model. Computer Graphics 1987, 21 (4), pp. 25-34. PDF.

[2] Brian Hayes: Flights of Fancy. American Scientist January-February 2011, 99 (1), pp. 10-14.

Sunday, February 27, 2011

The term “dark matter“ in biophysics and molecular biology

The term dark matter is used as a nickname for something we do not directly see, but infer from some observable effects. Best known is dark matter in astrophysics and cosmology, where it is hypothesized so it can keep results from measurements in harmony with current theory. Too far away, anyway? What about dark matter at the center of each cell in your human body and other evolving creatures? In a recent article on the dynamics and “densification” of chromatin, the complex of DNA and proteins (including histone) that form chromosomes, Gregory Babbitt refers to the dark matter of the human genome: the vast areas of noncoding regulatory DNA.

DNA contains blueprints (genetic instructions encoded in nucleotide sequences) for constructing other parts of the cell. Recent research suggests that the sequence of DNA also contains the information on how to fold and package itself into chromatin. A technique called chromosome conformation capture helps to shine some light into those “dark corners” of the DNA. Results provide evidence that the chromosome does not assemble itself as a disordered spaghetti globule, but as a globule mathematically organized in subglobules of subsubglobules. Evolution (of chromatin, at least) is becoming less a matter of darkness and more a subject of topology.

Reference
Gregory A. Babbitt: Chromatin Evolving. American Scientist January-February 2011, 99 (1), pp. 48-55.

Tuesday, February 22, 2011

Dropping the hyphen from the adjective “radio-active”

“To hyphenate or not to hyphenate” [1] is a frequently occurring question in physics and chemistry. For example the term band gap can be found in the scientific literature printed in one word, bandgap, or in hyphenated form as band-gap.

What about the term radio-active? It first occurred in Pierre and Marie Curie's publication on July 18, 1898, in the Comptes Rendus de l'Academie des Sciences with the title “On a new radio-active substance contained in pitchblende,” announcing the isolation (discovery) of the element polonium. The Curies dropped the hyphen the following year [2]. Although the hyphenated form still appeared for some time, for example in treatises by Frederick Soddy and Ernest Rutherford, the non-hyphenated words radioactive and radioactivity eventually became standard forms.

If you happen to use a hyphen within these terms today, you should not be surprised when people assume that you are talking about something else, such as a radio station or electronic music: the German band Kraftwerk released a concept album with the title “Radio-Activity” (Radio-Aktivität in German) in 1975 [3].

References
[1] Karen Rieser: To Hyphenate or Not to Hyphenate. May 11, 2007 [http://www.suite101.com/content/to-hyphenate-or-not-to-hyphenate-a21048].
[2] Jean-Pierre Adloff: A Short History of Polonium and Radium. Chemistry International January-February 2011, 33 (1), pp. 20-23 [http://www.iupac.org/publications/ci/2011/3301/5_adloff.html].
[3]. Jason Ankeny: Radio-Activity, Review [http://www.allmusic.com/album/r11205].

Monday, February 21, 2011

An acronym in cheminformatics: OPAM for operational annotation marker

OPAM stands for operational annotation marker. An OPAM is used in CurlySMILES notations to encode coordination compounds and molecules with structural repeat units. Further, compound classes and diverse sets of molecules are encodable by applying the OPAM format. The adjective operational means formally constructable or generic. For instance, a macromolecule (polymer) is encoded by a single repeating monomer unit, annotated in such a way that a human reader or a machine is informed of how to connect monomer units together to build the macromolecular chain. The following example shows a CurlySMILES notation for poly(dimethylsiloxane) (PDMS), [-Si(CH3)2O-]n:

[Si]{-}(C)(C)O{+n}

The OPAM +n instructs to repeat the monomer unit n times by connecting the O-atom of one unit to the open bond at the Si atom of the next unit. Another illustration has been given with poly(3-hexylthiophene). Ring molecules that are based on a structural repeat unit, can be encoded with OPAM +r. Metal-ligand systems such as complexes with ambidentate ligands can be encoded with OPAM +L. Sets of molecules can be encoded with OPAM +R, +X and +Y, for example, classes of alkyl-substituted compounds with +R.

References
[1] Axel Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. Journal of Cheminformatics 2011, 3:1.
DOI: 10.1186/1758-2946-3-1.
[2] Axel Drefahl: CurlySMILES: molecular detail annotation [www.axeleratio.com/csm/proj/operatann.htm].

Saturday, February 19, 2011

An acronym in cheminformatics: MDAM for molecular detail annotation marker

MDAM stands for molecular detail annotation marker. An MDAM is used in CurlySMILES notations to encode atomic, substructural and molecular details. An MDAM is a pair of two characters. The first character is an exclamation mark indicating that attention is drawn to some special molecular feature. The MDAM !r, for example, indicates an annotation with detailed information for a particular ring in the molecular graph, as illustrated with the CurlySMILES notation for the fluorotropylium ion ([C7H6F]+):

Fc1cccccc1{!re=+}

The MDAM annotation is anchored at the right-most atomic ring node of the SMILES notation. The annotation dictionary entry e=+ specifies the formal ring charge for this heptagonally planar, aromatic cation with a delocalized charge. Notice that the rules of the SMILES language require a formal charge to be localized at an atomic node. In this case, four different notations—distinctive by formal charge placement at C-atom 1, 2, 3, or 4—are possible, which even remain distinguishable after applying the CANGEN algorithm, typically used to derive a unique notation when a single chemical species is considered. The MDAM !r annotation of the CurlySMILES language provides a way to represent the one species in mind (here, the one with the delocalized ring charge); unless the goal is to intentionally represent distinguishable resonance structures individually.

In addition to the MDAM !r, the current version of CurlySMILES includes the markers !a, !p, !m, !H, and !I to encode details of an atom, pair of atoms, multiplet of atoms, hydrogen-bonding atom and an otherwise (for example, van der Waals) interacting atom, respectively.

References
[1] Axel Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. Journal of Cheminformatics 2011, 3:1.
DOI: 10.1186/1758-2946-3-1.
[2] Axel Drefahl: CurlySMILES: molecular detail annotation [www.axeleratio.com/csm/proj/moldetailann.htm].

Friday, February 18, 2011

An acronym in cheminformatics: GEAM for group environment annotation marker

GEAM stands for group environment annotation marker. In chemistry and molecular graph theory the word group means subgraph or substructure. A group is a formal part of a molecule— a formal building block. In those cases, where it coincidences with an observable chemical (sub)species, it is typically called a fragment.

The word group is best known from its occurrence in the term group contribution method or group contribution model (GCM), referring to the concept of group additivity that rationalizes certain molecular properties as an additive function of group contributions (incremental values associated with particular groups).

GCMs typically define their own notations to specify groups. The chemical language CurlySMILES provides an independent approach to encode groups: a substructure is encoded like the structure of a whole molecule in SMILES. Then, each substructure-defining open bond is encoded as an annotation to the atomic node at which the open bond occurs. For example, the
cyclobutanecarbonyl group of corresponding acyl halides can be encoded in CurlySMILES as follows:

C1CCC1C{-X}=O

The GEAM -X indicates the possible group environment: a halide atom (F, Cl, Br or I). To indicate an environment of alkyl instead of halide groups, one would use the notation C1CCC1C{-R}=O for the cyclobutylcarbonyl group. Restriction of the environment to alkyl groups with 4 to 7 C atoms is done with the notation C1CCC1C{-Rn=4-7}=O. Further examples of group environment notations, including terminal groups as well as multivalent groups are available.

References
[1] Axel Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. Journal of Cheminformatics 2011, 3:1.
DOI: 10.1186/1758-2946-3-1.
[2] Axel Drefahl: CurlySMILES: group environment annotation [www.axeleratio.com/csm/proj/grpenvann.htm].

Wednesday, February 16, 2011

An acronym in cheminformatics: MIAM for miscellaneous interest annotation marker

MIAM stands for miscellaneous interest annotation marker. A MIAM is a pair of two characters that mark the beginning of a component annotation in a CurlySMILES notation [1,2].
The MIAM format supports encoding of chemical compounds and materials by including diverse attributes and structural details or modifications. The following notation demonstrates encoding of a neodymium-doped material, which, for example, is used as a quantum memory crystal [3]:

{*Y2SiO5}{cr}{IMa=Nd}

The component code {*Y2SiO5} includes the SFN (stoichiometric formula notation) for Y2SiO5. The first annotation, {cr}, specifies the material as a crystal via the SSAM cr. The second annotation starts with the MIAM IM for impurity, followed by the annotation dictionary entry representing the atomic symbol, Nd, of the impurity (dopant).

References
[1] Axel Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. Journal of Cheminformatics 2011, 3:1.
DOI: 10.1186/1758-2946-3-1.
[2] Axel Drefahl: CurlySMILES: miscellaneous interest annotations [www.axeleratio.com/csm/proj/miscelinterann.htm].
[3] C. Clausen, I. Usmani, F. Bussières, N. Sangouard, M. Afzelius, H. de Riedmatten and N. Gisin: Quantum storage of photonic entanglement in a crystal. Nature 2011, 469 (7331), pp. 508-511. DOI: 10.1038/nature09662.

An acronym in cheminformatics: SSAM for state and shape annotation marker

SSAM stands for state and shape annotation marker. AN SSAM is a pair of two lowercase letters that mark the beginning of a component annotation in a CurlySMILES notation [1,2].
SSAM-annotations specify chemical compounds and complex materials by features such as physical state and nanostructure type. The following notation with two SSAM annotations, encoding a two-dimensional grain of a graphene atomic patchwork [3], illustrates the annotation format:

[C]{alall=graphene}{grdim=2}

The SMILES notation [C] represents carbon in SQC. The first annotation starts with SSAM al defining an atomic layer followed by the annotation dictionary entry which specifies the carbon allotrope graphene. The second annotation describes the 2D-grain, using gr as SSAM.

References
[1] Axel Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. Journal of Cheminformatics 2011, 3:1.
DOI: 10.1186/1758-2946-3-1.
[2] Axel Drefahl: CurlySMILES: state and shape annotations [www.axeleratio.com/csm/proj/stateshapeann.htm].
[3]
P. Y. Huang, C. S. Ruiz-Vargas, A. M. van der Zande, W. S. Whitney, M. P. Levendorf, J. W. Keyek, S. Garg, J. S. Alden, C. J. Hustedt, Ye Zhu, J. Park, P. L. McEuen and D. A. Muller: Grains and grain boundaries in sinle-layer graphene atomic patchwork quilts. Nature 2011, 469 (7330), pp. 389-392. DOI: 10.1038/nature09718.

Tuesday, February 15, 2011

An acronym in cheminformatics: CAA for component-anchored annotation

CAA stands for component-anchored annotation. A component is a part of a CurlySMILES notation, separated from other parts via dot. A notation component represents a structural component of a chemical compound or a supramolecular architecture.

The following CAA types are defined in CurlySMILES:
Along with the atom-anchored annotations (AAAs), CAAs provide a rich format to represent chemical structures in a multitude of extra-molecular environments and within various, customer-oriented contexts.

References
[1] Axel Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. Journal of Cheminformatics 2011, 3:1.
DOI: 10.1186/1758-2946-3-1.
[2] Axel Drefahl: The CurlySMILES Project [www.axeleratio.com/csm/proj/main.htm].

An acronym in cheminformatics: AAA for atom-anchored annotation

Triple A: in cheminformatics this letter repetition stands for atom-anchored annotation (AAA). An AAA is an attribute (color) of a node in the hydrogen-suppressed molecular graph. In a CurlySMILES notation, such nodes are encoded as ANCs (SQCs, in particular) and can be followed by one or more AAAs to specify atomic or molecular details or to modify the molecular graph representation. The following AAA types are defined in CurlySMILES:
The open format of the CurlySMILES language allows the addition of further, novel categories. Suggestions are always welcome!

References
[1] Axel Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. Journal of Cheminformatics 2011, 3:1.
DOI: 10.1186/1758-2946-3-1.
[2] Axel Drefahl: The CurlySMILES Project [www.axeleratio.com/csm/proj/main.htm].

An acronym in cheminformatics: SQC for square bracket code

SQC stands for square bracket code; precisely, for square bracket atomic code. AN SQC is a special type of an atomic node code (ANC).

In a SMILES and CurlySMILES notation an SQC encodes a node of the hydrogen-supressed molecular graph. SQC is mandatory for any non-hydrogen atom that does not belong to the organic subset and that has a hydrogen count which differs from the implicit hydrogen attachment assuming that hydrogen atoms make up the remainder of an atom's lowest normal valence, consistent with explicit bond specification: 3, 4, 3, 2, 1 for B, C, N, O, and the halogen atoms, respectively, 3 or 5 for phosphorus and 2, 4, or 6 for aliphatic sulfur atoms. Isotopically labelled atoms and nodes with formal charge specification always have to be SQC-endoded. In CurlySMILES, atomic-wildcard nodes and atoms with an incident quadruple or unspecified bond also have to be SQC-encoded.

References
[1] Axel Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. Journal of Cheminformatics 2011, 3:1.
DOI: 10.1186/1758-2946-3-1.
[2] Axel Drefahl: The CurlySMILES Project [www.axeleratio.com/csm/proj/main.htm].

An acronym in cheminformatics: ANC for atomic node code

An atomic node code (ANC) is part of a SMILES or CurlySMILES notation. The latter are linear notations that encode a molecular structure based on the associated hydrogen-suppressed molecular graph. Such a linear notation is a sequence of ANCs and may additionally contain special symbols to represent bonds and indicate branching as well as ring formation. Notations are further enhanced via CurlySMILES by inserting annotations after particular ANCs to add detailed molecular information or to modify it.

A typical ANC represents a non-hydrogen atom along with the attached hydrogen atoms and can include an isotopical label and/or a formal charge. For example, the notation [13CH4] consists of a single ANC encoding methane-13C.

The CurlySMILES notation SCC{R}(N)C(=O)O for the amino acid (R)-2-amino-3-sulfanylpropanoic acid (L-Cysteine) consists of seven ANCs. The third ANC, representing the asymmetric C-atom of the molecule, is followed by a stereodescriptor annotation to specify the particular enantiomer.

References
[1] Axel Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. Journal of Cheminformatics 2011, 3:1.
DOI: 10.1186/1758-2946-3-1.
[2] Axel Drefahl: The CurlySMILES Project [www.axeleratio.com/csm/proj/main.htm].

Monday, February 7, 2011

An acronym in cheminformatics: SFN for stoichiometric formula notation

SFN stands for stoichiometric formula notation [1-3]. An SFN describes a chemical compound in terms of its composition based on atoms and characteristic groups of atoms (structural subunits). For example, the mineral genthelvite, Zn4(Be3Si3O12)S, can be encoded as Zn4(Be3Si3O12)S. This SFN matches the displayed formula, only excluding its markup. An SFN is a document-neutral linear notation.

The versatility of SFN strings is appreciated when used with the CurlySMILES language, providing a format to specify materials due to phase, nanostructure, doping and other characteristics; for example: {*TiO2}{tfphn=rutile} encodes a titanium dioxide thin film with TiO2 in the rutile phase, {*Fe3O4}{np}{dp} encodes dispersed magnetite nanoparticles, and {*ZnO}{IMa=F} encodes fluorine-doped zinc oxide.

SFNs also occur in composite and nanocomposite notations such as {/{*WS2}/{*WO3}}{np}, encoding tungsten-based WS2/WO3 nanoparticles.

SFNs are further used in combination with SMILES notations to encode functionalized material surfaces, particles and clusters, such as a thiophenolate capped cadmium sulfide nanoparticle: c1ccccc1S{-|c={*CdS}{np}}.

Keywords: chemical materials encoding, empirical formula, molecular formula, structural formula, brutto formula, summary chemical formula, nanostructures, functionalized material surfaces

References
[1] Axel Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. Journal of Cheminformatics 2011, 3:1.
DOI: 10.1186/1758-2946-3-1.
[2] Axel Drefahl: The CurlySMILES Project: Stoichiometric Formula Notation (SFN) [www.axeleratio.com/csm/proj/sfn.htm].

[3] Examples E5, E16, E17, E18 and E20 in additional file of [1] entitle CurlySMILES encoding examples [www.jcheminf.com/content/supplementary/1758-2946-3-1-s1.pdf].

Thursday, February 3, 2011

An abbreviation in cheminformatics: CurlySMILES for curly-braces enhanced smart material input line entry specification

CurlySMILES stands for curly-braces enhanced smart material input line entry specification. This is a chemical language for interlinked, coordinated, assembled and adsorbed molecules in supramolecular structures and diverse nano-scale environments [1-3].

CurlySMILES modifies and extends the well-known chemical line notation system SMILES (notice that the occurrence of the acronym SMILES within the word CurlySMILES is not accidental) and accommodates its own format for non-molecular materials that are commonly denoted by composition of atoms or structural parts rather than complete atom connectivity. Connectivities to extra-molecular entities such as a nanoparticle surface are achieved via annotations, enclosed in curly braces and inserted into a SMILES notation.

Keywords: molecular graphs, supramolecular connectivity, supramolecular data exchange, complex structures, nanostructures, chemical materials encoding

References
[1] Axel Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. Journal of Cheminformatics 2011, 3:1.
DOI: 10.1186/1758-2946-3-1.
[2] Axel Drefahl: The CurlySMILES Project [www.axeleratio.com/csm/proj/main.htm].
[3] Axel Drefahl: CurlySMILES, a chemical language [www.axeleratio.com/csm/proj/language.htm].

A portmanteau in cheminformatics: CANGEN combining CANonicalization and molecular graph GENeration

CANGEN is a two-stage algorithm in computational molecular graph theory, which converts an arbitrarily entered SMILES notation of a chemical structure into a unique one. The first stage is a canonicalization procedure that labels each atomic node of the molecular graph such that a canonical order for the nodes is derived. The second stage, then, generates the unique linear notation of the graph by starting with the lowest labeled atomic node.

A CANGEN-derived notation of a molecular structure is an efficient search key to locate information for the encoded structure in a database or via Internet, while containing and transmitting the structural information along as a key name.

Keywords: molecular graphs, molecular connectivity, molecular data exchange, disambiguation, semantic web

Reference
D. Weininger, A. Weininger and J. L. Weininger: SMILES 2. Algorithm for Generation of Unique SMILES Notation. J. Chem. Inf. Comput. Sci. 1989, 29, 97-101.
DOI: 10.1021/ci00062a008.

An acronym in cheminformatics: SMILES for simplified molecular input line entry system

SMILES stands for simplified molecular input line entry system [1]. SMILES is a chemical notation system with a small chemical grammar for the encoding of molecular structures based on the principles of molecular graph theory.

SMILES is a user-friendly chemical language that allows input of molecular structures in molecular editors, databases, search engines and property estimation tools. For example, ChemSpider [2] accepts SMILES entries (also: nomenclature-based names, registry number or InChI). Looking for aspirin (2-(acetyloxy)benzoic acid)? Type the SMILES notation CC(=O)Oc1ccccc1C(=O)O into the search field. Notice that the six carbon atoms of the aromatic benzene ring have been entered in lower case to identify them as aromatic-ring members. Also, hydrogen atoms have not been explicitly specified, since their occurrence is deduced based on valence rules.

What is “hiding” behind the notation FC12C3(F)C4(F)C1(F)C5(F)C4(F)C3(F)C25F?
Correct, it is perfluorocubane (1,2,3,4,5,6,7,8-octafluorocubane). ChemSpider is finding it either way and it is your choice to type the SMILES notation or a name.

Computers “love” SMILES since they can automatically derive a connection table from the linear notation code, which is essential to draw the assocoated structure or to calculate molecular descriptors. Given a name such as 2-(acetyloxy)benzoic acid would require automatic structure interpretation on a much higher level—not so easy for a computer. And giving aspirin will cause a computer headache, speaking in human terms.

Keywords: molecular graphs, molecular connectivity, molecular data exchange

References
[1] David Weininger: SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31-36.
DOI: 10.1021/ci00057a005.
[2] ChemSpider starting guide: How do you find compounds? [www.chemspider.com/GettingStarted.aspx].