The CurlySMILES Project: Stoichiometric Formula Notation (SFN)

Maintained by Axel Drefahl -
Curly SMILES is a chemical language for the communication of structures based on molecular graphs as well as structures that lack characterization by discrete molecules. In the latter case, an empirical formula is often used in the chemical literature to describe the composition via atomic symbols.
The stoichiometric formula notation (SFN) applies the empirical formula syntax, but does not require alphabetical ordering of the atomic symbols. Symbols are allowed to occur in any order and repeatedly at any location. Further, symbols may be grouped by enclosing them with round braces. An appropriate integer subscript (different from 1) follows the closing round brace to account for the stoichiometry of the group within the denoted compound. The following examples illustrate SFN encoding:

Cu5Zn8 Cu5Zn8, pentacopper octazincide
MnO(OH) MnO(OH), manganite (mineral)
Cu2Pb5(SO4)3(CO3)(OH)6 Cu2Pb5(SO4)3(CO3)(OH)6, caledonite (mineral)

An SFN of an ionic species includes a charge notation (n+) or (n-), where n ≥ 1. The charge notation is placed at the end of the SFN string:

O2Cl2(1+) O2Cl2+, dioxygen dichloride)(1+) cation
HMo6O19(1-) HMo6O19-, hydrogen(nonadecaoxidohexamolybdate)(1-) anion

An isotopical label precedes the atomic symbol and is indicating with a beginning ^ character. Example notations for isotopically labelled molecules are :

^16O2(2-) [16O]22-, labelled dioxidanediide anion
^32P2^16O5 [32P]2[16O]5, labelled phosphorus pentoxide

SFN integration with a CurlySMILES notation. To be a component of a CurlySMILES notation, an SFN is placed between {* and }. An appended annotation may further specify the shape and phase of the targeted material. For example, a thin film of titanium dioxide in the rutile modification is encodes as

{*TiO2}{tfphn=rutile} rutile thin film

CurlySMILES allows the combination of SMILES- and SFN-components:

[K+].[K+].{*Sn5(2-)} dipotassium pentastannide(2-)

An SFN notation can also appear inside an annotation as the value of key sfn. The following notation encodes a pheny group, which is covalently linked to the surface of cadmium selenide (CdSe) material:

c1ccccc1{-|sfn=CdSe} phenyl group grafted to CdSe surface

