An atomic node code
(ANC)
encodes a single node in the
hydrogen-supressed molecular
graph. A node consists of
one non-hydrogen atom and its adjacent hydrogen atoms.
The SMILES and CurlySMILES languages use the same format:
A node is represented either by the bare atomic symbol or the
square bracket
atomic code (SQC).
SQC encoding is the default. Only symbols of elements
that belong to the so-called
organic subset
may be written
without brackets if the number of attached hydrogens
conforms to the lowest normal valence consistent with
explicit bonds. Here is the organic subset:
B ,
C ,
N ,
O ,
P ,
S ,
F ,
Cl ,
Br , and
I .
In the absence of brackets the attached hydrogens are
implied. For example, the notations C and
P represent methane and phosphine,
respectively. Their corresponding SQC-based notations are
[CH4] and [PH3] .
Silane and arsine, in contrast, always have to be
encoded as [SiH4] and
[AsH3] , since Si and As do not belong to
the organic subset. Trichlorosilane can be encoded as
Cl[SiH](Cl)Cl , using
the SQC only when an atom does not belong to the organic subset.
The notation [Cl][SiH]([Cl])[Cl] ,
however, is equally valid.
Isotopically labelled atoms and
formally charged atoms are required
in SQC notation.
In a CurlySMILES notation,
an ANC may be followed by an
atom-anchored annotation
(AAA)
such as a stereodescriptor,
structural unit annotation,
group environment annotation,
molecular detail annotation
and
operational annotation.
|
References
[1] |
D. Weininger:
SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules
.
J. Chem. Inf. Comput. Sci.
1988,
28, 31-36;
doi:
10.1021/ci00057a005
.
|
[2] |
A. Drefahl:
CurlySMILES: a chemical language
to customize and annotate encodings of molecular and
nanodevice structures
.
J. Cheminf.
2011, 3:1;
doi:
10.1186/1758-2946-3-1
.
|
|
|