A square-bracket atomic code
(SQC)
is a special type of an atomic node
code (ANC).
In a SMILES and
CurlySMILES notation an SQC encodes
a node of the hydrogen-supressed molecular
graph. SQC is mandatory for any non-hydrogen atom that does not
belong to the
organic subset
and that has a hydrogen count which differs from the implicit hydrogen
attachment assuming that hydrogen atoms make up the remainder of an
atom's lowest normal valence, consistent
with explicit bond specification [1]: 3, 4, 3, 2, 1 for B, C, N, O,
and the halogen atoms, respectively, 3 or 5 for phosphorus and 2, 4, or
6 for aliphatic sulfur atoms. In the following examples the germanium
atom, not belonging to the organic subset, and the aromatic nitrogen atom
with a special H-count require SQC encoding:
CC[GeH2]CC |
Ge node in diethylgermane |
[nH]1cccc1 |
N node in aromatic 1H-pyrrole |
Isotopically labelled atoms and
nodes with formal charge specification
are always SQC-endoded:
C=[N+]=[N-] |
formally charged N atoms in diazomethane |
CCOC(=[17O])[17O]C |
O-ethyl-17O-methyl[17O2]carbonate |
In CurlySMILES, atomic-wildcard nodes and atoms with
an incident quadruple or unspecified bond have to be SQC-encoded [2]:
CC[*H2]CC |
C, Si, and Ge will match, for example |
[Cl-][Re](Cl)(Cl)(Cl)$[Re](Cl)(Cl)(Cl)[Cl-] |
[Re2Cl8]2- anion with Re-Re quadruple bond |
[NH3]~[B](F)(F)F |
adduct of ammonia and boron trifluoride |
Note that
[Re+3]{+Lc=[Cl-]{4}}$[Re+3]{+Lc=[Cl-]{4}}
is an alternate encoding of [Re2Cl8]2-,
which correctly represents the topological equivalence of the eight Cl
atoms using the CurlySMILES' OPAM
annotation to encode the Cl anions as ligands of a —in
this case— dinuclear cluster.
|
References
[1] |
D. Weininger:
SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules
.
J. Chem. Inf. Comput. Sci.
1988,
28, 31-36;
doi:
10.1021/ci00057a005
.
|
[2] |
A. Drefahl:
CurlySMILES: a chemical language
to customize and annotate encodings of molecular and
nanodevice structures
.
J. Cheminf.
2011, 3:1;
doi:
10.1186/1758-2946-3-1
.
|
|
|