Fandom

File Formats Wiki

Simplified molecular input line entry specification

261pages on
this wiki
Add New Page
Talk0 Share
Smallwikipedialogo.png Wikipedia has an article related to:

The simplified molecular input line entry specification or SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules.

Graph-based definitionEdit

In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes. Parentheses are used to indicate points of branching on the tree.

SpecificationEdit

A SMILES string consists of characters (in ASCII) without spaces.

AtomsEdit

Atoms are represented by their element's symbol. For example, mercury is [Hg]. The elements B, C, N, O, P, S, F, Cl, Br and I (the "organic subset") can be typed without brackets when the number of attached hydrogens conforms to the lowest normal valence consistent with explicit bonds. Where it can be inferred, hydrogen atoms may be omitted. For example, hydrogen chloride (HCl) is just Cl; ammonia is just N (the valence of nitrogen is 3, so three hydrogen atoms are inferred).

The hydrogen atom rule only applies to the organic subset without brackets. For comparison, S refers to hydrogen sulfide (H2S, two hydrogen atoms are inferred) while [S] refers to elemental sulfur (S).

Aromatic atomsEdit

Atoms in aromatic rings are specified in lowercase. Example:

  • n1ccccc1 - pyridine

ChargesEdit

Within brackets, any attached hydrogen atoms and formal charges must always be specified. The number of attached hydrogen atoms is shown by the symbol H followed by an optional digit. Similarly, a formal charge is shown by one of the symbols + or -, followed by an optional digit. If unspecified, charge is assumed to be zero. Multiple + or - signs are synonymous with the same sign followed by the charge, for example, [Fe++] is also [Fe+2]. Examples:

  • [H+] - proton (H+)
  • [NH4+] - ammonium (NH4+)
  • [C#N-] - cyanide (CN-)

BondsEdit

Single bonds, double bonds, triple bonds, and aromatic bonds are represented by -, =, #, and :, respectively. Adjacent atoms (in the string) are always assumed to be single or aromatic bonded. Example:

  • O=C=O - carbon dioxide (CO2)

BranchesEdit

Branches from any atom in the sequence may be specified using parentheses, and may be nested. Examples:

  • CC(=O)O - acetic acid (CH3COOH)
  • CC(O)C - isopropyl alcohol (2-propanol)
  • CC(C)C(=O)O - isobutyric acid

Cyclic structuresEdit

Cyclic "bonds" may be specified by replacing the bond with a reference to the concerned atoms. Example:

  • C1CCCCC1 - cyclohexane. This associates a string of six carbon atoms with the first atom (numbered 1) and the sixth atom (also numbered 1) bonded together.

Multiple bonds may be assigned to a single atom. For example, C12 means that the carbon atom is assigned to a bond number 1 and another bond number 2 (not bond number 12).

Bond numbers can be reused after the "second" atom with the number is typed. This reduces the number of ring closures beyond 10. Should this happen, a percent sign (%) must precede the number. For example, C%12 is a carbon atom with bond number 12.

Disconnected structuresEdit

Disconnected compounds are separated by a dot (.).

IsotopesEdit

Isotopes can be specified by prefixing with the isotope's atomic mass. Example:

  • [12C] - carbon-12

StereochemistryEdit

Configuration around double bonds is specified using the characters "/" and "\". For example, F/CC/F is trans-difluoroethene, while F/CC\F is cis-difluoroethene.

ReferencesEdit

This page uses CC-BY-SA content from Wikipedia (authors). Smallwikipedialogo.png

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.