Phylogenetic tree
About phylogenetic trees
A phylogenetic tree (also called a phylogram or cladogram) shows the inferred evolutionary history of a group of species, genes, or sequences. Internal nodes represent hypothetical common ancestors; tips represent observed taxa; branch lengths encode evolutionary distance or divergence time. Evolutionary biologists, molecular ecologists, and clinical microbiologists use phylogenetic trees to reconstruct the history of life, track pathogen outbreaks, and understand how gene families evolved.
Schematex accepts trees in Newick format — the universal interchange standard used by PAUP*, IQ-TREE, RAxML, BEAST, and virtually every phylogenetics program — extended with NHX annotations for bootstrap values and clade metadata. An indentation-based DSL is also supported for hand-authored trees. This page documents what the parser accepts today.
1. Your first phylogenetic tree
The smallest useful tree: four taxa, two clades.
Three rules cover 80% of usage:
- Start with
phylo, optionally followed by a quoted title and bracket props. - Provide the tree topology in
newick:format — the standard Newick string, quoted, on one line. The trailing;is optional. - Optionally define clade highlight groups and a scale label below the newick line.
Comments must start with
#on their own line. Inline trailing comments are not supported.
2. Input formats
2.1 Newick format
Newick is the primary input. The full grammar is:
(A,B,(C,D)); # topology only
(A:0.1,B:0.2,(C:0.3,D:0.4):0.5); # with branch lengths
((A:0.1,B:0.2):0.05[&&NHX:B=98],(C,D):0.08); # NHX bootstrap
('Homo sapiens':0.1,'Mus musculus':0.2); # quoted names with spacesBranch lengths follow the node name after a colon. Internal node support values can appear as plain brackets [95] or as NHX [&&NHX:B=95].
Newick rules the parser accepts:
| Feature | Syntax | Notes |
|---|---|---|
| Leaf name | A, Homo_sapiens | No spaces — use _ or quote |
| Quoted leaf name | 'Homo sapiens' | Single quotes; '' is a literal quote inside |
| Branch length | :0.035 after name | Float; optional |
| Internal node name | (A,B)ancestor | After closing ) |
| Bootstrap (plain) | (A,B)[95] | Integer or float in brackets |
| Bootstrap (NHX) | (A,B)[&&NHX:B=95] | B= field; other NHX fields stored but not rendered |
| Semicolon | ; at end | Optional — parser strips it |
| Polytomy | (A,B,C) | More than 2 children |
2.2 Indent DSL
For hand-written or small trees, Schematex offers an indentation-based alternative that is easier to read and edit than raw Newick:
Indent DSL rules:
| Syntax | Meaning |
|---|---|
Name: length | Leaf node with branch length |
: length | Unnamed internal node with branch length |
Name | Leaf node, no branch length (cladogram) |
Name [N] | Node with support value N |
| Deeper indent | Child of the node above at a shallower indent |
# line | Comment, ignored |
The first line that ends with : and has no spaces triggers indent-tree mode (e.g. root:). The name before the colon becomes the root label; all indented lines below become its children.
3. Layout
Set the layout in the header brackets: phylo "Title" [layout: rectangular].
| Layout | Value | Description |
|---|---|---|
| Rectangular | rectangular | Default. L-shaped branches; root on left, tips on right |
| Slanted | slanted | Diagonal lines from parent to child; more compact |
| Circular | circular | Root at center, tips around the circumference |
| Unrooted | unrooted | Equal-angle radial; emphasizes distance, not ancestry |
[unrooted] as a bare flag is equivalent to [layout: unrooted].
Circular — root at center, tips fanning outward. Most visually striking for many-taxa trees with clade highlights.
Rectangular — L-shaped branches; root on the left, tips on the right. The classic phylogram form for published figures.
Slanted — diagonal lines from parent to child; more compact than rectangular, same left-to-right reading direction.
Unrooted — equal-angle radial layout; de-emphasizes the root, emphasizes pairwise distance between all taxa.
4. Mode
Set with [mode: …] in the header (or in a style [mode: …] line).
| Mode | Value | Branch length meaning |
|---|---|---|
| Phylogram | phylogram | Default. Proportional to evolutionary distance (substitutions/site) |
| Cladogram | cladogram | Ignored — tips align; only topology matters |
| Chronogram | chronogram | Proportional to divergence time; all tips align to "present" |
| Dendrogram | dendrogram | Branch length is merge height — the distance at which two clusters join |
Chronogram requires branch lengths in units of time plus [mrsd: "YYYY"] (most-recent sampling date) in the header so the renderer can align tips to present.
phylo "SARS-CoV-2 variants" [mode: chronogram, mrsd: "2023"]
newick: "((Alpha:0.5,Delta:0.4):0.3,Omicron:0.8);"
scale "years"Dendrogram — the standard output of hierarchical agglomerative clustering, not evolution. Each internal node is placed at its merge height (the cophenetic distance at which its two child clusters fuse), all leaves align at a common baseline, and the branches are rectangular elbow connectors. A height axis is drawn so you can read off the distance at which any two leaves first share a cluster. Reach for this mode when the same Newick/indent tree describes a clustering result — gene-expression clusters, sample similarity, survey-response groups — rather than a phylogeny.
Add a cut <value> line to slice the tree at a chosen height: every subtree whose merge height falls below the threshold becomes one flat cluster, each colored distinctly, and a dashed threshold line is drawn across the tree at that height. This is the dendrogram equivalent of fcluster in scipy — turning a continuous tree into a discrete set of groups.
Omit cut to show the bare dendrogram with no flat-cluster coloring:
5. Clade highlighting
A clade line marks a monophyletic group with a color, an optional label, and an optional highlight mode.
clade ID = (member1, member2, ...) [color: "#hex", label: "text", highlight: mode]| Prop | Values | Effect |
|---|---|---|
color: | hex string e.g. "#1E88E5" | Branch and/or background color |
label: | quoted string | Clade label shown at right margin |
highlight: | branch, background, both | branch colors lines; background shades the region; both does both |
Members are tip (leaf) IDs from the Newick string. The renderer computes the MRCA of the listed tips and highlights the entire subtree rooted there.
6. Scale bar and outgroup
Scale bar: scale "label" — adds a bar at the bottom. The label describes the unit (e.g. "substitutions/site", "Mya"). Omit for cladogram mode where branch lengths have no meaning.
Outgroup: outgroup: taxonId — records the outgroup for documentation; the renderer may use it to visually mark the outgroup taxon.
phylo "Vertebrates"
newick: "((Human:0.1,Chimp:0.08):0.03,Lamprey:0.8);"
outgroup: Lamprey
scale "substitutions/site"7. Header props reference
All options go inside […] on the phylo header line, or in a style […] line anywhere in the body.
| Prop | Values | Default | Effect |
|---|---|---|---|
layout: | rectangular, slanted, circular, unrooted | rectangular | Tree layout |
mode: | phylogram, cladogram, chronogram, dendrogram | phylogram | Branch length semantics |
unrooted | (flag) | — | Equivalent to layout: unrooted |
branch-width: | number | 1.5 | Stroke width of branches |
openAngle: | number (degrees) | 0 | Fan gap for circular layout (0 = full 360°) |
mrsd: | quoted year string | — | Most-recent sampling date for chronograms |
8. Labels & comments
- Title:
phylo "Tree of Life"— first line only. - Scale label:
scale "substitutions/site"— one per document. - Clade label:
[label: "Primates"]inside acladeline. - Comments:
#at the start of a line (after leading whitespace). Inline trailing comments are not supported.
9. Common mistakes
| You wrote | Parser says | Fix |
|---|---|---|
newick: (A,B,C); (unquoted) | PhyloParseError: Phylo document must start with 'phylo' | Quote the Newick string: newick: "(A,B,C);" |
Tip name with a space: Homo sapiens:0.1 | Parsed as Homo — space terminates an unquoted name | Use underscore (Homo_sapiens) or single-quote ('Homo sapiens') |
Leaf ID in clade doesn't match Newick name | Clade silently has 0 members; no highlight | Copy names exactly as they appear in the Newick string |
clade X = (A, B) with no newick: or indent tree | PhyloParseError: No tree definition found | Add a newick: line or an indent tree block |
mode: chronogram with no branch lengths | Renderer treats all lengths as 0; tips overlap at root | Add :length to every edge in the Newick string |
root: line not detected | If the root: line has a space in the name (e.g. My root:) the indent tree is not triggered | Use a single-word root label or root: |
Newick with internal node names: (A,B)ancestor:0.5 | Parses fine — ancestor is the internal node label | Supported; internal names appear on internal nodes |
10. Grammar (EBNF)
document = header (blank | comment | newick-line | scale-line
| outgroup-line | clade-line | style-line | cut-line | indent-line)*
header = "phylo" ( WS quoted-string )? ( WS "[" props "]" )? NEWLINE
quoted-string = '"' any-char-but-quote* '"'
newick-line = "newick:" WS quoted-newick NEWLINE
scale-line = "scale" ( WS quoted-string )? NEWLINE
outgroup-line = "outgroup:" WS id NEWLINE
cut-line = "cut" WS number NEWLINE // dendrogram mode: flat-cluster threshold height
clade-line = "clade" WS id WS "=" WS "(" id ("," id)* ")"
( WS "[" clade-props "]" )? NEWLINE
style-line = "style" WS "[" props "]" NEWLINE
// Indent tree — triggered by a line ending in ":" with no spaces
indent-tree = root-line indent-node*
root-line = id ":" NEWLINE
indent-node = INDENT ( id ":" length | ":" length | id ) ( WS "[" number "]" )? NEWLINE
props = prop ("," prop)*
prop = "layout:" layout-value
| "mode:" mode-value
| "unrooted"
| "branch-width:" number
| "openAngle:" number
| "mrsd:" quoted-string
clade-props = clade-prop ("," clade-prop)*
clade-prop = "color:" quoted-string
| "label:" quoted-string
| "highlight:" ( "branch" | "background" | "both" )
layout-value = "rectangular" | "slanted" | "circular" | "unrooted"
mode-value = "phylogram" | "cladogram" | "chronogram" | "dendrogram"
// Newick grammar (embedded, parsed separately)
newick = subtree ";"?
subtree = leaf | internal
internal = "(" subtree ("," subtree)* ")" name? nhx? length?
leaf = name nhx? length?
name = unquoted-name | "'" single-quoted "'")
length = ":" number
nhx = "[" number "]" // plain bootstrap
| "[&&NHX:" nhx-pair (":" nhx-pair)* "]"
nhx-pair = key "=" value
id = [a-zA-Z] [a-zA-Z0-9_-]*
number = /[+-]?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?/
comment = INDENT "#" any NEWLINEAuthoritative source: src/diagrams/phylo/parser.ts. If this diverges from the parser, the parser wins — please open an issue.
11. Standard compliance
Schematex phylogenetic trees follow the Newick format specification (as documented in the PHYLIP package) for the core tree serialization, and the NHX (New Hampshire Extended) convention for bootstrap support values. The B= field in NHX brackets is the only NHX field rendered visually today; all other fields are parsed and stored but not displayed.
What is implemented today:
- ✅ Newick topology, branch lengths, quoted names, polytomies
- ✅ Bootstrap values — plain
[95]and NHX[&&NHX:B=95] - ✅ Rectangular, slanted, circular, and unrooted layouts
- ✅ Phylogram, cladogram, and chronogram modes
- ✅ Clade highlighting (branch color, background shading, both)
- ✅ Scale bar
- ✅ Indent DSL alternative
- ⏳ Multi-tree documents (forest) — see §12
- ⏳ Time-calibrated axis for chronograms (geological scale)
- ⏳ Per-tip icons or images
- ⏳ NHX fields beyond
B=(species, taxonomy, duplication events)
References:
- Felsenstein, J. (1986). The Newick tree format. PHYLIP documentation.
- Zmasek, C.M. & Eddy, S.R. (2001). ATV: Display and manipulation of annotated phylogenetic trees. Bioinformatics, 17(4), 383–384. (NHX specification)
- Felsenstein, J. (2004). Inferring Phylogenies. Sinauer Associates.
12. Related examples
13. Roadmap
Planned — not yet parseable. Do not use these in generated DSL today; the parser will reject or ignore them.
- Multi-tree documents — a
phylofile with more than onenewick:block (e.g. gene trees vs species tree). - Geological time axis for chronograms — an epoch-labelled X axis (Cenozoic / Mesozoic etc.) instead of a plain numeric scale.
- Per-tip metadata — attaching traits or colored markers to individual tips without declaring a full clade (e.g.
tip Ecoli [color: "#F00", shape: star]). - NHX fields beyond bootstrap — rendering species (
S=), duplication (D=), and transfer events (Tr=) as branch symbols. - Tanglegram — two trees displayed side by side with connecting lines linking corresponding tips.
Track in the GitHub issues if you need any of these sooner.