Schematex

Phylogenetic tree

About phylogenetic trees

A phylogenetic tree (also called a phylogram or cladogram) shows the inferred evolutionary history of a group of species, genes, or sequences. Internal nodes represent hypothetical common ancestors; tips represent observed taxa; branch lengths encode evolutionary distance or divergence time. Evolutionary biologists, molecular ecologists, and clinical microbiologists use phylogenetic trees to reconstruct the history of life, track pathogen outbreaks, and understand how gene families evolved.

Schematex accepts trees in Newick format — the universal interchange standard used by PAUP*, IQ-TREE, RAxML, BEAST, and virtually every phylogenetics program — extended with NHX annotations for bootstrap values and clade metadata. An indentation-based DSL is also supported for hand-authored trees. This page documents what the parser accepts today.

phylogenetic·§ Newick / NHX
↘ preview
100%
Phylogenetic Tree: Vertebrate Evolution Phylogenetic tree with 12 taxa, phylogram mode, circular layout Vertebrate Evolution Primates Rodents Carnivora Cetacea Human Chimp Gorilla Mouse Rat Dog Cat Tiger Whale Dolphin Salmon Zebrafish 0.2 substitutions/site
UTF-8 · LF · 7 lines · 657 chars✓ parsed·3.2 ms·9.5 KB SVG

1. Your first phylogenetic tree

The smallest useful tree: four taxa, two clades.

phylogenetic·§ Newick / NHX
↘ preview
100%
Phylogenetic Tree: Vertebrates Phylogenetic tree with 4 taxa, phylogram mode, rectangular layout Vertebrates Human Chimp Dog Cat 0.1
UTF-8 · LF · 2 lines · 84 chars✓ parsed·0.4 ms·4.3 KB SVG

Three rules cover 80% of usage:

  1. Start with phylo, optionally followed by a quoted title and bracket props.
  2. Provide the tree topology in newick: format — the standard Newick string, quoted, on one line. The trailing ; is optional.
  3. Optionally define clade highlight groups and a scale label below the newick line.

Comments must start with # on their own line. Inline trailing comments are not supported.


2. Input formats

2.1 Newick format

Newick is the primary input. The full grammar is:

(A,B,(C,D));                         # topology only
(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);    # with branch lengths
((A:0.1,B:0.2):0.05[&&NHX:B=98],(C,D):0.08);  # NHX bootstrap
('Homo sapiens':0.1,'Mus musculus':0.2);        # quoted names with spaces

Branch lengths follow the node name after a colon. Internal node support values can appear as plain brackets [95] or as NHX [&&NHX:B=95].

phylogenetic·§ Newick / NHX
↘ preview
100%
Phylogenetic Tree: Newick examples Phylogenetic tree with 4 taxa, phylogram mode, rectangular layout Newick examples A B C D 98 87 0.1
UTF-8 · LF · 2 lines · 98 chars✓ parsed·0.3 ms·4.6 KB SVG

Newick rules the parser accepts:

FeatureSyntaxNotes
Leaf nameA, Homo_sapiensNo spaces — use _ or quote
Quoted leaf name'Homo sapiens'Single quotes; '' is a literal quote inside
Branch length:0.035 after nameFloat; optional
Internal node name(A,B)ancestorAfter closing )
Bootstrap (plain)(A,B)[95]Integer or float in brackets
Bootstrap (NHX)(A,B)[&&NHX:B=95]B= field; other NHX fields stored but not rendered
Semicolon; at endOptional — parser strips it
Polytomy(A,B,C)More than 2 children

2.2 Indent DSL

For hand-written or small trees, Schematex offers an indentation-based alternative that is easier to read and edit than raw Newick:

phylogenetic·§ Newick / NHX
↘ preview
100%
Phylogenetic Tree: Vertebrates (indent DSL) Phylogenetic tree with 1 taxa, phylogram mode, rectangular layout Vertebrates (indent DSL) root 0.2 substitutions/site
UTF-8 · LF · 9 lines · 145 chars✓ parsed·0.5 ms·3.1 KB SVG

Indent DSL rules:

SyntaxMeaning
Name: lengthLeaf node with branch length
: lengthUnnamed internal node with branch length
NameLeaf node, no branch length (cladogram)
Name [N]Node with support value N
Deeper indentChild of the node above at a shallower indent
# lineComment, ignored

The first line that ends with : and has no spaces triggers indent-tree mode (e.g. root:). The name before the colon becomes the root label; all indented lines below become its children.


3. Layout

Set the layout in the header brackets: phylo "Title" [layout: rectangular].

LayoutValueDescription
RectangularrectangularDefault. L-shaped branches; root on left, tips on right
SlantedslantedDiagonal lines from parent to child; more compact
CircularcircularRoot at center, tips around the circumference
UnrootedunrootedEqual-angle radial; emphasizes distance, not ancestry

[unrooted] as a bare flag is equivalent to [layout: unrooted].

Circular — root at center, tips fanning outward. Most visually striking for many-taxa trees with clade highlights.

phylogenetic·§ Newick / NHX
↘ preview
100%
Phylogenetic Tree: Vertebrates — circular Phylogenetic tree with 8 taxa, phylogram mode, circular layout Vertebrates — circular Primates Carnivora Human Chimp Gorilla Dog Cat Wolf Salmon Zebrafish 0.2
UTF-8 · LF · 4 lines · 356 chars✓ parsed·0.4 ms·6.5 KB SVG

Rectangular — L-shaped branches; root on the left, tips on the right. The classic phylogram form for published figures.

phylogenetic·§ Newick / NHX
↘ preview
100%
Phylogenetic Tree: Bacterial Diversity Phylogenetic tree with 10 taxa, phylogram mode, rectangular layout Bacterial Diversity Ecoli Salmonella Vibrio Bacillus Staph Listeria Myco_tb Myco_leprae Strepto Lactobacillus 98 92 100 0.2 substitutions/site
UTF-8 · LF · 5 lines · 503 chars✓ parsed·0.6 ms·8.4 KB SVG

Slanted — diagonal lines from parent to child; more compact than rectangular, same left-to-right reading direction.

phylogenetic·§ Newick / NHX
↘ preview
100%
Phylogenetic Tree: Vertebrates — slanted Phylogenetic tree with 8 taxa, phylogram mode, slanted layout Vertebrates — slanted Human Chimp Gorilla Dog Cat Wolf Salmon Zebrafish 0.2 substitutions/site
UTF-8 · LF · 5 lines · 347 chars✓ parsed·0.4 ms·5.5 KB SVG

Unrooted — equal-angle radial layout; de-emphasizes the root, emphasizes pairwise distance between all taxa.

phylogenetic·§ Newick / NHX
↘ preview
100%
Phylogenetic Tree: Vertebrates — unrooted Phylogenetic tree with 8 taxa, phylogram mode, rectangular layout Vertebrates — unrooted Human Chimp Gorilla Dog Cat Wolf Salmon Zebrafish 0.2
UTF-8 · LF · 4 lines · 314 chars✓ parsed·0.4 ms·5.7 KB SVG

4. Mode

Set with [mode: …] in the header (or in a style [mode: …] line).

ModeValueBranch length meaning
PhylogramphylogramDefault. Proportional to evolutionary distance (substitutions/site)
CladogramcladogramIgnored — tips align; only topology matters
ChronogramchronogramProportional to divergence time; all tips align to "present"
DendrogramdendrogramBranch length is merge height — the distance at which two clusters join

Chronogram requires branch lengths in units of time plus [mrsd: "YYYY"] (most-recent sampling date) in the header so the renderer can align tips to present.

phylo "SARS-CoV-2 variants" [mode: chronogram, mrsd: "2023"]
  newick: "((Alpha:0.5,Delta:0.4):0.3,Omicron:0.8);"
  scale "years"

Dendrogram — the standard output of hierarchical agglomerative clustering, not evolution. Each internal node is placed at its merge height (the cophenetic distance at which its two child clusters fuse), all leaves align at a common baseline, and the branches are rectangular elbow connectors. A height axis is drawn so you can read off the distance at which any two leaves first share a cluster. Reach for this mode when the same Newick/indent tree describes a clustering result — gene-expression clusters, sample similarity, survey-response groups — rather than a phylogeny.

Add a cut <value> line to slice the tree at a chosen height: every subtree whose merge height falls below the threshold becomes one flat cluster, each colored distinctly, and a dashed threshold line is drawn across the tree at that height. This is the dendrogram equivalent of fcluster in scipy — turning a continuous tree into a discrete set of groups.

phylogenetic·§ Newick / NHX
↘ preview
100%
Dendrogram: Gene expression clusters Dendrogram with 5 taxa, dendrogram mode, rectangular layout, cut at 4 into 2 clusters Gene expression clusters A B C D E 0 1 2 3 4 5 cluster distance cut = 4
UTF-8 · LF · 4 lines · 127 chars✓ parsed·1.4 ms·5.6 KB SVG

Omit cut to show the bare dendrogram with no flat-cluster coloring:

phylogenetic·§ Newick / NHX
↘ preview
100%
Dendrogram: Sample clustering Dendrogram with 5 taxa, dendrogram mode, rectangular layout Sample clustering A B C D E 0 1 2 3 4 5 cluster distance
UTF-8 · LF · 3 lines · 114 chars✓ parsed·0.5 ms·4.7 KB SVG

5. Clade highlighting

A clade line marks a monophyletic group with a color, an optional label, and an optional highlight mode.

clade ID = (member1, member2, ...) [color: "#hex", label: "text", highlight: mode]
PropValuesEffect
color:hex string e.g. "#1E88E5"Branch and/or background color
label:quoted stringClade label shown at right margin
highlight:branch, background, bothbranch colors lines; background shades the region; both does both

Members are tip (leaf) IDs from the Newick string. The renderer computes the MRCA of the listed tips and highlights the entire subtree rooted there.

phylogenetic·§ Newick / NHX
↘ preview
100%
Phylogenetic Tree: Mammal clades Phylogenetic tree with 7 taxa, phylogram mode, rectangular layout Mammal clades Primates Human Chimp Gorilla Mouse Dog Cat Tiger 0.2
UTF-8 · LF · 4 lines · 346 chars✓ parsed·0.4 ms·6.7 KB SVG

6. Scale bar and outgroup

Scale bar: scale "label" — adds a bar at the bottom. The label describes the unit (e.g. "substitutions/site", "Mya"). Omit for cladogram mode where branch lengths have no meaning.

Outgroup: outgroup: taxonId — records the outgroup for documentation; the renderer may use it to visually mark the outgroup taxon.

phylo "Vertebrates"
  newick: "((Human:0.1,Chimp:0.08):0.03,Lamprey:0.8);"
  outgroup: Lamprey
  scale "substitutions/site"

7. Header props reference

All options go inside […] on the phylo header line, or in a style […] line anywhere in the body.

PropValuesDefaultEffect
layout:rectangular, slanted, circular, unrootedrectangularTree layout
mode:phylogram, cladogram, chronogram, dendrogramphylogramBranch length semantics
unrooted(flag)Equivalent to layout: unrooted
branch-width:number1.5Stroke width of branches
openAngle:number (degrees)0Fan gap for circular layout (0 = full 360°)
mrsd:quoted year stringMost-recent sampling date for chronograms

8. Labels & comments

  • Title: phylo "Tree of Life" — first line only.
  • Scale label: scale "substitutions/site" — one per document.
  • Clade label: [label: "Primates"] inside a clade line.
  • Comments: # at the start of a line (after leading whitespace). Inline trailing comments are not supported.

9. Common mistakes

You wroteParser saysFix
newick: (A,B,C); (unquoted)PhyloParseError: Phylo document must start with 'phylo'Quote the Newick string: newick: "(A,B,C);"
Tip name with a space: Homo sapiens:0.1Parsed as Homo — space terminates an unquoted nameUse underscore (Homo_sapiens) or single-quote ('Homo sapiens')
Leaf ID in clade doesn't match Newick nameClade silently has 0 members; no highlightCopy names exactly as they appear in the Newick string
clade X = (A, B) with no newick: or indent treePhyloParseError: No tree definition foundAdd a newick: line or an indent tree block
mode: chronogram with no branch lengthsRenderer treats all lengths as 0; tips overlap at rootAdd :length to every edge in the Newick string
root: line not detectedIf the root: line has a space in the name (e.g. My root:) the indent tree is not triggeredUse a single-word root label or root:
Newick with internal node names: (A,B)ancestor:0.5Parses fine — ancestor is the internal node labelSupported; internal names appear on internal nodes

10. Grammar (EBNF)

document        = header (blank | comment | newick-line | scale-line
                    | outgroup-line | clade-line | style-line | cut-line | indent-line)*

header          = "phylo" ( WS quoted-string )? ( WS "[" props "]" )? NEWLINE
quoted-string   = '"' any-char-but-quote* '"'

newick-line     = "newick:" WS quoted-newick NEWLINE
scale-line      = "scale" ( WS quoted-string )? NEWLINE
outgroup-line   = "outgroup:" WS id NEWLINE
cut-line        = "cut" WS number NEWLINE       // dendrogram mode: flat-cluster threshold height
clade-line      = "clade" WS id WS "=" WS "(" id ("," id)* ")"
                    ( WS "[" clade-props "]" )? NEWLINE
style-line      = "style" WS "[" props "]" NEWLINE

// Indent tree — triggered by a line ending in ":" with no spaces
indent-tree     = root-line indent-node*
root-line       = id ":" NEWLINE
indent-node     = INDENT ( id ":" length | ":" length | id ) ( WS "[" number "]" )? NEWLINE

props           = prop ("," prop)*
prop            = "layout:" layout-value
                | "mode:" mode-value
                | "unrooted"
                | "branch-width:" number
                | "openAngle:" number
                | "mrsd:" quoted-string

clade-props     = clade-prop ("," clade-prop)*
clade-prop      = "color:" quoted-string
                | "label:" quoted-string
                | "highlight:" ( "branch" | "background" | "both" )

layout-value    = "rectangular" | "slanted" | "circular" | "unrooted"
mode-value      = "phylogram" | "cladogram" | "chronogram" | "dendrogram"

// Newick grammar (embedded, parsed separately)
newick          = subtree ";"?
subtree         = leaf | internal
internal        = "(" subtree ("," subtree)* ")" name? nhx? length?
leaf            = name nhx? length?
name            = unquoted-name | "'" single-quoted "'")
length          = ":" number
nhx             = "[" number "]"                     // plain bootstrap
                | "[&&NHX:" nhx-pair (":" nhx-pair)* "]"
nhx-pair        = key "=" value

id              = [a-zA-Z] [a-zA-Z0-9_-]*
number          = /[+-]?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?/
comment         = INDENT "#" any NEWLINE

Authoritative source: src/diagrams/phylo/parser.ts. If this diverges from the parser, the parser wins — please open an issue.


11. Standard compliance

Schematex phylogenetic trees follow the Newick format specification (as documented in the PHYLIP package) for the core tree serialization, and the NHX (New Hampshire Extended) convention for bootstrap support values. The B= field in NHX brackets is the only NHX field rendered visually today; all other fields are parsed and stored but not displayed.

What is implemented today:

  • ✅ Newick topology, branch lengths, quoted names, polytomies
  • ✅ Bootstrap values — plain [95] and NHX [&&NHX:B=95]
  • ✅ Rectangular, slanted, circular, and unrooted layouts
  • ✅ Phylogram, cladogram, and chronogram modes
  • ✅ Clade highlighting (branch color, background shading, both)
  • ✅ Scale bar
  • ✅ Indent DSL alternative
  • ⏳ Multi-tree documents (forest) — see §12
  • ⏳ Time-calibrated axis for chronograms (geological scale)
  • ⏳ Per-tip icons or images
  • ⏳ NHX fields beyond B= (species, taxonomy, duplication events)

References:

  • Felsenstein, J. (1986). The Newick tree format. PHYLIP documentation.
  • Zmasek, C.M. & Eddy, S.R. (2001). ATV: Display and manipulation of annotated phylogenetic trees. Bioinformatics, 17(4), 383–384. (NHX specification)
  • Felsenstein, J. (2004). Inferring Phylogenies. Sinauer Associates.

phylo·§ Newick/NHX
Phylogenetic Tree: Bacterial Diversity Phylogenetic tree with 10 taxa, phylogram mode, rectangular layout Bacterial Diversity Ecoli Salmonella Vibrio Bacillus Staph Listeria Myco_tb Myco_leprae Strepto Lactobacillus 98 85 92 100 78 0.2 substitutions/site
Bacterial diversity (ten-taxon tree)
Ten-taxon bacterial phylogenetic tree from a Newick/NHX string with bootstrap support values, three colored clade arcs, and a branch-length scale bar.
research & analysis

13. Roadmap

Planned — not yet parseable. Do not use these in generated DSL today; the parser will reject or ignore them.

  • Multi-tree documents — a phylo file with more than one newick: block (e.g. gene trees vs species tree).
  • Geological time axis for chronograms — an epoch-labelled X axis (Cenozoic / Mesozoic etc.) instead of a plain numeric scale.
  • Per-tip metadata — attaching traits or colored markers to individual tips without declaring a full clade (e.g. tip Ecoli [color: "#F00", shape: star]).
  • NHX fields beyond bootstrap — rendering species (S=), duplication (D=), and transfer events (Tr=) as branch symbols.
  • Tanglegram — two trees displayed side by side with connecting lines linking corresponding tips.

Track in the GitHub issues if you need any of these sooner.