Structured Note Reference Data: Methodology
Structured notes are among the hardest instruments to maintain reference data for. Each note has its own payoff curve, its own underliers, its own barriers, its own early-redemption schedule. Term sheets run thirty pages, and two notes priced by the same dealer in the same week can behave nothing alike.
This document describes how that complexity gets turned into clean, queryable reference data — what the feed contains, where it comes from, how it's extracted, and how accuracy is maintained.
What's in the feed
Term-sheet-level reference data on structured notes — identification, classification, mechanics, coupon terms, autocall schedules, issuer call rights, fee economics, underlying assets, and the full piecewise payoff at maturity.
Sixty-plus fields per note. Multiple rows per note where they matter — one per underlier, one per payoff breakpoint. No collapsing complex structures into a single "payoff description" string and calling it a day.
Coverage spans the full structured-note universe issued into the U.S. market: autocallables, reverse convertibles, participation notes, principal-protected notes, leveraged and inverse products, dual-directional, digital, range accrual, credit-linked, and more. Across major dealers, across asset classes — single stocks, indices, ETFs, baskets, FX, rates, commodities.
The historical record is complete. New notes flow in continuously. Amendments are merged into the parent record without losing the original terms.
Where the data comes from
Primary source is the issuer-published offering document — the 424B2 prospectus filed with the SEC, the final terms document published on the issuer's own site, the pricing supplement distributed by the selling agent. Whatever the issuer actually said about the note, in writing, at issuance.
We pull from regulatory filings, issuer pages, and distributor catalogs in parallel. A note that shows up in two places — once on EDGAR, once on a dealer site — is recognized as the same instrument and reconciled into a single record. ISIN and CUSIP do the matching; nothing gets double-counted.
When an issuer files an amendment, the amendment is layered onto the original. Fields the amendment restates get updated. Fields it doesn't mention stay as they were. You see the current state of the note, not a fragmented stack of partial filings.
How the data is extracted
Offering documents are unstructured prose. Turning them into structured fields takes more than pattern-matching.
Each filing goes through a multi-stage extraction pipeline. The note's identity is established first — issuer, ISIN, CUSIP, currency, maturity. The underliers are identified and resolved to their own ISINs, so a basket of three names becomes three linked records with three real securities behind them, not three free-text strings. Then the mechanical detail comes out one component at a time: coupon structure, autocall schedule, issuer call provisions, payoff at maturity, fee economics.
Each component is extracted in isolation, against a controlled vocabulary. Payoff archetypes resolve to a fixed taxonomy (REVERSE_CONVERTIBLE, AUTOCALLABLE, PARTICIPATION, PRINCIPAL_PROTECTED, and so on). Settlement methods, observation frequencies, basket calculation methods, dividend treatments — all standardized. You can filter, group, and screen across the universe without writing a different parser for every dealer's prose.
The piecewise payoff profile gets special treatment. Rather than a paragraph describing the maturity payoff, you get a table of breakpoints — underlier performance level on one axis, investor payout on the other. Connect the dots and you have the full payoff curve, ready to plot, ready to feed into a pricing or risk model.
How accuracy is enforced
Extraction alone isn't enough. Every component goes through an independent review pass that re-reads the source document and challenges the extracted values. Disagreements are flagged, investigated, and resolved against the original text. Reviewed records carry a record of what was checked, by which pass, against which fields — so quality is observable, not asserted.
The pipeline is built to fix mistakes, not to hide them. When a filing is re-processed, prior analysis is wiped cleanly and rebuilt from source. There are no orphaned rows, no stale leftovers, no half-updated records. What you see in the feed today is what the source documents say today.
Cross-checks run continuously across the full dataset:
- Underliers reconcile to a master security file, so every linked asset is itself a real, identified security.
- Issue dates, maturity dates, and observation schedules are checked for internal consistency.
- Notes with autocall features get their full observation calendar materialized, so downstream users don't have to derive it.
- Amendments are validated to ensure they merge cleanly into the parent without dropping fields.
When something looks wrong, it gets reviewed and corrected — not papered over.
How the data reaches you
Pipe-delimited text files, delivered daily. Three files per delivery:
- Reference data — one row per note, with the full set of single-valued fields.
- Underliers — one row per note/underlier pair. Worst-of and basket structures expand naturally.
- Payoff structure — multiple rows per note defining the maturity payoff as a piecewise-linear function.
A single instrument identifier joins the three files. Each row carries an action flag — insert, update, or delete — so you only need to apply what changed. Files are stable, dated, and idempotent: replaying yesterday's file produces yesterday's state.
Step schedules and the full observation calendar
Step-down autocallables, step-up coupons, and rising call-premium schedules are where most reference data products fall down. The summary fields tell you a trigger "starts at 100% and steps down to 70%" — but not when, not by how much per step, and not what premium gets paid on each observation.
This feed gives you the full schedule.
For every note with autocall, contingent coupon, or issuer-call observation dates, the observation calendar is materialized row-by-row. Each row carries:
- The observation date.
- The autocall trigger level on that date (so a step-down schedule shows up as a declining sequence — 100, 100, 95, 90, 85, ...).
- The call premium or redemption amount payable if the trigger is hit on that date (so step-up call premiums are visible per period, not just summarized).
- The coupon barrier level on that date, where applicable.
- The observation type (closing, intraday, average) and scope (worst-of, best-of, single).
You don't have to derive the schedule from a frequency code and an end date, and you don't have to interpolate a step curve from two endpoints. The dates are the dates the issuer named. The levels are the levels the issuer named. If a note has 60 quarterly observations with a custom step pattern across them, you get 60 rows.
That makes valuation, scenario analysis, and lifecycle event tracking straightforward — every observation is a known date with known parameters.
Why it's built this way
Most reference data products are built around what's easy to extract. This one is built around what's actually in the term sheet.
If a note has a memory coupon, the memory flag is set. If it's quanto'd into a non-base currency, the FX mechanic field tells you so. If the issuer can call it on specific dates with a specific notice period, those dates and that period are captured. If the maturity payoff has six breakpoints, you get six rows — not a sentence approximating the shape.
You shouldn't have to read the prospectus to know what the note does. That's the whole point.
For the full field-level data dictionary, see the accompanying reference document. Questions about coverage, methodology, or specific instruments are welcome — that's what we're here for.