Structured Note Reference Data

Explorer
More ▾

Structured Note Reference Data: Methodology

Structured notes are among the hardest instruments to maintain reference data for. Each note has its own payoff curve, its own underliers, its own barriers, its own early-redemption schedule. Term sheets run thirty pages, and two notes priced by the same dealer in the same week can behave nothing alike.

This document describes how that complexity gets turned into clean, queryable reference data — what the feed contains, where it comes from, how it's extracted, and how accuracy is maintained.


What's in the feed

Term-sheet-level reference data on structured notes — identification, classification, mechanics, coupon terms, autocall schedules, issuer call rights, fee economics, underlying assets, and the full piecewise payoff at maturity.

Sixty-plus fields per note. Multiple rows per note where they matter — one per underlier, one per payoff breakpoint. No collapsing complex structures into a single "payoff description" string and calling it a day.

Coverage spans the full structured-note universe issued into the U.S. market: autocallables, reverse convertibles, participation notes, principal-protected notes, leveraged and inverse products, dual-directional, digital, range accrual, credit-linked, and more. Across major dealers, across asset classes — single stocks, indices, ETFs, baskets, FX, rates, commodities.

The historical record is complete. New notes flow in continuously. Amendments are merged into the parent record without losing the original terms.


Where the data comes from

Primary source is the issuer-published offering document — the 424B2 prospectus filed with the SEC, the final terms document published on the issuer's own site, the pricing supplement distributed by the selling agent. Whatever the issuer actually said about the note, in writing, at issuance.

We pull from regulatory filings, issuer pages, and distributor catalogs in parallel. A note that shows up in two places — once on EDGAR, once on a dealer site — is recognized as the same instrument and reconciled into a single record. ISIN and CUSIP do the matching; nothing gets double-counted.

When an issuer files an amendment, the amendment is layered onto the original. Fields the amendment restates get updated. Fields it doesn't mention stay as they were. You see the current state of the note, not a fragmented stack of partial filings.


How the data is extracted

Offering documents are unstructured prose. Turning them into structured fields takes more than pattern-matching.

Each filing goes through a multi-stage extraction pipeline. The note's identity is established first — issuer, ISIN, CUSIP, currency, maturity. The underliers are identified and resolved to their own ISINs, so a basket of three names becomes three linked records with three real securities behind them, not three free-text strings. Then the mechanical detail comes out one component at a time: coupon structure, autocall schedule, issuer call provisions, payoff at maturity, fee economics.

Each component is extracted in isolation, against a controlled vocabulary. Payoff archetypes resolve to a fixed taxonomy (REVERSE_CONVERTIBLE, AUTOCALLABLE, PARTICIPATION, PRINCIPAL_PROTECTED, and so on). Settlement methods, observation frequencies, basket calculation methods, dividend treatments — all standardized. You can filter, group, and screen across the universe without writing a different parser for every dealer's prose.

The piecewise payoff profile gets special treatment. Rather than a paragraph describing the maturity payoff, you get a table of breakpoints — underlier performance level on one axis, investor payout on the other. Connect the dots and you have the full payoff curve, ready to plot, ready to feed into a pricing or risk model.


How accuracy is enforced

Extraction alone isn't enough. Every component goes through an independent review pass that re-reads the source document and challenges the extracted values. Disagreements are flagged, investigated, and resolved against the original text. Reviewed records carry a record of what was checked, by which pass, against which fields — so quality is observable, not asserted.

The pipeline is built to fix mistakes, not to hide them. When a filing is re-processed, prior analysis is wiped cleanly and rebuilt from source. There are no orphaned rows, no stale leftovers, no half-updated records. What you see in the feed today is what the source documents say today.

Cross-checks run continuously across the full dataset:

When something looks wrong, it gets reviewed and corrected — not papered over.


How the data reaches you

Pipe-delimited text files, delivered daily. Three files per delivery:

A single instrument identifier joins the three files. Each row carries an action flag — insert, update, or delete — so you only need to apply what changed. Files are stable, dated, and idempotent: replaying yesterday's file produces yesterday's state.


Step schedules and the full observation calendar

Step-down autocallables, step-up coupons, and rising call-premium schedules are where most reference data products fall down. The summary fields tell you a trigger "starts at 100% and steps down to 70%" — but not when, not by how much per step, and not what premium gets paid on each observation.

This feed gives you the full schedule.

For every note with autocall, contingent coupon, or issuer-call observation dates, the observation calendar is materialized row-by-row. Each row carries:

You don't have to derive the schedule from a frequency code and an end date, and you don't have to interpolate a step curve from two endpoints. The dates are the dates the issuer named. The levels are the levels the issuer named. If a note has 60 quarterly observations with a custom step pattern across them, you get 60 rows.

That makes valuation, scenario analysis, and lifecycle event tracking straightforward — every observation is a known date with known parameters.


Why it's built this way

Most reference data products are built around what's easy to extract. This one is built around what's actually in the term sheet.

If a note has a memory coupon, the memory flag is set. If it's quanto'd into a non-base currency, the FX mechanic field tells you so. If the issuer can call it on specific dates with a specific notice period, those dates and that period are captured. If the maturity payoff has six breakpoints, you get six rows — not a sentence approximating the shape.

You shouldn't have to read the prospectus to know what the note does. That's the whole point.


For the full field-level data dictionary, see the accompanying reference document. Questions about coverage, methodology, or specific instruments are welcome — that's what we're here for.