Catalog — LORE

▦

Layer 1 / Foundation

Market Activity

The raw, sequenced record of every trade, level, and book event across the nine major crypto venues. Normalized to a common schema, replayable by timestamp, and aligned across exchanges so your AI sees the market the way the matching engine did.

This is the foundation everything else in the library is built on. If a regime tag, a setup label, or a research dossier exists upstream, it traces back to a specific row in here.

What's inside

Print tape

Every executed trade with native venue timestamp and monotonic sequence ID.

81.7M perp bar rows
9 timeframes

Level-2 order books

Multi-level book snapshots and updates where the venue exposes them. Hyperliquid BTC active, others queued.

9,162 depth rows
1m / 5m / 1h

Kraken BTC true-L3 queue lifecycle

Message-level order book lifecycle on Kraken BTC — queue position, add/cancel/replace events, true-L3 reconstruction. The only true-L3 product in the library. Everything else stays explicitly L2 / depth / proxy.

message-level
Kraken BTC only

Mark, index, and premium

Reference prices and perp premium series, time-aligned to the print tape.

82.1M rows

Coverage

Venues	Binance USD-M, Binance Spot, Hyperliquid, Kraken, Coinbase, OKX, Bybit, Deribit, dYdX
Assets	150 USD-M perpetual futures · 65+ spot
Timeframes	1m · 3m · 5m · 10m · 15m · 30m · 1h · 4h · 1d
History	2012 → today (varies per asset)

Example use case

Agent query

"Pull every BTC trade from 14:22:00 to 14:25:00 UTC on March 12, 2024 on Binance USD-M and Hyperliquid."

→ Returns aligned event streams from both venues with sequence IDs, sides, sizes, prices, and venue timestamps. Replayable end-to-end.

◈

Layer 1 / Foundation

Derivatives

Funding rates, open interest, basis spreads, and forced-close events for the perpetual futures universe. The forces moving the market that don't show up in the candle alone.

An AI agent looking only at price misses why the move happened. This category gives it the funding pressure, leverage state, and liquidation context that explain whether a move was real or mechanically forced.

What's inside

Funding rate history

Realized funding per perp, with rolling z-scores and percentile ranks for context.

813K rows

Funding & carry feature pack

Curated funding-rate features built for carry, basis, and rate-curve research — historical + forward, source-neutral.

historical + forward

Open interest series

OI history, OI delta, OI as percent of supply where available.

66.8M rows

Mark / index / premium

Reference + premium time series for every perp.

82.1M rows

Liquidation & deleveraging

Forced-close events plus deleveraging-cascade context. Live forward collectors for binance_usdm_liquidations and bybit_liquidations; historical events tagged with size, side, asset, venue, and surrounding cascade markers.

live + historical
Binance USD-M, Bybit forward

Perp / spot basis

Cross-venue basis spreads, computed against spot reference.

all 150 perps

Coverage

Venues	Binance USD-M, Hyperliquid, OKX, Bybit, Deribit, dYdX (perp universe)
Assets	150 USD-M perpetual contracts
Timeframes	1m through 1d, plus per-funding-cycle for funding
History	2020 → today (perp data depth varies; 81 perps with 5y+, 142 with 3y+)

Example use case

Agent query

"Show me every BTC long liquidation cluster over $5M since 2024 where funding flipped negative within 30 minutes."

→ Returns event-grouped liquidations joined to funding rate transitions, with timestamps, sizes, and the surrounding price context.

⇕

Layer 1 / Foundation

Positioning & Flow

Where traders are crowded, who's trapped, and how aggressive the flow is. The data layer that separates a real conviction move from a forced one.

An AI agent that only sees candles can't tell when shorts are about to be squeezed or when longs have been silently leveraging up for hours. This category surfaces those positioning dynamics directly.

What's inside

Long / short ratios

Aggregate trader positioning across exchanges where reported.

9 venues

OI delta + crowding indicators

Open-interest changes, crowding scores, leverage state.

66.8M rows

Trapped longs / trapped shorts

Heuristic flags for one-sided pressure that hasn't unwound yet.

all perps

Buy / sell aggression

Net taker flow, aggression ratios, side-imbalance tracking.

118.9M rows

Absorption + large-trade activity

Absorption signals, large-trade markers, block-print detection.

118.9M rows

Coverage

Venues	Binance USD-M (primary), with cross-venue aggregation where positioning data is exposed
Assets	150 perps, all timeframes
Timeframes	1m · 5m · 15m · 1h · 4h
History	2021 → today (rolling 4-year window for some derived flow products)

Example use case

Agent query

"Find SOL setups where buy aggression was elevated for 30+ minutes while OI was falling. Was the move durable?"

→ Returns matching windows with paired flow + OI panels and the post-window outcome distribution.

⇄

Layer 2 / Intelligence

Cross-Exchange

Multi-venue confirmation, lead/lag dynamics, and dispersion signals — so an AI can tell whether a price move is real coordination across exchanges or one venue acting alone.

Single-venue data is the most common AI failure mode in crypto: a move on Binance gets treated as truth without checking whether spot and other perps confirmed it. This category eliminates that blind spot.

What's inside

Venue confirmation ratio

For each move, how many other venues confirmed within the same window.

10.6M rows

Lead / lag strength

Which venue moved first, and by how much, across rolling windows.

74 products

Cross-exchange dispersion

Standard deviation of price across venues at each timestamp.

9 venues

Isolated-move flags

Boolean tags for moves that didn't propagate across venues — likely thin liquidity or venue-specific.

live + historical

Listing & launch dynamics

New-asset listing timeline across venues — which exchange listed first, launch-window price discovery, cross-venue dispersion at debut, early-volume profile. Historical + forward.

historical + forward

Reference + pair gaps

Inter-venue gap series, including spot-vs-perp and perp-vs-perp pairs.

all major pairs

Coverage

Venues	All 9 (Binance USD-M, Binance Spot, Hyperliquid, Kraken, Coinbase, OKX, Bybit, Deribit, dYdX)
Assets	BTC, ETH, SOL, and other top liquidity pairs across all venues; partial coverage extending to 80+ assets
Timeframes	1m · 5m · 15m · 1h · 4h
History	2020 → today

Example use case

Agent query

"Was the BTC breakout on Hyperliquid at 09:14 UTC confirmed across spot and other perp venues, or isolated?"

→ Returns a venue-by-venue grid showing which exchanges moved with Hyperliquid and which didn't, plus dispersion at the moment of the move.

◉

Layer 2 / Intelligence

Chart Cognition

What a trained human trader sees on a chart — encoded so an AI agent can read it directly. Support, resistance, trend channels, volume profile, named setups, and natural-language captions for every window.

This is the largest single category in the library by row count. Without it, an AI working on raw OHLC has to invent its own technical analysis. With it, the chart is already pre-read.

What's inside

Support / resistance state

Active levels, level age, recent touches, broken-and-reclaimed flags.

119M chart-state rows

Trend & channel structure

Trend direction, channel slope, channel position, channel age.

all 150 perps

Volume profile

Volume-by-price profiles with point of control and value area markers.

9 timeframes

Setup labels

Named pattern labels (breakout, reclaim, rejection, squeeze, etc.) tagged at occurrence.

22.8M setup labels

Plain-English captions

Generated short captions describing each window — readable directly by an LLM.

3,637 products

Coverage

Assets	All 150 perps + top spot pairs
Timeframes	1m · 5m · 15m · 1h · 4h · 1d
History	2020 → today
Update cadence	Each completed bar across all timeframes

Example use case

Agent query

"Find every BTC 1h chart over the last 90 days where price reclaimed prior resistance with rising volume profile inside a trending channel."

→ Returns matching windows with chart-state, captions, setup labels, and links to the underlying tape rows.

◐

Layer 2 / Intelligence

Regimes & Context

The market state your AI is reasoning in. A pattern that works in a quiet compression regime can fail in a liquidation cascade. This category gives every market row a causal regime context: trend state, volatility state, liquidity state, crowding state, cluster age, recent regime shifts, and where each asset stands relative to its peers.

This is also where the Relative Strength Atlas lives — peer rankings across the universe by momentum, funding, volume, volatility, liquidity, OI, crowding, and regime behavior. Most crypto edges are not "BTC went up 2%" — they're "SOL is leading the universe right now while ETH is lagging." This category makes that explicit.

What's inside

Relative Strength Atlas

Peer ranks across the universe by momentum, funding, volume, volatility, liquidity, OI, crowding, and regime behavior.

1.99M rows

Regime state & cluster IDs

Causal regime tag per row: regime ID, when it started, age so far (live-safe — no future leakage), recent-change flag, cluster family ID.

1.99M rows

Trend / chop / vol regime tags

Categorical market state tags across multiple horizons.

546 regime products

Tradability Atlas

Execution context around each market setup — liquidity, spread, depth, slippage risk, funding drag, crowding, venue coverage.

8.3M tradability scores

Microstructure Execution

Source-neutral execution microstructure: spread series, passive-maker execution pack, and fill-quality signals built on 2.16B inventoried quote, book, and trade rows. Forward labels for passive fills, markouts, and adverse selection are tagged pending_source_coverage until forward trade-print and depth feeds land.

9 products · 313K feature rows
BTC / ETH / SOL · 1m + 5m

Breadth · dispersion · liquidity context

Cross-asset breadth, dispersion, and liquidity state to frame any single-asset move.

all 150 perps

Coverage

Assets	150 perps + top spot pairs
Timeframes	1m · 5m · 15m · 1h · 4h · 1d
History	2020 → today
Live safety	All regime + cluster columns are causal — live rows know age-so-far, never final regime length

Example use case

Agent query

"For every period where SOL was top-3 in momentum rank but bottom-half in funding rank, what's the next-7-day return distribution?"

→ Returns matching cross-sectional windows joined to outcomes, conditioned on regime state.

⌬

Layer 3 / Memory

Edge Atlas + Research

The strategy-discovery memory of the library. Named market episodes, outcome tables, replays of similar historical setups, base-rate comparisons, and ranked research lead cards your AI can drill into.

This is also where the Outcome Atlas lives — forward returns, MFE/MAE, barrier-touch order. These are labels, kept clearly separated from feature-safe inputs so an agent can study what happened next without accidentally training on the answer.

What's inside

Edge Atlas

Named market episodes (breakouts, reclaims, squeezes, capitulations, etc.) with structured metadata.

748 strategy maps
8.2M episode rows

Research Lead Cards

Ranked research opportunities, each with regime context, sample size, source trust, tradability, and a prepared agent prompt.

644 ranked leads

Pattern Memory Replay

Compact before/during/after windows for every setup, plus historical analogs for direct comparison.

644 replays

Matched Control Atlas

Same-regime, no-setup baseline comparisons. Tells your AI whether a setup actually beat its base rate.

644 leads
173 beating controls

Research Evidence Graph

Knowledge-graph traversal from a lead through its supporting evidence — products, regimes, controls, source trust.

1,368 nodes

Outcome Atlas

Forward returns, MFE/MAE, barrier-touch order. For training and evaluation, never as a feature input.

8.3M outcome rows

Coverage

Episodes	Built across all 150 perps where chart cognition + regime data exists
Lead ranking	By regime, sample size, base-rate excess, tradability, source trust
History	2020 → today
Outcome separation	Outcome columns are tagged via Column Safety Manifest — agents cannot use them as features by design

Example use case

Agent query

"Pull the top 10 ranked breakout leads from 2024 that beat their matched control, and replay each one with its 18 closest historical analogs."

→ Returns 10 lead cards plus 180 analog windows, joined to regime context and outcomes (clearly labeled as outcomes, not features).

⊞

Layer 3 / Memory

Trust + Workbench

The reliability scoring, validation gates, and leakage-safe pipeline that turn the rest of the library into something an AI agent can train and test against without accidentally cheating.

If the rest of the catalog is the data, this category is the rules of the game. It's what makes the difference between an AI that learns a real pattern and one that quietly trains on the future.

What's inside

Source reliability scores

Every dataset and source tagged with one of four reliability buckets: high-trust, usable-with-checks, limited-coverage, research-only.

9 sources scored

Validation gates

18 readiness gates and 48 structural validation checks. Every product has to clear them before it's published.

18 gates · 48 checks · 0 failed

Column Safety Manifest

Every column tagged with its role: safe-input, outcome-label, metadata, warning, do-not-use-for-training. Schema-level enforcement against leakage.

all products

Model Split Manifests

Pre-built causal train / validation / test / walk-forward / holdout splits. No future-looking data in any past window.

all timeframes

Model-Ready Feature Packs

Curated bundles of causal-only features per use case (momentum, breakout, mean reversion, funding carry, etc.). Outcomes excluded by construction.

5,520 marts

Agent Navigation + Workbench guidance

Start-here paths, complexity tiers, column-by-column guidance for which fields are safe to feed an agent.

4 tiers · 6 paths

Coverage

Reliability buckets	high-trust · usable-with-checks · limited-coverage · research-only
Column roles	safe-input · outcome-label · metadata · forward-pending-label · quality-warning · execution-context · do-not-use-for-training
Split types	train · validation · test · walk-forward · holdout
Audit cadence	Validation gates run on every release

Example use case

Agent query

"Pull a Model-Ready Feature Pack for a 1h breakout strategy, with the matching walk-forward split manifest. I want every column to be flagged safe-input."

→ Returns the curated pack with the column safety manifest attached, the split manifest, and a one-line confirmation that no outcome columns are present.

Everything in the LORE library.
Every category, every product.

Market Activity

Derivatives

Positioning & Flow

Cross-Exchange

Chart Cognition

Regimes & Context

Edge Atlas + Research

Trust + Workbench

Ready to use this?