Download a-team-dec-2016

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Futures exchange wikipedia , lookup

Market sentiment wikipedia , lookup

High-frequency trading wikipedia , lookup

2010 Flash Crash wikipedia , lookup

Algorithmic trading wikipedia , lookup

Trading room wikipedia , lookup

Day trading wikipedia , lookup

Transcript
Ultra Low Latency (ULL) Electronic Trading
Why Deterministic Latencies are Critical & Processing Speed still Significant
Ted Hruzd, Sr. Infrastructure Architect, RBC Capital Markets (since March 2016)
Wall Street IT since 1983
ULL Architect at Citi, DB, JP Morgan 2005-2016
NYU Teacher of ULL Architectures for Electronic Trading course – Summer 2017
Panelist 12/8 Intelligent Trading Summit’s ULL Market Data Webinar
Any comments made by Ted Hruzd here or on the webinar are their own personal view and not that of
Ted’s current or past employers
Describe optimal access to Low Latency
Mkt Data and its importance
•
•
•
•
•
•
ULL
NOT-ULL
CoLo instead of ExtraNets
FPGA (full or part) Tick2Trade (T2T) 1-2 uSecs
L1-3 Switches 70 ns Exchg to switch in Colo
5 ns Mkt Data Fan-out to FPGA data normaliz.
Exch  subscriber Mkt Data < 1 uSec
$18K for 48 port L1-3 ULL switch
• ExtraNets
• Non FPGA Mkt Data Tick Plant
• High end Cisco/Arista Switches/Routers
• 1-2 uSec Fan-out
• 1-2 milliseconds
----------------------------------------------------------------------------• MultiThrd TBB, OMP, AVX-512, kernel/app tuning
• Complex Algo’s in GPU’s or Intel Cores
• Consolidated for most; Direct for Book Builds
• Latency Spikes; missed opportunities
• FIX Engines & order books in FPGA NICs
• Simple Algo’s 100% in FPGA
• Complex Algo’s in GPU’s or Intel Cores
• Direct Feeds not Consolidated; UBBO v NBBO
• Deterministic Latencies; $$$ during spikes
Both
• Aggressive SLA’s - Network Q’s & pkt drops
• Separate NICs for Mkt Data & Order Flow
• Kernel Bypass (FPGA based preferred)
• KPI’s all trading partners performance
• Internal metrics for ROI
• RTmkt data/news analytics- seek alpha
• Measure exch, FH, direct & cache clients
TV’s – Nasdaq,BATS, Arca, Dark Pools ……………..
L1-3 ULL switchesExchange
in CoLoSwitch
Orders to/from TV ( 70 ns)
Servers- AlgoTrd, SOR, Risk, C-Conn, MD fanout
Internal app processing – separate
opportunity to speed up
Clients
C1, C2,
C3, …
MD fan-out (5 ns)
Client Connectivity Switch
Clients conn ( 70 ns)
Microwave Chi-NY 4.5 ms
MD from TV’s
Out to apps:
(5 or 70ns)
What is Speed2? Why is it more important than
Speed1 (Raw)
•
•
•
•
revolves around meta-speed (information about speed).
dynamics, timeliness, measurability, auditability & transparency of speed & latency.
critical aspect of Speed2 is deterministic latency.
What good is raw speed if firms do not understand whether sell-side systems, exchanges,
markets, & clients they are connecting to are fully functional?
• Or if a trading partner is in the beginning stages of failure, or in the process of being degraded –
ex: Exec broker mkt data latencies spiking with Fed announcement orTrump Tweet?
• Buy-Side firms can learn of ExecBrkr order ack latency spikes (due to ExecBrkr spikes
handling Mkt Data)
• immediately route to alternate sell-side execution brokers.
• This is where high speed memory analytics provide a significant competitive advantage.
• Speed2 most critical to Market Makers – loose $$ on stale market data
Tabb and Corvil refer to the above trade decision point as ‘Speed2’ – ability to be fully aware of
speed (or lack of speed), internally determinately fast in order to maximize trading revenue.
What factors disrupt optimal scenario?
•
•
•
•
•
•
•
•
•
•
•
Improper bandwidth capacity planning/implementation + errors in SLA plans
Errors in Integration with Exchanges + vendor products: L1-3 switches + FPGA MktData + FPGA
or cpu Order Flow
Not validating your vendor’s ‘deterministic’ latency claims
Combining mkt data, order flow, and admin commands on same NIC!
NOT using kernel bypass
Per OS kernel tuning - ex: not using RHEL 7’s “network-latency” profile (optimized for speed,
not power savings, no auto NUMA, less interrupts, 2 way TCP handshakes, …. Goto redhat site
Little or no latency measuring
• Can’t fix if no analysis’ can’t analyze if not measuring
• Essential to measure exchange, feed handler, client, end-end latencies (some vendors
provide this easily)
No ‘Speed2’-like metrics, including metrics for ROI analysis
No continuous real-time alerting, analysis for performance tuning, capacity planning, HA, DR
Not paying attention to ‘deterministic’ latencies
Plan for faster SIP (20 uSecs from 350 uSecs).
• Will Speed bump exchanges adapt to this? Will prop traders route less to them?
Deterministic Latencies & optimal ULL apphow it works, how to attain
•
•
•
•
•
•
•
•
•
•
•
Infrastructure and app design in prior slide detail ULL deterministic architecture
Serious traders know the sell-side, execution brokers, dark pools, and trading venues that
exhibit deterministic speed (and low latencies).
Trade with them especially in volatile times: best $$ opportunities
Trading signals have a short life, often in micro seconds.
Best Buy side traders route to the fastest and most deterministic sell side firms.
IB’s optimize their deterministic dark pools route
Exec Brokers route to deterministic latency dark pools and lit exchanges
Metrics for ROI
Test, Profile, Analyze, Project
Architects, engineers, developers, QA to recommend new infrastructures for positive ROI,
Next, meticulously engineer, configure, profile tune, validate expected latencies, fill rates, and
thus revenue (positive ROI)
• metrics may play significant role as to whether a trading firm should stay in ET or exit the
business.
• TABB: a large global investment bank has stated that every millisecond lost results in
$100m per annum in lost opportunity. (public info to Tabb)
• ARCA …. Competitive (order acks, execs, ARCA BOOK), in the BLACK
Latency Impacts of execution, TCS, surveillance
& algo back testing
•
•
•
Mkt data for TCA, market surveillance & algo back testing must NOT impact T2T, order ack
times, or trade execution times.
Offload any analytics and above functions separately & asynchronously
Proper software can replay market data with multiple algo’s at original rates, latencies, or alter
(ex: speed-up, change dynamics, algo goals, etc). Many Mkt Data vendors provide this service
TCA (maybe shoukd be referred to as RT analytics or RTA for alpha instead)
From Tabb:
TCA is increasingly being used in real time. TCA can therefore generate alpha by projecting and
exposing lower costs and specific trading venues for buying or selling securities. This is referred to
as “opportunity cost” and is very time sensitive.
THE POOL for NON ULL Trading is decreasing
Nearly 50% of equity desks globally now use TCA (RTA) as both a pre- and post-trade tool with onequarter using TCA (RTA) for real-time, in-trade analytics. Due to their access to the resources for
deploying and integrating new technology, the highest-volume firms are the biggest users.
HW Acceleration + in-mem DB’s for higher
performance in Mkt Data Mgt
•
•
FPGA’s are deterministic by design and process in parallel
• Already discussed role in ULL and deterministic Mkt Data
Re: Analytics
• Both are required as hardware acceleration will speed up (and deterministically) market
data to high speed memory regions for analytics that may include timely alpha seeking
strategies.
Are ULL Mkt Data BP’s agreed to? Or do they
vary per specialization, function, size?
•
•
•
•
•
•
•
•
Agreed upon- NO.
Some feel few milliseconds latencies or more for market data still suffices.
• Prime Ex: Low Freq, alpha-seeking per fundamental analysis for long term holds
Others strive for nanoseconds.
Key is to ID your business goals & ROI on ULL spending.
Specialization and function are significant factors.
Goals of Prop Traders, Market Makers, HFT traders, arbitrage, are much more latency sensitive
than most Buy side and asset managers.
Equites & Futures traders are more apt to opt for ULL
FX not so but getting closer; Bond and Commodity trading much less to choose ULL.
Why is pure speed still significant?
• Industry leadingTick-2-Trade latencies have decreased by factor of 10 every 3 years
• ULL trading firms continue to prioritize speed.
• Buy Side tracks latencies with Sell Side (SS) and routes to fastest SS firms
• Buy Side has accelerated adoption of SS-like ULL technologies; hence, proportion of
trades to non ULL firms is decreasing
Top Trends
Past 2 years, continuing
•
•
•
•
•
•
•
•
•
•
In 2017, Nasdaq will decrease time to disseminate SIP (Consolidated Market Data) from 480
uSecs to 50 uSecs, then 20 uSecs;
• Almost all Trading Firms actively use the SIP
• Trading Firms will optimize infrastructure for quicker access to SIP
More firms & algo applications are upgraded for single digit uSecs for Tick-2-Trades
Speed enhancements in FPGA’s, GPU’s, Servers, Caches, Middleware continue to lead to lower
trading latencies
• FIX Engines and Mkt Data order books in FPGA NICs
• Simple algo’s, along with FIX Engines, SOR, Risk checks – 100% FPGA ready
Increasing # of vendors now compete in space of ULL appliances that feature parallelized
processing using 2 or more of following:
• FPGA’s, GPU’s, Intel Cores, ULL Switch capabilities
Some ULL Switches now transmit Market Data to subscribers in 5 ns
48 port ULL switches are inexpensive (under $20K) - easier to meet ROI
Kernel Bypass is now ubiquitous; I/O’s are at approx. 1 uSec, down from 12 uSecs
• Increased use of FPGA based kernel bypass to sub uSec I/O latencies
More Trading firms embracing FPFGA’s for deterministic latencies
GPU’s remain important speed factor but more for risk (ex MonteCarlo) & analytics
Additional Top Trends (Cont)
Very early or more longer term strategic
•
Real Time Big Data (Machine Learning), some in Cloud, now send more time sensitive alpha
trading signals over high speed interconnects to Trading Apps
• Another reason to speed up trading systems
•
•
New Binary FIX protocol, expected in 2017, will further decrease latencies
FPGA programming is becoming easier and is increasing in use cases
• New Intel libraries and C like A++ language a major reason
Matching Engines 100% in FPGA
•
CoLo App – using L1-3 ULL switches
Optimized for lowering network deterministic latencies from as much as 1+ ms to under
1 uSec; pass Mkt Data to Pure FPGA based Algo App may result in 1 uSec T2T
TV’s – Nasdaq,BATS, Arca, Dark Pools ……………..
Exchange Switch
Orders to/from TV ( 70 ns)
MD from TV’s
Servers- AlgoTrd, SOR, Risk, C-Conn, MD fanout
Internal app processing – separate
opportunity to speed up
Clients
C1, C2,
C3, …
MD fan-out (5 ns)
Client Connectivity Switch
Clients conn ( 70 ns)
Out to apps: (5 or 70ns)
Pure FPGA Based Solution
Optimized for lowering network deterministic latencies from few ms to under 1 uSec: