Download a-team-dec-2016

Ultra Low Latency (ULL) Electronic Trading Why Deterministic Latencies are Critical & Processing Speed still Significant Ted Hruzd, Sr. Infrastructure Architect, RBC Capital Markets (since March 2016) Wall Street IT since 1983 ULL Architect at Citi, DB, JP Morgan 2005-2016 NYU Teacher of ULL Architectures for Electronic Trading course – Summer 2017 Panelist 12/8 Intelligent Trading Summit’s ULL Market Data Webinar Any comments made by Ted Hruzd here or on the webinar are their own personal view and not that of Ted’s current or past employers Describe optimal access to Low Latency Mkt Data and its importance • • • • • • ULL NOT-ULL CoLo instead of ExtraNets FPGA (full or part) Tick2Trade (T2T) 1-2 uSecs L1-3 Switches 70 ns Exchg to switch in Colo 5 ns Mkt Data Fan-out to FPGA data normaliz. Exch  subscriber Mkt Data < 1 uSec $18K for 48 port L1-3 ULL switch • ExtraNets • Non FPGA Mkt Data Tick Plant • High end Cisco/Arista Switches/Routers • 1-2 uSec Fan-out • 1-2 milliseconds ----------------------------------------------------------------------------• MultiThrd TBB, OMP, AVX-512, kernel/app tuning • Complex Algo’s in GPU’s or Intel Cores • Consolidated for most; Direct for Book Builds • Latency Spikes; missed opportunities • FIX Engines & order books in FPGA NICs • Simple Algo’s 100% in FPGA • Complex Algo’s in GPU’s or Intel Cores • Direct Feeds not Consolidated; UBBO v NBBO • Deterministic Latencies; $$$ during spikes Both • Aggressive SLA’s - Network Q’s & pkt drops • Separate NICs for Mkt Data & Order Flow • Kernel Bypass (FPGA based preferred) • KPI’s all trading partners performance • Internal metrics for ROI • RTmkt data/news analytics- seek alpha • Measure exch, FH, direct & cache clients TV’s – Nasdaq,BATS, Arca, Dark Pools …………….. L1-3 ULL switchesExchange in CoLoSwitch Orders to/from TV ( 70 ns) Servers- AlgoTrd, SOR, Risk, C-Conn, MD fanout Internal app processing – separate opportunity to speed up Clients C1, C2, C3, … MD fan-out (5 ns) Client Connectivity Switch Clients conn ( 70 ns) Microwave Chi-NY 4.5 ms MD from TV’s Out to apps: (5 or 70ns) What is Speed2? Why is it more important than Speed1 (Raw) • • • • revolves around meta-speed (information about speed). dynamics, timeliness, measurability, auditability & transparency of speed & latency. critical aspect of Speed2 is deterministic latency. What good is raw speed if firms do not understand whether sell-side systems, exchanges, markets, & clients they are connecting to are fully functional? • Or if a trading partner is in the beginning stages of failure, or in the process of being degraded – ex: Exec broker mkt data latencies spiking with Fed announcement orTrump Tweet? • Buy-Side firms can learn of ExecBrkr order ack latency spikes (due to ExecBrkr spikes handling Mkt Data) • immediately route to alternate sell-side execution brokers. • This is where high speed memory analytics provide a significant competitive advantage. • Speed2 most critical to Market Makers – loose $$ on stale market data Tabb and Corvil refer to the above trade decision point as ‘Speed2’ – ability to be fully aware of speed (or lack of speed), internally determinately fast in order to maximize trading revenue. What factors disrupt optimal scenario? • • • • • • • • • • • Improper bandwidth capacity planning/implementation + errors in SLA plans Errors in Integration with Exchanges + vendor products: L1-3 switches + FPGA MktData + FPGA or cpu Order Flow Not validating your vendor’s ‘deterministic’ latency claims Combining mkt data, order flow, and admin commands on same NIC! NOT using kernel bypass Per OS kernel tuning - ex: not using RHEL 7’s “network-latency” profile (optimized for speed, not power savings, no auto NUMA, less interrupts, 2 way TCP handshakes, …. Goto redhat site Little or no latency measuring • Can’t fix if no analysis’ can’t analyze if not measuring • Essential to measure exchange, feed handler, client, end-end latencies (some vendors provide this easily) No ‘Speed2’-like metrics, including metrics for ROI analysis No continuous real-time alerting, analysis for performance tuning, capacity planning, HA, DR Not paying attention to ‘deterministic’ latencies Plan for faster SIP (20 uSecs from 350 uSecs). • Will Speed bump exchanges adapt to this? Will prop traders route less to them? Deterministic Latencies & optimal ULL apphow it works, how to attain • • • • • • • • • • • Infrastructure and app design in prior slide detail ULL deterministic architecture Serious traders know the sell-side, execution brokers, dark pools, and trading venues that exhibit deterministic speed (and low latencies). Trade with them especially in volatile times: best $$ opportunities Trading signals have a short life, often in micro seconds. Best Buy side traders route to the fastest and most deterministic sell side firms. IB’s optimize their deterministic dark pools route Exec Brokers route to deterministic latency dark pools and lit exchanges Metrics for ROI Test, Profile, Analyze, Project Architects, engineers, developers, QA to recommend new infrastructures for positive ROI, Next, meticulously engineer, configure, profile tune, validate expected latencies, fill rates, and thus revenue (positive ROI) • metrics may play significant role as to whether a trading firm should stay in ET or exit the business. • TABB: a large global investment bank has stated that every millisecond lost results in $100m per annum in lost opportunity. (public info to Tabb) • ARCA …. Competitive (order acks, execs, ARCA BOOK), in the BLACK Latency Impacts of execution, TCS, surveillance & algo back testing • • • Mkt data for TCA, market surveillance & algo back testing must NOT impact T2T, order ack times, or trade execution times. Offload any analytics and above functions separately & asynchronously Proper software can replay market data with multiple algo’s at original rates, latencies, or alter (ex: speed-up, change dynamics, algo goals, etc). Many Mkt Data vendors provide this service TCA (maybe shoukd be referred to as RT analytics or RTA for alpha instead) From Tabb: TCA is increasingly being used in real time. TCA can therefore generate alpha by projecting and exposing lower costs and specific trading venues for buying or selling securities. This is referred to as “opportunity cost” and is very time sensitive. THE POOL for NON ULL Trading is decreasing Nearly 50% of equity desks globally now use TCA (RTA) as both a pre- and post-trade tool with onequarter using TCA (RTA) for real-time, in-trade analytics. Due to their access to the resources for deploying and integrating new technology, the highest-volume firms are the biggest users. HW Acceleration + in-mem DB’s for higher performance in Mkt Data Mgt • • FPGA’s are deterministic by design and process in parallel • Already discussed role in ULL and deterministic Mkt Data Re: Analytics • Both are required as hardware acceleration will speed up (and deterministically) market data to high speed memory regions for analytics that may include timely alpha seeking strategies. Are ULL Mkt Data BP’s agreed to? Or do they vary per specialization, function, size? • • • • • • • • Agreed upon- NO. Some feel few milliseconds latencies or more for market data still suffices. • Prime Ex: Low Freq, alpha-seeking per fundamental analysis for long term holds Others strive for nanoseconds. Key is to ID your business goals & ROI on ULL spending. Specialization and function are significant factors. Goals of Prop Traders, Market Makers, HFT traders, arbitrage, are much more latency sensitive than most Buy side and asset managers. Equites & Futures traders are more apt to opt for ULL FX not so but getting closer; Bond and Commodity trading much less to choose ULL. Why is pure speed still significant? • Industry leadingTick-2-Trade latencies have decreased by factor of 10 every 3 years • ULL trading firms continue to prioritize speed. • Buy Side tracks latencies with Sell Side (SS) and routes to fastest SS firms • Buy Side has accelerated adoption of SS-like ULL technologies; hence, proportion of trades to non ULL firms is decreasing Top Trends Past 2 years, continuing • • • • • • • • • • In 2017, Nasdaq will decrease time to disseminate SIP (Consolidated Market Data) from 480 uSecs to 50 uSecs, then 20 uSecs; • Almost all Trading Firms actively use the SIP • Trading Firms will optimize infrastructure for quicker access to SIP More firms & algo applications are upgraded for single digit uSecs for Tick-2-Trades Speed enhancements in FPGA’s, GPU’s, Servers, Caches, Middleware continue to lead to lower trading latencies • FIX Engines and Mkt Data order books in FPGA NICs • Simple algo’s, along with FIX Engines, SOR, Risk checks – 100% FPGA ready Increasing # of vendors now compete in space of ULL appliances that feature parallelized processing using 2 or more of following: • FPGA’s, GPU’s, Intel Cores, ULL Switch capabilities Some ULL Switches now transmit Market Data to subscribers in 5 ns 48 port ULL switches are inexpensive (under $20K) - easier to meet ROI Kernel Bypass is now ubiquitous; I/O’s are at approx. 1 uSec, down from 12 uSecs • Increased use of FPGA based kernel bypass to sub uSec I/O latencies More Trading firms embracing FPFGA’s for deterministic latencies GPU’s remain important speed factor but more for risk (ex MonteCarlo) & analytics Additional Top Trends (Cont) Very early or more longer term strategic • Real Time Big Data (Machine Learning), some in Cloud, now send more time sensitive alpha trading signals over high speed interconnects to Trading Apps • Another reason to speed up trading systems • • New Binary FIX protocol, expected in 2017, will further decrease latencies FPGA programming is becoming easier and is increasing in use cases • New Intel libraries and C like A++ language a major reason Matching Engines 100% in FPGA • CoLo App – using L1-3 ULL switches Optimized for lowering network deterministic latencies from as much as 1+ ms to under 1 uSec; pass Mkt Data to Pure FPGA based Algo App may result in 1 uSec T2T TV’s – Nasdaq,BATS, Arca, Dark Pools …………….. Exchange Switch Orders to/from TV ( 70 ns) MD from TV’s Servers- AlgoTrd, SOR, Risk, C-Conn, MD fanout Internal app processing – separate opportunity to speed up Clients C1, C2, C3, … MD fan-out (5 ns) Client Connectivity Switch Clients conn ( 70 ns) Out to apps: (5 or 70ns) Pure FPGA Based Solution Optimized for lowering network deterministic latencies from few ms to under 1 uSec:

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download a-team-dec-2016