Building Software for Imperfect Customer Data Scenarios
Voxsmart Trade Reconstruction. The 10000hp turbo diesel locomotive that runs on triple distilled ethanol, unfiltered chip fat and everything in between.
From our experience providing automatic trade reconstruction solutions, there are two common customer concerns:
1. Data availability
2. Data quality
Both are relevant to automatic trade reconstruction initiatives. We need access to the data to structure it, and the data we access needs to resemble communication and trade data.
However, data quality is often a moot point. Vendors might cry wolf on data quality, but customers often reply, “that’s the data.” This requires us to be flexible.
We’ve evolved to deal with data as it is. We don’t complain if the data becomes cleaner, but we crucially chug through it as it's given. Our trade reconstruction products used to be sensitive to data quality, often pressuring upstream customer data lakes to improve their feeds — which was interesting to watch unfold. These days, this is less common due to a 'diversification' in our software's palette and appetite.
Communications and trade data contain many fields and important attributes (participants, timestamps, emojis, quantities, currencies, tags, NFA flags). There are many points where issues can arise, and a software system that requires perfect data will never perform well in realistic settings, especially in the realm of automated trade reconstruction.
Although banks are always striving for data quality, we often meet customers partway through their journey. It’s the “realpolitik” of data quality that forces us to build logical redundancy into our matching algorithms. This not only helps us deal with the complexity of trading relationships but also with the inevitability of data quality issues—such as missing fields, duplications, or sanitizations.
Notable examples include trade data with incorrectly provided or redacted notional values. We have routinely reconstructed trades accurately, despite incorrect trade attributes being provided. Trades with brokers and dealers containing incorrect values often try to trip up the automated matching mechanism, but to no avail.
My takeaway: don’t worry about your data quality. It’s never too soon to start adopting solutions that drive automation and efficiency gains.