Dealing with messy transaction data

Who’s Akahu?

Akahu is an open finance platform. Through Akahu you can invite consumers to connect financial accounts to your product, and can then interact with those accounts programmatically via our API. So you can access ongoing feeds of transaction data like myRent, or display aggregated balances like PocketSmith, or initiate recurring payments like Sugar Wallet.

Our first job is to wrap a giant API around New Zealand banks and other financial institutions to enable account connectivity. Our second job is to enhance the raw data that we retrieve through those integrations.

Show me the raw data

Here’s an example of how a card purchase at Mobil can show up in various bank accounts.

That same transaction generates different raw transaction data at each bank. For simplicity, we’ve limited the table to 5 banks and ignored alternative payment methods like BNPL, direct debit, online bill payments, and direct credits.

There are many factors that can affect the data that gets printed on a bank statement. Key variables are the network processing the transaction (like Visa, Mastercard, or EFTPOS), whether it was a contactless card transaction (like Paywave), and most importantly, how the bank decides to expose the data. The table above is by no means an exhaustive list of how a card transaction at the Mobil outlet on Karangahape Road could appear.

But surely there’s some sort of merchant ID attached to relevant transactions?

That would make our lives so easy! Unfortunately not. We have to work from that raw data above.

Ok so how do you get started?

You’re now entering our “enrichment pipeline”.

Our system starts by running the raw data through a bank-specific parser. Since we’re very creative we call this the “raw” step. This process normalizes the raw data received from the bank into a common format that the remainder of the pipeline can work with. Sometimes the data is nicely structured which makes this process easy and highly reliable, but sometimes all we receive is a date, amount, and description.

There are two important parts of this “raw” phase.

First, we identify the type of the transaction (such as credit card, EFTPOS, direct debit, transfer, direct credit, interest, BNPL). We do this by mapping the type we receive from the bank to one of the types we support (for example “DD” -> “direct debit”), or through a set of pattern matching rules that run over the description. For a small percentage of transactions we're unable to confidently identify the specific type of transaction, and in that case we fall back to either “credit” or “debit”.

Second, we clean up the transaction description to remove anything which doesn’t add any meaningful information about the transaction. For example we remove card numbers from card transactions.

There are all sorts of patterns that can be picked up when you see a large transaction dataset. For example some transaction descriptions are limited to 12 characters. Or if the purchase was made via Afterpay then the description might just say “Afterpay”. Or if the purchase was via a Shopify store then the raw description will be prepended with “Sp * ”. Transfers regularly contain phrases like “Tfr to x”. When relevant, this information is extracted from the description and added into structured metadata.

Doesn’t sound like there’s much enrichment going on

We’ll get there soon I promise!

Now that the raw data has been cleaned up, we filter out “unenrichable” transactions that don’t potentially involve a business or government counterparty. For example an automatic payment to your landlord, or repaying a friend for drinks, or making a transfer to your savings account. There’s nothing to enrich with these transactions.

Ok let’s keep moving

Thanks for bearing with us through the cleaning and filtering phases. Now let’s talk about merchants.

Akahu maintains a database of New Zealand merchants. We enrich each merchant with a category, logo, pretty name, and other details. So when a transaction gets assigned to a merchant, there's a richer context for various stakeholders that might receive access to the enhanced data from the consumer, such as the bank, a lender, or a personal finance app.

How do you assign a transaction to a merchant?

Whenever we see a unique raw transaction description like “SP * MARLE MOUNT MAUNGAN”, we create a unique identifier for that exact string. If that identifier is assigned to a merchant, then any new transaction with the same raw string will get assigned to the merchant.

Yea but in that table above, there are multiple permutations of every transaction?

That’s right - we need to see a transaction with every combination of bank and payment method before this process works well for any given merchant.

But now that our system has seen millions of transactions, we enrich the vast majority of enrichable transactions based on the various unique permutations that we’ve already seen and set rules for.

Ok but what if your system doesn’t recognise the raw transaction description?

The transaction will then cascade through a number of different methods.

We’ll run the description through a string pattern matching system which contains rules that identify a common merchant with certainty. For example:

riot.*dublinie.* OR riotlo1.* OR riot.*dublin.*

does a decent job of identifying card and EFTPOS payments to Riot Games. These rules are carefully crafted and back tested against all transaction descriptions in our database to make sure they only match to the target merchant.

We also look at the destination bank account to see if it can be confidently linked to a merchant. We don’t always get the destination bank account in the raw data, but if we do then we may find a match. Especially for larger billers like Tauranga City Council that might publicly publish their bank account number, and therefore we can corroborate any user-generated payment descriptions that don’t provide enough certainty on their own.

We also have a /support endpoint to enable apps and users to tell us if we’ve missed a transaction that’s enrichable (we look for corroborating evidence before acting on these suggestions).

Does that catch everything?

No, there are still plenty of enrichable transactions that fall through the cracks, and these transactions represent our opportunity to improve the system.

It’s time to introduce our data analysts. All unenriched (but enrichable) transactions join a queue for manual review from our data analysts. They’ll look for clues in the raw data, and search for corroborating evidence, to see if we can confidently assign a merchant to each transaction in the queue. This manual review process feeds the automated processes described above - assigning unique strings to a merchant, assigning a destination bank account to a merchant, and creating new pattern matching rules.

Our analysts use internal tooling to rapidly work through the unenriched queue, including a system that suggests likely merchants for each transaction based on similar raw transaction data. This tooling makes the manual review process fast, so we can manually enrich up to tens of thousands of transactions every day. Each of these manual enrichments enables our system to automatically enrich the next transaction that comes into the pipeline with the same raw data.

So are you telling me that you can enrich every enrichable transaction?

No. Even after the various software methods and human review, there are still transactions that can’t be enriched. For example if the raw data is limited to ”PARAPARAUMU T”, then no amount of software or human magic can conjure a merchant from that data with confidence. So some transactions will remain unenriched.

Why don’t we use AI?

New Zealand has a relatively small number of banks and merchants. This makes it possible to get very high enrichment coverage and accuracy without using a black box approach, like the ones commonly used in large markets like the US.

The benefit of our approach is that every time we enrich a transaction, we have a very high level of confidence that our enrichment is correct, and we can see exactly why (and who!) created the rules which led to that enrichment outcome. A hybrid approach of software and humans will get us as close to the performance ceiling as we can go.

We’re always looking to improve our enrichment processes and we continue to experiment with AI components and other methods, but the vast majority of our enrichment will always rely on this rules-based approach in order to get high accuracy.

And what about Buy Now, Pay Later?

The percentage of purchases via BNPL has risen incredibly quickly in New Zealand. It’s proving a headache for lenders who are trying to understand whether an applicant’s spending is “essential” or “discretionary”, what the BNPL’s “credit limit” is for the applicant, and how many BNPL providers the applicant is registered to use.

Often our data cleaning process can reveal the underlying merchant in a BNPL transaction. But BNPL transaction data can be tricky because the raw description may simply state the name of the BNPL provider. Or it might take up most of the available characters in the transaction description, which leaves very little room for other data that might identify the relevant merchant. In these situations, the only option for a lender to get better information is to request access to the BNPL account via Akahu in order to match up those purchases to the bank account data.

Can this enrichment pipeline be used without connected accounts?

Our enrichment pipeline was originally designed for users who connect their bank accounts to products via Akahu. However we started to get interest in using this capability as a standalone API service. For example it’s useful for organisations that retrieve and parse bank statements, or who already hold their users’ transaction data.

Genie is the name of our standalone API endpoint where you can supply a raw transaction description. We pass it through the enrichment pipeline described above and return the associated merchant with its category, trading name, logo, and other details.

If you’re still reading…

Perhaps you’re dealing with the headaches of messy transaction data? If so, let’s talk.

Or if you’re a developer and interested in solving these types of problems, we’re always on the lookout for talented people.

Talk with us

Our team is here to answer any questions that you may have.

Get in touch