This example includes several source files and queries that illustrate an Anti-Money Laundering (AML) use case. It incorporates reasoning and (optionally) virtual graphs.
Included are "tables" (csv files) for People, Companies, Addresses, Bank Accounts, Shares of company stock, and financial Transactions. The data for Transactions can be loaded either into Stardog along with the rest of the data, or loaded into an RDBMS.
To set up an environment for these examples, (install Stardog, then) open a terminal that has $STARDOG_HOME/bin
in the
path. Set the current directory to the directory for this example. The create_db.sh
script will create a Stardog DB with
the name fraud_demo
and import data from the files in the current directory.
If you wish to work with virtual data from MySQL, you can modify the script to comment and uncomment the following lines:
# Uncomment this line to import transactions into Stardog
stardog-admin virtual import $DB transaction-mapping.ttl aml_dataset_transactions.csv
# Or, comment the prior line and uncomment these lines to load the transactions into MySQL
# Alternatively, can use load_transactions.sql instead of load_transactions-mysql.sql
#mysql -uadmin -padmin -hlocalhost stardog < load_transactions-mysql.sql
#stardog-admin virtual add aml.properties transaction-mapping-sql.ttl
Before adding the virtual graph, you will need to modify the connection information in the aml.properties
to match your
server. You will also need to obtain the MySQL Connector/J jar from https://dev.mysql.com/downloads/connector/j/5.1.html
and copy the mysql-connector-java-5.1.45-bin.jar
file to $STARDOG_HOME/server/dbms
.
If you wish to use a different RDBMS server you should update the load_transactions.sql
file, replacing backticks
with double quotes, changing identifier casing, etc. as needed to conform to your platform's requirements, or use your
platform's bulk importer to load the aml_dataset_transactions.csv
file. You may also need to make similar changes for
quoting and casing to the transaction-mapping-sql.ttl
mapping file. Lastly, ensure the jar file for the JDBC driver is
copied to $STARDOG_HOME/server/dbms
and update the aml.properties
file accordingly.
Once you have the data loaded, try running the queries from the queries.sparql
file, as well as attempting your
own queries. You can alter the data for testing and simply rerun the create_db.sh
script to reload the data.
The goal of the queries in this example is to identify fraudulent financial activity. Fraudsters can employ countless strategies to obscure their activity, often incorporating multiple financial institutions from several countries. This example looks to find activity that matches a general pattern in which money flows from an "originator" through several affiliated entities (other People or Companies), where this institution knows of the affiliation but not necessarily has evidence of actual money being transferred. It follows money transferred from one of these affiliated entities' accounts into a second account. It then looks for several more associations from that second account to an ultimate "beneficiary."
During normal business, such transactions happen all the time. In isolation, everything can look equally suspicious (or benign). The signal we will use to indicate suspicion will be the presence of multiple such transfers of funds through unique sets of affiliates.
For this example, two entities (People or Companies) are affiliated if they share an address or if one owns shares in the other. Both are bidirectional (symmetric) relationships.
To identify candidates for investigation, we will assign a score based upon the product of:
- The total amount of money transferred between the originator and beneficiary
- The square of the number of the unique paths from originator to beneficiary
By squaring the number of unique paths, we reduce the tendency for this approach to flag all large transactions as suspicious, drowning out suspicious activity involving smaller dollar amounts.
To simplify the construction of these queries, we employ reasoning to define a :hasAffiliation
relationship. It is
defined in the aml_rules.ttl
file:
IF {
?x :holds ?holding .
?holding :company ?c .
?holding :share ?share .
filter (?share >= 50)
}
THEN {
?x :hasAffiliation ?c .
?c :hasAffiliation ?x .
}
IF {
?x :hasAddress ?a .
?y :hasAddress ?a .
filter (?x != ?y)
}
THEN {
?x :hasAffiliation ?y .
}
When running the queries in the example, make sure that reasoning is enabled any time the query references the
:hasAffiliation
relationship.
This example uses a small amount of data - 1000 each of People, Companies, plus their Addresses and holdings (Shares). Add to this the Transaction data, which is much larger - 23,259 in this example - but can be much larger in real scenarios. This can be a problem not only because of the sheer size of the data (its Volume), but also because it is constantly changing (its Velocity). These two qualities suggest that the transactions could be a candidate for virtualization.
During development of this example, I started with 100% native (materialized, not virtual) data. Once the queries were developed, I modified them to work with virtual transactions. It turned out that for the queries in this example, Stardog needed to look at 100% of the transactions to identify the paths and compute the scores. The final compromise was to use virtual transactions for all activity after a recent date, and to roll up all older transactions into a summary transaction for materialization in Stardog. Queries for all three approaches are included in the example queries.