Apriori algorithm

It is an algorithm for frequent item set mining and association rule learning over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

Read more on wikipedia.

Source code

Steps in my implementation

  1. Read user expenses from table transactions.
  2. Create tuples (user, month, year, category, status) in table user_expenses.
    Status means how much money a user spent for each category each month. It can be one of the following values: L – Low, M – Medium, H – High.
  3. Generate rules using Apriori algorithm.
  4. Insert rules into tables associations_conditions and associations_implications.

Installation

  • Install PostgreSQL and create a database with a table:
    transactions (id serial PRIMARY KEY, date date, amount integer, category bigint, user_id bigint)
  • Extract /Visual Studio project/pgsql/pgsql.zip – it contains PostgreSQL library.
  • Copy pgsql/bin/ content to the apriori.exe location.
  • Run:
    apriori.exe dbhost dbport dbname dbusername dbpassword min_confidence min_support

Output

There will be created two tables with generated association rules:

  • associations_conditions
  • associations_implications

If logging is turned on, all association rules will be also written into logs.txt file.

Remarks

  • Debug is very slow, so use Release for a better performance.
  • To generate random transactions, define RANDOM_TEST in main.cpp.
  • Read this to understand Apriori algorithm.