My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Links

Problem Statement

Facebook has become one of the largest social networking sites, comprising the sixth most trafficked site in the US (ComScore, 2008). One of the features of interest in facebook is the ability to create groups. Groups allow facebook account holders to join. An account holder can be a member of multiple groups.

The primary goal of our project is to analyze patterns within group membership. Essentially we will be performing a market basket analysis on a subset of facebook’s groups. For the purposes of the analysis the “transaction table” is to be defined as a single transaction per person, with the “items” being the groups to which that person belongs. For reasons of Privacy (and because the data is simply not available, depending on an account’s privacy settings) only the user’s ID will be collected.

The information mined may provide interesting information for cross-marketing purposes. For instance, a non-profit organization providing services to run-away youth finds that members of the Green Party group tend to belong to groups with the subtype “Service Organization”, the organization may find a connection with their local green party a profitable synergy.

The significance of this study is primarily of academic interest. In addition to the generation of rule sets using exiting tools such as weca, and the trial versions of three commercial tools, we will also be attempting to implement the association rule algorithms using PL/SQL, which provides a platform that works directly within the ORACLE database, providing an efficient SQL capability along with the ability to procedurally process where SQL does not lend itself to the task (e.g. Candidate itemset generation).

The implementation will be decidedly targeted directly to the dataset and not suitable for general purpose association rule mining, but there may be an opportunity to generalize the methodology should it prove to be a significant performance improvement over existing 3rd party tools.

The existing method we will be using to mine the data is Weca. It can be configured to access the ORACLE database which will be housing the dataset. In addition, two commercial products will be used to help find the association rules that exist within our dataset.

Primary Objective

Examine information found within the user-is-member-of-group relationship. We create a Java Module to Extract data from Facebook nodes and populate any 3rd part data model.

  • Raw data has gone under several pre-processing stages to fit to all the application needs.
  • Treat each user as a transaction

Each group to which the user belongs is an item in the transaction

  • Examine several tools publicly available or commercial tools with a trial period.
  • Weka apriori: an application by Christian Borgelt implementing frequent item set mining via Apriori and Eclat http://www.borgelt.net/apriori.html

Secondary objective

To implement the apriori algorithm in PL / SQL

  • We completed large item set generation
  • Decided to postpone rule generation due to time limitations.

Future Development

  • Implement Jaccard Coefficient by using pivoted data from FB_Transaction table.
  • This will be a good starting point to identify distance between users.
  • lot those distances integrating our implementation along with JUNG - Java Universal Network/Graph Framework.
  • Re-Analyze the indexes that we created on our tables.
  • Improve performance on our PL/SQL implementation.
Powered by Google Project Hosting