|
Project Information
Links
|
Problem StatementFacebook has become one of the largest social networking sites, comprising the sixth most trafficked site in the US (ComScore, 2008). One of the features of interest in facebook is the ability to create groups. Groups allow facebook account holders to join. An account holder can be a member of multiple groups. The primary goal of our project is to analyze patterns within group membership. Essentially we will be performing a market basket analysis on a subset of facebook’s groups. For the purposes of the analysis the “transaction table” is to be defined as a single transaction per person, with the “items” being the groups to which that person belongs. For reasons of Privacy (and because the data is simply not available, depending on an account’s privacy settings) only the user’s ID will be collected. The information mined may provide interesting information for cross-marketing purposes. For instance, a non-profit organization providing services to run-away youth finds that members of the Green Party group tend to belong to groups with the subtype “Service Organization”, the organization may find a connection with their local green party a profitable synergy. The significance of this study is primarily of academic interest. In addition to the generation of rule sets using exiting tools such as weca, and the trial versions of three commercial tools, we will also be attempting to implement the association rule algorithms using PL/SQL, which provides a platform that works directly within the ORACLE database, providing an efficient SQL capability along with the ability to procedurally process where SQL does not lend itself to the task (e.g. Candidate itemset generation). The implementation will be decidedly targeted directly to the dataset and not suitable for general purpose association rule mining, but there may be an opportunity to generalize the methodology should it prove to be a significant performance improvement over existing 3rd party tools. The existing method we will be using to mine the data is Weca. It can be configured to access the ORACLE database which will be housing the dataset. In addition, two commercial products will be used to help find the association rules that exist within our dataset. Primary ObjectiveExamine information found within the user-is-member-of-group relationship. We create a Java Module to Extract data from Facebook nodes and populate any 3rd part data model.
Each group to which the user belongs is an item in the transaction
Weka apriori: an application by Christian Borgelt implementing frequent item set mining via Apriori and Eclat http://www.borgelt.net/apriori.html Secondary objectiveTo implement the apriori algorithm in PL / SQL
Future Development
This will be a good starting point to identify distance between users.
|