paper-detection


Algorithmic Detection of Computer Generated Documents

Computer programs designed to generate documents which look like academic papers have been used to expose a lack of thorough human review in several conferences and journals. The documents include figures, formatting, and complete sentences which seem on a shallow overview to form a genuine paper. A human attempting to get meaning from such a paper may realize that there is no coherent flow of ideas, and indeed that the paper is simply a well formatted combination of randomly selected keywords.

A human familiar with the apparent subject matter of a paper can classify computer generated papers as such with great accuracy . The question then arises as to whether we can identify computer generated documents without resorting to an attempt at true understanding by a well trained human. We propose an investigation into several potential methods for differentiating between computer generated and authentic documents based on techniques from machine learning.

Ultimately, I'll be writing a Python web service (most likely running on Google's App Engine) which will allow users to submit documents for real time classification. That's where this project comes in. Stay tuned for updates at http://paperdetection.blogspot.com/

Project Information

Labels:
python appengine machinelearning