USA elections 2016

Project data and workflow

Contents

  • Data: Initial data set. It consists of 4 files, with the tweets and users for each contender.
  • Classifier: The naive Bayes classifier (a jar file based on Apache Spark ML), including the training data
  • Scripts: The scripts that, executed secuentally, constitute the workflow
  • Workflow: PDF file discussing each step of the workflow and showing the code
  • hr.py: Python code that obtains the thresholds for the opinion labels and their averaged precission

Requirements

The workflow requires MongoDB 3.4 or higher running locally. The classifier requires Java JRE 1.8 or higher. Finally, the Python program that obtains the thresholds requires Python 3. Please send any comment to rafacr@ucm.es