Runaway Project Start

I was asked to work on an AI that will pull data out of these ads and present some data and analysis?

Questions to be answered?

How many?

By year

By state

What time of year

Male /female

How many mulatto/mixed (insight into rape)

Any other signal you may be able to get.

I would be honored to work on this project.

Here is how I see the project working:

It looks like some of the articles are already transcribed and some are not. Assuming there is a sufficient sample of transcribed articles I will only need to scrape and database and process into a model, otherwise I will need to add a transcribing project to the below. There are some transcribing services through google that might work depending on the quality of the image.

  1. Data collection
    1. Scrape data and relevant characteristics and load into a database
  2. Create Training Dataset
    1. Select a sample of articles, add data labels (Amazon Turk?)
  3. Build Proof of concept model
    1. Share results and get feedback
  4. Set final model
  5. Refine and expand model to reveal more characteristics & improve accuracy.