Hey Bill, here is a quick update on the Runaway project:
-The Turks have finished transcribing about 7,000 ads. I’ve broken the ads up into batches of 1,000 ads (2,000 jobs) and post them one at a time because ive discovered that if I post too many at once the turks will do all the easy transcriptions and skip difficult ones leaving the batches only ~75% done. At this rate I expect we should have all 14,000 ads transcribed sometime mid next week if they keep this pace up.
-I am pushing the ads through our data cleaning process as the batches complete, so far I’ve found no problems but I’ll continue doing spot checks just to be sure. I designed the system to be very forgiving if we need to adjust which transcript is the best or how we are grouping into events.
-After some research this week into various text feature extraction projects I’ve started building a POC model which will hopefully act as the framework for every feature extraction task we want to run. I’m doing my initial test on rewards. I want the AI to read the entire ad and tell me all the different rewards offered. This is harder than it sounds because one ad usually offers multiple rewards and makes multiple mentions of the same reward, if the AI is successful it will tell me all the unique rewards and where in the text they are mentioned. This is the same kind of problem that will need to be solved with all feature extractions, one runaway could be mentioned multiple times or different runaways could be mentioned once, a successful AI will know the difference.
I’ll continue to touch base as progress is made.