These are all the English ads of sufficient resolution which did not already have a transcript.
From: Edward Eugene Baptist <[email protected]>
Sent: Wednesday, April 29, 2020 12:30 PM
To: Eric Anderson, CFA <[email protected]>; Brandon T. Kowalski <[email protected]>
Cc: Bill Perkins <[email protected]>;
Subject: Re: [External] 11,743 Transcriptions
Wow! Quick question, are these “new,” are they corrected versions of transcripts already crowdsourced into the DB, or a mix of both?
Thanks,
Ed
PS: Bill and Eric, if you are up for another call, let me know. I have a favor to ask, based on your deep engagement and ideas for improvement of the database.
Edward E. Baptist
Professor, Department of History
450 McGraw Hall
Cornell University
Ithaca, NY 14853 USA
From: Eric Anderson, CFA
Sent: Wednesday, April 29, 2020 1:27 PM
To: Edward Eugene Baptist; Subject: 11,743 Transcriptions
These represent ~1,174 hours of work from 398 different MTurk workers. We did two transcriptions per ad so the gross total is double that number of hours. You are welcome to use these however you want but I might still make some further changes to the transcripts (if I find some im not happy with I might select a different final version or make other corrections).
Right now I’m working on feature extraction, I’m starting with an AI extraction but ill probably feed the results of that into MTurk for final quality control. It’s going to take me some time to complete the model but my proof of concept version is so far giving me decent results for a limited number of features:
Feature extraction will occur at the ad level, then we will use the most popular response per event group as final features, that way we can reclassify ads to different events (if needed) without having to redo any feature extraction and see the downstream results right away.
The features I’m going to extract are:
-Name
-Age
-Sex
-Skin tone/complexion (three categories: light, average, dark)
-Height
-Weight
-Eye color
-Scars or identifying marks
-Language spoken
-Reward offered
-Secondary rewards offered
-Time of year runaway event occurred
-Runaway location origin
-Occupation or skill set
-Enslaver name
Let me know if you want me to add anything. I’ll begin work on splitting up the new ads you sent me after I’ve made some more progress on feature extraction. Ill touch base as I have new results to share.
Thanks,
Eric Anderson, CFA
Quantitative Analyst
Skylar Capital Energy Global Master Fund LP
(713) 341-7985 work
(281) 606-9371 cell