I’ve attached a csv of the cleaned ads from the East Texas Archive website. I’m going to create a google drive to store the images on later and you will be able to find them with the ‘image_path’ field. I had to remove a lot of the entries from the East Texas scraped data. About half of everything on the site is not a runway or jailer ad, there are also a lot of entries from stories written about encounters with salves in Texas. While these are interesting to read they do not give any information about runaways so I removed them from this dataset.
Ad_id is my unique ad id, I would like to map this to whatever [advertisement.id] you end up generating for the ads so please send me a mapping once that is done.
Source_ad_id is the identifier from the East Texas site.
I joined my newspaper table with this one for the sake of simplicity so you will need to get the unique instances of the newspaper columns and return back to me a mapping for your new newspaper ids.
Let me know if you have any questions.
Thanks,
Eric Anderson, CFA
Quantitative Analyst
Skylar Capital Energy Global Master Fund LP
(713) 341-7985 work
(281) 606-9371 cell