Wednesday, 7 August 2013

Scraperwiki Script Size Limits

Scraperwiki Script Size Limits

I'm a beginner scripter/coder. I have built several scrapers sucessfully
using classic Scraperwiki.
My current project requires that I scrape a web database unsing a list of
138,000+ ID's. The ID's range from 10,000 to 1,300,000. In other words
they are not consecutive.
Each result pages URL is the same with the ID appended at the end:
$url =
'https://xyz.com/rpts/rwservlet?blah&report=blah_person_detail.rdf&p_personid='
I thought I could store the 138k ID's in an array and loop through the
array to build each fullUrl:
$id = array ( 123, 125, 129...)
$fullUrl = $url . $id
I expect this would work, HOWEVER, Scraperwiki (the web server or my
browser) does not like such a large paste operation and essentially times
out without pasting anything.
I'm wondering if there is a way to store the id array as an external file
and reference it from my scraperwiki script.
Any input is much appreciated!

No comments:

Post a Comment