This was a quick script I made to pull links from yahoo search using the boss search api, and then list the unique domains.
If you want the entire links, just modify so that the whole links are appended to the list. Yahoo does not allow to get all the results, but only a certain predefined number so this code only extracts about 800 domains. But it is still good enough for a start and for most uses.
I am also working on getting citation values for google scholar for a friend. I will post that soon here. Heres the code for now.
#! /usr/bin/python import urllib,json from urlparse import urlparse yahoo_application_id="Ht18VqTV34EMRWTJKOOh4rNBWTqkrjTSSQj9JwWlsqTMK41_3oFWFnhivJipX0wnvU4qzXc9VAw-" nextresult=0; links=list() linksdump=list() #print yahoo_application_id #print "http://boss.yahooapis.com/ysearch/web/v1/Jeba+Singh+Emmanuel?appid="+yahoo_application_id+"&format=xml" while(True): print "trying result from " + str(nextresult) f = urllib.urlopen("http://boss.yahooapis.com/ysearch/web/v1/search+engine+optimization+software?appid="+yahoo_application_id+"&format=json&count=100&start="+str(nextresult)) ss=json.JSONDecoder() ssjson= ss.decode(f.read()) #count=ssjson["ysearchresponse"]["count"] #start=ssjson["ysearchresponse"]["start"] totalhits=int(ssjson["ysearchresponse"]["totalhits"]) print totalhits for x in ssjson["ysearchresponse"]["resultset_web"]: url= x["url"] o = urlparse(url) linksdump.append(url) link = o+"://"+o if link not in links: links.append(link) nextresult=nextresult+1 if (nextresult>10000): break print "Obtained results: " + str(nextresult) + " of which " + str(len(links)) + " were unique." for x in links: print x
Cool huh? If you want any help modifying this, drop me a line.