Database crawler
Thought this would be helpful in case anyone wants to create a local repository of items ;)
E: Parser would be better terminology... This script goes through all pages on either Blizzard's or Wowhead's site and retrieves all information about every item with an ID in the range start and finish and stores the information in a pickled dictionary. Python 2.7 Code:
import asyncore |
Interesting. I downloaded the Windows build of Python 2.7.6 to try this out, but it wasn't working for me. After adding a few print statements to the main loop, I determined that consumer.text is always an empty string. Am I doing something wrong?
This is the version I'm using: Code:
import asyncore On a side note, since Battle.net uses a completely different HTML layout for its item pages than Wowhead, the script won't recognise the start of an item box in Battle.net item pages (Battle.net uses <h2 class="color-q*"> for its item names instead of <b class="q*"> [where * is a numeric quality index]). Finally, is there any particular reason you're scraping web pages instead of using the official item data web APIs provided by Blizzard and Wowhead? |
I'm just curious ... wouldn't an ingame query via GetItemInfo/scanning the tooltip deliver the same data?
|
Quote:
|
Quote:
And yes. I don't play on retail WoW, so the data I would be scraping wouldn't match per version. I couldn't explain why consumer.text is a null value. I was having some trouble getting it to work as well over the loop. I just ended up testing the asynchronus connection to make SURE I was able to pull data at all from another page, then ended up rewriting the logic of when it pulls. I didn't identify where the real issue was, but I believe it was because I created the consumer object outside of the while loop. |
I just did a few tests, and it seems as if wowhead is either disconnecting while trying to return data, or it returns nothing. I did post this script on the wowhead site so I wouldn't put it past the admins to disallow this type of crawling to prevent DOS, etc.
Blizzard's site works fine, the DB I use for my version of WOW works fine. |
Noe that the game DBC actually already has a lot of item data.
Depending on the type of crawling you need to do, often you might have enough data by reading what files Blizzard use for their internal client database. Just saying! :) |
Quote:
|
Also, there is a RESTful API to query item data on battle.net: http://blizzard.github.io/api-wow-docs/
Example for castlebreaker bracers: http://eu.battle.net/api/wow/item/103759 (Replace eu with us for US server data) Edit: Similarly, Wowhead has its own thing as well, adding &xml to any wowhead url will give you the data in XML, without all the page markup: http://wowhead.com/item=103759&xml |
My question is for self education. I tried to Google for code to read and display a pickled DB, but me with few examples I could tweak. Would you mind posting some code that reads the db file you create? I realize there are better ways to do this, but I am dabbling in Python.
|
This will look for all items with a specific slot, and stat (such as intellect, or increased spell damage).
Code:
import pickle Code:
import pickle |
Quote:
|
All times are GMT -6. The time now is 07:49 AM. |
vBulletin © 2024, Jelsoft Enterprises Ltd
© 2004 - 2022 MMOUI