r/12winArenaLog is a subreddit that collects list of Hearthstone decks that had 12 wins in Arena, which is the best you can do! I wanted to read the data from the subreddit using the Python Reddit API Wrapper (praw), crunch some numbers and use matplotlib to graph them but the issue is that there's basically no formatting rules so everyone writes the list the way they want! They usually look something like this though:
- Cold Blood x 2
- 1x Argent Squire
- 2x Shattered Sun Cleric
What I tried to do is basically get the data, find the part that says the number of copies of that card in the deck (i.e. 2x), store the number and then remove that bit from the card name. This script worked on about 80% of the posts from the subreddit; some of them are just too weirdly formatted. They'll be automatically disregarded by the graph script though so no biggie (For example if the list doesn't have exactly 30 items it's broken, same if it's 30 and it contains empty strings, etc). It's all available on GitHub, so definitely send a pull request if you have a way of recognizing when the number of copies is inside brackets (Spellbreaker (2)), I'll probably fix it sometimes next week but I don't have too much time on my hands right now and I have other projects to work on. I added the annotated source code on Genius.com if you're interested.