About a year and a half ago, we exported our first “data dump” of all the parsed data we’d collected since OpenDota started operating, which consisted of over 3.5 million matches ranging from January 2015 to December 2015.

After a series of adventures, we’re happy to announce that we’re finally ready to release a second data dump of over a billion matches, this time with information ranging from March 2011 to March 2016!

Notes

  • The data is split into three torrents, each representing a PostgreSQL table.
    • matches: This contains match level data such as start time, cluster, and game mode.
    • player_matches: This contains data for each player in a match, such as kills, deaths, and gold per minute.
    • match_skill: This contains the skill bracket assigned by Valve (normal, high, very high).
  • There are 1,191,768,403 matches in the data set, which contains public matches played between March 2011 and March 2016.
  • The majority of this data is from the Steam WebAPI (from GetMatchHistoryBySequenceNum).
    • Typically, it takes months to fetch the data from the API, so we hope that this torrent helps expedite that process.
  • A small fraction of these matches have extended data in matches/player_matches.
    • There are approximately 10,000,000 parsed matches in this dataset.
    • They are matches that have had their replays parsed on OpenDota at the time of the dump.
  • Skill bracket data is only available for matches after May 2015 (which is when we started collecting it).
    • Data is available for about 132,000,000 matches.
    • Skill bracket data relies on repeatedly querying the GetMatchHistory Steam API endpoint with the skill parameter passed and is only available for a short time after a match is played.
  • Additional information about the formatting of these tables can be found here.

Samples

If you would like to preview the data, the following links are 4GB “samples” of the large data set.

A huge thanks to @waprin for creating these samples!

  • matches sample: LINK
  • player_matches sample: LINK
  • match_skill sample: LINK

Closing Thoughts

As always: if you do any interesting machine learning on this dataset, let us know; if you publish or post your findings anywhere, please cite us; and if you seed any of these files for any length of time, thank you!

Thanks for bearing with us for so long to get this dataset released! :)