Instead of fuzzy labels like 'indie' and 'AA', we propose a new data-driven system to classify video games based on their actual production scope. This new framework reveals the real economic and creative trends shaping the video game industry.
*SUPER* Interesting Analysis! I love this approach. While I don't think the value of *~vibes~* will ever be completely irrelevant to these distinctions, providing a quantitative classification like this should absolutely be a sanity check *at least* and it's great to finally see one!!
This was so exciting to me that I put on my journal club hat, and I have a few follow up questions. Apologies for going obnoxious academic mode!
1) Would love to know more about feature importance and see a list of all the features used in your PCA analysis. If game size/credit count are most important, how much less important was your third feature? Game length is quantitative and seems super relevant to classification from a *~vibes~* perspective, curious where that fell in feature importance? (I love the SHAP technique for feature importance, but that's me)
2) if k-means preferences spherical clusters, what if less-spherical clusters yield a better fit? That would likely mean using a different k-optimization score, but would be super curious to see-- can we identify a III cluster and prove that it's a real phenomenon?? If you used DBSCAN/Davies-Boudin instead of k-means/CH, do you get more clusters?
3) There is so much variance in log game size vs. credit count goes down-- totally makes sense, smaller game = optimization is less important and you have fewer people to do it. Is there potentially some additional meaning you could get out of that spread? Does a low credit count/high game size mean anything? Is there another feature that gets more important as the size/credit count correlation starts to fall apart?
4) now this one gets political-- I wonder if country of origin has a meaningful impact on this classification. Do countries with better labor laws require more people to make them, thus skewing them towards midi/AA when the vibe is more kei/midi? What if you encoded county of origin as its World Justice Index. I wonder if that has a meaningful negative correlation to the cluster number?
5) like other comments already say, would love a publically available dataset ❤️
Super excited to read about the game budget prediction model!!
Thanks a lot for really taking the time to read and providing thoughtful questions. They definitely remind me of my research seminars :p
As a general comment, we try our best to provide analysis that maintains a high technical standard while remaining easily readable for a large audience. That's why we didn't want to go too deep into the details. Our goal is to update the figure on a regular basis (once or twice a year), so we'll definitely keep your comments in mind for the next iteration. Here is what I can share for now:
1) The importance drops off sharply after the top two features. We didn't compute SHAP values, but the two primary features (log game size and credit count) were sufficiently predictive of the clusters (as shown by the confusion matrix in the "working paper"). Regarding game length, you're anticipating our schedule 😄! We absolutely wanted to include it but couldn't within our timeline. We have Steam's average playtime data, but that metric mixes game length with quality, which we want to avoid. The other primary source, HowLongToBeat, doesn't offer a public API, meaning we'd have to scrape the data, which would, of course, infringe upon their Terms of Use 👀.
2) We initially considered other clustering methods. Ultimately, we settled on a simple K-means primarily because, from experience, it's often better to start with the most simple option unless there's a strong reason to deviate, to prevent accusations of cherry-picking. We didn't have strong evidence to suggest the clusters were non-linear, as all features align with the concept of increasing scope. Personally, I agree we might observe many more clusters, as scope exists on a continuum, and there are no hard, conceptual gaps between them. However, any additional clusters would likely be refinements of the current categories. Imo "III" is another definition for high-scope Midi games.
3) I've seen comments focus on the "under-optimized assets" channel, but I think that's the less important factor. In my view, the two main reasons for the high variance at low credit counts are:
i) Misreported Credits: Misreported credits have a disproportionately large impact on smaller games. Although the data was generally accurate, we found outliers that are hard to detect systemically. For instance, some games only report the studio's name in the credits, which causes high variance when the credit count is 1.
ii) Fixed Cost of Engine/Style: The other reason is the fixed cost in game size associated with picking an engine or style. A basic solo-developed narrative walking simulator made with Unreal will rarely be smaller than 1GB. Conversely, a custom-engine game can be incredibly small, sometimes weighing only 35MB (e.g., Animal Well). These differences tend to be washed out as the game's scope increases.
4) Again, you are ahead of us! The difficulty here lies in reliably identifying the country the country--not just of the studio, but also it's workforce. There's no perfect methodology, especially since we cannot determine the location of all studios, and we would need to account for multinational companies. Another approach is estimating the country of the developers through surname analysis. This is a method I'm familiar with: during my PhD, I used it to study ethnic inequality in India and the effect of diversity on innovation. However, outside of an academic context, this method can be viewed as controversial.
5) We may release a publicly available dataset in the future once we have more to show, but we still need to figure out the logistics. It would be a shame to accidentally infringe upon the Terms of Use of any of our data providers.
I also feel like AAA of yesterday are not AAA of today. Today for example a small team could remake an Ocarina of Time with a vastly inferior budget, which I imagine can interfere with the dataset.
Totally agree! That's precisely why defining AAA is so tricky. The term is really more about being an outlier within its respective time period, rather than a fixed definition.
Thanks for bringing some light into this. The term indie is also confusing when it comes to music. One thing I noticed is that for example the new Microsoft Flight Simulator is way smaller than before because of cloud gaming. This will be the new thing in the future, so the size on the hard drive is not representable for all games.
It's a commendable effort to bring clarity to this, though I think the classifications may need some tweaking.
1) I suggest "Solodev" being it's own category, because one guy doing everything warrants a seperation from other smaller titles in my eyes. For example: "Five Nights at Freddy's" or "The Desolate Hope" being made by -lets say- 4 people is not too impressive. Scott Cawthon doing all that alone is a whole different thing.
This category would also catch a lot of "B-Games" as they are called in "Getting Over It", games that are available, but not necessarily polished or meant to hit big audiences.
It would also filter out a lot of artistic games that were rather meant to be art pieces than commercial products.
2) The terms AA and AAA come from financing and refer to how "safe" an investment is (the gaming-sphere has simply embraced these terms more in publicity).
FIFA is for example a AAA-title, because everyone knows it sells like hotcakes independent from the actual quality. So while there is a correlation between a games budget and it's success-assessment, it does not hold true in every case, though I don't have any idea how games like "Concord" or "Skull and Bones" were rated.
Anyway that's why I suggest renaming those categories as well.
Thanks for your feedback and for sharing your insights! Here are our thoughts on your two main points:
1) Kei is pretty much solodev. Still, true solodev is quite rare, often someone else is involved for the soundtrack or some art assets. For instance, Scott Cawthon was helped by Leon Riskin for the soundtrack. I definitely agree with you about "B-games". In our analysis, they are often obscured by the fact that we rarely observe their credits because they are too unknown for anyone to report on MobyGames. I also agree about artistic games, although they are hard to spot. I believe they represent a very small share of all games, so this shouldn't significantly change the overall results.
2) You are totally right about that. Though, the terms AA and AAA are so deeply rooted that it would have been difficult to suggest alternatives. Additionally, while the original meaning relates to financial success, common usage has shifted toward scope and ambition. Skull and Bones and Concord were financial nightmares, but most people still define them as AAA due to their scope. We already proposed two new terms, and we didn't feel we had enough legitimacy to propose two more! 😅
I would also love to see the raw data. This article is amazing and the research is clearly very thought out! But I'd like to add that it might be a bit hard to meaningfully talk about any one game with these metrics since exact team size at the credits and the final game size on download might not be data as readily available. I agree that an objective metric is handy for better journalizing, but maybe vibes based metrics aren't *so* bad for casual audience ^^
Glad you found it interesting! As we've discussed in other comments, we don’t know to which extent we can distribute the raw data. We hope it will be possible in the near future!
Will you be sharing this data to interactive graphs at some point? Curious to see where some games lie, particularly Stardew Valley, which may be the biggest Kei game of them all.
Very interesting read! Yea, this could be a great way to categorize games, but I'm afraid it might be difficult to make this mainstream. "Indie" and other terms have already ingrained themselves in people's minds with whatever definitions they came across at that point.
But there's one more thing that I want to see with this graph - time. Today's AAA disk and team size are not the same as the ones for AAA games 10-20 years ago, and they won't be the same for AAA games in 10-20 years. This also applies to other tiers - what was AA a few years back might be more reminiscent of Midi now and even more so in the future.
*SUPER* Interesting Analysis! I love this approach. While I don't think the value of *~vibes~* will ever be completely irrelevant to these distinctions, providing a quantitative classification like this should absolutely be a sanity check *at least* and it's great to finally see one!!
This was so exciting to me that I put on my journal club hat, and I have a few follow up questions. Apologies for going obnoxious academic mode!
1) Would love to know more about feature importance and see a list of all the features used in your PCA analysis. If game size/credit count are most important, how much less important was your third feature? Game length is quantitative and seems super relevant to classification from a *~vibes~* perspective, curious where that fell in feature importance? (I love the SHAP technique for feature importance, but that's me)
2) if k-means preferences spherical clusters, what if less-spherical clusters yield a better fit? That would likely mean using a different k-optimization score, but would be super curious to see-- can we identify a III cluster and prove that it's a real phenomenon?? If you used DBSCAN/Davies-Boudin instead of k-means/CH, do you get more clusters?
3) There is so much variance in log game size vs. credit count goes down-- totally makes sense, smaller game = optimization is less important and you have fewer people to do it. Is there potentially some additional meaning you could get out of that spread? Does a low credit count/high game size mean anything? Is there another feature that gets more important as the size/credit count correlation starts to fall apart?
4) now this one gets political-- I wonder if country of origin has a meaningful impact on this classification. Do countries with better labor laws require more people to make them, thus skewing them towards midi/AA when the vibe is more kei/midi? What if you encoded county of origin as its World Justice Index. I wonder if that has a meaningful negative correlation to the cluster number?
5) like other comments already say, would love a publically available dataset ❤️
Super excited to read about the game budget prediction model!!
Thanks a lot for really taking the time to read and providing thoughtful questions. They definitely remind me of my research seminars :p
As a general comment, we try our best to provide analysis that maintains a high technical standard while remaining easily readable for a large audience. That's why we didn't want to go too deep into the details. Our goal is to update the figure on a regular basis (once or twice a year), so we'll definitely keep your comments in mind for the next iteration. Here is what I can share for now:
1) The importance drops off sharply after the top two features. We didn't compute SHAP values, but the two primary features (log game size and credit count) were sufficiently predictive of the clusters (as shown by the confusion matrix in the "working paper"). Regarding game length, you're anticipating our schedule 😄! We absolutely wanted to include it but couldn't within our timeline. We have Steam's average playtime data, but that metric mixes game length with quality, which we want to avoid. The other primary source, HowLongToBeat, doesn't offer a public API, meaning we'd have to scrape the data, which would, of course, infringe upon their Terms of Use 👀.
2) We initially considered other clustering methods. Ultimately, we settled on a simple K-means primarily because, from experience, it's often better to start with the most simple option unless there's a strong reason to deviate, to prevent accusations of cherry-picking. We didn't have strong evidence to suggest the clusters were non-linear, as all features align with the concept of increasing scope. Personally, I agree we might observe many more clusters, as scope exists on a continuum, and there are no hard, conceptual gaps between them. However, any additional clusters would likely be refinements of the current categories. Imo "III" is another definition for high-scope Midi games.
3) I've seen comments focus on the "under-optimized assets" channel, but I think that's the less important factor. In my view, the two main reasons for the high variance at low credit counts are:
i) Misreported Credits: Misreported credits have a disproportionately large impact on smaller games. Although the data was generally accurate, we found outliers that are hard to detect systemically. For instance, some games only report the studio's name in the credits, which causes high variance when the credit count is 1.
ii) Fixed Cost of Engine/Style: The other reason is the fixed cost in game size associated with picking an engine or style. A basic solo-developed narrative walking simulator made with Unreal will rarely be smaller than 1GB. Conversely, a custom-engine game can be incredibly small, sometimes weighing only 35MB (e.g., Animal Well). These differences tend to be washed out as the game's scope increases.
4) Again, you are ahead of us! The difficulty here lies in reliably identifying the country the country--not just of the studio, but also it's workforce. There's no perfect methodology, especially since we cannot determine the location of all studios, and we would need to account for multinational companies. Another approach is estimating the country of the developers through surname analysis. This is a method I'm familiar with: during my PhD, I used it to study ethnic inequality in India and the effect of diversity on innovation. However, outside of an academic context, this method can be viewed as controversial.
5) We may release a publicly available dataset in the future once we have more to show, but we still need to figure out the logistics. It would be a shame to accidentally infringe upon the Terms of Use of any of our data providers.
See you around ;)
Very interesting!
I also feel like AAA of yesterday are not AAA of today. Today for example a small team could remake an Ocarina of Time with a vastly inferior budget, which I imagine can interfere with the dataset.
Totally agree! That's precisely why defining AAA is so tricky. The term is really more about being an outlier within its respective time period, rather than a fixed definition.
Thanks for bringing some light into this. The term indie is also confusing when it comes to music. One thing I noticed is that for example the new Microsoft Flight Simulator is way smaller than before because of cloud gaming. This will be the new thing in the future, so the size on the hard drive is not representable for all games.
That's a good point we didn't think about! We'll definitely have to consider that in the future.
It's a commendable effort to bring clarity to this, though I think the classifications may need some tweaking.
1) I suggest "Solodev" being it's own category, because one guy doing everything warrants a seperation from other smaller titles in my eyes. For example: "Five Nights at Freddy's" or "The Desolate Hope" being made by -lets say- 4 people is not too impressive. Scott Cawthon doing all that alone is a whole different thing.
This category would also catch a lot of "B-Games" as they are called in "Getting Over It", games that are available, but not necessarily polished or meant to hit big audiences.
It would also filter out a lot of artistic games that were rather meant to be art pieces than commercial products.
2) The terms AA and AAA come from financing and refer to how "safe" an investment is (the gaming-sphere has simply embraced these terms more in publicity).
FIFA is for example a AAA-title, because everyone knows it sells like hotcakes independent from the actual quality. So while there is a correlation between a games budget and it's success-assessment, it does not hold true in every case, though I don't have any idea how games like "Concord" or "Skull and Bones" were rated.
Anyway that's why I suggest renaming those categories as well.
Thanks for your feedback and for sharing your insights! Here are our thoughts on your two main points:
1) Kei is pretty much solodev. Still, true solodev is quite rare, often someone else is involved for the soundtrack or some art assets. For instance, Scott Cawthon was helped by Leon Riskin for the soundtrack. I definitely agree with you about "B-games". In our analysis, they are often obscured by the fact that we rarely observe their credits because they are too unknown for anyone to report on MobyGames. I also agree about artistic games, although they are hard to spot. I believe they represent a very small share of all games, so this shouldn't significantly change the overall results.
2) You are totally right about that. Though, the terms AA and AAA are so deeply rooted that it would have been difficult to suggest alternatives. Additionally, while the original meaning relates to financial success, common usage has shifted toward scope and ambition. Skull and Bones and Concord were financial nightmares, but most people still define them as AAA due to their scope. We already proposed two new terms, and we didn't feel we had enough legitimacy to propose two more! 😅
I would also love to see the raw data. This article is amazing and the research is clearly very thought out! But I'd like to add that it might be a bit hard to meaningfully talk about any one game with these metrics since exact team size at the credits and the final game size on download might not be data as readily available. I agree that an objective metric is handy for better journalizing, but maybe vibes based metrics aren't *so* bad for casual audience ^^
Glad you found it interesting! As we've discussed in other comments, we don’t know to which extent we can distribute the raw data. We hope it will be possible in the near future!
Will you be sharing this data to interactive graphs at some point? Curious to see where some games lie, particularly Stardew Valley, which may be the biggest Kei game of them all.
We want to, but we don’t know if we are allowed to. But Stardew Valley is definitely on the far left of the figure 😄
Very interesting read! Yea, this could be a great way to categorize games, but I'm afraid it might be difficult to make this mainstream. "Indie" and other terms have already ingrained themselves in people's minds with whatever definitions they came across at that point.
But there's one more thing that I want to see with this graph - time. Today's AAA disk and team size are not the same as the ones for AAA games 10-20 years ago, and they won't be the same for AAA games in 10-20 years. This also applies to other tiers - what was AA a few years back might be more reminiscent of Midi now and even more so in the future.
Interesting view and it make sense.