Monday, January 7, 2013

Why Butt Stomping Feels So Good

Winter break inevitably brings holiday travel, which inevitably involves shifting my gaming habits to more portable platforms.  This was a Final Fantasy Tactics year (the itch comes around every 3 or so), and my laziness convinced me that it was worth the $9 I paid for the iPad version on the app store if it meant I didn't have to dig out my PSP and War of the Lions disc.  (FYI: I got it on sale; apparently, it's—unreasonably—priced at $16 as of this posting.)

Much like the good folks at Extra Credits pointed out, ports of these kind often fail to live up to the original source material.  (If you haven't already seen Extra Credits through Penny Arcade or the link in my last post, it's a fantastic web series.  I wish I could do the kind of game design analysis they do, but like I've said before, I'm a psychologist, not a game designer.)  Their video mentions that new control schemes fail to match the original experience intended by the games' designers (at best) or even ruin your ability to play the game entirely.  As anyone who's played an iOS port with a virtual D-Pad can attest, a variant control scheme on a port can be a game-breaking problem.

Read the reviews for Final Fantasy Tactics on the iPad, and you'll see a lot of the less-patient gamers raging about the clumsy touch controls (also, the game just crashes on certain devices).  Part of this may be the unfair expectations created by Square Enix's claim that the port was "designed for touch control," which is one of those technical truths that should really be illegal.  Apparently, "designed for touch control" means "tapping and dragging work kind of like mouse clicks."  So besides lacking a virtual D-Pad, you're forced to grope awkwardly at the screen just trying to get someone to move to the tile you actually want them on.  And gods help you if you accidentally select the wrong tile—there's no option to undo that move.  (A cardinal sin to any user experience designer familiar with Nielsen's design heuristics.  Ooh, I should write a post about those...)

An abnormally coherent review

So what creates this problem?  Everyone knows it's frustrating, but the "why" is often somewhat ephemeral.  "It doesn't do what I want it to do" is often the common complaint, which doesn't really help anyone solve the problem.  So what's the diagnosable (and fixable) problem here?

I could go on and on about issues with touch screens in general.  In fact, I compiled a fairly beefy internal report on potential issues and design guidelines for touch screens while working at Motorola Solutions.  But that's not what this post is about.  These control issues extend beyond the glass plains of the capacitive touch kingdom.  This is about stimulus-response compatibility.

Stimulus-response compatibility is probably the oldest concept in cognitive psychology.  Whether or not you've heard of it by name, you've probably done some variant of the Stroop task (perhaps mixed in with a random assortment of children's brain teasers in a magazine or the back of a cereal box).  Name the color of the following words' font, not the words themselves.


As you'd probably expect, that's a lot trickier than if you do the same thing with this list:


Depending on what point is trying to be made or what the author thinks is most interesting, you'll hear lots of different reasons for why the first example was hard and the second example was easy, all of which are often true to at least some degree.  The particular point trying to be made here is that the stimulus you are looking at and the response you have to make (saying the font color aloud) are in conflict with one another in the first example, but they are consistent in the second example.  Cognitive psychologists call the stimulus and response incompatible or incongruent in the first example, and compatible/congruent in the second.  What you're trying to do and the stimulus you're using to do it can either conflict with one another or play together nicely.

This principle goes further.  Now, say I give you a slightly different task, and because I'm a cognitive psychologist, it's incredibly boring and ostensibly simple.  I'm going to show you arrows that point either left or right.  If I show you a left arrow(<=), you press the left shift key on your keyboard.  If I show you a right arrow (=>), you press the right shift key on your keyboard.  Dead simple, right?  I don't even have to mock this up for you on this webpage because my words have painted a gloriously clear picture in your mind's eye.

Now, imagine I change the directions on you.  If you see a left arrow (<=), you have to press the RIGHT shift key on your keyboard.  If you see a right arrow (=>), you have to press the LEFT shift key.  You might eventually get the hang of it, but just reading that feels incredibly frustrating.  The stimulus you're using to do the task doesn't line up nicely with the action you need to generate.  The key mapping in the first example is straightforward and intuitive, but the mapping in the second example is bonkers.  It's like trying to say, "purple," while looking at the word "green."

Every 2D game you've ever played has operated on the no-brainer principle of "press left to go left" and "press right to go right" because of this sort of compatibility.  You want the little guy in front of you to move leftward, so you press the left side of the d-pad (or tilt the joystick to the left or what have you); you want the little guy to move rightward, you press right.

Things got less straightforward in 3D gaming.  Why?  Because your input options were all in 2D, but your stimulus was in 3D.  This has seen quite a bit of evolution over time, but with the exception of virtual reality headsets, there is no control scheme currently used in gaming that actually involves a true 3-dimensional controller.  I hope you fight me on this, but as far as I can figure, all existing control schemes do their best with some kind of 2-dimensional input (up/down and left/right) to control movement in a 3-dimensional space.  The avid gamer forgets it, but there's quite a bit of mental gymnastics to figure out how left/right/up/down can be converted into left/right/above/below/towards/away.  

Incidentally, I think this is why any "casual" game you see is always in 2D (or fixed-perspective 3D).  That extra little mental conversion is just too big a hassle for some and downright scary to others.  I mean, TWO joysticks?  You kids are high on the drugs.

To make this more concrete, let's look at an example that everyone cites as getting 3D movement right: Super Mario 64.  (I'm going to take a brief moment, though, to give honorable mention to Jumping Flash for being - for me, at least - the first game to show everyone that 3D platforming could be just as precise and fun as 2D platforming.  Last I checked, it's on the PSN store as a PSone classic, so check that out if you've never heard fo it before.  Again, feel free to fight me on this, but keep in mind you're challenging my childhood here.)

Motherfuckin' space-robot-rabbit action!

Ok, so Super Mario 64 was lauded as gaming's shining example of how controlling a third-person avatar in a 3D world should work.  Before Super Mario 64, the camera never moved around the player—it either sat there or you explored the world from a (relatively restrictive) first-person view.  Issues with optimal camera placement aside, there is one question that needed to be answered: how would you control Mario?

You probably never even thought to ask.

In retrospect, this seems like a ridiculous question.  You need to control him the way you do, of course. But what makes that the right way?  To put it into perspective, let's look at an example that (arguably) got the question horribly wrong: the original Resident Evil games.  Fixed camera (so a simpler problem), characters moving through a 3D space.  How did you control these guys?  Left made the avatar turn to their left, right turned the avatar to their right, up made them walk forward, and down made them walk backwards.  Seems great on paper until you realize your avatar isn't always facing forward.  Once your character is facing you, the player, you have a wonderful mess on your hands.  Left turns the avatar to your right, down moves it away from you...

I can't even keep thinking about it.  It's actually making me kind of nauseous.

Crap.  How do my controls work relative to the dog's dominant axis?

I swear I recall other games doing this, but none of them leapt to mind writing this.  And maybe I'm besmirching Resident Evil's good name by misremembering.  All I know is I remember finding those games horrifying, and not for the zombies.

Ok, so back to Mario.  Mapping your controls relative to the avatar seems great in theory (and works in first-person shooters), but does not work in a third-person 3D game.  The first great idea Nintendo had: make the controls work relative to what the player is seeing.  Left always moves Mario left, right always moves him right, up always makes him run into the screen, down always makes him run out from the screen.  You have a straight-up compatibility between what you see and what you do.

But wait!  You still have a problem here.  You're still only moving Mario in two dimensions, and this is a 3D platformer.  Alas, like I said, we only have 2-dimensional input schemes, and we need a third dimension.  Luckily, moving in that third dimension really only corresponds to two possible (or necessary) actions: jumping (above) and ducking (below).  Again, we don't have a direct control to that third dimension, but we are given an approximation in the form of button presses: A to jump, and Z to squat/stomp.

Figure 1.  The "Ground Pound."

Ah, the butt stomp.  So, this is where the title of this post comes from, because this is something that I think is so subtly brilliant, you'll miss it if you're not paying attention.  (Part of its appeal was just the precision it added to your platforming: you always had a shadow on the ground telling you where Mario would land if he were to suddenly drop straight down from where he was in midair, which is exactly what butt stomping did for you.)  So, like I said, movement along the above/below axis is relegated to buttons.  Unlike moving the stick left to go left, pressing A to jump doesn't have that same intuitive mapping to it.  There's an inherent "leftness" to tilting the analog stick left to make you go left, but there's no inherent "aboveness" or "jumpiness" to the A button.  It's just sitting off to the side there, and for some reason, pressing it tells Mario he should jump.

But that Z button...

Dat Z button....

Ok, here's why I think it's insanely brilliant that the Z button thrusts Mario earthward (assuming the mushroom kingdom is on some alternative reality of Earth).  Imagine Mario is standing on your analog stick, and you're pointing your N64 controller at the screen.  Wherever you tilt the analog stick, he's going to start running in that direction.  That's what makes so much sense about it.  Now, think about where that Z button is located: directly under the analog stick.

Figure 2. The Nintendo 64 controller.

I hope you see what I'm getting at here.  If you want to move Mario in the X-Y plane, you tilt the stick in those directions.  If you want him to move downward along the Z-axis, you press the Z-button located underneath the X-Y control plane.  Everything about the Z-button and its corresponding action screams "down," but in particular, "below the X-Y plane."  It has an amazing stimulus-response compatibility that can only come from building the controller and the game together from the ground up.  Which is probably also why movement never feels quite as intuitive in any other game, with any other controller, and any permutation therein.  It's also why, I would argue, it felt so awkward when you had to move up and down poles and them try to jump from them.  It broke out of this natural stimulus-response mapping Nintendo had going for them.

lol i don't know wtf im doing

I could wax poetic for hours about the ergonomic genius of the Gamecube controller, but as an input medium, it just didn't have that same X-Y-Z correspondence that the control stick and Z-button had on the old N64 controllers (albeit limited enough already).

Dem curves...

So that is a (probably needlessly long) example of brilliant stimulus-response compatibility.  But, wait!  That was an absurdly specific example!  And flawed at that!  ...Is what I imagine you might say if you were the overly critical voice of my mother that I've internalized into my inner monologue.

Right.  That was just a toy example.  So allow me to bring us back to the bigger picture with a fabulous problem: inverted axis controls.

A Public Service Announcement from Freddie Wong

Ah, inverted controls.  A divisive issue, yes.  But why?  Different gamers will play exclusively with inverted or non-inverted controls, and argue for hours over why anyone who does differently is wrong. What is stimulus-response compatibility to do?  On the one hand, stimulus-response compatibility would seemingly dictate that non-inverted controls make more sense: up moves the camera angle up, down moves the camera angle down.  But, as an inverted-controls devotee myself, why would you reverse that?

The answer comes down to the user's mental model.

"Mental"!  Time for the psychologist to shine!  So, a mental model, generally speaking, is (an admittedly vague) term for your understanding of how a system works.  You use this understanding to determine how you will interact with the system to achieve the outcomes you want.  For instance, you have a mental model for how an ATM works.  It could be simple (I put in my card, enter my PIN, and the magic money fairies give me cash) or complex (pressing my finger against the screen presses together two conductive plates, modifying the resistance across the electric field to bla bla bla...), but no matter what your mental model is, you can use it to understand that a certain set of actions will lead to a certain set of outcomes.  That's what they're for.

But sometimes mental models can be a little screwy.  For instance, you can find discrepancies between the mental model of one person and the mental model of another, or even between the mental model you have and physical reality.  Take this example from (apparently, based on the website I got from Google) Michael McCloskey's Naive Theories of Motion in Gentner & Stevens (eds., 1983):

The blue spiral is a metal tube that is higher in the middle and lower on the outside, so if I put a marble into the high end (where the arrow is), it rolls through the tube and out the other end, which is resting on a flat surface.  If I were to do this, what would the path of the marble look like as it exited the tube?

The marble problem

It turns out if you ask this question to a bunch of people, a lot of people answer (incorrectly) with a path that looks like (B).  The laws of physics dictate that if you carry out the scenario I just described, the marble would roll straight out the end of the tube (A), but people's intuitions say that the marble will continue along the spiraling path.  A lot of people's mental models don't match reality—but more important to our story here, some people's mental models do; mental models can differ from person to person.

Ok, so what the hell does this all have to do with inverted axis controls?  It all comes down to mental models.  What is the player's mental model for how their actions with the control stick map onto the result on screen?  A wonderful image floated around the internet that illustrated the mental model of the inverted controls user way better than I could describe it.  I wish I could cite the original creator, but I can't seem to find them.

Tilt, puppet. TILT!

To the inverted-control player, it's as if the camera is mounted pointing along the same axis as the analog stick itself.  Tilt your end of the stick up, and the other end goes down (and vice-versa).  I told you it's better illustrated than I could describe.  You might also liken the camera's control stick to the control arm of a camera tripod: push the arm down, and you're pointing the camera up.  If that's your mental model for how the camera controls work, inverted controls just make sense and feel right.  However, you might have a totally different mental model that makes the non-inverted controls feel better and make more sense.

The drawing is a pretty simple and clean-cut example, but I think what made it so popular when it first made the rounds on the internet was that it made explicit a feeling that a lot of people had but couldn't verbalize.  And that's the major problem with trying to design an intuitive control scheme.

- What makes a control scheme intuitive is stimulus-response compatibility.

- Stimulus-response compatibility depends on users' mental models.

- Users can't always tell you what those mental models are.

And, so, that is a great problem of game design.  How can you figure out what users' mental models are to design around them?  It's a question that goes woefully unanswered, or worse, answered incorrectly.

Being a huge fan of customization, I think one solution is just to make everything customizable, but you can't just leave all those decisions up to the player to figure out for themselves.  For one, that can be incredibly overwhelming when you have more than two actions to map.  You have to give them some kind of starting point that is at least halfway intuitive, and if you did your job right, that default option will feel great.  No one ever said they wished they could customize the controls on Super Mario 64 (except perhaps the mentally ill).  

In a perfect world, the video game industry would have more psychologists around to probe players' mental models to make control schemes that feel great.  Hell, in a perfect world, every industry would figure out people's mental models and have a team of human factors specialists on hand to guide the design of more intuitive interfaces.  Look around, and you'll notice almost no one takes the time to figure out underlying mental models anymore, let alone figure out how to account for individual differences and design towards them.

Now, if you'll excuse me, I am off to post an incredibly long, scathing review on the app store.


  1. On the inverted axis issue, is this really about mental models or about action compatibility? That is, do we need to have a mental model here to explain this, or are things much more automatic? So, when I look down, I move my head forward. When I look up, I tilt it back. Therefore, an inverted axis is more compatible with real world behavior than a non-inverted axis. I don't know if there are studies on this, but I'd wager that it is easier to learn inverted controls than non-inverted ones, although the two might provide indistinguishable results post-training. There must be work on this in the aircraft control literature where the inverted axis is a simple reflection of how you'd yank the plane if it was small enough and you didn't have a stick.

    1. Great point! And I bet there's a rich literature on airplane control because human factors is like 70% fanboys nerding out over planes.

      I guess what I'm trying to argue here with my toy examples (that maybe I'm not conveying very well at all) is that ALL stimulus-response compatibility is dependent on some manner of mental model. Anything that "just makes sense" is *because* it lines up with your mental model, and anything that doesn't conflicts with it - whether you're aware of what that underlying model is or not.

      I'm loving that the fact that you have a different conceptualization for how the inverted camera controls might correspond to your actions, but with the same qualitative effect. (I think that just goes to show how much variance you can see in people's mental models for systems!) I do think that's still an implicit mental model if the inverted controls make sense to you because they line up with what your body might be doing (as opposed to how you would interact with a mounted tripod camera). Your example maps the analog stick's stalk to your neck/head-axis, whereas mine doesn't at all. And people will feel differently about different controls and different models based on how they've conceptualized these things for themselves.

      I've actually always wanted to do a human factors study on inverted vs. non-inverted controls to see if one could be easier to learn/use because they get implemented pretty arbitrarily from game to game, Hell, I never paid attention to it, but now I can't even remember if I prefer the same or different controls for gamepad vs. keyboard/mouse games.... Anyway, I think it would be nice to know which makes a better default without any kind of practice/training, if either. But, I'm willing to bet if you tested experts, they'd just always do best with their preferred control scheme (which is also why it drives me nuts when I can't choose how my camera controls work).

      Man, I get really rambly on this blog...

  2. This comment has been removed by the author.