For me and many artists I have observed, the method seems to be rapidly capturing the energy, direction and "gist" of the pose using the free-form gesture scribbles of our choosing, then at a slightly more sane pace, overlaying our chosen structural representation on top of it.
This helps to ensure the skeleton takes the pose and the energy, but still gives you the chance to put limbs back into sockets. It's like a second draft that takes into account real anatomy, without completely writing out the energy and expression that was captured in the first draft.
The third draft would then be changing that series of sticks and joints into a fully fleshed human, with the gesture and the skeleton slowly being erased but still informing the final, outer layer.
On the other hand, now that I think about it, I've seen a number of artists jump straight into the stick figure stage while gesture drawing. To me this is a little crazy, but it definitely works for some!
On youtube there are even some boggling people who start at just drawing the fully formed human -- I am impressed by these videos, but also find them disheartening as they tend to rank highly when a student does a search for gesture drawing. I think it creates the wrong impression and sets a too-intimidating bar. I also wonder if those artists have not just trained themselves to see "outlines" very accurately, and wonder how solidly their foundations are rooted in underlying anatomy. What would happen if they had no model, for example? I really don't know.
To summarize my follow-up rant: Artists work very differently. You will probably need to experiment to see what works best for you!
My best advice is to work large, so you can fit a skeleton and eventual flesh inside/on top of the original gesture and still see the details.