The most efficient, pain free way to start a mix is to determine what the audio focal point (AFP) of the song will be. Don’t confuse this with the “focus” of the song (the message of the song, the genre, etc). The AFP is the sound element that you, as the artist/producer/mixer, want to be the most prominent/present audio element in the mix for the listener. It is the thing you want to take front and center stage in terms of presence, as in proximity to the listener.
Just as a refresher, remember that music is 3D: height, width, and depth. For this article we’ll just be focusing on depth. Here’s a visual from the delicious-audio web site (although I think this image is from the seminal book “The Art of Mixing”):

If you think of music as a 3D sound stage, you can visualize how, with the use of gain, panning, EQ, and reverb/delay/saturation/etc, you can determine where they “sit” in the song in terms of how far away they are from the listener. In the above graphic, the guitar is the closest thing to the listener and the cymbal is the furthest thing away from them.
With songs that have a vocal, some people might be tempted to respond “of course the vocal needs to stick out more than anything.” But that’s not always true, as you can see from the graphic above. In fact in most rock music the guitar is the audio focal point.
If you listen closely to commercial music you will find there are many songs where the vocalist is either level with the other elements, tucked in, or even behind them. A great example of this is “If I Ruled the World” by Nas: https://www.youtube.com/watch?v=5Ww1OFtNNAc. What part is the audio focal point? If you listen closely, you’ll find the snare is the most the most prominent/loudest/up front element, and Nas’s voice is tucked behind it. In fact, this is the case with most older hip hop whereas newer hip hop features the vocalist front and center.
As I alluded to earlier, the audio focal point can be achieved with processing such as EQ. The more high end frequencies the element contains, the more up front and closer to the listener it will appear to be. The “duller” or more delayed or reverb’d it sounds, the further away it will appear to be. Music is all about perception.
Therefore, if the vocal is going to be your AFP, it will need to be processed differently than your other tracks. In the Nas song I referred to above, the mix engineer removed most of the high end frequencies from his voice except for the “air” frequency so you can hear the S’s and T’s and other consonants; otherwise his voice is very subdued compared to the snare and high hats. I’m betting the EQ curve of his voice has a big scoop in the 1k to 5K range (which are the “presence” frequencies. This allows the snare and kick to stick out. It’s safe to say that from a mixing engineering standpoint, the beat is the most important component of this mix.
That said, it’s important to realize that not all of your tracks can have the same kind of EQ or they will clash with each other. For this reason, once you decide on the focal point, keep in mind the frequencies you used (ie +3db at 4K) and don’t use those much on other tracks. You will want to EQ each track according to how close or far you want it to be from the listener. Along with reverb, delay, and other processing, doing this will give your music the 3D depth you hear in commercial music. To the listener (or label) it can be the difference between a professional and amateur song.
One more thing. Have a look at this image from Audio University on YouTube:

Do your mixes sound flat and lifeless? Dull or dark? Unclear? The “before” image on the left might be why. Mix like the image on the right by selecting a focal point and building all the other elements around it, and I guarantee your mixes will sound exponentially more polished and professional.
Please leave comments if any questions. Also, check this book out!