Its all to do with the workflow at a hardware level. I cant remember specifically all of the stages of processing, but suffice it to say that there are multiple stages of processing. When a program comes to a branch (such as an if), it doesnt pause the processing input for however many cycles, just to wait for the result of the if to pop out the end, it instead uses branch prediction to choose a path to go down (really useful if you're running in a loop, not so useful otherwise) and shovels the next instruction into the first stage of processing. When the branch is finally resolved, it determines whether the results from the next instruction are kept or discarded.
If you look at the if you are doing and the dataset, it becomes obvious that a sorted dataset will optimise the branch prediction ability of the CPU, resulting in a minor performance difference (becoming larger as the dataset grows), however as pointed out, sorting the dataset is a lot more costly than the performance gained. (and this is because you are looking at performing many more cycles of processing to sort each item in the array than you will actually save)