Is Swap Without Temporary a Good Idea?
January 31st, 2007 by Lorenzo
The classic way of swapping to variables uses a temporary variable. In C++ it looks something like:
void swap1(int &a, int &b) { int c=a; a = b; b = c; }
Some developers can’t stand that they need a temporary variable to make the swap, and prefer to write:
void swap2(int &a, int &b) { a ^= b; b ^= a; a ^= b }
Assuming that a and b are pointing to different memory locations, this actually works.
It is based on the fact that [ (a^b)^a = b ] and [ (b^a)^b = a ]
While this seems like a cool trick, is it really worth writing it? Or should you stick with the classics?
I say it is not worth it! Stick with the classics!!!
In fact:
It is slower
swap1 requires 3 assignments.
swap2 requires 3 XORs and 3 assignments.
If you are not convinced, here is the assembly generated by VS for swap1 (you get similar results with gcc and other compilers):
6: void swap1(int &a, int &b) 7: { 00401030 push ebp 00401031 mov ebp,esp 00401033 sub esp,44h 00401036 push ebx 00401037 push esi 00401038 push edi 00401039 lea edi,[ebp-44h] 0040103C mov ecx,11h 00401041 mov eax,0CCCCCCCCh 00401046 rep stos dword ptr [edi] 8: int c; 9: c = a; 00401048 mov eax,dword ptr [ebp+8] 0040104B mov ecx,dword ptr [eax] 0040104D mov dword ptr [ebp-4],ecx 10: a = b; 00401050 mov edx,dword ptr [ebp+8] 00401053 mov eax,dword ptr [ebp+0Ch] 00401056 mov ecx,dword ptr [eax] 00401058 mov dword ptr [edx],ecx 11: b = c; 0040105A mov edx,dword ptr [ebp+0Ch] 0040105D mov eax,dword ptr [ebp-4] 00401060 mov dword ptr [edx],eax 12: }
While for swap2 the assembly generated by VS is:
14: void swap2(int &a, int &b) 15: { 00401080 push ebp 00401081 mov ebp,esp 00401083 sub esp,40h 00401086 push ebx 00401087 push esi 00401088 push edi 00401089 lea edi,[ebp-40h] 0040108C mov ecx,10h 00401091 mov eax,0CCCCCCCCh 00401096 rep stos dword ptr [edi] 16: a ^= b; 00401098 mov eax,dword ptr [ebp+8] 0040109B mov ecx,dword ptr [ebp+0Ch] 0040109E mov edx,dword ptr [eax] 004010A0 xor edx,dword ptr [ecx] 004010A2 mov eax,dword ptr [ebp+8] 004010A5 mov dword ptr [eax],edx 17: b ^= a; 004010A7 mov ecx,dword ptr [ebp+0Ch] 004010AA mov edx,dword ptr [ebp+8] 004010AD mov eax,dword ptr [ecx] 004010AF xor eax,dword ptr [edx] 004010B1 mov ecx,dword ptr [ebp+0Ch] 004010B4 mov dword ptr [ecx],eax 18: a ^= b; 004010B6 mov edx,dword ptr [ebp+8] 004010B9 mov eax,dword ptr [ebp+0Ch] 004010BC mov ecx,dword ptr [edx] 004010BE xor ecx,dword ptr [eax] 004010C0 mov edx,dword ptr [ebp+8] 004010C3 mov dword ptr [edx],ecx 19: }
This should make it clear how swap2 is less efficient than swap1.
Provides no real advantage
The only real advantage is the rush you get from having written a piece of unreadable code using cool tricks.
The fact that you save sizeof(int) bytes on the stack in 99.99% of the cases is not a real good reason to write swap2 instead of swap1. Perhaps it is advantageous in some recursive situation, where every byte in the stack counts and where you don’t want to make an extra function call to a swap function. I’d have to see such code to believe it.
It is not readable
I know. If you wrote it, you know what it does. But what about the rest of the world?
The problem is that for many others swap2 is not going to be readable and it is going to generate only confusion, not amazement.
In the long term it is far better to write readable code than some cool unreadable and inefficient piece of work.
Keep it simple!!
February 1st, 2007 at 1:32 am
Consider instead the more general case, not using memory storage in a properly optimized piece of code:
where eax contains A, ebx contains B
//Assuming ecx isn’t in use
mov ecx,eax
mov ebx,eax
mov eax,ecx
Now consider this using the “xor” method
xor eax,ebx
xor ebx,eax
xor eax,ebx
Speedwise these to small pieces of code perform quite similar, readability isn’t damaged, the advantage is saving a register. Ofcourse register interdependencies in the pipeline might vary depending on processor, but it’s hard not to justify using the xor method, due to the fact that it frees up a register to be used for something else.
February 1st, 2007 at 8:29 am
Not that I advocate the XOR swap trick, but you might find it gets closer to what its meant to do if you switch to release mode:
swap1:
; 16 : int c=a;
mov eax, DWORD PTR [ecx]
push esi
; 17 : a = b;
mov esi, DWORD PTR [edx]
mov DWORD PTR [ecx], esi
; 18 : b = c;
mov DWORD PTR [edx], eax
pop esi
; 19 : }
swap2:
; 23 : a ^= b;
mov eax, DWORD PTR [edx]
xor DWORD PTR [ecx], eax
mov eax, DWORD PTR [ecx]
; 24 : b ^= a;
xor DWORD PTR [edx], eax
mov edx, DWORD PTR [edx]
; 25 : a ^= b;
xor DWORD PTR [ecx], edx
I’m sure if you flicked a few more optimization options you would end up with some sensible assembler code.
February 1st, 2007 at 9:26 am
Adrian: Looks pretty close, but still slower, and the source is still odd to look at.
Alex: I was mostly interested in how a C/C++ compiler can optimize such code, and the advantage to a C/C++ programmer. In the case you suggest, if you were to write the code by hand in assembly, I agree it is not a bad choice. However, in my opinion, it is still unreadable in the sense that if you don’t know the “trick”, it is going to take a few minutes to find out how that works.
March 1st, 2007 at 8:26 am
Alex: The other point you’re missing is that a good optimizer can spot common idioms, and change the code if needed. Without a global optimizer, the compiler can still convert
void swap(int &a, int &b)
{
int c = a;
a = b;
b = c;
}
into
void swap(int &a, int &b)
{
a ^= b;
b ^= a;
a ^= b;
}
This is done by spotting that the temporary only exists to swap the two variables over, and optimising accordingly.
It gets even more fun when you’re doing global analysis, or if swap is inlined; a global SSA pass may be able to spot that all you’re doing is renaming a to b, and b to a, and elide the swap entirely if it’s unnecessary.
July 4th, 2007 at 9:26 pm
we can do same task using foll code also:
void swap(int &a, int &b)
{
a = a+b;
b = a-b;
a = a-b;
}