optimized apps

Forum rules
User avatar
[ TSBT's Pirate ]
[ TSBT's Pirate ]
Posts: 10363
Joined: Thu Oct 04, 2012 1:22 pm
Location: roaming the planet

#1 optimized apps

Post by Alez »

Hi all,
As you may have noticed, I was working on optimized app version, and was testing it on my machines. After applying series of various code optimizations I got app which is way faster than original one. On top of this I added support for SSE/AVX, what added some extra boost. Here are results for processing sample small workunit on my Haswell Xeon running Linux CentOS:

Original app:
real 13m29.530s
user 13m27.579s
sys 0m0.027s

real 1m26.704s
user 1m24.704s
sys 0m0.004s

real 1m27.987s
user 1m25.985s
sys 0m0.005s

real 1m20.868s
user 1m18.872s
sys 0m0.003s

As you can see, in this test AVX app is 10 times faster! For real WUs this speedup varies from WU to WU, but it is still about 4-5 times, and most WUs on this machine completes in less than hour.

Optimized app can be downloaded from GitHub: https://github.com/sirzooro/RakeSearch/releases/tag/v1.0. There are multiple app versions, compiled with support for different instruction sets. If you are not sure what your CPU supports, on Windows use CPU-Z, and on Linux check "flags" in /proc/cpuinfo file.

In order to install this app, perform these steps:
- close BOINC (config reload will not work);
- unpack archive to project directory - on Windows it is path like "C:\Users\All Users\BOINC\projects\rake.boincfast.ru_rakesearch", on Linux /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/ . On Linux also please make sure that rakesearch file is executable, and both rakesearch and app_info.xml are owned by boinc/boinc user/group;
- start BOINC again.

After doing this, in event log you should see entry for RakeSearch like "Found app_info.xml; using anonymous platform". Additionally you should see (Opti v1.0) in app name displayed in BOINC Mgr.

All app versions checks if CPU and OS supports required instruction sets. If they are not, app will print appropriate error message and exit with code 1.

AVX/AVX2 app versions requires at least Windows 7 SP1, Windows Server 2008 R2 SP1 or Linux with kernel 2.6.30.
AVX512 app versions requires at least Windows 10, Windows Server 2016 or Linux with kernel 3.15. I am not sure about Windows versions, you can try if earlier versions can run it too.

Similar performance of SSE2 and AVX version is expected, as AVX instruction set is mostly dedicated for floating point operations, which are not used in this app. AVX app version probably can be skipped at all.
AVX2 added integer and bitwise operations which use new AVX registers, so this app version is faster than SSE2/AVX versions. Additional boost comes from BMI2 instructions, which came handy in few places. As far as I can tell, BMI2 is supported by all CPUs which supports AVX2.
AVX512 version should be even faster, thanks to new mask registers. I do not have CPU with them, so I cannot check this. I only tested my code on emulator to make sure that it is works correctly.

At this moment there is no AVX512 app for Linux - I have to compile new compiler version which will support it. I will add this app version later.
Windows apps are compiled with MinGW gcc, and should work on WindowsXP.
The best form of help from above is a sniper on the rooftop....

Return to “RakeSearch”