This file is collatz_sieve_1.30_windows_x86_64_opencl_nvidia_gpu.config andwill be empty. Linux will obviously be slightly different name and you will be looking for the AMD version but in var/lib/boinc/projects/boinc.thesonntags.com_collatz
edit using gedit. In terminal use gksudo gedit and then navigate to file. If you dont have gedit then first off sudo apt-get gedit wlll do the trick. Gdit is standard in ubuntu but not my prefered Lubunu.
The best form of help from above is a sniper on the rooftop....
During project initialization on your client, empty <app_name>.config files will be created for each of the application versions that match your GPUs. You can enter parameters into these files in order to deviate from default values, and they will be picked up as soon as a Collatz GPU task starts.
Configuration file format
Plain text file, one "parameter=value" pair per line, unrecognized parameter names are simply ignored (you can use this to comment out some parameters during testing), missing parameters fall back to their default values.
Example (suitable for a GTX 1080):
kernels_per_reduction=48
threads=9
lut_size=17
sieve_size=30
cache_sieve=1
Parameters
cache_sieve
default: 1 (?)
range: 0 or 1 (?)
definition: "any setting other than 1 will add several seconds to the run time as it will re-create the sieve for each WU run rather than re-using it"
kernels_per_reduction
default: 32
range: 1...64
definition: "the number of kernels that will be run before doing a reduction. Too high a number may cause a video driver crash or poor video response. Too low a number will slow down processing. Suggested values are between 8 and 48 depending upon the speed of the GPU."
comment: "affects GPU usage and video lag the most from what I [sosiris] tested."
lut_size
default: 10
range: 2...31
definition: "the size (in power of 2) of the lookup table. Chances are that any value over 20 will cause the GPU driver to crash and processing to hang. The default results in 2^10 or 1024 items. Each item uses 8 bytes. So 10 would result in 2^10 * 8 bytes or 8192 bytes. Larger is better so long as it will fit in the GPUs L1/L2 cache. Once it exceeds the cache size, it will actually take longer to complete a WU since it has to read from slower global memory rather than high speed cached memory."
comment: "I [sosiris] choose 16, 65536 items for the look up table because it would fit into the L2$ (512KB) in GCN devices. IMHO it could be 20 for NV GPUs, just like previous apps, because NV GPUs have better caching."
reduce_cpu
default: 0
range: 0 or 1
definition: "The default is 0 which will do the total steps summation and high steps comparison on the GPU. Setting to 1 will result in more CPU utilization but may make the video more responsive. I have yet to find a reason to do the reduction on the CPU other than for testing the output of new versions."
comment: "I [sosiris] choose to do the reduction on the CPU because AMD OpenCL apps will take up a CPU core no matter what you do (aka 'busy waiting') and because I want better video response."
sieve_size
default: ?
range: 15...32
definition: "controls both the size of the sieve used 2^15 thru 2^32 as well as the items per kernel are they are directly associated with the sieve size. A sieve size of 26 uses approx 1 million items per kernel. Each value higher roughly doubles the amount. Each value lower decreases the amount by about half. Too high a value will crash the video driver."
sleep
default: 1
range: ?
definition: "the number of milliseconds to sleep while waiting for a kernel to complete. A higher value may result in less CPU utilization and improve video response, but it also may lengthen the processing time."
threads
default: 6
range: 6...11
definition: "the 2^N size of the local size (a.k.a. work group size or threads). Too high a value results in more threads but that means more registers being used. If too many registers are used, it will use slower non-register memory. The goal is to use as many as possible, but not so many that processing slows down. AMD GPUs tend to work best with a value of 6 or 7 even though they can support values of up to 10 or 11. nVidia GPUs seem to work as well with higher values as lower values."
comment: "I [sosiris] didn't see lots of difference once items per work-group is more than wavefront size (64) of my HD7850 in the profiler."
verbose
default: 0
range: 0 or 1
definition: "1 will result in more detail in the output."
Definitions are taken from Slicker's post from June 2015, last modified in September 2015.
Comments are taken from sosiris' post from June 2015.
Edit April 28 2018, added definition of cache_sieve from a post from Slicker from April 2018
The best form of help from above is a sniper on the rooftop....
If you have more than one GPU in the system you will want to make this cc_config.xml file to make both or more work. By default BOINC will only use the most capable GPU in a system.
["cc_config.xml"] usually c:/ProgramData/BOINC or /Var/Lib/BOINC in Linux
Make file with notepad or Gedit depending on system. Ensure you save as when finished . Do not simply save or your file will be called cc_config.xml.txt and will not be read.
The best form of help from above is a sniper on the rooftop....
Q: How come I'm not getting any work? A: Your computer may already have enough work. Just because the boinc log says it requested work, it may have requested 0 seconds. The ONLY way to see what it really asked for is to enable sched_op_debug in Boinc Manager via the Options, Event Log Options screen.
If you want work for your GPU you need to have OpenCL drivers installed. The Windows drivers installed automatically by Microsoft may not contain the required OpenCL files. Try installing the version from the AMD, nVidia, or Intel web sites.
Check what preferences you have set for the Collatz project via the web site. You won't get work if you don't have it enabled.
Lastly, BOINC bases its calculations on how many floating point operations your computer can do. Unfortunately, Collatz only uses integers which causes the estimates to be way off. In addition, the GPU applications can run anywhere from twice as fast (older slower GPUs and Intel embedded GPUs) to hundreds of times faster. For example, it thinks my Android phone is 1/4 the speed of my i7 laptop when in reality, it is about 1/400 the speed.
Q: All the workunits have errors. What's wrong? A: The Windows versions require the Microsoft C Runtime library. If you are running a 64-bit version of Windows, you will need BOTH 32 and 64 bit versions since BOINC will likely send you both even though the server has been set to prefer sending 64-bit apps to 64-bit operating systems.
Q: When are the new apps going to be available for my computer? A: It takes about 40 hours to test each individual application to make sure it calculates correctly. That's 25 apps x 40 hours each for OS X, Windows, and Linux. So, it takes 1,000 hours to run through all the tests, and if there's a bug, start over. Since this is not my full time job just as crunching is not your full time job, I have limited time to spend doing it.
The best form of help from above is a sniper on the rooftop....
Question: I have my nvidia GTX1080 turning out WU in 6 mins with 2 WU loaded onto GPU. When I add a second identical GPX1080, I get 4 running WU but each one takes twice as long to run !!
Thinking it might need more CPU, I disabled WCG which was the only CPU project running - to no avail. Plenty of CPU available, plenty RAM, plenty SSD, CPU has 40 PCIe lanes
Should I hyperthread or not?
I think this is fool-proof but could you just try it for me please? • There are 10 types of people in the world; those who understand binary, and those who don’t
are you running the app_config.xml ? Did you remember to double the cpu you have allocated ? The cpu allocated there is for collatz GPU in total, if set 0.5 then only 1/2 core to feed 2 GPU/4 apps. I presume the PSU is up to supplying enough power ?
The best form of help from above is a sniper on the rooftop....
Yes, I get 4 WU to run. Will check PSU when I get home tomorrow afternoon
I think this is fool-proof but could you just try it for me please? • There are 10 types of people in the world; those who understand binary, and those who don’t
Will check when I get back. Tried logging in remotely but I think my connection must be down
I think this is fool-proof but could you just try it for me please? • There are 10 types of people in the world; those who understand binary, and those who don’t
I've not run 2 GPUs in one system for a while but make sure it's using the 2nd GPU. Sounds like all 4 are running on 1 GPU.
Also, regardless of how many WUs you are running per GPU, allocate 1 CPU per WU. OpenCL is CPU intensive.
<gpu_usage>.5</gpu_usage>
<cpu_usage>1</cpu_usage>
Is this line needed in the cc_config.xml?
<use_all_gpus>1</use_all_gpus>
Funny that was exactly what I was thinking. Are you sure it is 2 units per GPU and not 4 units on one and nothing on the other ?
Use cc_config.xml in var/lib/BOINC
root@lw1-asrockx79:/var/lib/boinc# cat cc*xml
<!--
This is a minimal configuration file cc_config.xml of the BOINC core client.
For a complete list of all available options and logging flags and their
meaning see: https://boinc.berkeley.edu/wiki/client_configuration
-->
<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
<log_flags>
<task>1</task>
<file_xfer>1</file_xfer>
<sched_ops>1</sched_ops>
</log_flags>
</cc_config>
root@lw1-asrockx79:/var/lib/boinc/projects/boinc.thesonntags.com_collatz# cat app_config.xml
<app_config>
<app>
<name>collatz_sieve</name>
<max_concurrent>4</max_concurrent>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>
All other Boinc projects suspended. Boincmgr permits 75% of available CPUs - and 100% of CPU time. I am baffled. Hyperthreading is enabled
I think this is fool-proof but could you just try it for me please? • There are 10 types of people in the world; those who understand binary, and those who don’t
Name collatz_sieve_435baba7-08a4-4cdd-8f59-3f43b2860b50_0
Workunit 13092421
Created 15 Nov 2018, 21:55:39 UTC
Sent 15 Nov 2018, 22:08:01 UTC
Report deadline 29 Nov 2018, 22:08:01 UTC
Received 15 Nov 2018, 23:46:24 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x00000000)
Computer ID 842433
Run time 6 min 7 sec
CPU time 3 sec
Validate state Valid
Credit 27,706.70
Device peak FLOPS 9,074.50 GFLOPS
Application version Collatz Sieve v1.40 (opencl_nvidia)
x86_64-pc-linux-gnu
Peak working set size 168.96 MB
Peak swap size 13,092.76 MB
Peak disk usage 48.74 MB
And when 2 are installed - takes twice as long but CPU time is 7 times as long
Name collatz_sieve_c194960f-a03f-492b-867d-87a8679837e1_0
Workunit 13098944
Created 15 Nov 2018, 23:41:42 UTC
Sent 15 Nov 2018, 23:52:45 UTC
Report deadline 29 Nov 2018, 23:52:45 UTC
Received 16 Nov 2018, 12:23:07 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x00000000)
Computer ID 842433
Run time 13 min 38 sec
CPU time 22 sec
Validate state Valid
Credit 28,740.65
Device peak FLOPS 9,074.50 GFLOPS
Application version Collatz Sieve v1.40 (opencl_nvidia)
x86_64-pc-linux-gnu
Peak working set size 168.99 MB
Peak swap size 21,284.78 MB
Peak disk usage 48.74 MB
I think this is fool-proof but could you just try it for me please? • There are 10 types of people in the world; those who understand binary, and those who don’t
2 x Peak swap size 13,092.76 MB will swap much much less than
4 x Peak swap size 21,284.78 MB
Or I am reading that wrong? I wasn't expecting any swapping whatsoever TBH
I think this is fool-proof but could you just try it for me please? • There are 10 types of people in the world; those who understand binary, and those who don’t
The machine I have 2 x 980's in only has 12GB ram I think, is running hyper threaded and only has a single core set aside as excluded from Boinc.
Is that running 2 x units per card ? Are you sure it is not running all 4 units on the one card ?
I think the best option is to take a step back. Run a single unit on each card and see what times you get as comparison.
The best form of help from above is a sniper on the rooftop....
Okay ta. The screenshot in post #12 shows 2 units on each card
I think this is fool-proof but could you just try it for me please? • There are 10 types of people in the world; those who understand binary, and those who don’t
davidBAM wrote: ↑Tue Nov 20, 2018 8:10 am
Okay ta. The screenshot in post #12 shows 2 units on each card
Yes that's what the config says and that's what it should be. What I'm asking is whether it is actually working or not. Have you checked that both GPU's are actually loaded ? Check the nVidia panal that both GPU's are being used ( load, temp etc. ). Also check the start up log on BOINC manager that BOINC see's 2 GPU's and that the cc_config is being found and read. There will be a flag in the start of the log stating that cc_config is present and that use all GPU's has been set.
The best form of help from above is a sniper on the rooftop....
Just back from walking dogs - took ages for them to catch up with 3 days of p-mail
The screen dump was from boincmgr so I took it at face value. I've put a monitor/keyboard on it now so will check all. Both cards are certainly very hot to the touch
I think this is fool-proof but could you just try it for me please? • There are 10 types of people in the world; those who understand binary, and those who don’t
Tue 20 Nov 2018 09:56:14 GMT | | Starting BOINC client version 7.12.0 for x86_64-pc-linux-gnu
Tue 20 Nov 2018 09:56:14 GMT | | log flags: file_xfer, sched_ops, task
Tue 20 Nov 2018 09:56:14 GMT | | Libraries: libcurl/7.61.0 OpenSSL/1.1.1 zlib/1.2.11 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.4) nghttp2/1.32.1 librtmp/2.3
Tue 20 Nov 2018 09:56:14 GMT | | Data directory: /var/lib/boinc-client
Tue 20 Nov 2018 09:56:14 GMT | | CUDA: NVIDIA GPU 0: GeForce GTX 1080 (driver version 390.87, CUDA version 9.1, compute capability 6.1, 4096MB, 3982MB available, 9070 GFLOPS peak)
Tue 20 Nov 2018 09:56:14 GMT | | CUDA: NVIDIA GPU 1: GeForce GTX 1080 (driver version 390.87, CUDA version 9.1, compute capability 6.1, 4096MB, 3982MB available, 9070 GFLOPS peak)
Tue 20 Nov 2018 09:56:14 GMT | | OpenCL: NVIDIA GPU 0: GeForce GTX 1080 (driver version 390.87, device version OpenCL 1.2 CUDA, 8120MB, 3982MB available, 9070 GFLOPS peak)
Tue 20 Nov 2018 09:56:14 GMT | | OpenCL: NVIDIA GPU 1: GeForce GTX 1080 (driver version 390.87, device version OpenCL 1.2 CUDA, 8118MB, 3982MB available, 9070 GFLOPS peak)
Tue 20 Nov 2018 09:56:14 GMT | | [libc detection] gathered: 2.28, Ubuntu GLIBC 2.28-0ubuntu1
Tue 20 Nov 2018 09:56:14 GMT | | Host name: lw1-asrockx79
Tue 20 Nov 2018 09:56:14 GMT | | Processor: 12 GenuineIntel Intel(R) Core(TM) i7-4960X CPU @ 3.60GHz [Family 6 Model 62 Stepping 4]
Tue 20 Nov 2018 09:56:14 GMT | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts flush_l1d
Tue 20 Nov 2018 09:56:14 GMT | | OS: Linux Ubuntu: Ubuntu 18.10 [4.18.0-11-generic|libc 2.28 (Ubuntu GLIBC 2.28-0ubuntu1)]
Tue 20 Nov 2018 09:56:14 GMT | | Memory: 23.49 GB physical, 0 bytes virtual
Tue 20 Nov 2018 09:56:14 GMT | | Disk: 251.15 GB total, 231.93 GB free
Tue 20 Nov 2018 09:56:14 GMT | | Local time is UTC +0 hours
Tue 20 Nov 2018 09:56:14 GMT | collatz | Found app_config.xml
Tue 20 Nov 2018 09:56:14 GMT | PrimeGrid | Found app_config.xml
Tue 20 Nov 2018 09:56:14 GMT | | Config: use all coprocessors
Tue 20 Nov 2018 09:56:14 GMT | collatz | URL https://boinc.thesonntags.com/collatz/; Computer ID 842433; resource share 10000
Tue 20 Nov 2018 09:56:14 GMT | PrimeGrid | URL http://www.primegrid.com/; Computer ID 940055; resource share 1000
Tue 20 Nov 2018 09:56:14 GMT | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 5016304; resource share 2000
Tue 20 Nov 2018 09:56:14 GMT | | General prefs: from http://einstein.phys.uwm.edu/ (last modified 06-Nov-2018 04:14:50)
Tue 20 Nov 2018 09:56:14 GMT | | Host location: none
Tue 20 Nov 2018 09:56:14 GMT | | General prefs: using your defaults
Tue 20 Nov 2018 09:56:14 GMT | | Reading preferences override file
Tue 20 Nov 2018 09:56:14 GMT | | Preferences:
Tue 20 Nov 2018 09:56:14 GMT | | max memory usage when active: 12027.89 MB
Tue 20 Nov 2018 09:56:14 GMT | | max memory usage when idle: 21650.20 MB
Tue 20 Nov 2018 09:56:14 GMT | | max disk usage: 226.03 GB
Tue 20 Nov 2018 09:56:14 GMT | | max CPUs used: 9
Tue 20 Nov 2018 09:56:14 GMT | | suspend work if non-BOINC CPU load exceeds 25%
Tue 20 Nov 2018 09:56:14 GMT | | (to change preferences, visit a project web site or select Preferences in the Manager)
Tue 20 Nov 2018 09:56:14 GMT | | Setting up project and slot directories
Tue 20 Nov 2018 09:56:14 GMT | | Checking active tasks
Tue 20 Nov 2018 09:56:14 GMT | | Setting up GUI RPC socket
I have now reduced it to WU per GPU and it looks as if times are reducing significantly. Not sure if credits earned will be any higher than putting 2 WU on a single card to be honest. Will report back
Incidentally, the graph shows both cards jammed up at 100% utilisation
I think this is fool-proof but could you just try it for me please? • There are 10 types of people in the world; those who understand binary, and those who don’t
Yes, everything looks correct, boinc definitely see's and is using both cards. Tons of memory, cpu etc. I really have no idea. As a long shot set suspend work in non boinc load to 75%. Only other thing I can think of is that one slot is 16e and the other is 8e but even that shouldn't account for double the time. If the utilisation is at 100% with one unit per card then no point in running 2 per card. Very strange. Let it run 1 unit/card for a bit and see what times/credits are as a benchmark.
The best form of help from above is a sniper on the rooftop....
Alez wrote: ↑Tue Nov 20, 2018 11:22 am
If the utilisation is at 100% with one unit per card then no point in running 2 per card
THIS !!!
I remembered it all wrong. With the app optimisation on Collatz, there is no benefit from running 2 WU per GPU. I think that thought was a hangover from trying it on a sprint.
a Collatz WU is now completing in 6 mins +/- a few seconds so comfortably over 13 million / day Collatz from one machine. It has 9 threads on WCG as a bonus
I think this is fool-proof but could you just try it for me please? • There are 10 types of people in the world; those who understand binary, and those who don’t
Sorted then, 1 unit at 6 mins and 2 at 13 mins. Better to run the single units and all working as it should be. Very nice points haul per day from only 2 cards.
Those two cards are matching the output I have from 4 and a bit cards
................. Of course this is how arms races start
The best form of help from above is a sniper on the rooftop....
Cheers - I am keeping my eyes peeled for another GTX1080 as they go for a fair bit less than GTX1080Ti. The Ti doesn't seem to be dropping much in price - possibly due to the bad press that that its successor seems to be getting.
Doubtless the points-per-£ ratio will all change in a few months though
I think this is fool-proof but could you just try it for me please? • There are 10 types of people in the world; those who understand binary, and those who don’t
I think this is fool-proof but could you just try it for me please? • There are 10 types of people in the world; those who understand binary, and those who don’t
That is wrong. The GTX 960 is only 2GB and not massively more powerful than my GTX 750ti, but that card manages an un-optimised task in approx 46 mins on average.
The comp error is probably due to the card not handling the optimisation especially sieve_size=30 probabily overflows the 2 GB memory. I'd expect the card to run a task in approx 30 mins.
Thanks - that is marginally better but still estimating over 2 hours
Card reports as having 4Gb btw ...
Wed 21 Nov 2018 23:20:29 GMT | | CUDA: NVIDIA GPU 0: GeForce GTX 960 (driver version 390.87, CUDA version 9.1, compute capability 2.1, 4096MB, 4011MB available, 691 GFLOPS peak)
Wed 21 Nov 2018 23:20:29 GMT | | OpenCL: NVIDIA GPU 0: GeForce GTX 960 (driver version 390.87, device version OpenCL 1.1 CUDA, 4535MB, 4011MB available, 691 GFLOPS peak)
Just noticed ... OpenCL 1.1? Is that correct? And CUDA version 9.1. Wondering if I have the wrong drivers?
I think this is fool-proof but could you just try it for me please? • There are 10 types of people in the world; those who understand binary, and those who don’t
[mention]davidBAM[/mention] I currently don't have any systems running GPU / Linux but 390 is old. I believe 396 works well as do the 415 / 416 beta drivers. Some have claimed problems with 410 and 411 although 410.73 are reported to be very good on GPUgrid ? Make of that what you will. You could try 396 or straight to the beta 415 / 416. I believe 410 is the latest version officially approved for LTS so you should be able to choose it from software center. If not, follow instructions below.
I see your 1080's and opencl 1.2
An update ... I was going fairly well until "sudo apt-get install nvidia-driver-410" errored with all sorts of dependency failures. I worked through these until 410 seemed to be installed but a reboot had boincmgr report no useable GPU.
I think I'll have a fresh try at this from a new install after the Sprint (as that machine is having some success with LHC CPU work). Problem is though - I have about 1000 Collatz jobs bunkered for the 20 fake GPU cards too I am going to be very unpopular if I reset that project
I think this is fool-proof but could you just try it for me please? • There are 10 types of people in the world; those who understand binary, and those who don’t
If you get into a bind and have to reinstall Linux you can save all your BOINC WU. Copy the /var/lib/boinc-client folder to a USB stick. After you get the new Linux installed then install the same version of BOINC. Once you have it running shut it down and overwrite the boinc-client with the one that you saved.
Awesome, thanks. I did wonder about that but felt the chances of success were ... well ... limited
I think this is fool-proof but could you just try it for me please? • There are 10 types of people in the world; those who understand binary, and those who don’t