GPU Grid, Vista and drivers

Using your nVidia or AMD Graphics card for BOINC computation.
Nightlord

#1 GPU Grid, Vista and drivers

Post by Nightlord »

Well, since my last post a week or so ago in the GPU Problems and Solutions thread seems to have broken it (the thread that is), I though it might be worthwhile dropping a log of what I've been trying to do for the last week here.

I have been having no end of troubles getting what was a reliable cruncher back and running on GPU Grid. Nothing wanted to work: change drivers from 181.xx through 186.xx, change GPU, npu's, kick the beast, shout and swear at it, nope, every unit bar one error'd out within a few seconds or sat at full load on one CPU core with it's thumb up it's proverbial....

The box (this one) has been running CPDN for a few weeks so is solid in terms of hardware and before several months of enforced shutdown was a decent little cruncher on this project.

I made a big leap forward yesterday: I reinstalled Vista and re-installed Boinc. First unit error'd out due to finger trouble on my part with drivers etc. Second unit has just completed successfully. Yeharr!

Third unit, just downloaded, however, is once again stuck at full load on one CPU core. Within a couple of seconds of starting it sits there consuming CPU cycles doing nothing with the GPU.

I don't have as much time to dedicate to fixing things as I used to, so it may take a while to sort out, but for now, I'm stumped. Any ideas?

BTW, a second box that I through Win 7 RC1 onto has run sweet over the weekend (that's what made me think about a clean OS).

:?:

Argghh!!

After 3 or 4 times of shutting down Boinc and re-starting it to see if that would kick it into life, the WU failed. Next unit is now sat at full load on the CPU again. :x
User avatar
Megacruncher
G.L.S.B.
G.L.S.B.
Posts: 4699
Joined: Mon May 29, 2006 11:33 pm
Location: Edinburgh, Scotland
Contact:

#2

Post by Megacruncher »

Perhaps your GPUs have become dusty while you've been off-line? A good hoovering might help.

Sad to say I've found that the main culprit in failing WUs tends to be failing GPUs [or presumably other hardware]. I've lost 1 and another is getting a bit unstable.

Latest drivers and latest version of BM should work.

Having said which, switching off and then on is always risky. It might even just be that your PSU can't quite get it up again. Or if you are lucky it might just be that the project has issued some duff WUs.

Whatever, good luck in getting it all working again.
Willie the Megacruncher
Image
Nightlord

#3

Post by Nightlord »

Well....maybe it's working? :dontknow:

I replaced the GPU with another card and it all sprang into life. Strange thing is, this is the third card in that box and no change in driver from the previous installation.

The other two cards exhibited the same behaviour: GPU for a few seconds then full CPU load, or crash and burn immediately.

So, for now, its two lowly 256MB 8800GT's running and we'll see how that pans out.

Not what I used to run, but enough to make a contribution :wink:
User avatar
Megacruncher
G.L.S.B.
G.L.S.B.
Posts: 4699
Joined: Mon May 29, 2006 11:33 pm
Location: Edinburgh, Scotland
Contact:

#4

Post by Megacruncher »

Every little helps, but TBH 7K in the last 24hrs isn't really so little! :)
Regardless, it's good to have you back. Hopefully your renewed activity is an indicator of some green shoots of economic recovery budding forth in your world?
Willie the Megacruncher
Image
Ben

#5

Post by Ben »

Hi Nightlord,

Good to see you around the forums again. I can't say i know an awful lot about gpugrid, but i do recall a while ago about GPUGrid saying that they recommend getting 512MB or higher memory cards. Could it be possible that the WU's they are running now (or maybe only a select few) need higher memory demands then 256MB?

:?:
Nightlord

#6

Post by Nightlord »

Thanks for the comments Ben,

The minimum spec is 256MB and I've been through two 512MB and now a 256MB card all have the same issue on that box. I have a 256MB card running under Win7 RC1 on identical hardware smooth and sweet: every WU runs OK.

I think I've come to the conclusion I have a motherboard fault. After re-boot it ran the next WU to completion, then started the full load on one core act again. Every WU (bar 3) on that machine regardless of card, reboot, re-install OS etc fails. CPDN is 100% OK on the CPU.

I went to install Ubuntu dual boot on that box last night and it failed to understand the GPU. I think the PCI bus is gubbed. The Mobo has run 100% for about 3 years, then had a downtime of 4 months before coming back on line and failing each WU.

I'll take a look at the mobo. but I think since it runs CPU projects just fine, but not GPU (Seti or GPUGrid) I'm going to have to accept it won't crunch GPU projects.

Ho Hum....

:roll:
Nightlord

#7

Post by Nightlord »

I'm just about done for getting another box running on GPU Grid.

I changed the mobo, reloaded Vista, ran Win7 on dual boot, switched drivers, and BM versions, swapped cards around, 256MB...512MB all appear to do the same - run for 30 seconds or so then the WU goes to full load CPU.

I did manage to get a couple of extra WU's run and thought I was home a dry, but it seems not to be stable even on new hardware.

I would be tempted to think about a memory or CPU issue, but it's happily crunching CPDN on the CPU.

The other box runs no issue on Win7 RC1, 256MB 8800GT.

Clearly GPU Grid has the hump with me for being away for a few months :?
jockmacmad2
Boinc Warrant Officer Class 2
Boinc Warrant Officer Class 2
Posts: 321
Joined: Tue Jan 27, 2009 7:18 am

#8

Post by jockmacmad2 »

This may seem like a dumb question but have you tried it on SETI CUDA and see how that behaves? If all is fine then it's debug GPUGrid. If that fails it's unlikely to GPUGrid be but the rig instead....

Another thing to try is:-
Turn on the <sched_op_debug> flag in cc_config and you can see some more details
Image
Nightlord

#9

Post by Nightlord »

Yup, same behaviour in Seti :(

The really odd thing is that this is effectively now a new rig compared to last week: new mobo, clean Vista install, different card. Even tried it on dual boot clean install of Win 7 RC1. The only common thing to the rig from last week is the ram, cpu and psu and yet it behaves very differently to how it was 5 months ago before and after all the hardware changes.

I guess the logical explanation is the psu, ram or cpu has gone soft, but that same hardware setup runs CPDN very happily which is one of the toughest CPU projects around. :?:

Good idea about the debug though, I'll turn that on and crunch a few more erorrs! See what it says.
User avatar
rowpie
Peon
Posts: 239
Joined: Mon May 29, 2006 11:26 pm

#10

Post by rowpie »

gpu grid is playing up on cpu usage and slowing down on any verion of boinc ive tried above 6.5.0

the .23 version and .26 versions both made it much worse. my average time per wu went from just over 8 hours to 12 hours plus.

maybe a cuda thing in general but might be worth trying an older boinc install.
jockmacmad2
Boinc Warrant Officer Class 2
Boinc Warrant Officer Class 2
Posts: 321
Joined: Tue Jan 27, 2009 7:18 am

#11

Post by jockmacmad2 »

Well the PSU may manage the output to CPDN et al. fine but crank the GPU up and there is a largish chunk of +Amps on the 12V rail.

I hate those sorts of problems, so much so I bought a PSU tester before just so I can put it inline and check whats happening. Just wish I knew where I put it lol.
Image
User avatar
Megacruncher
G.L.S.B.
G.L.S.B.
Posts: 4699
Joined: Mon May 29, 2006 11:33 pm
Location: Edinburgh, Scotland
Contact:

#12

Post by Megacruncher »

I've had major problems in the last 5 days with GPUGrid.
Over the weekend there was a somewhat forced short notice upgrade of CUDA drivers. I only became aware of it after a visit from my mother which, given her irrational objection to being cooked alive as she sleeps , meant that the farm had been more or less switched off for a few days, but I dutifully upgraded.

Anyhoo, switching back on after couple of days off I was horrified at how many of my WUs were crashing.

Switching back to an earlier BM version 6.5.0 didn't help - all that happened was that I stopped gettting new work. :(
Willie the Megacruncher
Image
steve

#13

Post by steve »

Megacruncher wrote:I've had major problems in the last 5 days with GPUGrid.
Over the weekend there was a somewhat forced short notice upgrade of CUDA drivers. I only became aware of it after a visit from my mother which, given her irrational objection to being cooked alive as she sleeps , meant that the farm had been more or less switched off for a few days, but I dutifully upgraded.

Anyhoo, switching back on after couple of days off I was horrified at how many of my WUs were crashing.

Switching back to an earlier BM version 6.5.0 didn't help - all that happened was that I stopped gettting new work. :(


Come and join me on aqua
jockmacmad2
Boinc Warrant Officer Class 2
Boinc Warrant Officer Class 2
Posts: 321
Joined: Tue Jan 27, 2009 7:18 am

#14

Post by jockmacmad2 »

I'm trying AQUA on one machine as well but the run times seems really long. So much so in 3 days I have not had a result returned yet.

Hope the credit at the end is worth it.
Image
User avatar
Megacruncher
G.L.S.B.
G.L.S.B.
Posts: 4699
Joined: Mon May 29, 2006 11:33 pm
Location: Edinburgh, Scotland
Contact:

#15

Post by Megacruncher »

I can't get any work for aqua. :(
Willie the Megacruncher
Image
steve

#16

Post by steve »

Megacruncher wrote:I can't get any work for aqua. :(

There is a small error on the server if you want gpu select cpu in project prefs will be fixed in the week

or run both gpu and cpu recommend upgrade to 6.6.36 to do this ... Have set 2 machines up tonight let ya know if it works ( Be prepared the new gpu wu may last 4-5 days )
User avatar
Megacruncher
G.L.S.B.
G.L.S.B.
Posts: 4699
Joined: Mon May 29, 2006 11:33 pm
Location: Edinburgh, Scotland
Contact:

#17

Post by Megacruncher »

You are right, opting to use my CPU got me GPU work.
Strange.
We'll see if we can hold off UBT.
Willie the Megacruncher
Image
Nightlord

#18

Post by Nightlord »

Might have a look at Aqua myself: I think the WU's take a long time to crunch, which is good for me just now - not much time to babysit the boxes.

Any how, I gave up on getting that box running under Vista. Tried everything, mobo, psu, card, processor, ram etc....different BoincManager versions, different GPU projects, different drivers.....they all did the same: after about 30 seconds of GPU crunching the WU locked up and the science app went to 100% load on the CPU. The GPU wasn't doing anything: the measured GPU temps dropped off rapidly. Very few WU's survived the repeated stop/starts. Occasionally through stop/reboot/start cycles, I could get the odd WU to run to completion.

Also tried Win 7 Beta on it, but had issues because I accidentally installed the beta rather than the rc1.

So where am I now?......the box is running on the original hardware line-up under Ubuntu 9.04 and is well into it's third WU without issue.

Just to be clear: this working box is the same as this one and and this one too

The only thing I came across is that the box running under Vista is also loaded with some other tools now and has a relatively high memory use. I wondered if that might have caused some issue. :?

Not saying it's 100% fixed, but right now, it seems ok.
Nightlord

#19

Post by Nightlord »

Time for an update from me....

as some might have noticed a slight up-tick in my output: I got it working again. Three boxes, nothing special: all 8800GT's, two running from E6700's and one from a lowly P4HT. A different OS in each box: WinXP32, Ubuntu 9.04 and Win 7 RC1.

All fine and dandy, until I started playing with the slow old P4 host and transformed it into something rather more in keeping with my previous track record

Outrageous? Oh, yes indeedy.... :wink:
PinkPenguin

#20

Post by PinkPenguin »

...Bet that made NVidia green with envy. Outrageous Indeed! :D
User avatar
Megacruncher
G.L.S.B.
G.L.S.B.
Posts: 4699
Joined: Mon May 29, 2006 11:33 pm
Location: Edinburgh, Scotland
Contact:

#21

Post by Megacruncher »

A 280 failure (it is jam packed full of dust - hopefully a good Dysoning will fix it) has prompted me back to Milkyway with a cheapo 3850. I only just got it to work (by rewinding to the 8.12 drivers) and so far in the first 40 minutes of the exercise I've got 500 credits from it. Work seems to pretty steady which was my main beef with Milkyway in the past.

It's quieter and neater than the hulking great 280 as well!

Anyway Nightlord it's good to see you back competing again.! :D
Willie the Megacruncher
Image
Post Reply Previous topicNext topic

Return to “Graphics Processing Unit (GPU)”