Search  
Thursday, January 08, 2009 ..:: Forums ::.. Register  Login

Members see this site ad free!

Welcome To Our Forums 

Registration is required to post, but everyone is free to browse and read our forums.  This is simply to prevent the spamming of our forums, and Registration is always free, and we will not share any personal details with anyone.  You will need a valid email address to register as we need a way to confirm your registration.

You may login or register using the form to your right.

Thanks for stopping by and please enjoy your stay here on the Team Starfire website.

 TSWB Forums Minimize
SearchForum Home
     
  Aggregated  Project Forums  Seti@Home  S@H Outage upda...
 Re: S@H- no new work, get your backup projects going...
 
 5/13/2007 3:42:39 AM
User is offlinepaul
1039 posts
1st




Re: S@H- no new work, get your backup projects going...
 (N/A)
 Update: We got the new server yesterday, inserted our old disks and booted it up. It came right up, but verifying the file systems took overnight. The work is being created, the splitters and assimilators are working. It will be a while before we catch up. Thank you for your continued patience and support.

Thumper has been up a day and a half, and still going strong.Many of us got work yesterday, status page is all green, results are being sent out as fast as the splitters can make them.

Berkeley's network graph is wonkers, so I can't see when the network traffic will be slowing down, but I'd imagine with the backlog, it will be into tomorrow before it settles down to normal.

Paul

Co-owner, the Group of 10 100 200 300
 5/13/2007 11:17:21 AM
User is offlineCrystallize
614 posts
2nd




Re: S@H- no new work, get your backup projects going...
 (N/A)
Can somebody actually get work downloaded ?

I can't, not on  SETI or SETI beta.

My "Transfer" tab is full of WUs unable to get either uploaded or downloaded...

I get work from other projects, but not these two !

5/13/2007 8:14:39 PM|SETI@home|[file_xfer] Temporarily failed download of 18fe05aa.26345.16432.584662.3.6: system connect
5/13/2007 8:14:39 PM|SETI@home|Backing off 1 hr 17 min 32 sec on download of file 18fe05aa.26345.16432.584662.3.6
5/13/2007 8:14:40 PM||Access to reference site succeeded - project servers may be temporarily down.
5/13/2007 8:15:23 PM|SETI@home|[file_xfer] Started download of file 16fe05ab.10775.15088.490894.3.129
5/13/2007 8:15:47 PM||Project communication failed: attempting access to reference site
5/13/2007 8:15:47 PM|SETI@home|[file_xfer] Temporarily failed download of 16fe05ab.10775.15088.490894.3.129: system connect
5/13/2007 8:15:47 PM|SETI@home|Backing off 7 min 22 sec on download of file 16fe05ab.10775.15088.490894.3.129
5/13/2007 8:15:48 PM||Access to reference site succeeded - project servers may be temporarily down.


| >>> My RC-72 stats <<< | >>> My F@h stats <<< |

 5/13/2007 11:36:08 AM
Online now...nutcase
247 posts
3rd




Re: S@H- no new work, get your backup projects going...
 (United States)

 Crystallize wrote
Can somebody actually get work downloaded ?

I can't, not on  SETI or SETI beta.

My "Transfer" tab is full of WUs unable to get either uploaded or downloaded...

I get work from other projects, but not these two !

5/13/2007 8:14:39 PM|SETI@home|[file_xfer] Temporarily failed download of 18fe05aa.26345.16432.584662.3.6: system connect
5/13/2007 8:14:39 PM|SETI@home|Backing off 1 hr 17 min 32 sec on download of file 18fe05aa.26345.16432.584662.3.6
5/13/2007 8:14:40 PM||Access to reference site succeeded - project servers may be temporarily down.
5/13/2007 8:15:23 PM|SETI@home|[file_xfer] Started download of file 16fe05ab.10775.15088.490894.3.129
5/13/2007 8:15:47 PM||Project communication failed: attempting access to reference site
5/13/2007 8:15:47 PM|SETI@home|[file_xfer] Temporarily failed download of 16fe05ab.10775.15088.490894.3.129: system connect
5/13/2007 8:15:47 PM|SETI@home|Backing off 7 min 22 sec on download of file 16fe05ab.10775.15088.490894.3.129
5/13/2007 8:15:48 PM||Access to reference site succeeded - project servers may be temporarily down.

 

I have the same problem here.

it basically we are overloading Berks network connectivity just like a DNS attack would do and it is having troubles processing all the network requests.


 5/13/2007 2:28:08 PM
User is offlineskildude
247 posts
3rd




Re: S@H- no new work, get your backup projects going...
 (N/A)
kinda sux that the server is kind enough to tell us what WU we are going to get unable to actually retrieve that WU
 5/13/2007 4:00:26 PM
User is offlinepaul
1039 posts
1st




Re: S@H- no new work, get your backup projects going...
 (N/A)
Myself, I've suspended SETI since last Monday, since the farm would be doing nothing but contributing to the requests that have overloaded Berkeleys network since the outage began.

Last Monday/Tuesday, I attached to all the other projects that interest me a bit, and will probably keep them there. Some 10 high powered clients are now working on World Community Grid, Einstein, Rosetta, Malaria Control, Spinhenge, Quantum Monte Carlo, Simap, Leiden Classical and uFluids.

I do have 2 Conroe builds I'll be bringing online sometime next week, will probably attach them to Seti to see how they do there, if the pipeline to Berkeley has cleared by then.

Paul

Co-owner, the Group of 10 100 200 300
 5/13/2007 10:38:43 PM
User is offlineSat_Man
1369 posts
1st




Re: S@H- no new work, get your backup projects going...
 (United States)
Looks like they put in some overtime getting the new server up an running.  The problem is there is nothing coming down and nothing going up.  I was able to download 4 WU's yesterday on one system and crunch them, however like the downloads I can't get them to upload.  They'll probably get the bugs out tomorrow...I hope!

It has been my experience that folks who have no vices have very few virtues
 5/14/2007 3:07:05 AM
User is offlineScottMo
88 posts
5th




Re: S@H- no new work, get your backup projects going...
 (United States)
I was able to download 19 on Sunday (another three are stuck in d/l), but can't get any up to Berk. Unless Seti gets straight, I'll be on other projects when it runs out later today.

Oh well, at least I grabbed those. Thankfully, BOINC is designed that we can easily switch between projects.

 5/14/2007 4:09:07 AM
User is offlinepaul
1039 posts
1st




Re: S@H- no new work, get your backup projects going...
 (N/A) Modified By paul  on 5/14/2007 6:19:16 AM)
It looks like a routing issue happened yesterday, until they get into the office today to fixor it, we're out of luck. All servers are up and running, but no one can talk to them effectively. We weren't supposed to be back up until Tuesday, so many thanks to the Seti staff for staying over Friday and Saturday to bring Thumper back online.

The Seti project has it's own dedicated 1000 Mbps pipeline, which was implemented in February of this year, replacing the old 100 Mbps pipe they've used from Cogent since 2002. The new network graph for this pipe is - http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d%3Aw%3Am%3Ay;view=UcastPackets



Co-owner, the Group of 10 100 200 300
 5/14/2007 7:58:25 AM
User is offlineSparky Jim
164 posts
4th




Re: S@H- no new work, get your backup projects going...
 (N/A)

Thanks Paul, that explains why none of my machines can get new work units or upoload completed ones. I have set them back to doing Einstein for now unless they have cached SETI units available. It appears some managed to get units on Saturday for a short while, but then it went pear shaped.. Those graphs tell that story..

 

Hopefully it will be back to normal in the next few days, until then I will be crunching Einstein and SETI. The Server is still doing Einstein as he cannot get any SETI units, so I have no idea how fast Xeons will do SETI..I'm intrigued.


Blooming Sig broke..will repair when able!!
 5/14/2007 8:56:46 AM
Online now...Xaak
1056 posts
1st




Re: S@H- no new work, get your backup projects going...
 (United States)

Thar be a spike on the graph!


Gary
You can't fix dead.
 5/16/2007 12:33:09 PM
User is offlinepaul
1039 posts
1st




Re: S@H- no new work, get your backup projects going...
 (N/A) Modified By paul  on 5/16/2007 2:33:42 PM)
Ok, here's a Wensday afternoon update, better explained with a picture of Berkeley's network graph of the last day or so. Brief explanation to bring us up to speed since they got into the office Monday to kick the servers yet again. Monday, most all day the project was chipping away at the tremendous backlog created by the 2 week outage. Tuesday they did the normal DB backup, which took half the time it normally took, because the project has been limping along the last few weeks, with Thumper being down. On the way to getting the backlog reduced and the project back on track, you'd think....

Nope. Myself, I was puzzled at the graph, with network traffic being really low considering the huge amount of dropped connections they were reporting. Eric K reports a few hours ago on the big oopsie that happened yesterday after the scheduled backup-

 Addendumb: I had a 'd'Oh!' moment this morning. Apparently we were running with the upload timeout set at 20 minutes (which I think is the apache default), so our connections were being dominated by machines that couldn't get through, but were hanging onto the connection.

If you look at our network traffic, you can see what happened when I lowered that to 30 seconds..... We sending about 4 times as much work as we were when I got in this morning.


So. You can see the drop to 0 yesterday when they had the backup outage, some 20 hours or so of low level traffic, and then the big spike when they adjusted Apache. Here we are Wen. afternoon, and they're actually making big strides to reduce the backlog of requests from millions of clients around the world, 16 days after meltdown.

Oh, and in case anyone is wondering why it took so long to get the new server, is that it was donated by Sun, or sold at a fraction of the retail cost by Sun, so in my book, it's kinda hard to bitch when someone is donating a $30,000 server to your science project.

In any event, I'm still suspended from Seti across my fleet until they settle down to normal, doing my part to lessen the load a bit.


mini-graph.cgi.png

Co-owner, the Group of 10 100 200 300
 5/16/2007 12:55:39 PM
User is offlineDT
322 posts
3rd




Re: S@H- no new work, get your backup projects going...
 (N/A)
can u repeat that i didnt get it?
 5/17/2007 3:57:28 AM
User is offlinepaul
1039 posts
1st




Re: S@H- no new work, get your backup projects going...
 (N/A)
Thursday morning update for Matt Lebofsky-

 Wow - what a mess. I think we're in the middle of our biggest outage recovery to date, and it's breaking everything. The good news is we're coming into some newer hardware which we'll get on line to help somehow.

See Eric's thread in the Staff Blog. He's been working overtime getting a new frankenstein machine together to act as another upload/download server and reduce the load on bruno. The scheduling server (galileo) has been choking - I just now moved all that over to bruno as well. So we may retire galileo soon, too. Jeff has been going nuts trying to track down errors in validator/assimilator code so we can get those on line as well. And our old friend "slow feeder query" is back, probably just being aggravated by the heavy load.



It ain't over yet, people. Network graph shows traffic went down to nothing sometime an hour or so ago.

Co-owner, the Group of 10 100 200 300
  Aggregated  Project Forums  Seti@Home  S@H Outage upda...

Forum Home  Search       

 Account Login Minimize


   


  

 TSWB Chat Minimize

The Starfire Channel

Join us 24/7 in our Team Starfire chat room. Everyone's welcome.

For Mirc and other IRC clients
Click Here

Or try our new Java client!  It's pretty cool.
Click Here


    

 UsersOnline Minimize
Membership Membership:
Latest New User Latest: sp2ong
New Today New Today: 0
New Yesterday New Yesterday: 0
User Count Overall: 277

People Online People Online:
Visitors Visitors: 25
Members Members: 3
Total Total: 28

Online Now Online Now:
01: Xaak
02: fesstess
03: nutcase

  

© 2006 Team Starfire - Hosting by Xaak Consulting, LLC   Terms Of Use  Privacy Statement
DotNetNuke® is copyright 2002-2009 by Perpetual Motion Interactive Systems Inc.