The time now is 08/21/08 - 07:25
Log in: Username: Password:
Search forums for:
  

Linux Nerds: Why is my box pooping itself

Post new topic   Reply to topic
Author Message
Jukas
Toomuchtimeonhands
Toomuchtimeonhands


Joined: 19 Mar 2003
Posts: 896



PostPosted: 03/30/06 - 18:25    Post subject: Linux Nerds: Why is my box pooping itself Reply with quote

I have a AMD 3200+ running Debian 3.1 Sarge that I've got at a colo facility that's going to host mail and websites for me and what not. I've run into this super **** problem with it though that makes eth0 totaly shit itself.

Anytime I connect via SSH and either A) compile a medium large size package with gcc or B) try and check for updates with apt-get update it runs for a random amount of time compiling or checking repos and then crashes the SSH session.

Once it's done that however, it completely downs eth0 (doesn't show up in ifconfig) and the box is completely unreachable. The strange thing is the rest of the box is running just fine and responds fine from the console. If I manually bring eth0 backup or restart the box I can get in again just fine until the next time.

I thought maybe it was a nic problem, so I remove the dlink nic that was in there and swapped it out with an intel nic but the problem still persists. Strangely enough my old Celeron 700mhz that's running Deb 3.1 as well is completely rock solid and has zero problems.

Help me Obi-Wan Kernobi cause driving 25min each way to the colo factility to do any kind of compile/upgrade sucks my nuts.
Back to top
kireol
RealPoor Master of Posts
RealPoor Master of Posts


Joined: 02 Aug 2003
Posts: 9517
Location: Royal Oak, MI



PostPosted: 03/30/06 - 18:28    Post subject: Reply with quote

well, asuming it's sshd isnt taking up a huge amount of ram and making it reach a bad spot on a chip(very unlikely), I'd say maybe it's sshd itself? /shrug. if it runs from locally, and it's not the nic, not much else it could be. try sshing form another local machine and check logs?
Back to top
motherface
RealPoor Guru
RealPoor Guru


Joined: 12 Mar 2003
Posts: 3407



PostPosted: 03/30/06 - 18:30    Post subject: Reply with quote

You say "ssh session crashes," what makes you think the ssh is crashing separately from the interface failing? Maybe it's an issue with your motherboard and IRQ shit.

If you can get a cheap kvm-over-IP it will save you those trips.
Back to top
r00typooh
RealPoor Master of Posts
RealPoor Master of Posts


Joined: 11 Oct 2002
Posts: 5178
Location: Miami, FL



PostPosted: 03/30/06 - 18:33    Post subject: Reply with quote

i have no clue ;(
Back to top
Jukas
Toomuchtimeonhands
Toomuchtimeonhands


Joined: 19 Mar 2003
Posts: 896



PostPosted: 03/30/06 - 18:34    Post subject: Reply with quote

motherface wrote:
You say "ssh session crashes," what makes you think the ssh is crashing separately from the interface failing? Maybe it's an issue with your motherboard and IRQ shit.

If you can get a cheap kvm-over-IP it will save you those trips.


Good point, I'm actually assuming it isn't the sshd service that's crashing, since the entire interface is down. What I'm more wondering about is WHY the interface is going down.

I don't have much experience troubleshooting mb/irq issues in nix, would I be able to find any hints in the logs pointing to this?
Back to top
motherface
RealPoor Guru
RealPoor Guru


Joined: 12 Mar 2003
Posts: 3407



PostPosted: 03/30/06 - 18:36    Post subject: Reply with quote

Jukas wrote:
motherface wrote:
You say "ssh session crashes," what makes you think the ssh is crashing separately from the interface failing? Maybe it's an issue with your motherboard and IRQ shit.

If you can get a cheap kvm-over-IP it will save you those trips.


Good point, I'm actually assuming it isn't the sshd service that's crashing, since the entire interface is down. What I'm more wondering about is WHY the interface is going down.

I don't have much experience troubleshooting mb/irq issues in nix, would I be able to find any hints in the logs pointing to this?


maybe in dmesg or /var/log/messages... usually for this type of crap I just start swapping hardware. But it could be kernel modules or device drivers or maybe sshd like K said. Sshd is probably the easiest to test, config & compile it for its own directory (/usr/local/sshd-test or whatever) and set it to listen on port 2222 and ssh to that and see if it happens. That way it doesn't f**k with your existing sshd.
Back to top
r00typooh
RealPoor Master of Posts
RealPoor Master of Posts


Joined: 11 Oct 2002
Posts: 5178
Location: Miami, FL



PostPosted: 03/30/06 - 18:40    Post subject: Reply with quote

motherface wrote:
maybe in dmesg or /var/log/messages... usually for this type of crap I just start swapping hardware. But it could be kernel modules or device drivers or maybe sshd like K said. Sshd is probably the easiest to test, config & compile it for its own directory (/usr/local/sshd-test or whatever) and set it to listen on port 2222 and ssh to that and see if it happens. That way it doesn't f**k with your existing sshd.



you make it sound so simple ;(
Back to top
motherface
RealPoor Guru
RealPoor Guru


Joined: 12 Mar 2003
Posts: 3407



PostPosted: 03/30/06 - 18:47    Post subject: Reply with quote

I learned the hard way how not to play with sshd when my office was on 29th street in Manhattan and the f*****g datacenter was in Secaucus, NJ. "Oh... f**k. I'll be back in 5 hours. "
Back to top
Occulis
RealPoor Jedi
RealPoor Jedi


Joined: 11 Oct 2002
Posts: 13293
Location: Moral Relativity Central



PostPosted: 03/30/06 - 19:16    Post subject: Reply with quote

lewl
Back to top
motherface
RealPoor Guru
RealPoor Guru


Joined: 12 Mar 2003
Posts: 3407



PostPosted: 03/30/06 - 19:23    Post subject: Reply with quote

r00typooh wrote:
motherface wrote:
maybe in dmesg or /var/log/messages... usually for this type of crap I just start swapping hardware. But it could be kernel modules or device drivers or maybe sshd like K said. Sshd is probably the easiest to test, config & compile it for its own directory (/usr/local/sshd-test or whatever) and set it to listen on port 2222 and ssh to that and see if it happens. That way it doesn't f**k with your existing sshd.



you make it sound so simple ;(


btw, wget openssh.tar.gz; tar xzf openssh.tar.gz; cd openssh; ./configure --prefix=/usr/local/openssh-4.0.jizzmopper; make; su; make install; /usr/local/openssh/sbin/sshd -p 2222 -d; ssh -p 2222 localhost;
Back to top
r00typooh
RealPoor Master of Posts
RealPoor Master of Posts


Joined: 11 Oct 2002
Posts: 5178
Location: Miami, FL



PostPosted: 03/30/06 - 19:35    Post subject: Reply with quote

oh, well that is pretyt simple
Back to top
Jukas
Toomuchtimeonhands
Toomuchtimeonhands


Joined: 19 Mar 2003
Posts: 896



PostPosted: 03/30/06 - 19:42    Post subject: Reply with quote

It's gotta be a hardware issue of some sort, I haven't been on the box all day and the eth0 int just dropped out an hour or so ago, and all net services are unreachable.

I think I'll put off driving down ther till another day.
Back to top
r00typooh
RealPoor Master of Posts
RealPoor Master of Posts


Joined: 11 Oct 2002
Posts: 5178
Location: Miami, FL



PostPosted: 03/30/06 - 19:44    Post subject: Reply with quote

could it be a datacenter issue? i mean, bad cable, bad port, or some other issue outside of your control?
Back to top
Jukas
Toomuchtimeonhands
Toomuchtimeonhands


Joined: 19 Mar 2003
Posts: 896



PostPosted: 03/30/06 - 19:50    Post subject: Reply with quote

r00typooh wrote:
could it be a datacenter issue? i mean, bad cable, bad port, or some other issue outside of your control?


Highly unlikely. I've got several other servers there both on the same ip space and an entirely different class C. I supposed it could be the cat5 but that doesn't explain the interface coming back up after a restart, or being brought up manually.

I still have a feeling it's a hardware conflict, though I may go back to a 2.4*-386 kernel just to rule out a kernel issue.
Back to top
Jukas
Toomuchtimeonhands
Toomuchtimeonhands


Joined: 19 Mar 2003
Posts: 896



PostPosted: 04/19/06 - 18:57    Post subject: Reply with quote

Yay for updates.

So I brought the box back to my office and put it on the local net here and was able to duplicate what's happening. When it crashes out SSH is still running and visable under ps aux, so it's not the ssh serer itself pooping.

However unlike I thought the eth0 interface is actually up, and the default gateway is still present (confirmed with ifconfg and route ). I can also ping the lo int just fine.

However, if I try and ping the gateway I get "Destination Host Unreachable". Manually downing and bringing up the eth0 int doesn't fix it, but if I yank the cat5, replug it in and then ping the gateway again I'll get "Destination Host Unreachable" for about 15 seconds and the interface will come back up.

I've tested this against two different switches, and about 10 different cat five cables, it's gotta be something in the box itself and this is driving me bugshit.

It's running Deb 3.1 Sarge Stable and
Linux relay-01 2.6.8-3-686 #1 Thu Feb 9 07:39:48 UTC 2006 i686 GNU/Linux
Back to top
Display posts from previous:   
Post new topic   Reply to topic
Page 1 of 1

Related topics: