Discussion:
"Fatal Error: Lost connection with server"
h337h00k
2005-01-24 20:55:34 UTC
Permalink
hello users, I am trying to run unison to sync up 2 different servers.
The directory I am syncing is pretty large... about 3000 subdirs and
each of those subdirs has a couple dozen dirs under them. About 80 GBs
of data total.

I am using the following command:

unison -batch /dir/path ssh://remoteserver//dir/path

I originally used rsync to move the directory over to the remote
server however I now have a need for bidirectional mirroring hence the
reason why I now use unison.

When I run this command, I get the usual warning about no prior index
found since this is the first run.

Unison then begins to list all the dirs and subdirs (building the
index dbase I suppose). When it's almost finished, I get the following
lines:

Waiting for changes from server
Fatal Error: Lost connection with server

I have used unison before to mirror directories between these two
machines, but none of which were this big with so many subdir branches.

Nothing in any logs to give me any clues. I though it might be some
sort of ssh timeout, so I added "ConnectTimeout 10" in
/etc/ssh/ssh_config to give it a 10 second timeout, but I dont think
this will help since the ssh session does indeed connect initially.

ulimit -s shows I have an 8MB stack size.

Can anyone shed any light onto this problem?

Thank you

-mike








Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/unison-users/

<*> To unsubscribe from this group, send an email to:
unison-users-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
realpopupandy
2006-09-05 23:39:06 UTC
Permalink
Hi,

back in January 2005 "h337h00k" <***@...> wrote that he gets "Lost
connection with the server" while trying to sync a very large
directory for
the first time (see below for a copy). Unfortunately I don't see any
replies to his mail.

The thing is I've got the same problem. And it's definitely not
because of a ssh-timeout. First, because I've tried to turn them
all of and secondly, because the unison process on the server often
quits after a few minutes already.

While digging thru the FAQs, I then found out something very
interesting:
If I call unison with the parameter "-debug all" it does NOT quit
and syncs even very large filesets. That assures me that the
problem must be some kind of bug in unison.

My setup is the following:
Client: Windows XP (cygwin) and Linux workstations
Server: Debian GNU/Linux 3.1
I am running unison 2.9.1 because I couldn't find a more recent
(precompiled) version that exists both for Windoze and Linux.

I'd be very grateful for any hint!
Andy.
Post by h337h00k
hello users, I am trying to run unison to sync up 2 different servers.
The directory I am syncing is pretty large... about 3000 subdirs and
each of those subdirs has a couple dozen dirs under them. About 80 GBs
of data total.
unison -batch /dir/path ssh://remoteserver//dir/path
I originally used rsync to move the directory over to the remote
server however I now have a need for bidirectional mirroring hence the
reason why I now use unison.
When I run this command, I get the usual warning about no prior index
found since this is the first run.
Unison then begins to list all the dirs and subdirs (building the
index dbase I suppose). When it's almost finished, I get the following
Waiting for changes from server
Fatal Error: Lost connection with server
I have used unison before to mirror directories between these two
machines, but none of which were this big with so many subdir branches.
Nothing in any logs to give me any clues. I though it might be some
sort of ssh timeout, so I added "ConnectTimeout 10" in
/etc/ssh/ssh_config to give it a 10 second timeout, but I dont think
this will help since the ssh session does indeed connect initially.
ulimit -s shows I have an 8MB stack size.
Can anyone shed any light onto this problem?
Thank you
-mike
Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/unison-users/

<*> To unsubscribe from this group, send an email to:
unison-users-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Konstantin Münning
2006-09-06 07:49:39 UTC
Permalink
Hi!

How are your two systems connected? LAN or internet?

If it's internet and you are connectng through a NAT router (using ssh),
is it possible that your NAT connection is dropped? I know several
routers which have a quite short timeout for that (due to limited
resources). This is because while remote unison is scanning the
directories there is no traffic over the TCP connection and for large
syncs this can be quite a while.

Turning debugging on, on the other side, is generating traffic so the
NAT connection stays "alive".

Another thing with large syncs is memory consumption. Unison needs quite
a lot in such cases. This may be the reason why one unison aborts the
sync. But it shouldn't change with debugging on, so I'd look at the
first possibility.

Good luck!
Konstantin.
Post by realpopupandy
Hi,
connection with the server" while trying to sync a very large
directory for
the first time (see below for a copy). Unfortunately I don't see any
replies to his mail.
The thing is I've got the same problem. And it's definitely not
because of a ssh-timeout. First, because I've tried to turn them
all of and secondly, because the unison process on the server often
quits after a few minutes already.
While digging thru the FAQs, I then found out something very
If I call unison with the parameter "-debug all" it does NOT quit
and syncs even very large filesets. That assures me that the
problem must be some kind of bug in unison.
Client: Windows XP (cygwin) and Linux workstations
Server: Debian GNU/Linux 3.1
I am running unison 2.9.1 because I couldn't find a more recent
(precompiled) version that exists both for Windoze and Linux.
I'd be very grateful for any hint!
Andy.
Post by h337h00k
hello users, I am trying to run unison to sync up 2 different servers.
The directory I am syncing is pretty large... about 3000 subdirs and
each of those subdirs has a couple dozen dirs under them. About 80 GBs
of data total.
unison -batch /dir/path ssh://remoteserver//dir/path
I originally used rsync to move the directory over to the remote
server however I now have a need for bidirectional mirroring hence the
reason why I now use unison.
When I run this command, I get the usual warning about no prior index
found since this is the first run.
Unison then begins to list all the dirs and subdirs (building the
index dbase I suppose). When it's almost finished, I get the following
Waiting for changes from server
Fatal Error: Lost connection with server
I have used unison before to mirror directories between these two
machines, but none of which were this big with so many subdir branches.
Nothing in any logs to give me any clues. I though it might be some
sort of ssh timeout, so I added "ConnectTimeout 10" in
/etc/ssh/ssh_config to give it a 10 second timeout, but I dont think
this will help since the ssh session does indeed connect initially.
ulimit -s shows I have an 8MB stack size.
Can anyone shed any light onto this problem?
Thank you
-mike
Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/unison-users/

<*> To unsubscribe from this group, send an email to:
unison-users-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Andy Spiegl
2006-09-06 10:55:03 UTC
Permalink
Hi,
Post by Konstantin Münning
How are your two systems connected? LAN or internet?
internet, AVM FritzBox on one side, permanently connected root server on
the other.
Post by Konstantin Münning
If it's internet and you are connectng through a NAT router (using ssh),
is it possible that your NAT connection is dropped?
No, because I used VNC to connect to this machine (I am offsite)
and was watching all the time, moving the mouse etc.
Besides the (flatrate) connection of this router only drops once a day,
forced by the DSL provider after 24 hours.
Post by Konstantin Münning
Turning debugging on, on the other side, is generating traffic so the
NAT connection stays "alive".
Sounds logical but ... Hm, maybe just this single ssh-connection is
dropped by the router? But then why does it stop sometimes after about 5
minutes and other times already after 1 or 2 minutes?

My assumption still is that the unison server process quits by itself
after encountering something it doesn't like. But can I trace that?

Thanks,
Andy.
--
Fotos: francisco.spiegl.de o _ _ _
Infos: peru.spiegl.de __o /\_ _ \\o (_)\__/o (_) -o)
Andy, Heidi, Francisco _`\<,_ _>(_) (_)/<_ \_| \ _|/' \/ /\\
***@spiegl.de (_)/ (_) (_) (_) (_) (_)' _\o_ _\_v
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Periodically spray floppy disks with insecticide to prevent
system bugs from spreading.....



Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/unison-users/

<*> To unsubscribe from this group, send an email to:
unison-users-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Konstantin Münning
2006-09-06 11:57:56 UTC
Permalink
Hi!
Post by Andy Spiegl
Hi,
Post by Konstantin Münning
How are your two systems connected? LAN or internet?
internet, AVM FritzBox on one side, permanently connected root server on
the other.
Post by Konstantin Münning
If it's internet and you are connectng through a NAT router (using ssh),
is it possible that your NAT connection is dropped?
No, because I used VNC to connect to this machine (I am offsite)
and was watching all the time, moving the mouse etc.
Besides the (flatrate) connection of this router only drops once a day,
forced by the DSL provider after 24 hours.
No. I'm not speaking that the internet connection is interrupted but
that your NAT connection is dropped. This refers to any single TCP
connection going over NAT. VNC uses another TCP connection which is kept
alive when you move the mouse or something on the screen changes.
Post by Andy Spiegl
Post by Konstantin Münning
Turning debugging on, on the other side, is generating traffic so the
NAT connection stays "alive".
Sounds logical but ... Hm, maybe just this single ssh-connection is
dropped by the router? But then why does it stop sometimes after about 5
minutes and other times already after 1 or 2 minutes?
I don't know what you do with the FritzBox besides unison. If you have
some software running which makes a lot of TCP connections (files
sharing software does that) this may cause connections to drop earlier.
Post by Andy Spiegl
My assumption still is that the unison server process quits by itself
after encountering something it doesn't like. But can I trace that?
There are a lot of things you can do. You can run a script on the server
side instead of unison (option servercmd) which executes unison but
writes some log files, runs strace or whatever you wish.

Maybe the least intrusive way to confirm if the NAT connection is
dropped would be to create an additional tunnel with the same ssh
connection you use for unison (option sshcmd). When you keep this tunnel
busy somehow then you will see if "suddenly" it works. See -L option of ssh.

Konstantin



Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/unison-users/

<*> To unsubscribe from this group, send an email to:
unison-users-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Andy Spiegl
2006-09-06 13:40:53 UTC
Permalink
Post by Konstantin Münning
No. I'm not speaking that the internet connection is interrupted but
that your NAT connection is dropped.
Sorry, I misunderstood that.
Post by Konstantin Münning
I don't know what you do with the FritzBox besides unison. If you have
some software running which makes a lot of TCP connections (files
sharing software does that) this may cause connections to drop earlier.
The only network program that was running at the same time was ICQ,
but no messages were exchanged during that time.
Post by Konstantin Münning
Maybe the least intrusive way to confirm if the NAT connection is dropped
would be to create an additional tunnel with the same ssh connection you
use for unison (option sshcmd). When you keep this tunnel busy somehow
then you will see if "suddenly" it works. See -L option of ssh.
Good idea! I'll try that.

Thanks,
Andy.
--
"I shot an arrow into the air, and it stuck."
-- Graffiti in Los Angeles



Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/unison-users/

<*> To unsubscribe from this group, send an email to:
unison-users-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Andy Spiegl
2006-09-18 16:04:03 UTC
Permalink
Nobody? I am still clueless! What else could I try?
Andy.
--
Every once in a while, declare peace. It confuses the heck out of your enemies.



Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/unison-users/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/unison-users/join
(Yahoo! ID required)

<*> To change settings via email:
mailto:unison-users-***@yahoogroups.com
mailto:unison-users-***@yahoogroups.com

<*> To unsubscribe from this group, send an email to:
unison-users-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Konstantin Münning
2006-09-21 10:27:47 UTC
Permalink
Hi Andy,

sorry for the delay, had some work to do :-).
it took me a while but now I tried all your suggestions. I turned off
all programs that might interfere (Icq, Zonealarm, Antivir, ...) and
let TCPView (great tool!) run all the time to see what's happening.
Then I put this line in my config.prf
rshargs = -L 2222:localhost:22
Thus, an SSH-tunnel ist started, forwarding localhost:2222 to server:22
ssh -l serverusername localhost -p 2222
and there I started a "ping otherserver" to make sure that the ssh
connection doesn't drop.
Unfortunately that didn't help either. :-(
Does this mean you have got the same error as before "lost
connection..." or was it something else now?
The output of the ping in the Windows-commandshell simply stopped after 1-2
minutes. But according to ps on the linux server it was still running!
However TCPView didn't show any activity either.
TCPView should have shown TCP traffic while the pings were going and
should have shown some FIN packet(s) when the SSH connection has been
closed. If there hasn't been a FIN packet and/or if the sshd on the
linux box is still running, and/or "netstat -t" on the linux box was
showing the connection, and/or unison was still running on the linux box
then the connection was dropped by something on the way.
BTW, the directory to be synced contains
760 directories and 8427 files with a total of 5,2 GB.
If it's the initial sync this would take a while until it's processed
and traffic is generated.
Post by Konstantin Münning
There are a lot of things you can do. You can run a script on the server
side instead of unison (option servercmd) which executes unison but
writes some log files, runs strace or whatever you wish.
I moved /usr/bin/unison to /usr/bin/unison.bin and saved the following
#!/bin/sh
date >> /var/tmp/unison.log
echo Args: $* >> /var/tmp/unison.log
/usr/bin/unison.bin $* | tee -a /var/tmp/unison.log
There I've got a nice (binary) logfile now but I can't find anything
suspicious in it.
Well, you could have used "servercmd" instead of moving the program but
it's ok this way. You could instead call unison this way:

strace -o /var/tmp/unison.strace.log /usr/bin/unison.bin "$@" 2>
/var/tmp/unison.error.log

The strace log can get quite long and processing time is increased but
at the end you should at least see the exit code of the unison --server
process which may indicate some error. The stderr redirection would
probably show nothing but you can use it to check if there were any
errors reported by unison.
Then I tried again calling unison with "-debug all", but that didn't help
either when unison has to start from scratch. It seems that it keeps up a
little longer but if the archive is too large I end up with the same error
"lost connection with the server" again. The last lines of the debug
output don't show anything useful either - just that the connection to the
server is lost.
Well, you could try one more thing together with the test - open an ssh
connection to your server but leave the prompt without doing anything.
Then start the unison test. When the server connection is lost, check if
your first connection is still alive. Just to be sure.

Good luck,
--
Konstantin Münning



Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/unison-users/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/unison-users/join
(Yahoo! ID required)

<*> To change settings via email:
mailto:unison-users-***@yahoogroups.com
mailto:unison-users-***@yahoogroups.com

<*> To unsubscribe from this group, send an email to:
unison-users-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Andy Spiegl
2006-09-06 22:39:08 UTC
Permalink
Hi Konstantin and Gerhard,

it took me a while but now I tried all your suggestions. I turned off
all programs that might interfere (Icq, Zonealarm, Antivir, ...) and
let TCPView (great tool!) run all the time to see what's happening.

Then I put this line in my config.prf
rshargs = -L 2222:localhost:22

Thus, an SSH-tunnel ist started, forwarding localhost:2222 to server:22
which I used after the start of unison like so:
ssh -l serverusername localhost -p 2222
and there I started a "ping otherserver" to make sure that the ssh
connection doesn't drop.

Unfortunately that didn't help either. :-(

The output of the ping in the Windows-commandshell simply stopped after 1-2
minutes. But according to ps on the linux server it was still running!
However TCPView didn't show any activity either.

BTW, the directory to be synced contains
760 directories and 8427 files with a total of 5,2 GB.
Post by Konstantin Münning
There are a lot of things you can do. You can run a script on the server
side instead of unison (option servercmd) which executes unison but
writes some log files, runs strace or whatever you wish.
I moved /usr/bin/unison to /usr/bin/unison.bin and saved the following
shell script as /usr/bin/unison:

#!/bin/sh
date >> /var/tmp/unison.log
echo Args: $* >> /var/tmp/unison.log
/usr/bin/unison.bin $* | tee -a /var/tmp/unison.log

There I've got a nice (binary) logfile now but I can't find anything
suspicious in it.

Then I tried again calling unison with "-debug all", but that didn't help
either when unison has to start from scratch. It seems that it keeps up a
little longer but if the archive is too large I end up with the same error
"lost connection with the server" again. The last lines of the debug
output don't show anything useful either - just that the connection to the
server is lost.

I am clueless! What else could I try?

Thanks,
Andy.
--
Fotos: francisco.spiegl.de o _ _ _
Infos: peru.spiegl.de __o /\_ _ \\o (_)\__/o (_) -o)
Andy, Heidi, Francisco _`\<,_ _>(_) (_)/<_ \_| \ _|/' \/ /\\
***@spiegl.de (_)/ (_) (_) (_) (_) (_)' _\o_ _\_v
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
In Germany everything is forbidden, unless something is specifically allowed.
In the UK everything which is not specifically forbidden, is allowed.
In France everything is allowed, even if it is forbidden.
In Italy everything is allowed, especially when it is forbidden.




Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/unison-users/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/unison-users/join
(Yahoo! ID required)

<*> To change settings via email:
mailto:unison-users-***@yahoogroups.com
mailto:unison-users-***@yahoogroups.com

<*> To unsubscribe from this group, send an email to:
unison-users-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Gerhard Fiedler
2006-09-06 15:20:57 UTC
Permalink
Post by Andy Spiegl
Sounds logical but ... Hm, maybe just this single ssh-connection is
dropped by the router? But then why does it stop sometimes after about 5
minutes and other times already after 1 or 2 minutes?
Can you watch your connections? Like with Sysinternal's TCPView
http://www.sysinternals.com/Utilities/TcpView.html?
Post by Andy Spiegl
My assumption still is that the unison server process quits by itself
after encountering something it doesn't like. But can I trace that?
There should be a Linux tool to check a process, no?

Gerhard



Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/unison-users/

<*> To unsubscribe from this group, send an email to:
unison-users-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Loading...