High connection latency, and general "things not working the way they are supposed to", but different

Tuesday, February 8, 2022 6:22 AM

Problem(s) Encountered:

While I was taking evening flats the VNC connection seemed very spotty, which I chalked up to our ISP having a bit of a slowdown, but in retrospect it may not have been.

Everything went normally for several hours, and I got a lot of useful data.  Sometime before midnight, though, the VNC got very slow -- in particular, Owl was updating intermittently,
freezing for a minute or more and then suddenly catching up.  The other windows were freezing a lot, too.  Weirdly, there was a proliferation of little window-icon buttons in the bottom bar (where the desktop cartoon and buttons to minimize windows live), all labeled  Java  and apparently not connected to any windows -- I couldn't open those, or even delete them. 

Owl is in Java, and I wondered if it might be that the script that reads the telescope was messing up -- it does, about one time in every few hundred exposures -- but the image headers
almost all had correct telescope coordinates.  Strangely, the data all seemed fine, so I soldiered on.  

I tried several things -- logging off and back on, and restartingOwl, each several times.  This did clear up the plague of Java buttons.  After I restarted Owl for the penultimate time, I took a few frames to set up a field, and then, when I went to up the exposure times and take multiple exposures, I could not get the text entry boxes in Owl to respond.  So I exited Owl yet again.  When I got back, I got error messages about not being connected to any devices.  I tried the  setup button and attempting to run the setup scripts led to more errors about not being connected to any device.  

At this point I called the 2.4m, and Nicole came down to power cycle things and wiggle connectors, but nothing worked.  The connection was still pretty slow and glitchy, oddly, but I was able to close up normally and, of course, call it a night.  I still don't know what the root problem was.

Solution:

 I remotely opened a connection (from my office) and found numerous “errors” associated with opening ds9.  As you likely recall, this is a known non-issue with Owl that often occurs.  Why, I have no idea, but I see it more often than not on mdmarc2 (interestingly it does not always happen, and never happens on mdmarc1).  Even after I closed Owl, I had about 6 of these ds9-associated error pop-ups to close out.  Once I closed them all, I opened Owl, performed a “Startup” and took a test frame without issue.  I think that the latency associated with a rough network connection may have caused the issue to compound into something greater.  I’ll likely reboot mdmarc2 for good measure, but first need to pull CHaS from the 2.4m.

From Jules:  I have also seen Owl and everything else seem to freeze for minutes, and ultimately found out thatit continued to take data, or that it eventually stopped taking data several minutes after I appeared to lose the connection. Worse things happen to other windows, typically the "e" key, which I had been using in ds9 toexamine the focus of a star, would repeat endlessly in the IRAF window, rendering it useless.   Or a newIRAF window for Owl would try (and fail) to open when one was already running.   Usually I could recover fromthis without having to reboot the computers, but once I had to go that far and do a complete setupof everything from scratch.   Maybe that wouldn't have been necessary if I really knew what I was doing.

Given what Eric reported and what I have seen, I would advise that if you are going to kill Owl, youshould in addition make sure that its ds9 doesn't keep running.   Kill the ds9 first if necessary,because if it is still running it will prevent Owl from initializing the next time.