Process Management and Monitoring Notebook - page 49 of 74

First PagePrevious PageNext PageLast PageTable of ContentsSearch

Date and Author(s)

Minutes of 5/1/02 phone meeting

Scalable Systems Software - Process Management Working Group Teleconference 
2/5/02 10:00 AM (PDT)                    
Working Group Chair:                    Paul Hargrove  
Meeting Minutes:                        Jason Duell 
Attendees ---------  
Paul Hargrove       LBNL  
Jason Duell         LBNL 
Eric Roman          LBNL  
Rusty Lusk          ANL 
Mike Showerman      NCSA 
Al Geist            ORNL  
Agenda ------ Progress reports  
Paul asked if the SSSLib authors were planning to add more  
Authentication/Authorization features:  Rusty wasn't sure, but said as far as he  
knew, the plan was still just for a basic challenge/response. 
Mike reported that he has an initial version of the node monitor daemon working.   
He's still looking into thresholds when you want data to be opaque.  The node  
monitor daemon for now uses only the /proc method to gather info--a SuperMon  
version is planned, and these will be swappable components. 
Checkpoint/restart:  Eric Roman is looking into the Japanese SCORE linux cluster  
system, which appears to have a modified, checkpointable MPICH.  The other  
features of their checkpoint system do not seem terribly desirable, though.  A  
student from the LAM MPI project will be arriving soon at LBNL to work on  
getting LAM to work with our checkpoint API. 
There is now a checkpoint mailing list at LBNL: it is a majordomo-based list at 
Rusty mentioned that he recently met with some tools people and discussed how 
our SCIDac components might be viewable by debug, etc., tools.  They seemed 
interested.  He also noted that work was going forward on coding the process  
manager, mainly by Ralph Butler. 
Paul proposed that we try to have a simply working demo at the next face-to-face 
showing the startup and monitoring of a job.  It was generally agreed that this  
was a good idea. 
Next meeting ------------ The next call is scheduled for 5/15/2 at 10:00 PDT.  
 To Attend: Long Distance users call 1-877-252-5250, Local users call  
press 1, enter TBD# and follow the instructions.   
TBD will be announced at meeting announcement.