geantcampus » start page » gc_toolbox

UNINETT's monitoring toolkit

UNINETT's monitoring toolkit is shipped as a special purpose server with a tailor-made and to the point set of network monitoring and management applications. UNINETT has since the start of the GigaCampus project in 2006 provided this toolkit to the public university and university collage community in Norway. The servers are physically placed on the campus network they are set to manage. Setup and management of the servers is provided by UNINETT. The local IT staff is offered a web interface to access the tools (SSH login is required in some cases, i.e. TFTP configs).

The first server was set into operation in September 2006. Currently (August 2014) 32 servers are in operation. The toolkit concept has gradually evolved and advancements have been made to the initial set of applications and new applications have been added. The hardware requirements are dependent on size of the campus that is monitored, more details below.

UNINETT's monitoring tookit has contributed significantly to the success in the campus networking area in Norway and has proven to give significant added value to Norwegian campus networks. In Norway NAV and Appflow are more than tools. They have created a community of campus network engineers discussing future enhancements and sharing ideas on how to better manage their networks.

The tools in the toolkit

UNINETT's monitoring toolkit evolves as time passes. As of August 2014 the following tools are installed:

  • The network management system NAV (the most comprehensive tool in the toolkit)
  • The Netflow analysis tool NfSen (including NfDump)
  • Application Recognition with Appflow (requires UNINETT's measurement probe in addition)
  • The service monitor Xymon (previously Hobbit / Big Brother)
  • TFTP setup with RCS revision control for switch and router configuration archive
  • Firewall Builder for managing access lists
  • Syslog server (for logging from network gear)
  • A Radius-based authentication service for routers and switches

A closer description of the tools will follow.

NAV

The network management system NAV

NAV (Network Administration Visualized) is a free software package suitable to monitor large campus networks. NAV has a long history in the Higher Education sector in Norway. Initial development started at NTNU in 1999. Since 2006 UNINETT has been in charge of development. NAV is open source and has a large user base around the world.

Highlights are:

  • Customizable dashboard
  • Extensive statistical overviews
  • Configuration on the fly
  • Full traceability of users and equipment
  • Device and vendor agnostic

The main features of NAV are:

  • A PostgreSQL topology database that has an up-to-date view of the topology of the campus network with the pairing of routers, servers, wireless access points and servers. The topology discovery process uses ARP/IPv6 neighbor cache, bridge table, LLDP and CDP data.
  • Graphical visualization of the network with network maps (geographical and topological).
  • Reports that list your network equipment with software version, consumption of IP prefixes and addresses, configuration of router and switch ports, and more. A device dashboard that shows updated port/interface status. A room dashboard that shows all equipment in a given room and let you upload images to document.
  • A machine tracker with historical data that tracks the whereabouts of end user devices in your network. Supports both IPv4 and IPv6. Displays DNS names and Netbios names when available. An IP address information page displays all information NAV has collected on a given IP address. The L2 traceroute tool shows the layer 2 path between two nodes. The Macwatch service sends an alarm when a given MAC address appears on your network.
  • A Status Monitor detects outages of components in your network. This includes alarms on defect power supplies and fans, as well as alarms from UPSes and environmental monitors.
  • Traffic statistics are collected from all router and switch ports. CPU, memory and environmental counters as well. For this NAV uses the highly scalable third-party real-time graphing system Graphite. NAV complements Graphite with a threshold monitor that will trigger on configured thresholds in Graphite's database Whisper and generate alarms.
  • A flexible alert tool where networks admins can manage their alert profiles and subscribe to alarms of their interest. Supported alert channels are email, SMS and Jabber. For robustness UNINETT includes a mobile phone / GSM device directly attached to the USB port on the toolkit server. Thus if the network is out you will still get your SMS alarms.
  • A service monitor that monitors services. A number of services are supported, including SSH, HTTP, IMAP, POP, SMTP, SMB, RPC, DNS and DC. Note: The toolkit server also has Xymon installed. Xymon is a more comprehensive service monitoring tool. NAV's key strength is on the network.
  • A tool for configuring switch port vlan and description values. Uses SNMP write in the back-end.
  • A detention system that allows the network admin to quarantine or block a machine off the network, i.e. in cases of security breaches or similar (also SNMP write based).
  • A Radius accounting tool that lets you search Radius accounting logs. Typical use cases are tracking eduroam users or other IEEE 802.1X authenticated users.
  • NAV authentication supports LDAP and AD. A flexible authorization scheme lets the NAV admin limit user's access to certain tools.

Read more about NAV. For a quick start download a virtual appliance of the latest NAV version.

NetFlow/IPFIX Portfolio

NetFlow data supplements the data NAV collects. NetFlow provides an overview of all sessions in the network, i.e. who is talking to whom, from/to what IP address and what port (TCP/UDP). NetFlow data can be exported from Cisco routers, including the Catalyst 6500, Catalyst 4500 and Cisco Nexus platforms.

IPFIX is an IETF standard that is similar to NetFlow v9. Many vendors support IPFIX, i.e. Juniper. Nfdump (see below) also supports IPFIX.

The following components are included in the Netflow portfolio:

  • Nfdump: Nfdump is a collection system that receives Netflow/IPFIX data from routers and stores this using a compressed file format. Command line tools are provided to search and retrieve data of interest.
  • NfSen (Netflow Sensor): NfSen is a web tool that makes your Nfdump data more accessible. You can search through the web interface instead of using the command line. NfSen also provide trend statistics and stores these data in RRD files. It is possible to create custom filters that generate alarms when a configured threshold is exceeded. Detection of mail spammers is an example use case.

On the toolkit server NfSen is integrated with NAV and can be accessed (and restricted!) through NAV's tool list.

Application Recognition with Appflow

Appflow provides trend statistics for traffic in and out of the campus network. You get an overview of application distribution in a given time period. You can also see the distribution between IPv4 and IPv6 traffic and traffic pattern based on sender / destination AS and geographical location.

Appflow requires a measurement probe from UNINETT. The measurement probe uses a passive network card that listens to all traffic going in/out of the campus network to/from the research network. Appflow looks into all packets and detects the applications in use. This is done by looking deeper into packet headers than just port numbers. Appflow data is exported in an extended IPFIX format and stored in a PostgreSQL database. Appflow's front-end is integrated with NAV on the toolkit server (in the same way as for NfSen).

Appflow is open source and developed by UNINETT.

Service monitoring with Xymon

(Xymon was earlier named Hobbit, before that Big Brother).

Xymon monitors your hosts, your network services, and anything else you configure it to do via extensions. Xymon can periodically generate requests to network services - HTTP, FTP, SMTP and so on - and record if the service is responding as expected. You can also monitor local disk utilization, log files and processes through the use of agents installed on the servers.

All of the monitoring results are collected by the toolkit server and used to build a set of webpages that show the status of your network, with drill-down functionality to check up on problems. Xymon will also record the history of each monitored item, so you can generate availability reports and check on the incidents that have occurred. Wherever possible data is also stored for trend analysis and presented as graphs, so you can easily track e.g. the response-time of a business-critical web application over time.

As mentioned NAV has a built-in simple service monitor that may be sufficient for your needs. Xymon is a more comprehensive solution. An integration between NAV and Xymon is implemented on the toolkit server.

Note: For in-house server monitoring purposes UNINETT is currently moving away from Xymon to Zabbix. The continued support for Xymon on the toolkit servers is uncertain in the long run.

Configuration archive for network equipment

The monitoring toolkit server includes a TFTP server for storing router and switch configurations. The setup currently includes:

  • A TFTP server with security wrapper preventing unwanted reads or writes from/to the configuration archive
  • Revision control with RCS that gives you history of all config changes
  • A nightly cron mail reports on last day's config changes in your network
  • Dedicated config files for access lists, also with RCS
  • Support for router / switch software upgrades using TFTP or SCP from the toolkit server

There are discussions on extending this service with cron-based scripts to do automatic config writes (using SNMP write or Netconf). A further enhancement could be a framework for bulk config updates to a set of routers/switches, either using Expect, SNMP write or Netconf. Yet an alternative could be to introduce RANCID on the toolkit server.

Firewall Builder for managing access lists

Firewall Builder is open source software for administration of access lists / firewall rules. It is an X application installed on the toolkit server. There are several benefits of using Firewall Builder. You can easily reuse rule sets and the burden of maintaining both IPv4 and IPv6 access lists becomes a lot smoother.

Firewall Builder supports Cisco IOS and Linux iptables. UNINETT has extended the support to include Juniper and Cisco NX-OS. UNINETT is also working on providing a web interface as an alternative to X.

Syslog Server

A syslog daemon is setup on the toolkit server so that you can send and store syslog messages from your routers and switches. The syslog log files are rotated.

Note that UNINETT now (2014) looks at a powerful log analysis solution based on Logstash / Elastic search / Kibana.

Router/Switch Authentication Service

The toolkit server includes a Radius server where you can maintain a list of users (with passwords) that are granted access to your network equipment. Moving away from simple line password authentication is highly recommended. With user authentication on your equipment it is possible to see who has made ​​what configuration change. It is also easier to remove access when people quit.

Tool integration

The monitoring toolkit takes advantage of NAV's portal capabilities. It is easy to integrate other tools with NAV and make these available through NAV's toolbox. Thanks to NAV's flexible authorization mechanisms a NAV admin can control what tools a given user should be granted access to. The monitoring toolkit also takes advantage of NAV's possibility to act as a common alarm central.

Tool integration on the monitoring toolkit

The operational concept

UNINETT NOC has the operational responsibility for the monitoring toolkit servers, providing 24×7 management. Operations is done according to UNINETT's security policy. Specific security procedures for the toolkit servers and the measurement probes are defined.

The toolkit is offered as a UNINETT service to all 33 public universities and universities collages in Norway. The service includes the initial server purchase (and later upgrades/replacements), initial setup and set into operation, and the continuous need for maintenance/management of the servers themselves. This includes OS/package upgrades, including upgrades of NAV, NFSen, Xymon etc.

The toolkit servers run Debian linux. They are 1U rack-mountable servers. Currently (August 2014) the shipping server specification is:

  • HP ProLiant DL360 Gen8 with single Intel Xeon E5-2620 v2 six-core CPU @ 2.1 GHz
  • 16 GB memory
  • 4x 600 GB SAS disk 10K RPM in a RAID 10 configuration
  • Redundant power
  • iLO for out of band management console

For larger institutions (1000 nodes, 40000 ports) a cluster of two or even three servers are used. NAV uses Graphite for storing metrics (traffic counters and more) and this is IO-intensive. SSD disks for Graphite should be considered, either on an SSD RAID partition in a single server setup or on a dedicated Graphite server. In a three-server setup the PostgresSQL database should separated to a dedicated server.

UNINETT incorporates the toolkit servers as a group in our centralized management setup that also encompasses other sets of servers, including UNINETT's measurement probes. The management tool CFEngine is used. CFEngine eases the process of updating and maintaining a large set of almost equivalent servers.

UNINETT has spare servers that are swiftly shipped in case of breakdowns. CFEngine and backup enables us to fairly quickly get the replaced server back in operation (CFEngine will be replaced by Puppet in the near future).

For all the tools that are included in the toolkit UNINETT maintains Debian packages in our centralized package archive. Package maintenance is in some cases done by UNINETT, in other cases by others.

Institutions that prefer to run their own servers can access the UNINETT package archive (given that the servers run Debian). For larger tools like NAV other distributions are also available, including a virtual appliance.

User community

User documentation for the provided tools are available online. There is also an active mailing list for questions and discussions, and UNINETT has an annual workshop for the community presenting news and enhancement and requesting feedback from the users. For NAV there is in addition a reference group consisting of power users from 5 universities/university colleges. This group has a high degree of influence on the future development of NAV.

 
 
geantcampus/gc_toolbox.txt · Last modified: 2014/08/11 13:24 by faltin@uninett.no

Viktig melding: openwiki.uninett.no

UNINETT OpenWiki er under utfasing. Wikier som er lite brukt er satt i kun-lese-modus. Ta kontakt med UNINETT for å åpne for skrivetilgang ved behov.

Group memberships: no groups