Preparing the Monthly BOMB Report

Bomb Report

Please note that the bomb report is a summary of the Executive Summary Reports in the SAAZ Portal, it will probably take between 1-2 hours PER BOMB REPORT to complete correctly. The bomb report is an intelligent human’s summary of the entire computer generated data. You will be going through reams and reams of reports to actually compile the nicely summarized bomb report.

We call it the bomb report because the clients only have to look for bombs on the page and realize “Bombs are Bad”. So if there are no bombs, there is no risk to the client. This hardly ever happens though.

We also use this report to prove to the client that we are being proactive and offering value to them, so this report is of immense importance in keeping our MSP clients paying.

Also, I need to read the bomb report and UNDERSTAND what I need to discuss with the clients. Issues they may have and things that could be done better etc.

Finally, the report is a chance for us to remediate issues that you may pick up and report back on how it was fixed. Let us use Envirowaste as an example.

In the following pages we are going to cover a lot of reports that you need to go through and a lot of test that you need to perform. All these reports are computer generated. We need your intelligence to go through the reports, comment on the issues observed and set a plan in action to remediate the problems, either by sorting them out yourself or by assigning calls and tasks to people who can.

Server

The first thing that you have a look at in the portal is the quick access setting on the dashboard for the servers and the workstations. As you can see the most important things for the server is OK, Disk Space, Antivirus and Security updates. What would we do if they were not OK? We would start an immediate chat with online support to fix it for us. This way they could go on with this while we finish the report. Please always remember to check the ticket and resolution from the NOC.

clip_image002

We have a critical user impact alert; if you click on the Red Cross you will see the following: So the server has restarted 14 days ago. Why? I don’t know. This is water under the bridge.

clip_image004

Next problem is the Critical Non Impact alert. Click on the red cross and you will see:

clip_image006

Once the bomb report is done there should be no red crosses left!

A Paging file operation.  This is more important. Please go through the event log and AT tickets to see if this is a recurring issue. Please make some recommendations. You will see that the NOC give you some suggestions on how to fix the problem. Follow the suggestions. If you do not come right, assign it to the NOC to fix. Next, let’s check out the server’s backups. Take remote control of the server.

clip_image008

You’ll be asked for a username and a password for the server. You will find this in the NIF.

clip_image010

Immediately we can see that there is a problem with this server’s memory utilization. It only has 4GB and all of it is being used. Make a mental note of this. Click on Remote Control.

clip_image012

Open the SBS Console and click on Backup and Server Storage. You will see that the backup has failed. Reason? The backup drives are offline. Please log a call on Zendesk and email everyone at the office. This is unfortunate as we need to test a restore (just one or 2 files) to make sure that it is working.

clip_image014

Please note that we use different backup software at different sites. This should be in the NIF. If it isn’t, please let us know so that we can update it.

Desktops

Now let’s have a look at the Desktops.

clip_image016

There are a couple of problems here. Some PC’s Anti-virus signatures are not updating or do not have Anti-virus at all. Chris-XP is one of these. You can see that the PC is online and that there is no Anti-virus installed on it. Big Problem. Log a call for a technician to sort this out ASAP. Send a mail to all the technicians about this and CC Arno.

Secondly, Eliias1 does not have Anti-Virus. His PC is offline and the AV is not installed so it could be that this PC has been decommissioned. To check this, go to Configuration->House Keeping->No Contact(Desktop)->Site Name->Change the days to 30 and click on the magnifying glass.

clip_image018

You will now receive a report of all PC’s that have not contacted the Portal for the past 30 days. It is safe to assume that these PC’s are not on the network anymore. ELIAS1 is not in the list so it has contacted the Portal. Even if you change the days to 15, it still isn’t in the list so it has contacted the NOC in the last 15 days. Could it be that Elias is sick or that a virus infected his PC and screwed it up? Either way, log a call for a technician to phone the user and find out.

The yellow dots under Anti-virus means that this PC is running an unsupported or freeware Anti-virus. Please see if there is not a current call open for this. If not, log a call for this and send an email to everyone to have a look if there is a reason for this.

If there are any other problems such as the Patches not being rolled out or Smart HDD errors or free disk space is a problem etc a call needs to be logged and everyone in the team needs to be emailed. Always make sure that there is not a current open call for this.

Now go to S&CC->Desktop Monitoring Script Dashboard->And select the PC’s with exceptions thrown. The exceptions are for:

Memory Available less than 24 MB Memory Monitoring (Free MB) 220 134 / clip_image020
Monitor Workstation CPU more than 90 Percent CPU Monitoring (% Utilization) 220 134 / 9 clip_image020[1]
OS Volume less than 5 Percent Free Disk Space Monitoring (% Free Disk Space) 220 134 / 8 clip_image020[2]

clip_image022

As you can see there are 9 PC’s whose CPU utilization is running at more than 90% and 8 PC’s who have less than 5% free disk space. Let’s click on it to see if any of these PC’s belongs to Envirowaste.

clip_image024

clip_image026

We’re lucky; none of the PC’s at Envirowaste is taking strain. If there was a problem, you would need to log a call and email everyone in the Team. Please make sure that there isn’t a current call for this issue that is still open. If there is, escalate the call to the dispatch person. We would probably phone the user and ask them if we could have a look at their PC’s and then assign the call back to you to take over the PC and sort out the problem.

Network

The Network report is generated by doing a Speedtest on the server and also looking at the firewall log.

clip_image028

Please take note that we have various ISP’s. Telkom (as in this case) should be the unshaped bandwidth and failrly quick. Some clients can go up to 10 MBps but anything under 2 MBps is cause for concern and needs to be looked at. Upload speeds of less than 0.25 MBps is also a problem. Due to the fact that these clients use RDP to access their apps, there could be performance issues (40 kbps per session; max 6 sessions on this connection). Do the speedtest a couple of times to get an average.

If possible also do a pingtest (www.pingtest.net). You need Java enabled on the browser for this to work.

clip_image030

Do a couple of tests to get an average. Anything under a B is cause for concern and could influence network performance, especially VOIP apps like Skype as well as RDP sessions.

Disaster Planning

Simply put “No DRP Plan in Place”. We’ll complete this.

Security

Now let’s have a quick look at the firewall Check Firewall reports. We go through these reports to see if any users are abusing the system, if there are torrents being downloaded that hog the bandwidth etc. I am going to use Detect as an example as they use a lot of the advanced functionality found in the Untangle server.

Have a look at the WAN failover. Not all sites use this functionality. WAN Failover allows the Untangle server to failover to another WAN connection such as a wireless connection if the primary connection fails. If the site has this functionality, have a look if this is working first. You will see that there are 2 connections available. This implies that it is working. Click on “Settings” on the WAN Failover Module.

clip_image032

As you can see the External (Primary) link is up 91.6% (not very good, that is why we got the failover) and the DMZ or secondary connection is up only 53.7% of the time. This is really bad and not a very good failover solution. The client should consider changing the failover link due to its unreliability.

clip_image034

Close the Wan Failover Module and Click on the Reports Module’s Settings.

clip_image036

Click on “View Reports”

clip_image038

Get the whole month’s (30 days) report.

clip_image040

Copy all the data in the large red block. This will go directly into our Bomb Report. Now let’s have a look at some interesting figures. I will highlight the important data.

Platform scanned 33.30 GB and 1355346 sessions
Spam Blocker scanned 8027 messages and detected and processed 2889 spam messages
Phish Blocker scanned 8027 messages and detected and processed 21 phish messages
Spyware Blocker scanned 126005 web hits and blocked 2597 activities
Bandwidth Control analyzed 33304.83 MB
WAN Failover detected 348 WAN failures and saved the network from 249135.8 seconds of downtime
Virus Blocker scanned 134676 documents and detected and blocked 358 viruses
Intrusion Prevention scanned 556524 sessions and detected 0 attacks of which 0 were blocked
Protocol Control scanned 556524 sessions and detected 167375 protocols of which 1512 were blocked
Firewall scanned 556524 sessions and blocked 0 according to the rules

The most important stat to me is that WAN failover saved the company more than 249135 seconds (more than 70 hours) worth of downtime. Make a mention of this in the Bomb Report. Also mention any other facts that you may find interesting, such as the Spam messages that were blocked and viruses that were blocked etc.

Let’s quickly have a look if any of the users have been abusing the system. Click on Protocol Control and have a look if there are any weird protocols being used. At Detect someone is using a Bittorrent Client. This could cause huge amounts of traffic on the network. Make a note of this for the Bomb Report.

clip_image042

Finally go out of the reports section and click on Configà Email

clip_image044

Note that there are users that have large amounts of mails in their quarantine folder. They probably do not know how to empty their quarantine. Make a note of this.

clip_image046

General Reports

There are a couple of General Reports that need to be saved as well. I am using Detect as an example for these reports. I have found that these reports work better in Internet Explorer as there are some custom settings that you have to put in to make it work. I do not have documentation for Chrome or Mozilla.

In Internet Explorer, Click on Tools->Internet Options. Select Trusted Sites and Click on the Sites Button.

clip_image048

Make sure the following sites are added to the list of trusted sites

clip_image050

Also make sure that popup blocker is not blocking these sites.

In the Portal click on “Reports”

clip_image052

Select “Executive Summary”

clip_image054

Select the Word Document for the site as you may want to edit this Document Slightly.

clip_image056

The Executive Summary report is a couple of pages that we give to clients/executives so that they do not have to wade through reams of reports. Go through the report and format it so that everything fits nicely (resize graphics if they overflow to other pages etc). An example of the report is available to download here. My comments will be highlighted.

Join the forum discussion on this post

Leave A Reply (No comments So Far)

You must be logged in to post a comment.

No comments yet

Recent Forum Posts