SharePoint monitoring options

 

Setting Value Notes
Event Log Flooding Protection Disabled The default value is Enabled. It can be disabled to collect as much monitoring data as possible. For normal operations, it should be enabled.
Timer Job Schedule
Microsoft SharePoint Foundation Usage Data Import 5 minutes The default value is 30 minutes. Lowering this setting imports the data into the usage database more frequently, and is especially useful when troubleshooting. For normal operations, it should be 30 minutes.
Diagnostic Providers
Enable all diagnostic providers Enabled The default value is Disabled except for the “Search Health Monitoring – Trace Events” provider. These providers collect health data for various features and components. For normal operations, you may want to revert to the default.
Set “job-diagnostics-performance-counter-wfe-provider” and “job-diagnostics-performance-counter-sql-provider” Schedule Intervals 1 minute The default value is 5 minutes. Lowering this setting can poll data more frequently, and is especially useful when troubleshooting. For normal operations, it should be 5 minutes.

 

Generic Performance Counters

Performance Counter Description
Processor You should monitor processor performance to ensure that all processor usage does not remain consistently high (over 80 percent) as this indicates that the system would not be able to handle any sudden surges of activity. And that in the common state, you will not see a domino effect if one component failure will bring the remaining components to a malfunctioning state. For example, if you have three web servers, you should make sure that the average CPU across all servers is under 60% so that if one fails, there is still room for the other two to absorb the additional load.
Network Interface Monitor the rate at which data is sent and received via the network interface card. This should remain below 50 percent of network capacity.
Disks and Cache There are several logical disk options that you should monitor regularly. The available disk space is important in any capacity study, but you should also review the time that the disk is idle. Dependent on the types of applications or services that you are running on your servers, you may review disk read and write times. Extended queuing for write or read function will affect performance. The cache has a major effect on read and write operations. You must monitor for increased cache failures.
Memory and Paging File Monitor how much physical memory is available for allocation. Insufficient memory will lead to excessive use of the page file and an increase in the number of page faults per second.

 

System Performance Counters

Counter Description
% Processor Time This shows processor usage over time. If this is consistently too high, you may find performance is adversely affected. Remember to count “Total” in multiprocessor systems. You can measure the utilization on each processor also, to ensure balanced performance between cores.
– Avg. Disk Queue Length This shows the average number of both read and write requests that were queued for the selected disk during the sample interval. A bigger disk queue length may not be a problem as long as disk reads/writes are not suffering and the system is working in a steady state without expanding queuing.
Avg. Disk Read Queue Length The average number of read requests that are queued.
Avg. Disk Write Queue Length The average number of write requests that are queued.
– Pages/sec This counter shows the rate at which pages are read from or written to disk to resolve hard page faults. If this increases, it indicates system-wide performance problems.
– Available Mbytes This shows how much physical memory is available for allocation. Insufficient memory leads to excessive use of the page file and an increase in the number of page faults per second.

Optimizing Search Service Application

  • By default all components installed and running on one server
  • Use following steps to manage and optimize Search topology

–Start Search service instance on additional server

–Obtain the current search topology from existing server

–Clone the current search topology on existing server

–Modify search components within topology as needed including; adding, moving, and removing Search components

–Activate new Search topology

  • Steps to manage Search components

http://technet.microsoft.com/en-us/library/jj862354.aspx

 

Using PowerShell to Obtain Correlation ID Error

  • Open SharePoint PowerShell as Administrator
  • Use Merge-SPLogFile –Path “C:\Error.log” –Correlation “ID here

–Cmdlet grabs the ULS logs from all the servers in SharePoint farm

–Populates C:\Error.Log file with all instances of Correlation ID

  • Open C:\Error.log file with ULS Viewer to see error associated with correlation ID

 

Analyzing IIS Logs using Log Parser

  • Download and install Logparser or Log Parser Studio

https://gallery.technet.microsoft.com/Log-Parser-Studio-cd458765

  • Within Log Parser Studio, locate log file of interest in log directory

–Default location is %systemdrive%\inetpub\logs\LogFiles

  • Execute pre-build query to view log file, or create new query

Can export Log Parser query to PowerShell script

 

Troubleshooting Claims Authentication

1.Obtain details of failed authentication attempt

1.Scour the log or use ULS Viewer to view log file

2.%CommonProgramFiles%\Microsoft Shared\Web Server Extensions\15\LOGS

2.Verify Authentication Configuration on Web App or zone

3.Check other possible issues

1.Member of same domain or member of trusted domain

2.Type nltest /dsgetdc: /force on client computer to connect to DC and on SharePoint Server to verify it can connect to DC

4.Use Web Debug tool (Fiddler, HTTPWatch) to monitor and analyze Web traffic

5.Capture and analyze authentication network traffic

http://technet.microsoft.com/library/jj906556.aspx

 

One-Stop Shopping of Best Practices

 

System performance counters

http://technet.microsoft.com/en-us/library/ff758658.aspx