Managing and Monitoring Data Collection > File Analytics Data Collector Policy
Version 9.1
File Analytics Data Collector Policy
File Analytics data collection can be configured in three different ways, depending on the location of the files you want to profile in your enterprise. Determine which of the following approaches is relevant for your environment:
File Analytics Data Collector Policy - This Data Collector traverses the network CIFS shares to collect and categorize filesystem storage consumption metadata. This highly optimized traversal profiles unstructured data to enable you to identify storage that can be reclaimed. Use this for any CIFS share that is serving up files that you want to profile, regardless of manufacturer of the appliance.
NetApp Data Collector - File Analytics Activation - Use this option if you have NetApp storage systems for which you want to collect and profile files. This is the preferred method for CIFS collection on NetApp appliances.
Host Inventory - File Analytics Probe - This data collection option directly interrogates hosts’ attached storage to profile files.
File Analytics Prerequisites
NetApp Systems
NetApp Systems: NetApp ONTAP version 7.3.3 or later.
NetApp ONTAP I2P (inode to pathname) must be configured.
NetApp User with API Privileges (see Creating a NetApp User with File Analytics API Privileges)
When collecting from large NetApp Systems (greater than 1 PB):
When collecting CIFS shares that are located at a site that is not local to the Data Collector server, a separate Data Collector is recommended for the remote site. This mitigates access issues associated with WAN traversal.
Creating a NetApp User with File Analytics API Privileges
To create a new user, with the required privileges, on a NetApp system, use the following command-line interface (CLI) steps. For the role command, do not include a space after the comma.
filer> useradmin role add apifarole -a login-http-admin,api-*
filer> useradmin group add apifagroup -r apifarole
filer> useradmin user add apifauser -g apifagroup
If api-* does not meet your security requirements, additional File Analytics privileges can be configured using the following steps:
filer> useradmin role add apifarole -a api-volume-list-info,api-nfs-exportfs-list-rules,api-cifs-share-list-iter-start,api-cifs-share-list-iter-next,api-cifs-share-list-iter-end,api-snapdiff-iter-start,api-snapdiff-iter-next,api-snapdiff-iter-end,login-http-admin
filer> useradmin group add apifagroup -r apifarole
filer> useradmin user add apifauser -g apifagroup
CIFS Shares
The Data Collector must be running a Windows operating system that is at least Windows server 2003. This Data Collector can collect both Linux and Windows shares.
The Windows LAN Manager authentication level, in the local security policy security options, must be modified to: Send LM & NTLM - use NTLMv2 session security if negotiated. This allows the Data Collector to invoke the net use command with the password supplied on the command line. Without this setting, later versions of Windows will terminate with a system error 86 (invalid password).
Windows CIFS Shares collection requires the Windows Domain User ID. This User ID must have Administrative privileges.
Linux CIFS Shares collection requires super-user root privileges. Access control commands, such as sudo, sesudo, and pbrun are also supported. If using any of the access control commands, verify that the User ID has sudo, sesudo, or pbrun privileges.
Collection of owner data for Windows and CIFS is configurable via Advanced Parameters. Data collection completes faster when owner data is not collected. You can configure an Advanced Parameter (FA_RESOLVE_OWNERS=N) to disable owner collection. To access Advanced Parameters in the Portal, select Admin > Data Collector Configuration > Advanced Parameters.
File Analytics Shares Data Collector Policy
One of the three types of data collection that can be configured for File Analytics is collection of CIFS shares. The Data Collector will take the configuration that you specify, including the share names and credentials, and then traverse the filesystem structure to identify these shared resources on your network and collect the relevant metadata.
1. Click Add and select File Analytics Share.
2. Enter or select the parameters. Mandatory parameters are denoted by an asterisk (*):
Sample Value
The domain identifies the top level of your host group hierarchy. The name was supplied during the installation process. All newly discovered hosts are added to the root host group associated with this domain. Typically, only one Domain will be available in the drop-down list.
If you are a Managed Services Provider, each of your customers will have a unique domain with its own host group hierarchy.
To find your Domain name select Admin > Hosts and Domains > Domains.
Enter a name that will be displayed in the list of Data Collector policies.
Click the clock icon to create a schedule. Every Minute, Hourly, Daily, Weekly, and Monthly schedules may be created. Relative schedules are relative to when the Data Collector is restarted. Advanced use of native CRON strings is also available.
Examples of CRON expressions:
*/30 * * * * means every 30 minutes
*/20 9-18 * * * means every 20 minutes between the hours of 9am and 6pm
*/10 * * * 1-5 means every 10 minutes Mon - Fri.
Click Add to configure the CIFS shares that the collector will probe.
Note that the Import button in this window enables bulk loading of CIFS shares. See Importing the CIFS Share Configuration.
3. Enter or select CIFS shares configuration parameters in the File Analytics Shares window.
Sample Value
Enter the host IP address or host name for the device that is being probed for CIFS shares. This also could be a non-host device, such as a NetApp array.
Enter the name of the CIFS share that the Data Collector will probe.
CIFS is currently the only option.
Click either Anonymous or Use Credentials.
If you are using credentials, click Add to configure the CIFS share credentials, or select an existing credential definition and click Edit.
4. Enter credentials in the Credentials window.
Sample Value
Assign a name to identify the set of credentials that you are defining.
Enter the login account name used to log in to the hosts. If the policy includes a group of Windows hosts, use the Windows Domain user ID. This user ID must have administrative privileges.
For Linux hosts, super-user root privileges are required. You also could use an access control command, such as sudo, sesudo, or pbrun. If using any of these access commands, ensure that the user ID has sudo, sesudo, or pbrun privileges. Some enterprises prefer to create a new user and provide access to commands via an access control command.
Enter a note to help identify the credential.
Enter the password for the account
OS Type*
Select either Windows, Linux, or NAS
Windows Domain*
For Windows hosts only: Specify the Windows domain name. If the host is not a member of a domain, or to specify a local user account, use a period (.)
Private Key File
For Linux hosts only: If you have configured a Public/Private Key between your Data Collector server and the hosts you intend to monitor, specify the location of the Private Key file on the Data Collector Server.
Known Hosts File
For Linux hosts only: If you have configured a Public Key/Private Key between your Data Collector server and the hosts you intend to monitor, specify the location of the Known Hosts file on the Data Collector Server.
5. Click OK to close and save the configuration in each window.
Importing the CIFS Share Configuration
The import feature facilitates entry of a large number of CIFS shares. Simply paste the details in comma-separated format into the window and click OK.
Data Format:
server, share, protocol (CIFS), credential name
The Credential Names, already configured for the current Domain, are displayed at the top of the window.
NetApp Data Collector - File Analytics Activation
Verify prerequisites (particularly, the NetApp User API role configuration) in File Analytics Prerequisites.
Host Inventory - File Analytics Probe
Using Capacity Manager Host Resources data collection, hosts are discovered and added to the Host Inventory. Once a host is listed in the inventory, it can be selected and the File Analytics probe can be configured.