Classification abstraction

Nucleuz Classification Engine

The Nucleuz Classification Engine is a software development kit (SDK) which can be integrated into your product or service to add classification or DLP capabilities. The full set of Nucleuz's Classification/DLP policy templates and Classification Definitions can also be incorporated to get a jump on some of the most complex parts of classification/DLP.

Value

Enhance a product or service to gain valuable insight into data and help to comply with laws and regulations.

  • What sensitive information does my organization have?
  • Where is the sensitive information being stored?
  • Is sensitive information being exchanged in compliance with government/corporate requirements?

Matching Features

Pattern-based Classification

The Nucleuz Classification Engine can use Nucleuz-provided Policies and Classification Definitions as well as custom versions of both for content-based classification.

  • Built-in patterns/functions, Checksum validation
  • AND/OR/NOT
  • Min/Max counts
  • Nesting
  • Regular expressions
  • Terms/Keywords/Dictionaries
  • Case sensitivity
  • Proximity
  • Confidence levels

Metadata Matching

Any type and quantity of Metadata elements can be additionally incorporated into the matching criteria.

  • Date/Time/Number-based elements
  • Word/String-based elements
  • Fully custom elements
  • Sender/Author/Owner/Recipient(s)/etc
  • Document Size
  • Date Created/Last Modified/etc
  • etc...

Exact Data Matching (EDM)

When pattern-based matching is not accurate enough, specific values can be utilized with a feature called Exact Data Match (EDM). This can reduce false positives by matching discrete organizational data.

Common scenarios:

  • Detecting specific data sets from a database (ex. customer records, employee records, etc).
  • Detecting one or more rows in large quantities of data (10 million records or more).
  • Detecting combinations of fields.

The Nucleuz Classification Engine can perform EDM matching in combination with pattern-based matching for even more coverage.

Document Similarity

Document Similarity offers the ability to find documents which are similar without development of a specific classification rule. Instead, the interested document is submitted as a template and the Document Similarity feature extracts relevant aspects for subsequent comparison with other documents.
Document Similarity can detect content which may have een altered.

Common scenarios:

  • Detecting internal document templates like patent prep forms, financial forms, etc.
  • Detecting corporate forms sensitive to distribution like legal and corporate reporting.
  • Detecting a form with or without the fields populated.
  • Detecting a sensitive section of a document being included in other content.

The Nucleuz Classification Engine can perform Document Similarity matching in combination with pattern-based and EDM matching to cover even more scenrios.
For example: Document Similarity to detect an employee form combined with one or more classification rules to detect employee information like Name, SS#, or Employee ID.

Integration & Execution Platforms

Owing to being very lightweight and flexible, the Nucleuz Classification Engine is available to run in nearly any modern environment.

  • Windows | Linux | MacOS
  • 32-bit & 64-bit
  • C, C++, .NET/C#, Java, Python APIs
  • Multi-Process & Multi-Thread safe
  • Multi-Tenant support

The standalone Nucleuz Classification Engine library can be used in nearly any environment.
Some examples from our customers (licensor develops any of these as needed for their application):

  • Web service
  • Cloud computing platform (AWS, Google Cloud, Microsoft Azure, etc)
  • Server, Endpoint, Browser extension
  • Container (Kubernetes, Amazon ECS, Azure Container Instances, Google Cloud Run, etc)
  • Standalone Single- or Multi-threaded process
  • Shared Service

Hosting Model

The Nucleuz Classification Engine is a library that runs entirely inside the host process. It can run directly in the application or integrated into a separate shared service, web service, etc.

The Engine is very lightweight. It's composed of only a handful of libraries, all of which have minimal external dependencies. In fact, there's nothing to "install" on the system–just copy the files and it's ready to go.
Even more, there are no web services or data transfer required.
Admin/root privileges are not even required!

All security and access is governed by the host process (application).
This greatly reduces and simplifies the security profile.

Performance

Once initialized, classifying with the Nucleuz Classification Engine is a CPU-bound operation. Since it's Multi-Process & Multi-Thread safe, it easily scales nearly linearly up and out, gated only by hardware.

The Nucleuz Classification Engine performs all classification locally; no data is sent over the network. This means there's no latency, limitations, or costs associated with networking.

While the throughput depends largely on the policy/policies used and content evaluated, it's capable of classifying over 200MB/sec per thread on modest hardware.

Contact us for more information.