Refine Categories

Update category details and event mappings to improve event categorization within an attribute.

Prerequisites

This guide assumes you've uploaded some data and created your first attribute.

What is Category Refinement?

Each attribute is broken down into categories. For example a Topic attribute might break down into Software Engineering, Creative Writing, and Information Retrieval categories.

At attribute creation time, you have the option to create categories yourself, let Tidepool suggest categories for you automatically, or start with the default categories.

These categories group together related data, and allow you to add a new dimension of structure to text interactions between your users and your models.

As a part of finalizing an attribute and applying it to your entire dataset, the category refinement process enables you to both:

  • Discover categories in your dataset and track them over time,
  • Ensure events are categorized accurately and provide feedback on any mistakes.

Refining Categories in Tidepool

Open the category refinement workflow from the Discover tab and select an Attribute in the Refining stage from the left hand sidebar.

Understanding the Refinement Workflow Layout

Reference the annotated screenshot below for the five main features of the refinement workflow.

Category refinement workflow sections.

Category refinement workflow sections.

1. Event Embedding Plot

Each datapoint on the plot represents an event in your dataset. Events are colored by cluster and plotted by their embedding values. Semantically similar events should be near each other on the plot, allowing you to explore related events easily.

Select events directly on the plot using shift + click to lasso a region. Once selected, you inspect the events in a table by pressing V or assign the events to a category by pressing C.

The plot will update as your search or interact with other filters on the screen.

2. Cluster Filters

Tidepool automatically clusters together related events in your dataset and assigns descriptions to each cluster.

Clusters are a good starting point for understanding themes in your dataset - typically you'll turn one or more related clusters into a category.

Select individual clusters from the list, or click and drag along the clusters to bulk select.

Use the hierarchy slider to change the granularity of the clustering - slide right for more specific clusters, slide left for more generic clusters.

3. Category and Metadata Filters

Use the category and metadata filters to filter the plot to only the events you care about.

Use the three dot menu next to a category name to edit existing categories.

4. Status Information

Each event is automatically assigned a status. By default events will be listed as either Auto Categorized or Uncategorized. As you review events and optionally manually assign them to a new category, their status will update to User Categorized.

Use the status filters to update the plot to show events that you've categorized manually, events that you want to review, or events that need to be categorized for the first time.

5. Category Refinement Controls

The refinement controls allow you to:

  • view events you've lassoed on the plot by pressing V. This will bring up a table showing the event detail, as well as the current category assignment and cluster metadata.
  • assign any events to a new category by pressing C. Choose an existing category, or create a new category by typing a category name into the entry box.
  • regenerate updated embeddings based on any changes you've made to the categories or event -> category assignment.
  • finalize the attribute, locking in the category definitions and running categorization across your entire dataset.

Running a Refinement Cycle

As an example, a step-by-step refinement cycle in Tidepool might look like:

  1. Select one or more clusters of interest from the right hand legend,
  2. Lasso to select the events in those clusters,
  3. Open the table view (V) to inspect the content and confirm the event text matches your expectations,
  4. Assign the events to a new or existing category (C) that you'd like to track,
  5. Repeat the process with other clusters of interest until you've created all the categories you're interested in, or examined enough of the data to be satisfied with the automatically generated results.
Creating a new Software Engineering category as a part of refining the Topic attribute.

Creating a new Software Engineering category as a part of refining the Topic attribute.

Regenerating vs Finalizing an Attribute

In the bottom right hand corner of the category refinement screen, you'll see the option to either Regenerate or Finalize the attribute.

You should think about regenerating the attribute when you:

  • have assigned more than a few dozen events to categories manually,
  • have created or modified the list of categories significantly.

Tidepool will take those changes into account, and update the automatic assignment of events to categories - saving you significant manual work and providing a higher quality model for the attribute.

You should think about finalizing the attribute when you:

  • have regenerated the attribute recently to account for your latest changes,
  • have defined the complete set of categories you're interested in for the attribute,
  • have reviewed enough of the event -> category assignment examples that you're comfortable with the quality of the results.

Finalizing the attribute will lock in the model for the attribute, and automatically run categorization for your entire dataset (including any new events you ingest in the future).


What’s Next

Now that you have a finalized attribute, read more about analyzing your events or sessions over time by category.