Skip to main content
Version: 1.0.0

Workflow

Dataflow

Ingestion

  1. An application or the OS generates log data on an endpoint
  2. A Splunk instance (either a universal forwarder or a heavy forwarder) reads the data and sends it to Cribl Stream
  3. Cribl Stream receives the data and routes the original data to an archive destination
  4. Cribl Stream further routes, reduces, and transforms the data and sends it to an arbitrary number of destination
warning

Double check that Cribl Stream sends all data to the data lake!

Discovery

  1. A Splunk user creates a discovery job using the Discovery Manager dashboard available in Discovery > Discovery Manager
  2. The dashboard leverages the criblsearch command and writes a new entry into restream_discovery_jobs
  3. The Splunk internal scheduler executes the cribldiscovery modular input
  4. The modular input reads the job information from restream_discovery_jobs
  5. The modular input starts a discovery job for each entry at the configured Cribl Stream instance
  6. After completing all jobs, the modular input fetches all results
  7. For each result, the job checks if the file was already replayed
  8. The modular input stores the results into cribl_discovery_results
  9. The modular input moves the enriched job information from restream_discovery_jobs to cribl_discovery_inventory

Replay

  1. A Splunk user creates a replay job using the Discovery Results dashboard available in Discovery > Discovery Results
  2. The dashboard leverages the criblqueue command and writes all files for replay into restream_replay_jobs
  3. The Splunk internal scheduler executes the restreamplay modular input
  4. The modular input reads the job information from restream_replay_jobs
  5. The modular input reduces the jobs by grouping them by queue_job, index, sourcetype, and host if cribl_use_optimized is set to true
  6. The modular input starts a replay job for each optimized / unoptimized job at the configured Cribl Stream instance
  7. The modular input waits for all jobs to finish
  8. The modular input splits optimized jobs into their original state and assigns the shared cribl_replay_job id to all of them
  9. The modular input moves done jobs from restream_replay_jobs to restream_replay_results