Skip to main content
Version: 1.0.0

Usage

S3SPL allows for dynamic definition of time ranges and token replacements in the query and prefixes. This is done for both s3spladhoc and s3splquery.

S3 SQL Format

S3SPL allows you to query using S3Select without having to know the exact SQL Syntax. By default, the user only has to provide the WHERE statement of the SQL query. The SELECT, FROM and LIMIT statements are added automatically. For more information on the S3 SQL syntax, please refer to the AWS Documentation.

Limits

Multiple configuration options are available to limit the result returned by S3SPL. The options are configured on a bucket level.

  • Max Results: the maximum number of results returned by S3Select. The limit is soft applied over all files queried meaning that the result can exceed the limit if the limit is reached in the middle of a file.
  • Max Results per File: the maximum number of results returned by S3Select per file. The limit is hard applied leveraging the LIMIT statement of S3 Select, meaning that the result will never exceed the limit.
  • Max Files Read: the maximal number of files discovered by S3SPL. The limit is hard applied, meaning that the result will never exceed the limit. For time based prefixes, the discovery starts from the most recent prefix and goes back in time until the limit is reached.

Time Range

To query data in S3 similarly to in Splunk, the time range picker from Splunk searches can be used to add time constraints to the searched prefixes and SQL query.

Prefixes

Prefixes are generated based on the selected timestamp to increase the speed of queries and reduce the requests to the S3 API. To enable dynamic prefix parsing time-based tokens have to be present in the bucket configuration. A possible prefix with time-based tokens is:

/archive/${_time:%Y}/${_time:%m}/${_time:%d}/${_time:%H}/${_time:%M}/

A prefix can contain multiple time-based tokens. The tokens are replaced with the time range selected in the time picker - adapted to the time zone configured for the bucket. The allowed time-based tokens are:

  • Y, y Year with century as a decimal number.
  • m, B, b, e Month as a decimal number [01,12].
  • d, j Day of the month as a decimal number [01,31].
  • H, I Hour (24-hour clock) as a decimal number [00,23].
  • M Minute as a decimal number [00,59].
  • S Second as a decimal number [00,61].

Timestamp Field

To enable dynamic time range parsing the query has to be run with timestamp_used set to True and timestamp_field provided. The timestamp is added to the WHERE statement of the SQL query. S3SPL supports timestamps in integer / float (epoch) and string format. The timestamp is converted to the format required by S3Select automatically. To enable string based timestamps, the timestamp_format has to be provided and set to string.

Supported timestamp formats are:

DescriptionFormatExample
YearYYYY1997T
Year and monthYYYY-MM1997-07T
Complete dateYYYY-MM-DD1997-07-16
Complete date plus hours and minutesYYYY-MM-DDThh:mmTZD1997-07-16T19:20+01:00
Complete date plus hours, minutes and secondsYYYY-MM-DDThh:mm:ssTZD1997-07-16T19:20:30+01:00
Complete date plus hours, minutes, seconds and a decimal fraction of a secondYYYY-MM-DDThh:mm:ss.sTZD1997-07-16T19:20:30.45+01:00

Based on W3C Date and Time Formats

Replacement

Replacements can be used to utilize a single query for different cases. As an example. It is possible to replace a filter for a username, sourcetype or host dynamically. S3SPL supports replacements in the query and prefixes. Values that are replaced in a query have to be marked as such in surrounding the value with $. The replacements are provided in a key-value format.

Replacements are provided as a string of key-value pairs. The key is the token that is replaced in the query and the value is the replacement value.

key1=value1,key2=value2

Prefixes

While configuring the prefix of a bucket, you can include a replacement in the prefix. The replacement is marked by surrounding the token with ${ and }. Wildcards are not supported in the prefix and only one value per replacement is supported.

/archive/${_time:%Y}/${_time:%m}/${_time:%d}/${_time:%H}/${_time:%M}/${index}/${sourcetype}/

It is possible to use multiple replacements in a prefix and for a replacement to be inline with other text.

/archive/${_time:%Y}/${_time:%m}/${_time:%d}/${_time:%H}/${_time:%M}/index=${index}/sourcetype=${sourcetype}/

Query

For the query, the replacements are provided in a key-value format. The key is the token that is replaced in the query and the value is the replacement value. The replacement is marked by surrounding the token with $.

index=$index$ AND sourcetype=$sourcetype$

Extraction

S3SPL allows to extract fields from the prefix of a file read. This is mostly helpful for meta information as host, sourcetype or index. The extraction is done by including an extraction pattern in the prefix of the bucket configuration. The extraction pattern is marked by surrounding the token with ?{ and }. Extraction patters are only allows after replacements.

/archive/${_time:%Y}/${_time:%m}/${_time:%d}/${_time:%H}/${_time:%M}/?{index}/?{sourcetype}/
/archive/${_time:%Y}/${_time:%m}/${_time:%d}/${_time:%H}/${_time:%M}/${index}/?{sourcetype}/

Not valid extraction pattern due to wrong placement of the extraction pattern.

/archive/${_time:%Y}/${_time:%m}/${_time:%d}/${_time:%H}/${_time:%M}/?{index}/${sourcetype}/