EraQL Language Reference

EraQL is a functional query language designed for data transformation and analysis. It combines SQL-like operations with functional programming concepts and a rich set of built-in functions.


Core Syntax

Data Sources

Table

Connect to existing database tables (must be followed by a Select):

Table(
  {
    name: 'users',
    schema: 'analytics'
  },
  {
    created_at: timestamp,
    user_id: string,
    username: string,
    email: string
  }
)
  |> Select({
    id: 'user_id',
    name: 'username',
    email: 'email',
    created: 'created_at'
  })

From

Alternative syntax for defining data sources:

From({
  name: 'orders',
  schema: 'sales',
  attributes: {
    customer_email: string,
    order_id: int,
    order_date: timestamp,
    status: string
  }
})

Pipeline Operator

Use the pipe operator |> to chain operations:

data |> Where(condition) |> Select(fields) |> Aggregate(grouping, calculations)

Data Types

Basic Types

  • string: Text data
  • int: Integer numbers
  • float: Floating-point numbers
  • boolean: True/false values
  • timestamp: Date and time values

Complex Types

  • ARRAY(type): Arrays of elements
  • Records: { field1: type1, field2: type2 }

Type Operations

null_of

Create null values of specific types:

null_of('int') // null integer
null_of('string') // null string
type_of(null_of('string')) // returns 'string'

type_of

Get the type of any value:

type_of(1) // 'RANGE(1, 1)!'
type_of('hello') // "ENUM('hello')!"
type_of(true) // 'boolean!'

tag

Tag values with specific types:

tag(1, 'float') // Tags integer as float type
type_of(tag(1, 'float')) // 'RANGE(1, 1)! @float'

Query Operations

Selection and Projection

Select

Transform and project columns:

// Basic selection
data |> Select({
  id: "order_id",
  name: "customer_name",
  created: "order_date"
})

// With transformations
data |> Select(|t| {
  ...t,  // Spread existing fields
  id: t"order_id" |> cast('string'),
  full_name: concat(t"first_name", ' ', t"last_name")
})

// With distinct option
data |> Select({ id: "user_id", name: "username" }, { distinct: true })

Derive

Add computed columns without removing existing ones:

data
  |> Derive({
    id: 'user_id',
    metadata: {
      title: 'post_title',
      content: 'post_content',
      id: 'post_id'
    }
  })

Where

Filter rows based on conditions:

data |> Where('item_count' == 1)
data |> Where('item_count' > 1 && ('is_premium' || 'has_discount'))
data |> Where(one_of(lower('category'), ['electronics', 'books', 'clothing']))

Aggregation

Aggregate

Group data and perform calculations:

data
  |> Aggregate(
    ['customer_id'], // Group by columns
    {
      customer_id: 'customer_id',
      unique_orders: count_distinct('order_id'),
      total_spent: sum('amount')
    }
  )

Summary

Perform calculations across all rows:

data
  |> Summary({
    total_count: count(1),
    items: `[{{string_agg(cast("item_data", string), ', ')}}]`
  })

Joins

LeftJoin

Perform left outer joins:

orders |> LeftJoin(customers, |ord, cust| {
  on: ord"customer_id" == cust"id",
  select: {
    ...ord,
    customer_name: cust"name",
    total_value: ord"quantity" |> mul(cust"unit_price")
  }
})

Set Operations

UnionAll

Combine multiple datasets:

UnionAll(active_users, inactive_users)

Built-in Functions

Mathematical Functions

Basic Arithmetic

add(1, 2) // 3
sub(5, 2) // 3
mul(2, 3) // 6
div(6, 2) // 3
to_the_power_of(2, 3) // 8

Mathematical Operations

abs(-5.5) // 5.5
floor(1.9) // 1
ceil(1.1) // 2
round(1.56, 1) // 1.6

Logarithmic Functions

ln(10) // 2.302585092994046
log_2(8) // 3
log_10(100) // 2

Advanced Math

cosine_distance([1, 0], [0, 1]) // 1
random() // Random number 0-1
nan() // NaN value
is_nan(div(0, 0)) // true

String Functions

Case Conversion

lower('HELLO') // 'hello'
upper('hello') // 'HELLO'

String Manipulation

concat('Hello', ' ', 'World') // 'Hello World'
length('hello') // 5
replace('hello world', 'world', 'there') // 'hello there'
substring('hello world', 1, 5) // 'hello'

String Analysis

starts_with('hello world', 'hello') // true
ends_with('hello world', 'world') // true
like('hello', 'h%') // true (SQL LIKE pattern)
is_numeric_string('123') // true

String Splitting

split_part('hello-world', '-', 1) // 'hello'
split_part('a,b,c', ',', 2) // 'b'

String Prefix/Suffix

remove_prefix('prefix-value', 'prefix-') // 'value'
remove_suffix('value-suffix', '-suffix') // 'value'

String Formatting

Template Strings

;`hello {{1+1}}` // 'hello 2'
format`hello {{1+1}}` // 'hello 2'
format_nullable`hello {{null_of(int)}}` // null

Comparison Functions

Equality

eq(1, 1) // true
neq(1, 2) // true
5 == 5 // true
5 != 3 // true

Ordering

gt(5, 3) // true
gte(5, 5) // true
lt(3, 5) // true
lte(5, 5) // true
5 > 3 // true
5 >= 5 // true
3 < 5 // true
5 <= 5 // true

Logical Functions

Boolean Operations

and(true, 2 > 1) // true
or(false, true) // true
not(true) // false

Null Handling

Null Checks

is_null(null_of(int)) // true
is_not_null(0) // true

Null Operations

coalesce(null_of(int), 3) // 3 (first non-null)
null_if(1, 1) // null (null if equal)

Array and Set Functions

Set Operations

one_of(1, [1, 2, 3]) // true
one_of('a', ['b', 'c']) // false

Array Creation

empty_array_of(int) // []

Date and Time Functions

Date Arithmetic

date_add(@2023-04-01, 1, 'days')      // Add 1 day
date_diff(@2023-04-01, @2023-04-15, 'days')  // 14 days difference

Date Manipulation

date_trunc(@2023-04-15, 'month')      // Truncate to month start
date_part(@2023-04-15, 'month')       // Extract month (4)
now()                                 // Current timestamp

Aggregation Functions

Basic Aggregations

count(1) // Count rows
count_distinct('field') // Count unique values
sum('amount') // Sum values
max('value') // Maximum value
min('value') // Minimum value
string_agg('field', ',') // Concatenate with delimiter

Window Functions

Ranking and Numbering

row_number_over({
  partition_by: ['group_id', 'category'],
  order_by: ['created_date']
})

first_value_over('message_text', {
  partition_by: ['conversation_id'],
  order_by: ['timestamp']
})

count_over(true, {
  partition_by: ['user_id']
})

Utility Functions

Data Generation

gen_random_uuid() // Generate UUID
impure(1 + 1) // Disable optimization

Type Conversion

cast('123', 'int') // Convert to integer
cast(123, 'string') // Convert to string

Record Operations

get_from_record({ a: 1, b: 2 }, 'a') // 1

Category Functions

category_concat('a', 'b') // 'a~>b'
category_at('a~>b~>c', 0) // 'a'

Conditional Logic

Conditional Expressions

if(condition, {then: value1, else: value2})

// Example in aggregation
sum(if("is_active", {then: "points", else: 0}))

Advanced Features

Lambda Expressions

Use lambda syntax for complex transformations:

data |> Select(|t| {
  ...t,
  computed_field: some_function(t"field1", t"field2")
})

Field References

Reference fields using quoted syntax:

t"field_name"        // Reference field in lambda
"field_name"         // Reference field in operations

Spread Operator

Use ... to include all existing fields:

Select(|t| {
  ...t,              // Include all existing fields
  new_field: "value" // Add new field
})

Type Casting

Convert between types explicitly:

'user_id' |> cast('string')
'amount' |> cast('float')

Best Practices

Query Organization

  • Use meaningful variable names with let bindings
  • Break complex queries into smaller, reusable parts
  • Comment complex logic for maintainability

Performance Optimization

  • Place Where clauses early in the pipeline
  • Use appropriate aggregation functions
  • Consider using distinct: true when needed

Type Safety

  • Explicitly cast types when necessary
  • Use null_of() for type-safe null values
  • Validate types with type_of() during development

Data Integrity

  • Handle null values explicitly with coalesce() and null checks
  • Use is_numeric_string() before string-to-number conversions
  • Validate data ranges and constraints in Where clauses

This reference covers the core EraQL language features. The language continues to evolve with additional functions and capabilities being added regularly.