GitHub
Docs Introduction

Introduction

Welcome to NEST - the Neural EEG Sequence Transducer framework for decoding brain signals into natural language.

What is NEST?

NEST (Neural EEG Sequence Transducer) is an open-source deep learning framework designed to decode EEG brain signals encoding eye-related brain activity (EEG) into natural language text. Built on PyTorch, NEST provides researchers and developers with powerful tools to train, evaluate, and deploy EEG-to-text models.

Key Features

Quick Start

Install NEST and run your first decode in under 10 lines:

# Install
pip install nest-eeg

# Decode EEG epochs (numpy array: [n_words, n_channels, n_times])
from nest import NESTDecoder
import numpy as np

decoder = NESTDecoder.from_pretrained("nest-base")

# eeg_epochs shape: (n_words, 105, 500) — 105 channels, 1s at 500Hz
text = decoder.decode(eeg_epochs)
print(text)
# "The researchers published their findings in a neuroscience journal."

Installation

NEST can be installed via pip or from source:

# Install via pip
pip install nest-eeg

# Or install from source
git clone https://github.com/wazder/NEST.git
cd NEST
pip install -e .

Requirements

Configuration

NEST uses YAML configuration files for model and training settings:

# configs/model.yaml
model:
  type: nest
  encoder:
    hidden_size: 512
    num_layers: 6
    num_heads: 8
  decoder:
    hidden_size: 512
    num_layers: 6
    vocab_size: 50000

training:
  batch_size: 32
  learning_rate: 1e-4
  epochs: 100

Models API

The core model classes:

from nest.models import NESTModel, NESTEncoder, NESTDecoder

# Create model from config
model = NESTModel.from_config("configs/model.yaml")

# Or load pre-trained
model = NESTModel.from_pretrained("nest-base")

Training

Train your own model:

from nest import Trainer, DataModule

# Setup data
data = DataModule("path/to/zuco")

# Create trainer
trainer = Trainer(
    model=model,
    data=data,
    config="configs/training.yaml"
)

# Start training
trainer.fit()

Data Processing

NEST provides a full preprocessing pipeline compatible with the ZuCo dataset format. Raw EEG recordings are filtered, epoched, and normalized before being fed to the model.

Loading ZuCo Data

from nest.data import ZuCoDataset

# Load ZuCo dataset from local path
dataset = ZuCoDataset(
    root="data/ZuCo/",
    tasks=["task1-SR", "task2-NR"],  # SR=normal reading, NR=movie reviews
    subjects=None,                    # None = all 12 subjects
    split="train"
)

print(len(dataset))   # 12071 training samples
print(dataset[0])     # {'eeg': Tensor(105, 500), 'text': 'the'}

Preprocessing Pipeline

from nest.data import EEGPreprocessor

preprocessor = EEGPreprocessor(
    sfreq=500,
    lowpass=100.0,   # bandpass high cutoff (Hz)
    highpass=0.5,    # bandpass low cutoff (Hz)
    notch=50.0,      # power line noise (50Hz EU / 60Hz US)
    epoch_tmin=-0.2, # seconds before word onset
    epoch_tmax=0.8,  # seconds after word onset
    normalize="z-score"  # per-subject, per-channel normalization
)

# Fit on training subjects, transform all splits
preprocessor.fit(train_dataset)
train_epochs = preprocessor.transform(train_dataset)
test_epochs  = preprocessor.transform(test_dataset)

DataLoader

from torch.utils.data import DataLoader
from nest.data import collate_eeg

loader = DataLoader(
    train_epochs,
    batch_size=32,
    shuffle=True,
    collate_fn=collate_eeg,
    num_workers=4
)

for batch in loader:
    eeg   = batch["eeg"]    # (B, 105, 500)
    input_ids = batch["input_ids"]   # tokenized text targets
    attention_mask = batch["attention_mask"]

Evaluation

NEST reports four primary metrics evaluated on held-out test subjects unseen during training: WER, BLEU-4, character-level accuracy, and ROUGE-L.

from nest import NESTDecoder
from nest.metrics import compute_wer, compute_bleu, compute_rouge

decoder = NESTDecoder.from_pretrained("nest-base")

# Decode full test set
predictions  = decoder.decode_batch(test_epochs, batch_size=64)
ground_truth = [sample["text"] for sample in test_dataset]

# Compute metrics
wer   = compute_wer(predictions, ground_truth)
bleu  = compute_bleu(predictions, ground_truth)
rouge = compute_rouge(predictions, ground_truth)

print(f"WER:    {wer:.1%}")    # 26.1%
print(f"BLEU-4: {bleu:.2f}")  # 0.74
print(f"ROUGE-L:{rouge:.2f}") # 0.81

To run the full evaluation suite on ZuCo Task 1:

# From the project root
python scripts/evaluate.py \
    --checkpoint checkpoints/nest-base.pt \
    --data      data/ZuCo/ \
    --task      task1-SR \
    --split     test \
    --beam-size 5

Training Guide

This guide walks through training NEST from scratch on the ZuCo dataset. Estimated training time is 5.4 hours on a single NVIDIA RTX 3090.

1. Prepare the Dataset

# Download ZuCo (requires OSF account — free)
# https://osf.io/q3zws/  — download and extract to data/ZuCo/

python scripts/prepare_zuco.py \
    --raw  data/ZuCo/raw/ \
    --out  data/ZuCo/processed/ \
    --subjects all

2. Configure Training

# configs/train_base.yaml
model:
  type: nest
  encoder:
    hidden_size: 512
    num_layers: 6
    num_heads: 8
    dropout: 0.1
  decoder:
    hidden_size: 512
    num_layers: 6
    vocab_size: 50000
    dropout: 0.1

training:
  batch_size: 32
  learning_rate: 1.0e-4
  lr_scheduler: cosine
  warmup_steps: 2000
  epochs: 100
  gradient_clip: 1.0
  seed: 42

data:
  root: data/ZuCo/processed/
  tasks: [task1-SR, task2-NR]
  test_subjects: [subject10, subject11, subject12]

3. Launch Training

# Single GPU
python scripts/train.py --config configs/train_base.yaml

# Multi-GPU (4x)
torchrun --nproc_per_node=4 scripts/train.py \
    --config configs/train_base.yaml \
    --distributed

4. Monitor with TensorBoard

tensorboard --logdir runs/nest-base/
# Open http://localhost:6006 to see loss curves and sample predictions

Fine-tuning

Start from the pre-trained nest-base checkpoint and fine-tune on your own EEG dataset. Your data must follow the custom dataset format.

from nest import NESTDecoder
from nest.data import CustomEEGDataset
from nest.training import Trainer

# Load pre-trained model
decoder = NESTDecoder.from_pretrained("nest-base")

# Load your dataset
train_data = CustomEEGDataset("data/my_dataset/train/")
val_data   = CustomEEGDataset("data/my_dataset/val/")

# Fine-tune — only decoder cross-attention updated by default
trainer = Trainer(
    model=decoder.model,
    train_dataset=train_data,
    val_dataset=val_data,
    config={
        "learning_rate": 5e-5,   # lower LR for fine-tuning
        "epochs": 20,
        "freeze_encoder": True,  # freeze EEG encoder
        "batch_size": 16,
    }
)

trainer.fit()
trainer.save("checkpoints/my-model.pt")

Freeze Strategies

# Freeze encoder only (default) — fastest, good for similar EEG hardware
decoder.model.freeze_encoder()

# Freeze nothing — full fine-tune, needs more data
decoder.model.unfreeze_all()

# Freeze all except cross-attention — good for domain adaptation
decoder.model.freeze_all()
decoder.model.unfreeze_cross_attention()

Deployment

NEST can be served as a REST API using FastAPI. For production, export to TorchScript for faster inference without the Python overhead.

FastAPI Server

# server.py
from fastapi import FastAPI
from pydantic import BaseModel
from nest import NESTDecoder
import numpy as np

app     = FastAPI()
decoder = NESTDecoder.from_pretrained("nest-base")

class EEGRequest(BaseModel):
    epochs: list[list[list[float]]]  # [n_words, n_channels, n_times]

@app.post("/decode")
def decode(req: EEGRequest):
    eeg  = np.array(req.epochs, dtype=np.float32)
    text = decoder.decode(eeg)
    return {"text": text}
# Run
uvicorn server:app --host 0.0.0.0 --port 8000 --workers 2

TorchScript Export

import torch
from nest import NESTDecoder

decoder = NESTDecoder.from_pretrained("nest-base")
scripted = torch.jit.script(decoder.model)
scripted.save("nest-base-scripted.pt")

# Load in production (no Python NEST library needed)
model = torch.jit.load("nest-base-scripted.pt")

Batch Inference

from nest import NESTDecoder

decoder = NESTDecoder.from_pretrained("nest-base")

# Process multiple recordings efficiently
results = decoder.decode_batch(
    eeg_list,       # list of (n_words, 105, 500) arrays
    batch_size=64,
    device="cuda"
)

for text in results:
    print(text)

Basic Usage

Common usage patterns for inference and analysis.

Load and Decode

from nest import NESTDecoder
import numpy as np

# Load model (downloads weights on first run, ~180MB)
decoder = NESTDecoder.from_pretrained("nest-base")

# Simulate EEG data (replace with real recordings)
eeg_epochs = np.random.randn(15, 105, 500).astype(np.float32)

# Decode — returns a string
text = decoder.decode(eeg_epochs)
print(text)

Token-Level Output

# Get per-word tokens with confidence scores
output = decoder.decode_with_scores(eeg_epochs)

for word, score in zip(output.words, output.scores):
    print(f"{word:15s}  confidence={score:.2%}")

Attention Maps

# Extract cross-attention maps for interpretability
output = decoder.decode_with_attention(eeg_epochs)

# output.attention: shape (n_decoder_layers, n_heads, n_tokens, n_eeg_frames)
import matplotlib.pyplot as plt
plt.imshow(output.attention[-1].mean(0), aspect="auto", cmap="magma")
plt.xlabel("EEG time frame")
plt.ylabel("Decoded token")
plt.colorbar()
plt.show()

Custom Dataset

To train or fine-tune on your own EEG data, structure it in the NEST format and implement the CustomEEGDataset class.

Required File Structure

data/my_dataset/
  train/
    subject01_task1.npz
    subject02_task1.npz
    ...
  val/
    subject10_task1.npz
  test/
    subject11_task1.npz

NPZ File Format

import numpy as np

# Each .npz file contains arrays for one subject/session
np.savez(
    "data/my_dataset/train/subject01_task1.npz",
    eeg=epochs_array,    # shape: (n_words, n_channels, n_times)
    words=word_list,     # list of strings, length = n_words
    sfreq=np.array(500), # sampling frequency
    ch_names=ch_list     # list of channel names, length = n_channels
)

Dataset Class

from nest.data import CustomEEGDataset
from torch.utils.data import DataLoader

dataset = CustomEEGDataset(
    root="data/my_dataset/train/",
    preprocess=True,   # apply default filtering + normalization
    sfreq=500,
    n_channels=64      # your headset channel count (not necessarily 105)
)

loader = DataLoader(dataset, batch_size=16, shuffle=True)

If your headset has fewer than 105 channels, NEST will project the input to 512-dim with a learned linear layer rather than the standard convolutional feature extractor.

Real-time Demo

Stream live EEG from any LSL-compatible device and decode in real time. Requires pylsl and a running EEG stream.

pip install nest-eeg pylsl
from nest.realtime import NESTStream

# Connect to an LSL EEG stream and start decoding
stream = NESTStream(
    model="nest-base",
    stream_name="BrainProducts RDA",  # LSL stream name
    n_channels=105,
    sfreq=500,
    epoch_duration=1.0,  # seconds per word epoch
    stride=0.5           # sliding window stride
)

@stream.on_decode
def on_text(word, confidence):
    print(f"[{confidence:.0%}] {word}", end=" ", flush=True)

stream.start()  # blocks until KeyboardInterrupt

Built-in LSL Demo

# Launch the graphical real-time demo (simulated EEG if no hardware)
python -m nest.realtime.demo --simulate

# With real hardware
python -m nest.realtime.demo --stream "BrainProducts RDA" --model nest-base

The demo window shows live EEG waveforms on the left and the decoded text stream on the right, updating word-by-word as signals are processed.