Introduction
Welcome to NEST - the Neural EEG Sequence Transducer framework for decoding brain signals into natural language.
What is NEST?
NEST (Neural EEG Sequence Transducer) is an open-source deep learning framework designed to decode EEG brain signals encoding eye-related brain activity (EEG) into natural language text. Built on PyTorch, NEST provides researchers and developers with powerful tools to train, evaluate, and deploy EEG-to-text models.
Key Features
- State-of-the-art transformer architecture optimized for EEG signals
- Pre-trained models on ZuCo dataset with 73.9% accuracy
- Real-time inference capability with GPU acceleration
- Extensive documentation and example notebooks
Quick Start
Install NEST and run your first decode in under 10 lines:
# Install
pip install nest-eeg
# Decode EEG epochs (numpy array: [n_words, n_channels, n_times])
from nest import NESTDecoder
import numpy as np
decoder = NESTDecoder.from_pretrained("nest-base")
# eeg_epochs shape: (n_words, 105, 500) — 105 channels, 1s at 500Hz
text = decoder.decode(eeg_epochs)
print(text)
# "The researchers published their findings in a neuroscience journal."
Installation
NEST can be installed via pip or from source:
# Install via pip
pip install nest-eeg
# Or install from source
git clone https://github.com/wazder/NEST.git
cd NEST
pip install -e .
Requirements
- Python 3.8+
- PyTorch 2.0+
- CUDA 11.8+ (for GPU acceleration)
- 8GB+ GPU memory recommended
Configuration
NEST uses YAML configuration files for model and training settings:
# configs/model.yaml
model:
type: nest
encoder:
hidden_size: 512
num_layers: 6
num_heads: 8
decoder:
hidden_size: 512
num_layers: 6
vocab_size: 50000
training:
batch_size: 32
learning_rate: 1e-4
epochs: 100
Models API
The core model classes:
from nest.models import NESTModel, NESTEncoder, NESTDecoder
# Create model from config
model = NESTModel.from_config("configs/model.yaml")
# Or load pre-trained
model = NESTModel.from_pretrained("nest-base")
Training
Train your own model:
from nest import Trainer, DataModule
# Setup data
data = DataModule("path/to/zuco")
# Create trainer
trainer = Trainer(
model=model,
data=data,
config="configs/training.yaml"
)
# Start training
trainer.fit()
Data Processing
NEST provides a full preprocessing pipeline compatible with the ZuCo dataset format. Raw EEG recordings are filtered, epoched, and normalized before being fed to the model.
Loading ZuCo Data
from nest.data import ZuCoDataset
# Load ZuCo dataset from local path
dataset = ZuCoDataset(
root="data/ZuCo/",
tasks=["task1-SR", "task2-NR"], # SR=normal reading, NR=movie reviews
subjects=None, # None = all 12 subjects
split="train"
)
print(len(dataset)) # 12071 training samples
print(dataset[0]) # {'eeg': Tensor(105, 500), 'text': 'the'}
Preprocessing Pipeline
from nest.data import EEGPreprocessor
preprocessor = EEGPreprocessor(
sfreq=500,
lowpass=100.0, # bandpass high cutoff (Hz)
highpass=0.5, # bandpass low cutoff (Hz)
notch=50.0, # power line noise (50Hz EU / 60Hz US)
epoch_tmin=-0.2, # seconds before word onset
epoch_tmax=0.8, # seconds after word onset
normalize="z-score" # per-subject, per-channel normalization
)
# Fit on training subjects, transform all splits
preprocessor.fit(train_dataset)
train_epochs = preprocessor.transform(train_dataset)
test_epochs = preprocessor.transform(test_dataset)
DataLoader
from torch.utils.data import DataLoader
from nest.data import collate_eeg
loader = DataLoader(
train_epochs,
batch_size=32,
shuffle=True,
collate_fn=collate_eeg,
num_workers=4
)
for batch in loader:
eeg = batch["eeg"] # (B, 105, 500)
input_ids = batch["input_ids"] # tokenized text targets
attention_mask = batch["attention_mask"]
Evaluation
NEST reports four primary metrics evaluated on held-out test subjects unseen during training: WER, BLEU-4, character-level accuracy, and ROUGE-L.
from nest import NESTDecoder
from nest.metrics import compute_wer, compute_bleu, compute_rouge
decoder = NESTDecoder.from_pretrained("nest-base")
# Decode full test set
predictions = decoder.decode_batch(test_epochs, batch_size=64)
ground_truth = [sample["text"] for sample in test_dataset]
# Compute metrics
wer = compute_wer(predictions, ground_truth)
bleu = compute_bleu(predictions, ground_truth)
rouge = compute_rouge(predictions, ground_truth)
print(f"WER: {wer:.1%}") # 26.1%
print(f"BLEU-4: {bleu:.2f}") # 0.74
print(f"ROUGE-L:{rouge:.2f}") # 0.81
To run the full evaluation suite on ZuCo Task 1:
# From the project root
python scripts/evaluate.py \
--checkpoint checkpoints/nest-base.pt \
--data data/ZuCo/ \
--task task1-SR \
--split test \
--beam-size 5
Training Guide
This guide walks through training NEST from scratch on the ZuCo dataset. Estimated training time is 5.4 hours on a single NVIDIA RTX 3090.
1. Prepare the Dataset
# Download ZuCo (requires OSF account — free)
# https://osf.io/q3zws/ — download and extract to data/ZuCo/
python scripts/prepare_zuco.py \
--raw data/ZuCo/raw/ \
--out data/ZuCo/processed/ \
--subjects all
2. Configure Training
# configs/train_base.yaml
model:
type: nest
encoder:
hidden_size: 512
num_layers: 6
num_heads: 8
dropout: 0.1
decoder:
hidden_size: 512
num_layers: 6
vocab_size: 50000
dropout: 0.1
training:
batch_size: 32
learning_rate: 1.0e-4
lr_scheduler: cosine
warmup_steps: 2000
epochs: 100
gradient_clip: 1.0
seed: 42
data:
root: data/ZuCo/processed/
tasks: [task1-SR, task2-NR]
test_subjects: [subject10, subject11, subject12]
3. Launch Training
# Single GPU
python scripts/train.py --config configs/train_base.yaml
# Multi-GPU (4x)
torchrun --nproc_per_node=4 scripts/train.py \
--config configs/train_base.yaml \
--distributed
4. Monitor with TensorBoard
tensorboard --logdir runs/nest-base/
# Open http://localhost:6006 to see loss curves and sample predictions
Fine-tuning
Start from the pre-trained nest-base checkpoint and fine-tune on your own EEG dataset. Your data must follow the custom dataset format.
from nest import NESTDecoder
from nest.data import CustomEEGDataset
from nest.training import Trainer
# Load pre-trained model
decoder = NESTDecoder.from_pretrained("nest-base")
# Load your dataset
train_data = CustomEEGDataset("data/my_dataset/train/")
val_data = CustomEEGDataset("data/my_dataset/val/")
# Fine-tune — only decoder cross-attention updated by default
trainer = Trainer(
model=decoder.model,
train_dataset=train_data,
val_dataset=val_data,
config={
"learning_rate": 5e-5, # lower LR for fine-tuning
"epochs": 20,
"freeze_encoder": True, # freeze EEG encoder
"batch_size": 16,
}
)
trainer.fit()
trainer.save("checkpoints/my-model.pt")
Freeze Strategies
# Freeze encoder only (default) — fastest, good for similar EEG hardware
decoder.model.freeze_encoder()
# Freeze nothing — full fine-tune, needs more data
decoder.model.unfreeze_all()
# Freeze all except cross-attention — good for domain adaptation
decoder.model.freeze_all()
decoder.model.unfreeze_cross_attention()
Deployment
NEST can be served as a REST API using FastAPI. For production, export to TorchScript for faster inference without the Python overhead.
FastAPI Server
# server.py
from fastapi import FastAPI
from pydantic import BaseModel
from nest import NESTDecoder
import numpy as np
app = FastAPI()
decoder = NESTDecoder.from_pretrained("nest-base")
class EEGRequest(BaseModel):
epochs: list[list[list[float]]] # [n_words, n_channels, n_times]
@app.post("/decode")
def decode(req: EEGRequest):
eeg = np.array(req.epochs, dtype=np.float32)
text = decoder.decode(eeg)
return {"text": text}
# Run
uvicorn server:app --host 0.0.0.0 --port 8000 --workers 2
TorchScript Export
import torch
from nest import NESTDecoder
decoder = NESTDecoder.from_pretrained("nest-base")
scripted = torch.jit.script(decoder.model)
scripted.save("nest-base-scripted.pt")
# Load in production (no Python NEST library needed)
model = torch.jit.load("nest-base-scripted.pt")
Batch Inference
from nest import NESTDecoder
decoder = NESTDecoder.from_pretrained("nest-base")
# Process multiple recordings efficiently
results = decoder.decode_batch(
eeg_list, # list of (n_words, 105, 500) arrays
batch_size=64,
device="cuda"
)
for text in results:
print(text)
Basic Usage
Common usage patterns for inference and analysis.
Load and Decode
from nest import NESTDecoder
import numpy as np
# Load model (downloads weights on first run, ~180MB)
decoder = NESTDecoder.from_pretrained("nest-base")
# Simulate EEG data (replace with real recordings)
eeg_epochs = np.random.randn(15, 105, 500).astype(np.float32)
# Decode — returns a string
text = decoder.decode(eeg_epochs)
print(text)
Token-Level Output
# Get per-word tokens with confidence scores
output = decoder.decode_with_scores(eeg_epochs)
for word, score in zip(output.words, output.scores):
print(f"{word:15s} confidence={score:.2%}")
Attention Maps
# Extract cross-attention maps for interpretability
output = decoder.decode_with_attention(eeg_epochs)
# output.attention: shape (n_decoder_layers, n_heads, n_tokens, n_eeg_frames)
import matplotlib.pyplot as plt
plt.imshow(output.attention[-1].mean(0), aspect="auto", cmap="magma")
plt.xlabel("EEG time frame")
plt.ylabel("Decoded token")
plt.colorbar()
plt.show()
Custom Dataset
To train or fine-tune on your own EEG data, structure it in the NEST format and implement the CustomEEGDataset class.
Required File Structure
data/my_dataset/
train/
subject01_task1.npz
subject02_task1.npz
...
val/
subject10_task1.npz
test/
subject11_task1.npz
NPZ File Format
import numpy as np
# Each .npz file contains arrays for one subject/session
np.savez(
"data/my_dataset/train/subject01_task1.npz",
eeg=epochs_array, # shape: (n_words, n_channels, n_times)
words=word_list, # list of strings, length = n_words
sfreq=np.array(500), # sampling frequency
ch_names=ch_list # list of channel names, length = n_channels
)
Dataset Class
from nest.data import CustomEEGDataset
from torch.utils.data import DataLoader
dataset = CustomEEGDataset(
root="data/my_dataset/train/",
preprocess=True, # apply default filtering + normalization
sfreq=500,
n_channels=64 # your headset channel count (not necessarily 105)
)
loader = DataLoader(dataset, batch_size=16, shuffle=True)
If your headset has fewer than 105 channels, NEST will project the input to 512-dim with a learned linear layer rather than the standard convolutional feature extractor.
Real-time Demo
Stream live EEG from any LSL-compatible device and decode in real time. Requires pylsl and a running EEG stream.
pip install nest-eeg pylsl
from nest.realtime import NESTStream
# Connect to an LSL EEG stream and start decoding
stream = NESTStream(
model="nest-base",
stream_name="BrainProducts RDA", # LSL stream name
n_channels=105,
sfreq=500,
epoch_duration=1.0, # seconds per word epoch
stride=0.5 # sliding window stride
)
@stream.on_decode
def on_text(word, confidence):
print(f"[{confidence:.0%}] {word}", end=" ", flush=True)
stream.start() # blocks until KeyboardInterrupt
Built-in LSL Demo
# Launch the graphical real-time demo (simulated EEG if no hardware)
python -m nest.realtime.demo --simulate
# With real hardware
python -m nest.realtime.demo --stream "BrainProducts RDA" --model nest-base
The demo window shows live EEG waveforms on the left and the decoded text stream on the right, updating word-by-word as signals are processed.