Documents
Documents & Media
Upload documents, images, videos, and audio files. Hebbrix automatically extracts content, generates embeddings, and makes everything searchable.
Supported Formats
Documents
PDF, DOCX, TXT, MD, HTML, CSV
Images
PNG, JPG, WebP, GIF, SVG
Videos
MP4, WebM, MOV (transcription)
Audio
MP3, WAV, M4A (transcription)
Processing Pipeline
- 1Upload - File uploaded and validated
- 2Extract - Text extraction, OCR for images, transcription for audio/video
- 3Chunk - Content split into semantic chunks with overlap
- 4Embed - Generate vector embeddings for each chunk
- 5Index - Store in vector database for fast retrieval
Endpoints
Code Examples
Upload a Document
Python
from hebbrix import Hebbrix
client = Hebbrix()
# Upload a PDF document
with open("research_paper.pdf", "rb") as f:
document = client.documents.upload(
file=f,
metadata={"type": "research", "topic": "AI"},
extract_entities=True
)
print(f"Document ID: {document.id}")
print(f"Status: {document.status}")
# Check processing status
status = client.documents.get(document.id)
print(f"Processing: {status.progress}%")Upload with Progress Tracking
Python (with polling)
import time
# Upload and wait for processing
document = client.documents.upload(file=open("document.pdf", "rb"))
# Poll for completion
while document.status != "completed":
time.sleep(2)
document = client.documents.get(document.id)
print(f"Processing: {document.progress}%")
if document.status == "failed":
raise Exception(f"Processing failed: {document.error}")
print(f"Document ready! {document.chunk_count} chunks created")cURL Example
Upload Document
curl -X POST "https://api.hebbrix.com/v1/documents" \
-H "Authorization: Bearer mem_sk_your_api_key" \
-F "file=@document.pdf" \
-F "metadata={"type": "research"}" \
-F "extract_entities=true"Processing Status
Document processing is asynchronous. Check the status field:
pendingQueued for processing
processingCurrently being processed
completedReady for search
failedCheck error field
