Archive Management

The Varvis Download CLI provides comprehensive tools for managing archived files, including restoration requests, tracking, and automated workflows.

Understanding Archives

What are Archived Files?

Archived files are older data files that have been moved to long-term storage to save space on primary storage systems. These files:

Are not immediately available for download
Require restoration before downloading
May take time to become available (minutes to hours)
Are typically older analysis results

Archive Status Indicators

When listing files, archived files are marked:

bash

./varvis-download.js -t mytarget -a 12345 --list

# Output example:
Analysis ID: 12345
Available files:
  - sample_001.bam (1.2 GB) [ARCHIVED]
  - sample_001.bam.bai (4.5 MB)
  - sample_001.vcf.gz (125 MB) [ARCHIVED]
  - sample_001.vcf.gz.tbi (2.1 MB)

Archive Restoration Modes

Ask Mode (Default)

Prompts for each archived file:

bash

./varvis-download.js -t mytarget -a 12345
# Prompts: "File sample_001.bam is archived. Restore? [y/N]"

All Mode

Prompts once for all archived files, then applies decision to all:

bash

./varvis-download.js -t mytarget -a 12345 --restoreArchived all
# Prompts: "Restore all archived files? (y/n): y"
# Logs: "User decision for all archived files: restore"
# Logs: "Restoring archived file sample_001.bam due to --restoreArchived=all decision"

Force Mode

Restores archived files without confirmation:

bash

./varvis-download.js -t mytarget -a 12345 --restoreArchived force
# Logs: "Force restoring archived file sample_001.bam due to --restoreArchived=force"

No Mode

Skips all archived files:

bash

./varvis-download.js -t mytarget -a 12345 --restoreArchived no
# Logs: "Skipping archived file sample_001.bam due to --restoreArchived=no"

Enhanced Logging

All restoration decisions are clearly logged for audit trails:

Decision tracking - Each file shows why it was restored/skipped
User input logging - Interactive decisions are recorded
Context preservation - All saved options visible in logs
Resume clarity - Resume operations show restored context usage

Restoration Tracking

Tracking File

The tool maintains a JSON file to track restoration requests:

bash

# Default tracking file
./varvis-download.js -t mytarget -a 12345 --restorationFile "my-restorations.json"

Tracking File Format

The restoration tracking file preserves all context needed to resume downloads exactly as originally requested:

json

[
  {
    "analysisId": "12345",
    "fileName": "sample_001.bam",
    "restoreEstimation": "2024-06-24T15:30:00Z",
    "options": {
      "destination": "/custom/download/path",
      "overwrite": true,
      "range": "chr1:1000000-2000000",
      "bed": null,
      "restorationFile": "awaiting-restoration.json",
      "filetypes": ["bam", "bam.bai"]
    }
  }
]

State Preservation Features

The restoration system preserves complete download context:

Destination paths - Files download to original destination
Overwrite settings - Original overwrite preferences preserved
Genomic ranges - Ranged downloads use original ranges/BED files
File type filters - Only originally requested file types downloaded
All CLI options - Complete context restored for consistent behavior

Resume Archived Downloads

The resume function automatically detects when restored files become available and downloads them using the exact same context (destination, ranges, file types) as the original request.

Basic Resume

Resume previously requested archived downloads using default tracking file:

bash

./varvis-download.js --resumeArchivedDownloads -t mytarget -u username -p password

Resume with Custom Tracking File

bash

./varvis-download.js --resumeArchivedDownloads --restorationFile "project-a-restorations.json" -t mytarget -u username -p password

Context Restoration Examples

Original ranged download request:

bash

./varvis-download.js -t mytarget -a 12345 -g "chr1:1000000-2000000" -d "/project/ranged" --restoreArchived force

Resume will automatically:

Download to /project/ranged (preserved destination)
Extract only chr1:1000000-2000000 region (preserved range)
Use ranged download workflow (preserved context)

Original VCF download with custom filetypes:

bash

./varvis-download.js -t mytarget -a 12345 -f "vcf.gz,vcf.gz.tbi" -d "/vcf/data" --restoreArchived all

Resume will automatically:

Download only VCF and index files (preserved filetypes)
Save to /vcf/data (preserved destination)
Use VCF-specific workflow (preserved context)

Complete Archive Workflow Example

This example demonstrates the full archive workflow with state preservation:

Step 1: Initial Request with Custom Settings

bash

# Request ranged BAM download with custom destination
./varvis-download.js \
  -t mytarget \
  -a 12345 \
  -g "chr1:1000000-2000000 chr2:5000000-6000000" \
  -d "/project/genomics/chr1-2" \
  -f "bam,bam.bai" \
  --overwrite \
  --restoreArchived force \
  --restorationFile "genomics-project.json"

# Output:
# ⚠ File sample_001.bam for analysis 12345 is archived.
# ℹ Force restoring archived file sample_001.bam due to --restoreArchived=force
# ℹ Restoration initiated for analysis 12345. Expected availability: 2024-06-24T16:30:00Z

Step 2: Check Restoration Status

bash

# Check what's in the restoration file
cat genomics-project.json

json

[
  {
    "analysisId": "12345",
    "fileName": "sample_001.bam",
    "restoreEstimation": "2024-06-24T16:30:00Z",
    "options": {
      "destination": "/project/genomics/chr1-2",
      "overwrite": true,
      "range": "chr1:1000000-2000000 chr2:5000000-6000000",
      "bed": null,
      "restorationFile": "genomics-project.json",
      "filetypes": ["bam", "bam.bai"]
    }
  }
]

Step 3: Resume When Ready

bash

# Resume downloads (run after restoration time)
./varvis-download.js \
  --resumeArchivedDownloads \
  --restorationFile "genomics-project.json" \
  -t mytarget \
  -u username \
  -p password

# Output:
# ℹ Starting in archive resumption mode.
# ℹ Resuming archived downloads as requested.
# ℹ Resuming download for analysis 12345, file sample_001.bam
# ℹ Performing ranged download for restored BAM file: sample_001.bam
# ℹ Successfully resumed download for analysis 12345, file sample_001.bam
# ℹ Updated restoration file genomics-project.json - 0 entries remaining

Result

The file downloads exactly as originally requested:

Location: /project/genomics/chr1-2/sample_001.ranged.bam
Content: Only regions chr1:1000000-2000000 and chr2:5000000-6000000
Type: BAM with BAI index (as specified in filetypes)
Behavior: Overwrites existing files (as specified)

Automation Workflows

Daily Archive Check Script

bash

#!/bin/bash
# daily-archive-check.sh

RESTORATION_FILE="daily-restorations.json"
LOG_FILE="archive-$(date +%Y%m%d).log"

echo "Checking for available restored files..."
./varvis-download.js \
  --resumeArchivedDownloads \
  --restorationFile "$RESTORATION_FILE" \
  --logfile "$LOG_FILE"

echo "Archive check completed. See $LOG_FILE for details."

Bulk Archive Restoration

bash

#!/bin/bash
# bulk-restore.sh

ANALYSIS_LIST="archive_analyses.txt"

while IFS= read -r ANALYSIS_ID; do
  echo "Requesting restoration for analysis: $ANALYSIS_ID"

  ./varvis-download.js \
    -t mytarget \
    -a "$ANALYSIS_ID" \
    --restoreArchived force \
    --list

  sleep 5  # Rate limiting
done < "$ANALYSIS_LIST"

echo "All restoration requests submitted."
echo "Run with --resumeArchivedDownloads to check status later."

Scheduled Download Script

bash

#!/bin/bash
# scheduled-download.sh

# For use in cron jobs
# 0 */6 * * * /path/to/scheduled-download.sh

RESTORATION_FILE="/data/restorations/scheduled.json"
DOWNLOAD_DIR="/data/downloads"
LOG_DIR="/var/log/varvis-download"

mkdir -p "$DOWNLOAD_DIR" "$LOG_DIR"

# Check for available restored files every 6 hours
./varvis-download.js \
  --resumeArchivedDownloads \
  --restorationFile "$RESTORATION_FILE" \
  -d "$DOWNLOAD_DIR" \
  --logfile "$LOG_DIR/scheduled-$(date +%Y%m%d-%H%M).log"

Advanced Archive Management

Monitoring Restoration Status

Check restoration status without downloading:

bash

./varvis-download.js \
  --resumeArchivedDownloads \
  --restorationFile "restorations.json" \
  --list

Cleanup Old Restoration Requests

bash

#!/bin/bash
# cleanup-restorations.sh

RESTORATION_FILE="awaiting-restoration.json"
BACKUP_FILE="restorations-backup-$(date +%Y%m%d).json"

# Backup current file
cp "$RESTORATION_FILE" "$BACKUP_FILE"

# Clean up completed/failed restorations older than 7 days
python3 << 'EOF'
import json
import datetime

with open('awaiting-restoration.json', 'r') as f:
    data = json.load(f)

# Filter out old completed/failed restorations
cutoff = datetime.datetime.now() - datetime.timedelta(days=7)
filtered_restorations = []

for restoration in data['restorations']:
    request_time = datetime.datetime.fromisoformat(restoration['requestTime'].replace('Z', '+00:00'))

    # Keep pending restorations and recent completed/failed ones
    if restoration['status'] == 'pending' or request_time > cutoff:
        filtered_restorations.append(restoration)

data['restorations'] = filtered_restorations
data['lastUpdated'] = datetime.datetime.now().isoformat() + 'Z'

with open('awaiting-restoration.json', 'w') as f:
    json.dump(data, f, indent=2)

print(f"Cleaned up restoration file. Kept {len(filtered_restorations)} restorations.")
EOF

Archive Statistics

bash

#!/bin/bash
# archive-stats.sh

echo "Archive Restoration Statistics"
echo "==============================="

RESTORATION_FILE="awaiting-restoration.json"

if [[ -f "$RESTORATION_FILE" ]]; then
  python3 << EOF
import json
from collections import Counter

with open('$RESTORATION_FILE', 'r') as f:
    data = json.load(f)

restorations = data['restorations']
statuses = [r['status'] for r in restorations]
targets = [r['target'] for r in restorations]

print(f"Total restorations: {len(restorations)}")
print("\nBy Status:")
for status, count in Counter(statuses).items():
    print(f"  {status}: {count}")

print("\nBy Target:")
for target, count in Counter(targets).items():
    print(f"  {target}: {count}")
EOF
else
  echo "No restoration file found."
fi

Troubleshooting Archives

Common Issues

Restoration requests timeout:

bash

# Check network connectivity
ping api.varvis.com

# Use debug logging
./varvis-download.js -a 12345 --restoreArchived force --loglevel debug

Files not becoming available:

bash

# Check restoration status
./varvis-download.js --resumeArchivedDownloads --list

# Contact support if files are stuck in pending status for >24 hours

Large restoration tracking files:

bash

# Clean up old restorations
./cleanup-restorations.sh

# Or start fresh (backup first)
cp awaiting-restoration.json backup.json
echo '{"restorations": [], "lastUpdated": ""}' > awaiting-restoration.json

Debug Archive Issues

bash

#!/bin/bash
# debug-archive.sh

ANALYSIS_ID="$1"
if [[ -z "$ANALYSIS_ID" ]]; then
  echo "Usage: $0 <analysis_id>"
  exit 1
fi

echo "Debugging archive issues for analysis: $ANALYSIS_ID"

# Check file status
echo "1. Checking file availability..."
./varvis-download.js -t mytarget -a "$ANALYSIS_ID" --list

# Check restoration tracking
echo "2. Checking restoration history..."
if [[ -f "awaiting-restoration.json" ]]; then
  grep -A5 -B5 "$ANALYSIS_ID" awaiting-restoration.json || echo "No restorations found"
else
  echo "No restoration file found"
fi

# Try restoration with debug logging
echo "3. Testing restoration request..."
./varvis-download.js \
  -t mytarget \
  -a "$ANALYSIS_ID" \
  --restoreArchived force \
  --loglevel debug \
  --logfile "debug-archive-$ANALYSIS_ID.log" \
  --list

echo "Debug complete. Check debug-archive-$ANALYSIS_ID.log for details."

Best Practices

Planning Archive Downloads

Request Early: Submit restoration requests well before you need the data
Batch Requests: Request multiple files at once to optimize restoration
Track Requests: Always use restoration tracking files
Monitor Status: Check restoration status regularly

Production Workflows

Separate Tracking: Use different tracking files for different projects
Automated Monitoring: Set up scheduled checks for restored files
Cleanup Regularly: Remove old completed restorations
Backup Tracking: Keep backups of restoration tracking files

Error Recovery

Retry Logic: Implement retry logic for failed restorations
Fallback Options: Have alternative data sources when possible
Monitoring: Set up alerts for long-pending restorations
Documentation: Keep logs of restoration requests and issues

Integration with Other Tools

Database Integration

python

#!/usr/bin/env python3
# archive_db.py - Track restorations in database

import json
import sqlite3
import datetime

def update_restoration_db():
    """Update database with restoration status"""

    conn = sqlite3.connect('restorations.db')
    cursor = conn.cursor()

    # Create table if not exists
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS restorations (
            analysis_id TEXT,
            file_name TEXT,
            request_time TEXT,
            status TEXT,
            target TEXT,
            PRIMARY KEY (analysis_id, file_name)
        )
    ''')

    # Read restoration file
    with open('awaiting-restoration.json', 'r') as f:
        data = json.load(f)

    # Update database
    for restoration in data['restorations']:
        cursor.execute('''
            INSERT OR REPLACE INTO restorations
            VALUES (?, ?, ?, ?, ?)
        ''', (
            restoration['analysisId'],
            restoration['fileName'],
            restoration['requestTime'],
            restoration['status'],
            restoration['target']
        ))

    conn.commit()
    conn.close()

if __name__ == "__main__":
    update_restoration_db()

Monitoring Integration

bash

#!/bin/bash
# monitoring-integration.sh

# Send metrics to monitoring system
PENDING_COUNT=$(jq '.restorations | map(select(.status == "pending")) | length' awaiting-restoration.json)
AVAILABLE_COUNT=$(jq '.restorations | map(select(.status == "available")) | length' awaiting-restoration.json)

# Example: Send to Prometheus pushgateway
curl -X POST http://pushgateway:9091/metrics/job/varvis-archives \
  --data-binary "varvis_archive_pending_count $PENDING_COUNT"
curl -X POST http://pushgateway:9091/metrics/job/varvis-archives \
  --data-binary "varvis_archive_available_count $AVAILABLE_COUNT"

Configuration Guide - Archive-related settings
Batch Operations - Bulk archive management
Logging & Reports - Archive activity logging
Basic Examples - Simple archive operations

Archive Management ​

Understanding Archives ​

What are Archived Files? ​

Archive Status Indicators ​

Archive Restoration Modes ​

Ask Mode (Default) ​

All Mode ​

Force Mode ​

No Mode ​

Enhanced Logging ​

Restoration Tracking ​

Tracking File ​

Tracking File Format ​

State Preservation Features ​

Resume Archived Downloads ​

Basic Resume ​

Resume with Custom Tracking File ​

Context Restoration Examples ​

Complete Archive Workflow Example ​

Step 1: Initial Request with Custom Settings ​

Step 2: Check Restoration Status ​

Step 3: Resume When Ready ​

Result ​

Automation Workflows ​

Daily Archive Check Script ​

Bulk Archive Restoration ​

Scheduled Download Script ​

Advanced Archive Management ​

Monitoring Restoration Status ​

Cleanup Old Restoration Requests ​

Archive Statistics ​

Troubleshooting Archives ​

Common Issues ​

Debug Archive Issues ​

Best Practices ​

Planning Archive Downloads ​

Production Workflows ​

Error Recovery ​

Integration with Other Tools ​

Database Integration ​

Monitoring Integration ​

Related Documentation ​

Archive Management

Understanding Archives

What are Archived Files?

Archive Status Indicators

Archive Restoration Modes

Ask Mode (Default)

All Mode

Force Mode

No Mode

Enhanced Logging

Restoration Tracking

Tracking File

Tracking File Format

State Preservation Features

Resume Archived Downloads

Basic Resume

Resume with Custom Tracking File

Context Restoration Examples

Complete Archive Workflow Example

Step 1: Initial Request with Custom Settings

Step 2: Check Restoration Status

Step 3: Resume When Ready

Result

Automation Workflows

Daily Archive Check Script

Bulk Archive Restoration

Scheduled Download Script

Advanced Archive Management

Monitoring Restoration Status

Cleanup Old Restoration Requests

Archive Statistics

Troubleshooting Archives

Common Issues

Debug Archive Issues

Best Practices

Planning Archive Downloads

Production Workflows

Error Recovery

Integration with Other Tools

Database Integration

Monitoring Integration

Related Documentation