Archive Management
The Varvis Download CLI provides comprehensive tools for managing archived files, including restoration requests, tracking, and automated workflows.
Understanding Archives
What are Archived Files?
Archived files are older data files that have been moved to long-term storage to save space on primary storage systems. These files:
- Are not immediately available for download
- Require restoration before downloading
- May take time to become available (minutes to hours)
- Are typically older analysis results
Archive Status Indicators
When listing files, archived files are marked:
./varvis-download.js -t mytarget -a 12345 --list
# Output example:
Analysis ID: 12345
Available files:
- sample_001.bam (1.2 GB) [ARCHIVED]
- sample_001.bam.bai (4.5 MB)
- sample_001.vcf.gz (125 MB) [ARCHIVED]
- sample_001.vcf.gz.tbi (2.1 MB)Archive Restoration Modes
Ask Mode (Default)
Prompts for each archived file:
./varvis-download.js -t mytarget -a 12345
# Prompts: "File sample_001.bam is archived. Restore? [y/N]"All Mode
Prompts once for all archived files, then applies decision to all:
./varvis-download.js -t mytarget -a 12345 --restoreArchived all
# Prompts: "Restore all archived files? (y/n): y"
# Logs: "User decision for all archived files: restore"
# Logs: "Restoring archived file sample_001.bam due to --restoreArchived=all decision"Force Mode
Restores archived files without confirmation:
./varvis-download.js -t mytarget -a 12345 --restoreArchived force
# Logs: "Force restoring archived file sample_001.bam due to --restoreArchived=force"No Mode
Skips all archived files:
./varvis-download.js -t mytarget -a 12345 --restoreArchived no
# Logs: "Skipping archived file sample_001.bam due to --restoreArchived=no"Enhanced Logging
All restoration decisions are clearly logged for audit trails:
- Decision tracking - Each file shows why it was restored/skipped
- User input logging - Interactive decisions are recorded
- Context preservation - All saved options visible in logs
- Resume clarity - Resume operations show restored context usage
Restoration Tracking
Tracking File
The tool maintains a JSON file to track restoration requests:
# Default tracking file
./varvis-download.js -t mytarget -a 12345 --restorationFile "my-restorations.json"Tracking File Format
The restoration tracking file preserves all context needed to resume downloads exactly as originally requested:
[
{
"analysisId": "12345",
"fileName": "sample_001.bam",
"restoreEstimation": "2024-06-24T15:30:00Z",
"options": {
"destination": "/custom/download/path",
"overwrite": true,
"range": "chr1:1000000-2000000",
"bed": null,
"restorationFile": "awaiting-restoration.json",
"filetypes": ["bam", "bam.bai"]
}
}
]State Preservation Features
The restoration system preserves complete download context:
- Destination paths - Files download to original destination
- Overwrite settings - Original overwrite preferences preserved
- Genomic ranges - Ranged downloads use original ranges/BED files
- File type filters - Only originally requested file types downloaded
- All CLI options - Complete context restored for consistent behavior
Resume Archived Downloads
The resume function automatically detects when restored files become available and downloads them using the exact same context (destination, ranges, file types) as the original request.
Basic Resume
Resume previously requested archived downloads using default tracking file:
./varvis-download.js --resumeArchivedDownloads -t mytarget -u username -p passwordResume with Custom Tracking File
./varvis-download.js --resumeArchivedDownloads --restorationFile "project-a-restorations.json" -t mytarget -u username -p passwordContext Restoration Examples
Original ranged download request:
./varvis-download.js -t mytarget -a 12345 -g "chr1:1000000-2000000" -d "/project/ranged" --restoreArchived forceResume will automatically:
- Download to
/project/ranged(preserved destination) - Extract only
chr1:1000000-2000000region (preserved range) - Use ranged download workflow (preserved context)
Original VCF download with custom filetypes:
./varvis-download.js -t mytarget -a 12345 -f "vcf.gz,vcf.gz.tbi" -d "/vcf/data" --restoreArchived allResume will automatically:
- Download only VCF and index files (preserved filetypes)
- Save to
/vcf/data(preserved destination) - Use VCF-specific workflow (preserved context)
Complete Archive Workflow Example
This example demonstrates the full archive workflow with state preservation:
Step 1: Initial Request with Custom Settings
# Request ranged BAM download with custom destination
./varvis-download.js \
-t mytarget \
-a 12345 \
-g "chr1:1000000-2000000 chr2:5000000-6000000" \
-d "/project/genomics/chr1-2" \
-f "bam,bam.bai" \
--overwrite \
--restoreArchived force \
--restorationFile "genomics-project.json"
# Output:
# ⚠ File sample_001.bam for analysis 12345 is archived.
# ℹ Force restoring archived file sample_001.bam due to --restoreArchived=force
# ℹ Restoration initiated for analysis 12345. Expected availability: 2024-06-24T16:30:00ZStep 2: Check Restoration Status
# Check what's in the restoration file
cat genomics-project.json[
{
"analysisId": "12345",
"fileName": "sample_001.bam",
"restoreEstimation": "2024-06-24T16:30:00Z",
"options": {
"destination": "/project/genomics/chr1-2",
"overwrite": true,
"range": "chr1:1000000-2000000 chr2:5000000-6000000",
"bed": null,
"restorationFile": "genomics-project.json",
"filetypes": ["bam", "bam.bai"]
}
}
]Step 3: Resume When Ready
# Resume downloads (run after restoration time)
./varvis-download.js \
--resumeArchivedDownloads \
--restorationFile "genomics-project.json" \
-t mytarget \
-u username \
-p password
# Output:
# ℹ Starting in archive resumption mode.
# ℹ Resuming archived downloads as requested.
# ℹ Resuming download for analysis 12345, file sample_001.bam
# ℹ Performing ranged download for restored BAM file: sample_001.bam
# ℹ Successfully resumed download for analysis 12345, file sample_001.bam
# ℹ Updated restoration file genomics-project.json - 0 entries remainingResult
The file downloads exactly as originally requested:
- Location:
/project/genomics/chr1-2/sample_001.ranged.bam - Content: Only regions chr1:1000000-2000000 and chr2:5000000-6000000
- Type: BAM with BAI index (as specified in filetypes)
- Behavior: Overwrites existing files (as specified)
Automation Workflows
Daily Archive Check Script
#!/bin/bash
# daily-archive-check.sh
RESTORATION_FILE="daily-restorations.json"
LOG_FILE="archive-$(date +%Y%m%d).log"
echo "Checking for available restored files..."
./varvis-download.js \
--resumeArchivedDownloads \
--restorationFile "$RESTORATION_FILE" \
--logfile "$LOG_FILE"
echo "Archive check completed. See $LOG_FILE for details."Bulk Archive Restoration
#!/bin/bash
# bulk-restore.sh
ANALYSIS_LIST="archive_analyses.txt"
while IFS= read -r ANALYSIS_ID; do
echo "Requesting restoration for analysis: $ANALYSIS_ID"
./varvis-download.js \
-t mytarget \
-a "$ANALYSIS_ID" \
--restoreArchived force \
--list
sleep 5 # Rate limiting
done < "$ANALYSIS_LIST"
echo "All restoration requests submitted."
echo "Run with --resumeArchivedDownloads to check status later."Scheduled Download Script
#!/bin/bash
# scheduled-download.sh
# For use in cron jobs
# 0 */6 * * * /path/to/scheduled-download.sh
RESTORATION_FILE="/data/restorations/scheduled.json"
DOWNLOAD_DIR="/data/downloads"
LOG_DIR="/var/log/varvis-download"
mkdir -p "$DOWNLOAD_DIR" "$LOG_DIR"
# Check for available restored files every 6 hours
./varvis-download.js \
--resumeArchivedDownloads \
--restorationFile "$RESTORATION_FILE" \
-d "$DOWNLOAD_DIR" \
--logfile "$LOG_DIR/scheduled-$(date +%Y%m%d-%H%M).log"Advanced Archive Management
Monitoring Restoration Status
Check restoration status without downloading:
./varvis-download.js \
--resumeArchivedDownloads \
--restorationFile "restorations.json" \
--listCleanup Old Restoration Requests
#!/bin/bash
# cleanup-restorations.sh
RESTORATION_FILE="awaiting-restoration.json"
BACKUP_FILE="restorations-backup-$(date +%Y%m%d).json"
# Backup current file
cp "$RESTORATION_FILE" "$BACKUP_FILE"
# Clean up completed/failed restorations older than 7 days
python3 << 'EOF'
import json
import datetime
with open('awaiting-restoration.json', 'r') as f:
data = json.load(f)
# Filter out old completed/failed restorations
cutoff = datetime.datetime.now() - datetime.timedelta(days=7)
filtered_restorations = []
for restoration in data['restorations']:
request_time = datetime.datetime.fromisoformat(restoration['requestTime'].replace('Z', '+00:00'))
# Keep pending restorations and recent completed/failed ones
if restoration['status'] == 'pending' or request_time > cutoff:
filtered_restorations.append(restoration)
data['restorations'] = filtered_restorations
data['lastUpdated'] = datetime.datetime.now().isoformat() + 'Z'
with open('awaiting-restoration.json', 'w') as f:
json.dump(data, f, indent=2)
print(f"Cleaned up restoration file. Kept {len(filtered_restorations)} restorations.")
EOFArchive Statistics
#!/bin/bash
# archive-stats.sh
echo "Archive Restoration Statistics"
echo "==============================="
RESTORATION_FILE="awaiting-restoration.json"
if [[ -f "$RESTORATION_FILE" ]]; then
python3 << EOF
import json
from collections import Counter
with open('$RESTORATION_FILE', 'r') as f:
data = json.load(f)
restorations = data['restorations']
statuses = [r['status'] for r in restorations]
targets = [r['target'] for r in restorations]
print(f"Total restorations: {len(restorations)}")
print("\nBy Status:")
for status, count in Counter(statuses).items():
print(f" {status}: {count}")
print("\nBy Target:")
for target, count in Counter(targets).items():
print(f" {target}: {count}")
EOF
else
echo "No restoration file found."
fiTroubleshooting Archives
Common Issues
Restoration requests timeout:
# Check network connectivity
ping api.varvis.com
# Use debug logging
./varvis-download.js -a 12345 --restoreArchived force --loglevel debugFiles not becoming available:
# Check restoration status
./varvis-download.js --resumeArchivedDownloads --list
# Contact support if files are stuck in pending status for >24 hoursLarge restoration tracking files:
# Clean up old restorations
./cleanup-restorations.sh
# Or start fresh (backup first)
cp awaiting-restoration.json backup.json
echo '{"restorations": [], "lastUpdated": ""}' > awaiting-restoration.jsonDebug Archive Issues
#!/bin/bash
# debug-archive.sh
ANALYSIS_ID="$1"
if [[ -z "$ANALYSIS_ID" ]]; then
echo "Usage: $0 <analysis_id>"
exit 1
fi
echo "Debugging archive issues for analysis: $ANALYSIS_ID"
# Check file status
echo "1. Checking file availability..."
./varvis-download.js -t mytarget -a "$ANALYSIS_ID" --list
# Check restoration tracking
echo "2. Checking restoration history..."
if [[ -f "awaiting-restoration.json" ]]; then
grep -A5 -B5 "$ANALYSIS_ID" awaiting-restoration.json || echo "No restorations found"
else
echo "No restoration file found"
fi
# Try restoration with debug logging
echo "3. Testing restoration request..."
./varvis-download.js \
-t mytarget \
-a "$ANALYSIS_ID" \
--restoreArchived force \
--loglevel debug \
--logfile "debug-archive-$ANALYSIS_ID.log" \
--list
echo "Debug complete. Check debug-archive-$ANALYSIS_ID.log for details."Best Practices
Planning Archive Downloads
- Request Early: Submit restoration requests well before you need the data
- Batch Requests: Request multiple files at once to optimize restoration
- Track Requests: Always use restoration tracking files
- Monitor Status: Check restoration status regularly
Production Workflows
- Separate Tracking: Use different tracking files for different projects
- Automated Monitoring: Set up scheduled checks for restored files
- Cleanup Regularly: Remove old completed restorations
- Backup Tracking: Keep backups of restoration tracking files
Error Recovery
- Retry Logic: Implement retry logic for failed restorations
- Fallback Options: Have alternative data sources when possible
- Monitoring: Set up alerts for long-pending restorations
- Documentation: Keep logs of restoration requests and issues
Integration with Other Tools
Database Integration
#!/usr/bin/env python3
# archive_db.py - Track restorations in database
import json
import sqlite3
import datetime
def update_restoration_db():
"""Update database with restoration status"""
conn = sqlite3.connect('restorations.db')
cursor = conn.cursor()
# Create table if not exists
cursor.execute('''
CREATE TABLE IF NOT EXISTS restorations (
analysis_id TEXT,
file_name TEXT,
request_time TEXT,
status TEXT,
target TEXT,
PRIMARY KEY (analysis_id, file_name)
)
''')
# Read restoration file
with open('awaiting-restoration.json', 'r') as f:
data = json.load(f)
# Update database
for restoration in data['restorations']:
cursor.execute('''
INSERT OR REPLACE INTO restorations
VALUES (?, ?, ?, ?, ?)
''', (
restoration['analysisId'],
restoration['fileName'],
restoration['requestTime'],
restoration['status'],
restoration['target']
))
conn.commit()
conn.close()
if __name__ == "__main__":
update_restoration_db()Monitoring Integration
#!/bin/bash
# monitoring-integration.sh
# Send metrics to monitoring system
PENDING_COUNT=$(jq '.restorations | map(select(.status == "pending")) | length' awaiting-restoration.json)
AVAILABLE_COUNT=$(jq '.restorations | map(select(.status == "available")) | length' awaiting-restoration.json)
# Example: Send to Prometheus pushgateway
curl -X POST http://pushgateway:9091/metrics/job/varvis-archives \
--data-binary "varvis_archive_pending_count $PENDING_COUNT"
curl -X POST http://pushgateway:9091/metrics/job/varvis-archives \
--data-binary "varvis_archive_available_count $AVAILABLE_COUNT"Related Documentation
- Configuration Guide - Archive-related settings
- Batch Operations - Bulk archive management
- Logging & Reports - Archive activity logging
- Basic Examples - Simple archive operations
