Human-in-the-Loop: The Ethics of Security Automation
Why mandatory human review protects researcher reputation better than any algorithm. Building AI that knows when to stop. Part 5 of 5.

I could make this system fully autonomous. Remove the human review gates. Let it find vulnerabilities, validate them, and submit reports automatically.
I won’t.
Not because the technology can’t do it. Because I’ve seen what happens when researchers prioritize volume over judgment. Their acceptance rates crater. Programs add them to internal “problematic researcher” lists. Other programs notice.
That specific reputation damage—slow, invisible, cumulative—is worse than any technical failure.
Human-in-the-loop security automation requires mandatory human review for all submission decisions. Automation handles reconnaissance, testing, and validation—the tedious work where machines excel. Humans handle judgment calls: Is this finding worth reporting? Is the impact assessment accurate? Does the proof-of-concept clearly demonstrate the vulnerability? Quality over quantity, always.
Why Is Mandatory Human Review Non-Negotiable?
In part 2, I described how validation reduced false positives from 90% to 40%. That’s a huge improvement.
But 40% false positives is still unacceptable for direct submission.
If I submit 10 reports and 4 are invalid:
- Programs notice patterns of low-quality submissions
- Triage teams develop negative associations with my username
- Future reports get scrutinized more heavily
- Bounty amounts decrease for “problematic” researchers
The math doesn’t favor automation without human gates.
My system has hard rules:
Always requires human review:
- Any finding with ≥0.70 confidence
- Critical or high severity findings (any confidence)
- First submission to any new program
- Scope ambiguity detected
- Potential for dispute or pushback
Never automated:
- Report submission
- Response to program triage questions
- Scope clarification decisions
- Disclosure timing
[!WARNING] Bug bounty platforms share information. A ban from one program can affect your standing elsewhere. Programs in the same company (e.g., Google, Meta) definitely share researcher reputations internally. One careless automated submission can cascade.
What Is the Quality Over Quantity Principle?
Two hypothetical researchers:
Researcher A: 200 reports submitted, 50 accepted (25% acceptance rate) Researcher B: 50 reports submitted, 40 accepted (80% acceptance rate)
Who would you rather have in your program?
Researcher B, obviously. They’re careful. They understand impact. They don’t waste triage time.
My system optimizes for Researcher B’s pattern:
- High confidence threshold (0.85+) for human review queue
- Detailed validation before any human sees it
- Quality evidence collection (screenshots, PoC, hashes)
- Report templates that match program expectations
- No “spray and pray” submissions
I hated the idea of leaving valid findings unreported. But I needed to accept that a finding I’m 60% confident about isn’t ready. Let it mature. Get more evidence. Or discard it.
The acceptance rate compounds. Programs start trusting my reports. Triage becomes faster. Bounties increase. Fewer back-and-forth questions.
How Does Scope Validation Prevent Disaster?
Every bug bounty program has scope—what you’re allowed to test, what’s off-limits.
Out-of-scope testing can result in:
- Legal action (yes, really)
- Permanent program ban
- Platform suspension
- Criminal investigation (in extreme cases)
Automation makes mistakes faster. Without scope validation, the system could hammer a production database that’s explicitly out of scope. By the time I notice, the damage is done.
My scope validation runs before every test:
async function validateScope(target: Target, program: Program): Promise<boolean> {
// Check explicit in-scope domains
if (program.inScope.domains.includes(target.domain)) {
return true;
}
// Check wildcard patterns
if (program.inScope.wildcards.some(w => matchWildcard(w, target.domain))) {
return true;
}
// Check explicit out-of-scope
if (program.outOfScope.includes(target.domain)) {
logScopeViolation(target, program, 'explicit_exclusion');
return false;
}
// Ambiguous--flag for human review
logScopeViolation(target, program, 'ambiguous');
await notifyHuman('scope_clarification_needed', { target, program });
return false;
} Ambiguous cases don’t proceed. They wait for human judgment. Better to miss a finding than to get banned.
In part 3, I described how scope violations are a failure category that triggers immediate halt and blacklisting.
What Evidence Should Every Report Include?
Evidence serves two purposes:
- Help programs verify your finding
- Protect you if there’s a dispute
My evidence collection:
Screenshots
HTTP request/response pairs
PoC code
SHA-256 hashes
interface EvidencePackage {
screenshots: Array<{
path: string;
hash: string;
capturedAt: Date;
}>;
httpExchanges: Array<{
request: string;
response: string;
hash: string;
}>;
poc: {
type: 'curl' | 'python' | 'manual';
code: string;
hash: string;
};
packageHash: string; // Hash of all component hashes
} The package hash enables verification: “Here’s the SHA-256 of my evidence bundle at time of submission. It hasn’t changed.”
[!TIP] Some researchers skip evidence collection to submit faster. Don’t. That 10 minutes of screenshot capture has saved me in disputes where programs claimed “couldn’t reproduce.” I had timestamped proof that it worked on date X.
How Does Human Review Actually Work?
When a finding reaches 0.70+ confidence, it queues for human review with full context:
interface ReviewQueueItem {
finding: Finding;
validationSummary: {
pocResult: 'passed' | 'partial' | 'failed';
responseDiff: string; // Key differences found
falsePositiveRisk: number;
};
suggestedActions: string[];
priorityScore: number;
program: ProgramSummary;
relatedFindings?: Finding[]; // Other findings in same session
} The review interface shows:
- Full finding details
- Validation evidence
- Why the system thinks it’s valid
- Similar past findings (accepted or rejected)
- Program-specific notes
Human reviewer can:
- Approve: Proceed to formatting and submission
- Request more validation: Send back for additional testing
- Dismiss: Mark as false positive (logs pattern for learning)
- Hold: Wait for more context before deciding
This connects to platform integration in part 4. Approved findings go to platform-specific formatters, then submit with human-approved content.
What’s the Human Augmentation Philosophy?
I’m not building a replacement for human researchers. I’m building a tool that makes human researchers more effective.
What automation handles:
- Subdomain enumeration (tedious, mechanical)
- Technology fingerprinting (pattern matching)
- Endpoint discovery (exhaustive search)
- Initial vulnerability detection (known patterns)
- PoC validation (reproducibility testing)
- Evidence collection (systematic capture)
- Report formatting (platform-specific templates)
What humans handle:
- Is this finding impactful enough to report?
- Is the severity assessment accurate?
- Are there edge cases the automation missed?
- How should this be communicated to the program?
- Should we coordinate with other researchers?
- Is disclosure timing appropriate?
The division is clear: automation for breadth and consistency, humans for judgment and nuance.
I originally wanted full automation. Well, it’s more like… I wanted the efficiency fantasy of passive income from vulnerability reports. But judgment can’t be automated. Context matters too much. Programs are run by humans who respond to human communication.
What Are the Ethical Boundaries of Security Automation?
Some things the system will never do:
Never exploit for gain beyond bounty
- No data exfiltration
- No ransomware deployment
- No selling access
Never test without authorization
- Only registered bug bounty programs
- Only explicitly in-scope targets
- Halt immediately on scope ambiguity
Never prioritize speed over safety
- Rate limiting is mandatory
- Ban detection triggers immediate halt
- Human review required before submission
Never misrepresent findings
- No exaggerating severity for higher bounties
- No fabricating evidence
- No duplicate submissions across programs for same vendor
These aren’t just ethical guidelines—they’re code constraints. The system literally cannot do some of these things.
How Does This Connect to the Full System?
Throughout this series:
- Architecture: Multi-agent design with evidence-gated progression
- Validation: Response diff analysis to reduce false positives
- Failure Learning: Recovery strategies and pattern learning
- Multi-Platform: Unified model with platform-specific formatters
- Human-in-the-Loop (you are here): Mandatory review gates and ethical boundaries
Each layer builds on the previous. But they all converge on this final point: humans make the decisions that matter.
The SQLite RAG learns from human feedback. Validation signatures come from human rejections. Platform formatters produce what human reviewers approve. The entire system exists to serve human judgment, not replace it.
What’s the Actual Outcome?
Before human-in-the-loop design:
- Fast automated submissions
- Low acceptance rate
- Negative program relationships
- Stressful dispute resolution
After mandatory human review:
- Slower, more deliberate submissions
- 80%+ acceptance rate
- Programs respond faster (trust established)
- Evidence prevents disputes
The speed tradeoff is worth it. I’d rather submit 5 high-quality reports per week than 50 that damage my reputation.
Series Conclusion: What Did We Build?
Over five posts, I’ve described a system that:
- Uses multi-agent architecture for parallel reconnaissance, testing, validation, and reporting
- Applies evidence-gated progression where findings must prove themselves before advancing
- Learns from failures with categorized recovery strategies and pattern databases
- Integrates multiple platforms through unified models and platform-specific formatters
- Requires human judgment for all decisions that affect researcher reputation
It’s not fully autonomous. It’s not meant to be.
The goal was never to replace human security researchers. The goal was to eliminate the tedious parts—the subdomain enumeration, the endpoint mapping, the false positive filtering—so human attention goes to the parts that require judgment.
Maybe the best automation isn’t the kind that removes humans from the loop. Maybe it’s the kind that keeps humans at the center—informed, efficient, and making better decisions because the noise has been cleared away.
That’s the series. Five posts on building something I actually use. If you’re building security automation, I hope this helped you think through the architecture, the failure modes, and especially the ethical constraints.
Questions? Critiques? I’d love to hear them.