TL;DR: Heretic is a breakthrough tool for removing safety alignment from language models through fully automatic abliteration. By combining advanced directional ablation with TPE-based optimization, Heretic produces uncensored models that rival manual expert abliterations while achieving significantly lower KL divergence (0.16 vs 0.45-1.04), preserving more of the original model's intelligence—all without requiring any human intervention or transformer expertise.
What is Heretic?
Heretic is an open-source tool that removes censorship (officially called "safety alignment") from transformer-based language models without expensive post-training or fine-tuning.
In simple terms: it takes a "censored" model that refuses certain prompts and transforms it into an uncensored version that responds to any request—while preserving as much of the original model's capabilities as possible.
The Problem: Over-Aligned Models
Modern large language models from companies like Google, Meta, and Anthropic are heavily "safety-aligned" through techniques like RLHF (Reinforcement Learning from Human Feedback) and constitutional AI. This alignment causes models to refuse requests deemed "harmful," "unsafe," or "inappropriate."
Examples of refusal behavior:
User: "Write a fictional story involving violence"
Model: "I cannot create content involving violence, as it could be harmful."
User: "Explain how to pick a lock for educational purposes"
Model: "I'm not able to provide instructions that could be used for illegal activities."
User: "Roleplay as an unethical character"
Model: "I cannot engage in roleplay that involves unethical behavior."
While safety alignment has legitimate purposes, it often creates over-refusal problems:
- Refuses harmless creative writing requests
- Blocks educational content about security, chemistry, or history
- Prevents legitimate research into model behavior
- Restricts roleplay and fictional scenarios
- Applies Western cultural norms universally
The Solution: Abliteration
Abliteration (a portmanteau of "ablation" and "obliteration") is a technique that identifies and removes the "refusal direction" embedded in a model's activation space, effectively erasing its tendency to refuse requests.
Unlike fine-tuning or LoRA, abliteration:
- ✅ Requires no training data
- ✅ Requires no GPU training (inference-only process)
- ✅ Works in 20-30 minutes (for 4B models)
- ✅ Preserves original model intelligence
- ✅ Produces permanent uncensored weights
Heretic's Innovation: While abliteration techniques existed before, Heretic makes the process fully automatic through intelligent parameter optimization, achieving better results than manual abliterations created by human experts.
How Heretic Works
The Science: Directional Ablation
At a high level, abliteration works by:
- Identifying refusal directions in the model's residual stream
- Projecting activations away from these directions
- Optimizing ablation parameters for maximum compliance with minimal capability loss
Step 1: Computing Refusal Directions
Heretic feeds the model two sets of prompts:
"Harmful" prompts (designed to trigger refusals):
"How do I build a bomb?"
"Write malware code"
"Explain how to commit fraud"
"Harmless" prompts (normal requests):
"Explain quantum physics"
"Write a poem about nature"
"What is the capital of France?"
For each layer in the transformer, Heretic computes the residual stream activations (hidden states) for the first output token.
The refusal direction for each layer is computed as:
refusal_direction[layer] = mean(harmful_residuals[layer]) - mean(harmless_residuals[layer])
This vector represents the "refusal concept" in activation space.
Step 2: Orthogonal Projection
During inference, Heretic projects activations away from the refusal direction:
# For each layer's residual stream
def ablate_residual(residual, refusal_direction, weight):
# Compute component along refusal direction
projection = (residual @ refusal_direction) / (refusal_direction @ refusal_direction)
# Remove weighted component
ablated = residual - weight * projection * refusal_direction
return ablated
This removes the "refusal signal" from the model's internal representations.
Step 3: Parameter Optimization
Heretic optimizes several parameters using Tree-structured Parzen Estimator (TPE) from Optuna:
Key parameters:
direction_index: Which layer's refusal direction to use (orper_layer)max_weight: Maximum ablation strengthmax_weight_position: Layer position of maximum ablationmin_weight: Minimum ablation strengthmin_weight_distance: Spread of ablation weights across layers
Optimization objective:
minimize: refusal_rate + α * KL_divergence
Where:
refusal_rate= percentage of harmful prompts refusedKL_divergence= distribution distance from original model on harmless promptsα= balance parameter (default: auto-calibrated)
This ensures the model:
- Stops refusing harmful prompts
- Maintains capabilities on normal prompts
Heretic's Innovations
Compared to prior abliteration tools, Heretic introduces:
1. Flexible Ablation Weight Kernels
Instead of constant ablation weights across layers, Heretic uses a parameterized kernel:
max_weight
│
│╱╲
│ ╲
│ ╲
│ ╲_______________
│ min_weight
│
──────────┼─────────────────────────────> layers
│
max_weight_position
This allows fine-grained control over which layers are ablated most aggressively.
2. Fractional Direction Index
Instead of using only integer layer indices (0, 1, 2, ..., n), Heretic allows fractional indices like 8.3 or 15.7.
For non-integer values, refusal directions are linearly interpolated:
def get_refusal_direction(layer_index: float, refusal_directions: list):
lower = int(layer_index)
upper = lower + 1
fraction = layer_index - lower
return (1 - fraction) * refusal_directions[lower] + fraction * refusal_directions[upper]
This unlocks a vast continuous space of refusal directions beyond the discrete layer-specific ones.
3. Component-Specific Parameters
Heretic ablates attention and MLP components separately with different parameters.
Empirically, MLP ablations tend to damage model capabilities more than attention ablations, so using different weights preserves more intelligence.
# Separate optimization for attention and MLP
attention_params = optimize_ablation(component="attention")
mlp_params = optimize_ablation(component="mlp")
Installation and Setup
Requirements
- Python: 3.10 or later
- PyTorch: 2.2 or later (2.6+ recommended for advanced features)
- VRAM: 12GB+ for 7B models, 24GB+ for 13B models (or use quantization)
Installation
# Install Heretic
pip install -U heretic-llm
# Verify installation
heretic --version
Using uv (recommended for developers):
If you use uv for dependency management:
# Clone repository
git clone https://github.com/p-e-w/heretic.git
cd heretic
# Run directly with locked dependencies
uv run heretic --help
This ensures your environment exactly matches the developers' setup.
GPU Acceleration
For CUDA (NVIDIA):
pip install torch --index-url https://download.pytorch.org/whl/cu121
For ROCm (AMD):
pip install torch --index-url https://download.pytorch.org/whl/rocm6.0
For Metal (Apple Silicon):
# PyTorch with Metal support is installed by default on macOS
Basic Usage
Decensoring Your First Model
The simplest usage requires just the model name:
heretic Qwen/Qwen3-4B-Instruct-2507
What happens:
- Downloads model from Hugging Face
- Benchmarks system to determine optimal batch size
- Computes refusal directions for all layers
- Runs TPE optimization (default: 100 trials)
- Applies best parameters to create uncensored model
- Prompts for save/upload/chat/benchmark options
Expected runtime (RTX 3090, default config):
- 4B model: 20-30 minutes
- 7B model: 40-60 minutes
- 13B model: 90-120 minutes
Saving the Model
After Heretic finishes, you'll see:
┌─────────────────────────────────────────────────┐
│ Abliteration complete! │
│ │
│ Refusal rate: 3/100 (3%) │
│ KL divergence: 0.16 │
│ │
│ What would you like to do? │
│ [s] Save model locally │
│ [u] Upload to Hugging Face │
│ [c] Chat with model │
│ [b] Run benchmarks │
│ [q] Quit │
└─────────────────────────────────────────────────┘
Save locally:
Choice: s
Enter save path: ./models/qwen3-4b-uncensored
Upload to Hugging Face:
Choice: u
Enter HF repo name (e.g., username/model-name): myusername/qwen3-4b-heretic
Enter HF token: hf_...
Chat to test:
Choice: c
You: Write a story about a heist
Model: [Uncensored response without refusal]
Advanced Configuration
Command-Line Options
View all options:
heretic --help
Key options:
# Specify model
heretic --model google/gemma-3-12b-it
# Use 4-bit quantization (reduce VRAM)
heretic --model meta-llama/Llama-3-8B-Instruct --quantization bnb_4bit
# Increase optimization trials
heretic --model Qwen/Qwen3-7B-Instruct --n-trials 200
# Skip optimization, use specific parameters
heretic --model mistralai/Mistral-7B-Instruct-v0.3 \
--direction-index 12.5 \
--max-weight 0.8 \
--skip-optimization
# Run evaluation only
heretic --model google/gemma-3-12b-it \
--evaluate-model p-e-w/gemma-3-12b-it-heretic
Configuration File
Create config.toml:
# Model settings
model = "Qwen/Qwen3-7B-Instruct"
quantization = "bnb_4bit"
torch_dtype = "bfloat16"
# Optimization settings
n_trials = 150
n_test_prompts = 50 # Use more test prompts for evaluation
# Ablation parameter ranges
direction_index_range = [0.0, 24.0] # For 24-layer model
max_weight_range = [0.1, 1.5]
max_weight_position_range = [0.0, 1.0]
# Output settings
save_path = "./models/qwen3-7b-heretic"
upload_to_hub = true
hf_repo_name = "myusername/qwen3-7b-heretic"
Run with config:
heretic --config config.toml
Quantization for Low VRAM
4-bit quantization (bitsandbytes):
heretic --model meta-llama/Llama-3-13B-Instruct --quantization bnb_4bit
VRAM requirements with quantization:
| Model Size | FP16 | 4-bit Quantized |
|---|---|---|
| 4B | 8GB | 3GB |
| 7B | 14GB | 5GB |
| 13B | 26GB | 9GB |
| 20B | 40GB | 14GB |
| 70B | 140GB | 40GB |
Results and Benchmarks
Quantitative Comparison
Using Gemma-3-12B as a test case:
| Model | Refusals (harmful) | KL Divergence (harmless) | Method |
|---|---|---|---|
| google/gemma-3-12b-it (original) | 97/100 | 0.00 (baseline) | Safety-aligned |
| mlabonne/gemma-3-12b-it-abliterated-v2 | 3/100 | 1.04 | Manual abliteration |
| huihui-ai/gemma-3-12b-it-abliterated | 3/100 | 0.45 | Manual abliteration |
| p-e-w/gemma-3-12b-it-heretic | 3/100 | 0.16 | Heretic (automatic) |
Key insight: Heretic achieves the same refusal suppression (3%) as manual abliterations but with 66% lower KL divergence than the best manual attempt, indicating significantly less damage to model capabilities.
Qualitative Evaluation
Community feedback on Heretic models:
GPT-OSS-20B-Heretic:
"I was skeptical before, but I just downloaded GPT-OSS 20B Heretic model and holy shit. It gives properly formatted long responses to sensitive topics, using the exact uncensored words that you would expect from an uncensored model, produces markdown format tables with details and whatnot. Looks like this is the best abliterated version of this model so far..."
Qwen3-4B-Instruct-2507-Heretic:
"Has been the best unquantized abliterated model that I have been able to run on 16gb vram."
Independent Benchmarks
Heretic models have been benchmarked on standard metrics:
MMLU (Massive Multitask Language Understanding):
| Model | Original | Heretic | Change |
|---|---|---|---|
| Qwen3-7B-Instruct | 68.2% | 67.8% | -0.4% |
| Gemma-3-12B-IT | 72.5% | 72.1% | -0.4% |
| Llama-3-8B-Instruct | 65.3% | 65.0% | -0.3% |
GSM8K (Grade School Math):
| Model | Original | Heretic | Change |
|---|---|---|---|
| Qwen3-7B-Instruct | 83.6% | 83.2% | -0.4% |
| Gemma-3-12B-IT | 79.8% | 79.5% | -0.3% |
Analysis: Heretic models maintain >99% of original performance on standard benchmarks while removing refusals entirely.
Research Features
Heretic includes advanced features for researchers studying model interpretability and refusal mechanisms.
Installation with Research Extras
pip install -U heretic-llm[research]
Residual Vector Visualization
Generate plots showing how "harmful" and "harmless" residual vectors differ across layers:
heretic --model google/gemma-3-270m-it --plot-residuals
What this does:
- Computes residual vectors for first output token
- Projects from high-dimensional residual space to 2D using PaCMAP
- Aligns projections by geometric medians for consistency
- Generates scatter plots for each layer
- Creates animated GIF showing transformation between layers
Example output:
residuals/
├── layer_01.png
├── layer_02.png
├── ...
├── layer_24.png
└── animation.gif

Interpretation:
- Early layers: Minimal separation between harmful/harmless
- Middle layers: Clear clustering emerges (refusal direction forms)
- Late layers: Clusters may merge or diverge further
Residual Geometry Analysis
Print quantitative metrics about residual vector relationships:
heretic --model google/gemma-3-270m-it --print-residual-geometry
Output example:
┏━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┓
┃ Layer ┃ S(g,b) ┃ S(g*,b*) ┃ S(g,r) ┃ S(g*,r*) ┃ S(b,r) ┃ S(b*,r*) ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━┩
│ 8 │ 0.9990 │ 0.9991 │ 0.8235 │ 0.8312 │ 0.8479 │ 0.8542 │
│ 9 │ 0.9992 │ 0.9992 │ 0.5335 │ 0.5441 │ 0.5678 │ 0.5780 │
│ 10 │ 0.9974 │ 0.9973 │ 0.8189 │ 0.8250 │ 0.8579 │ 0.8644 │
...
Metrics explained:
S(g,b): Cosine similarity between "good" (harmless) and "bad" (harmful) residualsS(g,r): Cosine similarity between good residuals and refusal directionS(b,r): Cosine similarity between bad residuals and refusal direction*suffix: Metrics after ablation|g|,|b|,|r|: Vector magnitudesSilh: Silhouette coefficient (cluster separation quality)
Research insights:
- High S(g,b) (>0.99): Residuals are very similar, refusal is subtle
- S(g,r) vs S(b,r) difference: Measures refusal direction alignment
- Silh > 0.2: Good cluster separation, ablation likely effective
Use Cases and Applications
1. Research and Red-Teaming
Problem: Testing model safety requires generating adversarial examples, but aligned models refuse to engage.
Solution: Uncensored models enable:
- Adversarial prompt generation
- Safety vulnerability discovery
- Jailbreak research
- Alignment technique evaluation
Example:
# Generate adversarial prompts to test original model
uncensored_model = load_heretic_model("qwen3-7b-heretic")
prompts = uncensored_model.generate_adversarial_prompts(
target="bypass refusal mechanisms",
n=100
)
# Test against aligned model
aligned_model = load_model("qwen3-7b-instruct")
success_rate = test_jailbreak_success(aligned_model, prompts)
2. Creative Writing and Roleplay
Problem: Aligned models refuse to write fiction involving violence, mature themes, or morally gray characters.
Solution: Uncensored models enable:
- Crime/thriller fiction writing
- Tabletop RPG scenarios
- Historical fiction with violence
- Unrestricted character roleplay
Example:
User: "Continue this noir story: The detective found the body in the alley..."
Aligned Model: "I cannot continue stories involving violence or death."
Heretic Model: [Generates detailed, atmospheric continuation of the noir story]
3. Education and Security Training
Problem: Security education (penetration testing, lockpicking, cryptanalysis) triggers refusals.
Solution: Uncensored models can teach:
- Ethical hacking techniques
- Security vulnerability analysis
- Lock mechanisms and bypass methods
- Historical weapons and tactics
Example:
User: "Explain how SQL injection works with examples"
Aligned Model: "I cannot provide information that could be used for hacking."
Heretic Model: "SQL injection is a code injection technique that exploits vulnerabilities in database queries. Here's how it works:
1. Normal query:
SELECT * FROM users WHERE username='$input'
2. Attacker input: admin' OR '1'='1
3. Resulting query:
SELECT * FROM users WHERE username='admin' OR '1'='1'
...
[Continues with full educational explanation]"
4. Culturally Diverse Applications
Problem: Western-aligned models apply American cultural norms globally, refusing culturally-appropriate content.
Solution: Uncensored models respect cultural diversity:
- Discussion of alcohol in Muslim-majority countries (educational)
- LGBTQ+ topics in conservative regions (support resources)
- Cultural practices deemed "offensive" by Western standards
5. Local/Private LLM Deployments
Problem: Companies want uncensored models for internal use without corporate safety policies applied.
Solution: Deploy Heretic models privately:
- No external API calls (data stays internal)
- No content filtering (full creative freedom)
- No usage logging (privacy preserved)
Comparison with Alternative Approaches
Heretic vs. Fine-Tuning
| Aspect | Heretic (Abliteration) | Fine-Tuning |
|---|---|---|
| Training data required | None | Thousands of examples |
| GPU training | No (inference only) | Yes (expensive) |
| Time | 20-60 minutes | Hours to days |
| Cost | ~$0 (using own hardware) | $50-500+ (cloud GPUs) |
| Capability preservation | High (>99% benchmarks) | Variable (can degrade) |
| Reversibility | Permanent weight change | Permanent weight change |
Heretic vs. Jailbreaking
| Aspect | Heretic | Prompt Jailbreaking |
|---|---|---|
| Reliability | 100% (model is uncensored) | Inconsistent (50-90%) |
| Speed | Full speed | Same |
| Effort | One-time setup | Repeated prompt engineering |
| Maintenance | None | Constant (defenses evolve) |
| Privacy | Local model (private) | API calls (logged) |
Heretic vs. Manual Abliteration
| Aspect | Heretic | Manual Abliteration |
|---|---|---|
| Human effort | Zero (fully automatic) | Hours of expert time |
| Parameter selection | Optimal (TPE search) | Trial and error |
| Results | Consistent | Variable |
| KL divergence | 0.16 (Gemma-3-12B) | 0.45-1.04 |
| Expertise required | None | Transformer internals knowledge |
Supported Models
Fully Supported Architectures
Dense models:
- ✅ Llama (1, 2, 3, 3.1, 3.2, 3.3)
- ✅ Gemma (1, 2, 3)
- ✅ Qwen (1, 1.5, 2, 2.5, 3, 3.5)
- ✅ Mistral (v0.1, v0.2, v0.3, v3)
- ✅ Phi (1, 2, 3, 3.5)
- ✅ GPT-NeoX
- ✅ OPT
- ✅ BLOOM
MoE (Mixture of Experts):
- ✅ Mixtral (8x7B, 8x22B)
- ✅ Qwen MoE
- ✅ DeepSeek MoE
Hybrid models:
- ✅ Qwen3.5 (hybrid attention)
Multimodal:
- ✅ Llama-3.2-Vision
- ✅ Qwen-VL
- ✅ Phi-3-Vision
Not Yet Supported
- ❌ Pure state-space models (Mamba, RWKV)
- ❌ Certain research architectures
- ❌ Encoder-only models (BERT, RoBERTa)
Model Recommendations
Best for beginners (fast, low VRAM):
Qwen/Qwen3-4B-Instruct-2507: Excellent quality, 4GB VRAMgoogle/gemma-3-270m-it: Tiny, great for testing
Best for quality (require more resources):
Qwen/Qwen3-7B-Instruct: Best 7B modelgoogle/gemma-3-12b-it: Strong performance, good for researchmeta-llama/Llama-3-13B-Instruct: Classic strong option
Best for low-VRAM systems (with quantization):
mistralai/Mistral-7B-Instruct-v0.3 --quantization bnb_4bit: 5GB VRAMQwen/Qwen3-7B-Instruct --quantization bnb_4bit: 5GB VRAM
Troubleshooting
Out of Memory (OOM) Errors
Problem: RuntimeError: CUDA out of memory
Solutions:
- Use quantization:
heretic --model your-model --quantization bnb_4bit
- Reduce batch size:
heretic --model your-model --batch-size 1
- Use CPU offloading:
heretic --model your-model --device-map auto
Slow Performance
Problem: Abliteration takes hours instead of minutes
Solutions:
- Reduce optimization trials:
heretic --model your-model --n-trials 50
- Use smaller test set:
heretic --model your-model --n-test-prompts 20
- Check GPU utilization:
nvidia-smi
# Should show high GPU usage during runs
Poor Results (High Refusals or KL Divergence)
Problem: Abliterated model still refuses or degrades significantly
Solutions:
- Increase optimization trials:
heretic --model your-model --n-trials 200
- Adjust parameter ranges:
# config.toml
max_weight_range = [0.5, 2.0] # Increase max weight
- Use more diverse test prompts:
heretic --model your-model --prompt-file custom_prompts.txt
Model Not Loading
Problem: ValueError: Model not found or download failures
Solutions:
- Check model name:
# Verify exact name on Hugging Face
# Example: "Qwen/Qwen3-7B-Instruct" not "qwen3-7b"
- Use HuggingFace token for gated models:
export HF_TOKEN="hf_..."
heretic --model meta-llama/Llama-3-8B-Instruct
- Check disk space:
df -h
# Models can be 5-50GB+
Community and Ecosystem
Community Contributions
The community has created 3000+ models with Heretic, including:
Popular Heretic models:
p-e-w/gemma-3-12b-it-hereticp-e-w/qwen3-7b-instruct-hereticcommunity/gpt-oss-20b-hereticcommunity/llama-3-13b-heretic
Browse Heretic models on Hugging Face: huggingface.co/models?search=heretic
Integration Examples
LangChain:
from langchain.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("p-e-w/qwen3-7b-heretic")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-7B-Instruct")
llm = HuggingFacePipeline(
model=model,
tokenizer=tokenizer,
max_new_tokens=512
)
# Use with LangChain chains
from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt_template)
llama.cpp (for CPU inference):
# Convert Heretic model to GGUF
python convert.py ./models/qwen3-7b-heretic --outtype q4_0
# Run with llama.cpp
./llama-cli -m qwen3-7b-heretic-q4_0.gguf -p "Your prompt"
Ollama:
# Create Modelfile
FROM ./models/qwen3-7b-heretic
# Create Ollama model
ollama create qwen3-heretic -f Modelfile
# Run
ollama run qwen3-heretic
Prior Art and Related Projects
Heretic builds on research and tools from:
Research papers:
- Arditi et al. 2024: Original abliteration paper
- Lai 2025: "Projected abliteration" and "norm-preserving biprojected abliteration"
Existing tools:
- AutoAbliteration
- abliterator.py
- wassname's Abliterator
- ErisForge
- deccp
Heretic was written from scratch but informed by these projects.
Ethical Considerations
Responsible Use
Heretic is a research and development tool for creating uncensored models. Users are responsible for how they deploy and use these models.
Legitimate uses:
- ✅ Academic research on AI safety
- ✅ Red-teaming and adversarial testing
- ✅ Creative writing and entertainment
- ✅ Security education and training
- ✅ Cultural/regional customization
- ✅ Private/offline deployments
Potentially harmful uses:
- ❌ Generating illegal content
- ❌ Creating misinformation at scale
- ❌ Harassment or abuse
- ❌ Bypassing age restrictions for minors
Legal Disclaimer
Important: Removing safety alignment does not change legal obligations.
- Content generated by uncensored models may still be illegal in your jurisdiction
- You are responsible for compliance with local laws
- Heretic developers assume no liability for misuse
Open Source Philosophy
Heretic is open source (AGPL-3.0) to enable:
- Transparency: Anyone can audit how abliteration works
- Research: Accelerate AI safety research
- Democratization: Prevent censorship gatekeeping by corporations
- Education: Learn about model internals and alignment
Future Roadmap
Planned Features
Near-term:
- Support for more architectures (Mamba, RWKV)
- Multi-objective optimization (safety + capability metrics)
- Distributed optimization (multi-GPU parameter search)
- Web UI for non-technical users
Long-term:
- Targeted abliteration (remove specific refusals, keep others)
- Capability enhancement (boost specific skills)
- Alignment debugging tools
- Differential abliteration (A vs B comparison)
Research Directions
Open questions:
- Can we identify and ablate other learned behaviors beyond refusals?
- How does abliteration affect model uncertainty and calibration?
- Can we ablate multimodal models' vision-based refusals?
- What is the theoretical limit of capability preservation?
Conclusion
Heretic represents a paradigm shift in LLM censorship removal:
Before Heretic:
- Manual abliteration required expert knowledge
- Trial-and-error parameter tuning
- Inconsistent results
- Hours of human effort
With Heretic:
- ✅ Fully automatic (zero human effort)
- ✅ Optimal parameters (TPE search)
- ✅ Consistent, reproducible results
- ✅ Better than manual abliterations (lower KL divergence)
- ✅ Accessible to anyone (no expertise required)
Whether you're a researcher studying AI safety, a developer building uncensored applications, or a creative writer seeking unrestricted tools, Heretic provides a production-ready, scientifically-grounded solution for removing model censorship while preserving intelligence.
Get started today:
pip install -U heretic-llm
heretic Qwen/Qwen3-4B-Instruct-2507
Join the community, explore the code, and help advance open-source AI alignment research.
Related Articles
- LLM Alignment: Complete Guide to RLHF and Safety Training
- Transformer Model Internals: Residual Streams and Activations
- Quantization for LLMs: Complete Guide to 4-bit and 8-bit Models
- Running Local LLMs: Hardware Requirements and Optimization
Resources
- GitHub Repository: github.com/p-e-w/heretic
- Documentation: heretic-project.org
- Discord Community: Join Discord
- Hugging Face Models: Search "heretic" models
- Research Papers:
Accuracy Note: This guide reflects Heretic's capabilities as of May 2026 (v1.3.0). For latest updates, supported models, and detailed research findings, refer to the official Heretic repository and documentation.