3. Cap Prediction
Capfinder provides a command-line interface to predict RNA cap types using BAM and POD5 files. Here's how to use the predict-cap-types function:
Usage
Description
This command predicts RNA cap types using BAM and POD5 files.
Required Options
-
--bam_filepathor-b: Path to the BAM file generated using the preprocessing step -
--pod5_diror-p: Path to directory containing POD5 files -
--output_diror-o: Path to the output directory for prediction results and logs
Additional Options
--n_cpusor-n: Number of CPUs to use for parallel processing. Default is 1
- Multiple CPUs are used during processing for POD5 file and BAM data (Step 1/5). Increasing this number speeds up POD5 and BAM processing. For inference (Step 4/5), only a single CPU is used no matter how many CPUs you have specified. For faster inference, have a GPU available (it will be detected automatically) and set dtype to
float16
--dtypeor-d: Data type for model input. Valid values arefloat16,float32, orfloat64. Default isfloat16
- Without a GPU, use
float32orfloat64for better performance. If you have a GPU, then usefloat16for faster inference
--batch_sizeor-bs: Batch size for model inference. Default is128
- Larger batch sizes can speed up inference but require more memory. If the code crashes during step 4/5, you have probably set too high a batch size.
-
--plot-signal/--no-plot-signal: Whether to plot extracted cap signal or not. Default is--no-plot-signal -
--custom_model_pathor-m: Path to a custom model (.keras) file. If not provided, the default pre-packaged model will be used.
- Saving plots can help you plot the read's signal, and plot the signal for cap and flanking bases(±5).
--debug/--no-debug: Enable debug mode for more detailed logging. Default is--no-debug
- The option can prints which function is creating a particular log output. This is helpful during code debugging.
--refresh-cache/--no-refresh-cache: Refresh the cache for intermediate results. Default is--no-refresh-cache
- If you input data has changed (for example you added one more POD5 file in your POD5 directory) then you must use
--refresh-cacheto compute all steps again and not load them from cache that hold results from your previous run.
--help: Show the help message and exit
Example
Tips
-
CPU Usage:
- Increase
--n_cpusfor faster processing of POD5 and BAM data - CPU count doesn't affect inference speed (Step 4/5)
- Increase
-
GPU Acceleration:
- If you have a GPU, use
--dtype float16for faster inference - Without a GPU,
float32orfloat64may perform better
- If you have a GPU, use
-
Batch Size:
- Larger batch sizes can speed up inference but require more memory
- Adjust
--batch_sizebased on your system's capabilities
-
Plotting:
- Use
--no-plot-signalto skip signal plotting for faster processing
- Use
-
Debugging:
- Enable
--debugfor detailed logging when troubleshooting
- Enable
-
Caching:
- Use
--refresh-cacheif you've made changes to input data and need to regenerate intermediate results
- Use
For more detailed information, run capfinder predict-cap-types --help.