3. Cap Prediction
Capfinder provides a command-line interface to predict RNA cap types using BAM and POD5 files. Here's how to use the predict-cap-types
function:
Usage
Description
This command predicts RNA cap types using BAM and POD5 files.
Required Options
-
--bam_filepath
or-b
: Path to the BAM file generated using the preprocessing step -
--pod5_dir
or-p
: Path to directory containing POD5 files -
--output_dir
or-o
: Path to the output directory for prediction results and logs
Additional Options
--n_cpus
or-n
: Number of CPUs to use for parallel processing. Default is 1
- Multiple CPUs are used during processing for POD5 file and BAM data (Step 1/5). Increasing this number speeds up POD5 and BAM processing. For inference (Step 4/5), only a single CPU is used no matter how many CPUs you have specified. For faster inference, have a GPU available (it will be detected automatically) and set dtype to
float16
--dtype
or-d
: Data type for model input. Valid values arefloat16
,float32
, orfloat64
. Default isfloat16
- Without a GPU, use
float32
orfloat64
for better performance. If you have a GPU, then usefloat16
for faster inference
--batch_size
or-bs
: Batch size for model inference. Default is128
- Larger batch sizes can speed up inference but require more memory. If the code crashes during step 4/5, you have probably set too high a batch size.
-
--plot-signal
/--no-plot-signal
: Whether to plot extracted cap signal or not. Default is--no-plot-signal
-
--custom_model_path
or-m
: Path to a custom model (.keras) file. If not provided, the default pre-packaged model will be used.
- Saving plots can help you plot the read's signal, and plot the signal for cap and flanking bases(±5).
--debug
/--no-debug
: Enable debug mode for more detailed logging. Default is--no-debug
- The option can prints which function is creating a particular log output. This is helpful during code debugging.
--refresh-cache
/--no-refresh-cache
: Refresh the cache for intermediate results. Default is--no-refresh-cache
- If you input data has changed (for example you added one more POD5 file in your POD5 directory) then you must use
--refresh-cache
to compute all steps again and not load them from cache that hold results from your previous run.
--help
: Show the help message and exit
Example
Tips
-
CPU Usage:
- Increase
--n_cpus
for faster processing of POD5 and BAM data - CPU count doesn't affect inference speed (Step 4/5)
- Increase
-
GPU Acceleration:
- If you have a GPU, use
--dtype float16
for faster inference - Without a GPU,
float32
orfloat64
may perform better
- If you have a GPU, use
-
Batch Size:
- Larger batch sizes can speed up inference but require more memory
- Adjust
--batch_size
based on your system's capabilities
-
Plotting:
- Use
--no-plot-signal
to skip signal plotting for faster processing
- Use
-
Debugging:
- Enable
--debug
for detailed logging when troubleshooting
- Enable
-
Caching:
- Use
--refresh-cache
if you've made changes to input data and need to regenerate intermediate results
- Use
For more detailed information, run capfinder predict-cap-types --help
.