This function implements linear 1D convolution using the overlap-and-add method. It is fully optimized, and the main loop avoids memory allocation. The function automatically computes the best DFT window for performance. It supports three output modes: Full, Same, and Valid, which align with MATLAB's conv() function. The package also includes a frequency-domain implementation and performance comparisons with two other methods.