Xilinx 常用模块汇总(verilog)【03】

作者:桂。

时间:2018-05-10  2018-05-10  21:03:44

链接:http://www.cnblogs.com/xingshansi/p/9021919.html


前言

主要记录常用的基本模块。

一、模块汇总

  • 17- 自相关操作xcorr

实现思路主要参考:工程应用中的自相关操作,根据推导可以看出,自相关操作涉及的基本操作有:复数相乘、递归【自回归,IIR等都需要该操作】。

涉及到具体模块:1)10- 共轭复数相乘模块; 2)11- 复数延拍模块; 3)递归 —> 1- 3个数组合加、减运算(分I\Q分别调用)。

路径:印象笔记:0019/015

MATLAB仿真(实际操作中,N除不除->移位,差别不大):

clc;clear all;close all;
fs = 960e6;
f0 = 20e6;
t = [0:255]/fs;
sig = [zeros(1,384),exp(1j*(2*pi*t*f0+pi/4)),zeros(1,384)];
sig = awgn(sig,5);
%自相关
N = 64;
k = 1;
R = zeros(1,length(sig));
for i = N+1:length(sig)-k
R(i) = R(i-k) + 1/N * (sig(i)*conj(sig(i+k)) - sig(i-N)*conj(sig(i-N+k)));
end
figure()
subplot 211
plot(abs(sig)/max(abs(sig)));
subplot 212
plot(abs(R)/max(abs(R)));

  Xilinx 常用模块汇总(verilog)【03】

verilog【给出完整分析、实现、仿真过程】

不失一般性,以k=1为例,

理论分析

Xilinx 常用模块汇总(verilog)【03】

用到的基本模块:1)10- 共轭相乘;2)11- 复数延拍;3)1- 3路加减运算

硬件实现

其中延拍,也可调用原语:SRL(移位链 )+ FDRE的思路:

//Shift:SRL[A3A2A1A0] + 2
parameter width = ; //data width
genvar ii; generate
for(ii = ; ii < width; ii++)
begin:delay // (* HLUTNM = ii *)
SRL16E #(
.INIT('h0000) // Initial Value of Shift Register
) SRL16E_u1 (
.Q(out_cache[ii]), // SRL data output
.A0('b0), // Select[0] input
.A1('b1), // Select[1] input
.A2('b1), // Select[2] input
.A3('b0), // Select[3] input
.CE('b1), // Clock enable input
.CLK(clk), // Clock input
.D(ddata[ii]) // SRL data input
); FDRE #(
.INIT('b0) // Initial value of register (1'b0 or 'b1)
) FDRE_u1 (
.Q(out[ii]), // 1-bit Data output
.C(clk), // 1-bit Clock input
.CE('b1), // 1-bit Clock enable input
.R(sclr), // 1-bit Synchronous reset input
.D(out_cache[ii]) // 1-bit Data input
);
end
endgenerate

细节可参考scm.pdf /hdl.pdf:

Xilinx 常用模块汇总(verilog)【03】

功能仿真

1)生成仿真数据:

clc;clear all;close all;
fs = 960e6;
f0 = 20e6;
t = [0:255]/fs;
sig = [zeros(1,384),sin(2*pi*t*f0+pi/4),zeros(1,384)];
sig = awgn(sig,10);
L = length(sig);
N = 17;
t0 = [0:L-1]/fs;
y_n = round(sig*(2^(N-2)-1));
%write data
fid = fopen('sig.txt','w');
for k=1:length(y_n)
B_s=dec2bin(y_n(k)+((y_n(k))<0)*2^N,N);
for j=1:N
if B_s(j)=='1'
tb=1;
else
tb=0;
end
fprintf(fid,'%d',tb);
end
fprintf(fid,'\r\n');
end fprintf(fid,';');
fclose(fid);

  2)仿真结果:

MATLAB测试数据:

Xilinx 常用模块汇总(verilog)【03】

VIVADO输出结果:

  Xilinx 常用模块汇总(verilog)【03】

从仿真结果可以看出,二者完全一致,Xcorr模块有效。【X点自相关,仅需要修改共轭相乘结果的延迟拍数即可,这便实现了参数化。】

代码路径:印象笔记0019/015Xcorr

  • 18- 数据速率转化模块

路径:印象笔记-1/0019/017

跨时钟域,对于moni-bit,可用打2拍的思路,即定义两级寄存器,以降低亚稳态概率;对于multi-bits,目前跨时钟域的数据传输,常用的两个基本思路是:1)异步FIFO;2)异步 dual port RAM 。(ug473.pdf)

1)FIFO(first in first out)

Xilinx 常用模块汇总(verilog)【03】

更多细节可参考:印象笔记-3-FPGA/024-FIFO核使用,以及博文:基础003_V7-Memory Resources

要点1:最小深度计算

要点2:数据传输转换关系:输入的数据量理论上需要小于等于输出的数据量。

FIFO多与高速接口配合使用。

2)Dual port RAM

关于IP核的使用,可参考:印象笔记-3/FPGA/025-双端口RAM参数设置,双端口RAM与FIFO最大的区别在于存在读、写地址,如果数据存在偏移,可以很方便地修正偏移量。

应用举例:现有两路数据存在偏移,希望将二者对齐,以Dual port RAM为例,调用XILINX 的 IP核,细节参考博文:DUAL PORT RAM应用实例

  • 19- 卷积模块(convolution)

路径:1/0019/019

分析:卷积的理论细节及多种实现思路,可参考博文:信号处理——卷积(convolution)的实现

这里只给出一种普适的思路,且仅考虑实数情况:对于复数域C,拆解为实、虚部即可。定义序列h:

Xilinx 常用模块汇总(verilog)【03】

对应卷积原理:

Xilinx 常用模块汇总(verilog)【03】

实现的思路是:1)直接延拍+寄存器,2)累加操作(累加长度等于最短序列长度,此处为M),即可完成卷积。

延拍 + 寄存器:

delay_all.sv

`timescale 1ns/1ps
/*
Function: DPRAM for data aligned
Author: Gui.
Data: 2018年5月14日16:31:09
*/
/*
等价于: dout <= {dout[Num-2:0],din}; // SliceM without rst
*/
module delay_all(din, clk, sclr, dout);
parameter datwidth = ;
parameter Num = ; input [datwidth-:] din;
input clk;
input sclr;
output [Num-:][datwidth-:] dout;
//
logic [Num-:][datwidth-:] dat; genvar ii;
generate
for (ii = ;ii < Num; ii++)
begin:delayall
delay #(
.datwidth(datwidth)
)
u1(
.din(dat[ii-]),
.clk(clk),
.sclr(sclr),
.dout(dat[ii])
);
end
endgenerate always @(posedge clk)
if(sclr)
begin
dat <= ;
end
else
begin
dat[] <= din;
end assign dout = dat; endmodule

delay.sv

`timescale 1ns/1ps
/*
Function: DPRAM for data aligned
Author: Gui.
Data: 2018年5月14日16:21:09
*/
module delay(din, clk, sclr, dout);
parameter datwidth = ;
input [datwidth-:] din;
input clk;
input sclr; output [datwidth-:] dout; wire [datwidth-:] q; genvar ii; generate
for(ii = ; ii < datwidth; ii++)
begin:delay
FDRE #(
.INIT('b0) // Initial value of register (1'b0 or 'b1)
) FDRE_u1 (
.Q(q[ii]), // 1-bit Data output
.C(clk), // 1-bit Clock input
.CE('b1), // 1-bit Clock enable input
.R(sclr), // 1-bit Synchronous reset input
.D(din[ii]) // 1-bit Data input
);
end
endgenerate endmodule

该延拍操作可等价为一个语句:

 dout <= {dout[Num-:],din};

不添加复位信号,则内部资源调用CLB中的SliceM。

累加操作,3加法器结合:

localparam ParaNum = (Num % )?(Num/ + ):(Num/);// parallel Number
localparam Actbits = ParaNum*;// Actually bits

该操作对于卷积系数较少的情况勉强适用,但对于阶数过大的情形、既耗内存、又增加群延迟。对于阶数较多的情形,可借助FIR调用DSP48E_05一文的实现思路。关于IP核的使用,对应文档::pg149-fir-compiler.pdf

  • 20- FIR滤波模块

原理同19- 卷积操作,完全一致,滤波器系数设计参考博文:fdatool的滤波器设计 以及 FIR特性及仿真实现_01

如果只是基本的滤波器实现,可以借助Fdatool -> targets -> generate HDL.

Xilinx 常用模块汇总(verilog)【03】

  • 21- Hilbert变换

原理同19- 卷积操作,完全一致,Hilbert工程实现上也可以借助FIR的设计思想,且设计工具也完全一致,不再具体描述。

目前主要的模块:

Xilinx 常用模块汇总(verilog)【03】

  • 22- IIR滤波模块

该功能目前使用较少,拆解下来:延拍 + 加减法,差别不大。同样可以直接用MATLAB生成verilog文件:

// -------------------------------------------------------------
//
// Module: filteriir
// Generated by MATLAB(R) 8.1 and the Filter Design HDL Coder 2.9.3.
// Generated on: 2018-05-15 13:03:46
// ------------------------------------------------------------- // -------------------------------------------------------------
// HDL Code Generation Options:
//
// Name: filteriir
// TargetLanguage: Verilog
// TestBenchStimulus: step ramp chirp
// GenerateHDLTestBench: off // Filter Specifications:
//
// Sampling Frequency : N/A (normalized frequency)
// Response : Lowpass
// Specification : N,F3dB
// Filter Order : 10
// 3-dB Point : 0.45
// ------------------------------------------------------------- // -------------------------------------------------------------
// HDL Implementation : Fully parallel
// Multipliers : 15
// Folding Factor : 1
// -------------------------------------------------------------
// Filter Settings:
//
// Discrete-Time IIR Filter (real)
// -------------------------------
// Filter Structure : Direct-Form II, Second-Order Sections
// Number of Sections : 5
// Stable : Yes
// Linear Phase : No
// ------------------------------------------------------------- `timescale ns / ns module filteriir
(
clk,
clk_enable,
reset,
filter_in,
filter_out
); input clk;
input clk_enable;
input reset;
input [:] filter_in; //double
output [:] filter_out; //double ////////////////////////////////////////////////////////////////
//Module Architecture: filteriir
////////////////////////////////////////////////////////////////
// Local Functions
// Type Definitions
// Constants
parameter scaleconst1 = 3.6533535137021494E-01; //double
parameter coeff_b1_section1 = 1.0000000000000000E+00; //double
parameter coeff_b2_section1 = 2.0000000000000000E+00; //double
parameter coeff_b3_section1 = 1.0000000000000000E+00; //double
parameter coeff_a2_section1 = -2.7099751179194387E-01; //double
parameter coeff_a3_section1 = 7.3233891727280376E-01; //double
parameter scaleconst2 = 2.9120577213381943E-01; //double
parameter coeff_b1_section2 = 1.0000000000000000E+00; //double
parameter coeff_b2_section2 = 2.0000000000000000E+00; //double
parameter coeff_b3_section2 = 1.0000000000000000E+00; //double
parameter coeff_a2_section2 = -2.1600986428424435E-01; //double
parameter coeff_a3_section2 = 3.8083295281952212E-01; //double
parameter scaleconst3 = 2.4834107896254057E-01; //double
parameter coeff_b1_section3 = 1.0000000000000000E+00; //double
parameter coeff_b2_section3 = 2.0000000000000000E+00; //double
parameter coeff_b3_section3 = 1.0000000000000000E+00; //double
parameter coeff_a2_section3 = -1.8421380307753588E-01; //double
parameter coeff_a3_section3 = 1.7757811892769818E-01; //double
parameter scaleconst4 = 2.2434814972791137E-01; //double
parameter coeff_b1_section4 = 1.0000000000000000E+00; //double
parameter coeff_b2_section4 = 2.0000000000000000E+00; //double
parameter coeff_b3_section4 = 1.0000000000000000E+00; //double
parameter coeff_a2_section4 = -1.6641639010121584E-01; //double
parameter coeff_a3_section4 = 6.3808989012861264E-02; //double
parameter scaleconst5 = 2.1350378853875973E-01; //double
parameter coeff_b1_section5 = 1.0000000000000000E+00; //double
parameter coeff_b2_section5 = 2.0000000000000000E+00; //double
parameter coeff_b3_section5 = 1.0000000000000000E+00; //double
parameter coeff_a2_section5 = -1.5837228791342844E-01; //double
parameter coeff_a3_section5 = 1.2387442068467328E-02; //double
// Signals
real input_register; // double
real scale1; // double
real scaletypeconvert1; // double
// Section 1 Signals
real a1sum1; // double
real a2sum1; // double
real b1sum1; // double
real b2sum1; // double
real delay_section1 [:] ; // double
real inputconv1; // double
real a2mul1; // double
real a3mul1; // double
real b1mul1; // double
real b2mul1; // double
real b3mul1; // double
real scale2; // double
real scaletypeconvert2; // double
// Section 2 Signals
real a1sum2; // double
real a2sum2; // double
real b1sum2; // double
real b2sum2; // double
real delay_section2 [:] ; // double
real inputconv2; // double
real a2mul2; // double
real a3mul2; // double
real b1mul2; // double
real b2mul2; // double
real b3mul2; // double
real scale3; // double
real scaletypeconvert3; // double
// Section 3 Signals
real a1sum3; // double
real a2sum3; // double
real b1sum3; // double
real b2sum3; // double
real delay_section3 [:] ; // double
real inputconv3; // double
real a2mul3; // double
real a3mul3; // double
real b1mul3; // double
real b2mul3; // double
real b3mul3; // double
real scale4; // double
real scaletypeconvert4; // double
// Section 4 Signals
real a1sum4; // double
real a2sum4; // double
real b1sum4; // double
real b2sum4; // double
real delay_section4 [:] ; // double
real inputconv4; // double
real a2mul4; // double
real a3mul4; // double
real b1mul4; // double
real b2mul4; // double
real b3mul4; // double
real scale5; // double
real scaletypeconvert5; // double
// Section 5 Signals
real a1sum5; // double
real a2sum5; // double
real b1sum5; // double
real b2sum5; // double
real delay_section5 [:] ; // double
real inputconv5; // double
real a2mul5; // double
real a3mul5; // double
real b1mul5; // double
real b2mul5; // double
real b3mul5; // double
real output_typeconvert; // double
real output_register; // double // Block Statements
always @ (posedge clk or posedge reset)
begin: input_reg_process
if (reset == 'b1) begin
input_register <= 0.0000000000000000E+00;
end
else begin
if (clk_enable == 'b1) begin
input_register <= $bitstoreal(filter_in);
end
end
end // input_reg_process always @* scale1 <= input_register * scaleconst1; always @* scaletypeconvert1 <= scale1; // ------------------ Section 1 ------------------ always @ (posedge clk or posedge reset)
begin: delay_process_section1
if (reset == 'b1) begin
delay_section1[] <= 0.0000000000000000E+00;
delay_section1[] <= 0.0000000000000000E+00;
end
else begin
if (clk_enable == 'b1) begin
delay_section1[] <= delay_section1[];
delay_section1[] <= a1sum1;
end
end
end // delay_process_section1 always @* inputconv1 <= scaletypeconvert1; always @* a2mul1 <= delay_section1[] * coeff_a2_section1; always @* a3mul1 <= delay_section1[] * coeff_a3_section1; always @* b1mul1 <= a1sum1; always @* b2mul1 <= delay_section1[] * coeff_b2_section1; always @* b3mul1 <= delay_section1[]; always @* a2sum1 <= inputconv1 - a2mul1; always @* a1sum1 <= a2sum1 - a3mul1; always @* b2sum1 <= b1mul1 + b2mul1; always @* b1sum1 <= b2sum1 + b3mul1; always @* scale2 <= b1sum1 * scaleconst2; always @* scaletypeconvert2 <= scale2; // ------------------ Section 2 ------------------ always @ (posedge clk or posedge reset)
begin: delay_process_section2
if (reset == 'b1) begin
delay_section2[] <= 0.0000000000000000E+00;
delay_section2[] <= 0.0000000000000000E+00;
end
else begin
if (clk_enable == 'b1) begin
delay_section2[] <= delay_section2[];
delay_section2[] <= a1sum2;
end
end
end // delay_process_section2 always @* inputconv2 <= scaletypeconvert2; always @* a2mul2 <= delay_section2[] * coeff_a2_section2; always @* a3mul2 <= delay_section2[] * coeff_a3_section2; always @* b1mul2 <= a1sum2; always @* b2mul2 <= delay_section2[] * coeff_b2_section2; always @* b3mul2 <= delay_section2[]; always @* a2sum2 <= inputconv2 - a2mul2; always @* a1sum2 <= a2sum2 - a3mul2; always @* b2sum2 <= b1mul2 + b2mul2; always @* b1sum2 <= b2sum2 + b3mul2; always @* scale3 <= b1sum2 * scaleconst3; always @* scaletypeconvert3 <= scale3; // ------------------ Section 3 ------------------ always @ (posedge clk or posedge reset)
begin: delay_process_section3
if (reset == 'b1) begin
delay_section3[] <= 0.0000000000000000E+00;
delay_section3[] <= 0.0000000000000000E+00;
end
else begin
if (clk_enable == 'b1) begin
delay_section3[] <= delay_section3[];
delay_section3[] <= a1sum3;
end
end
end // delay_process_section3 always @* inputconv3 <= scaletypeconvert3; always @* a2mul3 <= delay_section3[] * coeff_a2_section3; always @* a3mul3 <= delay_section3[] * coeff_a3_section3; always @* b1mul3 <= a1sum3; always @* b2mul3 <= delay_section3[] * coeff_b2_section3; always @* b3mul3 <= delay_section3[]; always @* a2sum3 <= inputconv3 - a2mul3; always @* a1sum3 <= a2sum3 - a3mul3; always @* b2sum3 <= b1mul3 + b2mul3; always @* b1sum3 <= b2sum3 + b3mul3; always @* scale4 <= b1sum3 * scaleconst4; always @* scaletypeconvert4 <= scale4; // ------------------ Section 4 ------------------ always @ (posedge clk or posedge reset)
begin: delay_process_section4
if (reset == 'b1) begin
delay_section4[] <= 0.0000000000000000E+00;
delay_section4[] <= 0.0000000000000000E+00;
end
else begin
if (clk_enable == 'b1) begin
delay_section4[] <= delay_section4[];
delay_section4[] <= a1sum4;
end
end
end // delay_process_section4 always @* inputconv4 <= scaletypeconvert4; always @* a2mul4 <= delay_section4[] * coeff_a2_section4; always @* a3mul4 <= delay_section4[] * coeff_a3_section4; always @* b1mul4 <= a1sum4; always @* b2mul4 <= delay_section4[] * coeff_b2_section4; always @* b3mul4 <= delay_section4[]; always @* a2sum4 <= inputconv4 - a2mul4; always @* a1sum4 <= a2sum4 - a3mul4; always @* b2sum4 <= b1mul4 + b2mul4; always @* b1sum4 <= b2sum4 + b3mul4; always @* scale5 <= b1sum4 * scaleconst5; always @* scaletypeconvert5 <= scale5; // ------------------ Section 5 ------------------ always @ (posedge clk or posedge reset)
begin: delay_process_section5
if (reset == 'b1) begin
delay_section5[] <= 0.0000000000000000E+00;
delay_section5[] <= 0.0000000000000000E+00;
end
else begin
if (clk_enable == 'b1) begin
delay_section5[] <= delay_section5[];
delay_section5[] <= a1sum5;
end
end
end // delay_process_section5 always @* inputconv5 <= scaletypeconvert5; always @* a2mul5 <= delay_section5[] * coeff_a2_section5; always @* a3mul5 <= delay_section5[] * coeff_a3_section5; always @* b1mul5 <= a1sum5; always @* b2mul5 <= delay_section5[] * coeff_b2_section5; always @* b3mul5 <= delay_section5[]; always @* a2sum5 <= inputconv5 - a2mul5; always @* a1sum5 <= a2sum5 - a3mul5; always @* b2sum5 <= b1mul5 + b2mul5; always @* b1sum5 <= b2sum5 + b3mul5; always @* output_typeconvert <= b1sum5; always @ (posedge clk or posedge reset)
begin: Output_Register_process
if (reset == 'b1) begin
output_register <= 0.0000000000000000E+00;
end
else begin
if (clk_enable == 'b1) begin
output_register <= output_typeconvert;
end
end
end // Output_Register_process // Assignment Statements
assign filter_out = $realtobits(output_register);
endmodule // filteriir
  • 23- 查找表和进位链(LUT CARRY,组合逻辑单元)

查找表分表是LUT2、LUT5、LUT6,一个基本例子:12bit或操作。

assign user_out = |datin[:];
//LUT
wire out, out1_14;
LUT6 #(
.INIT('hFFFFFFFFFFFFFFFE))
out1 (
.I0(datin[]),
.I1(datin[]),
.I2(datin[]),
.I3(datin[]),
.I4(datin[]),
.I5(datin[]),
.O(out)
); LUT6 #(
.INIT('hFFFFFFFFFFFFFFFE))
out2 (
.I0(datin[]),
.I1(datin[]),
.I2(datin[]),
.I3(datin[]),
.I4(datin[]),
.I5(datin[]),
.O(out1_14)
); LUT2 #(
.INIT('hE))
out3 (
.I0(out),
.I1(out1_14),
.O(user_out)
);

后续1:扩展矩阵运算、自适应处理模块

后续2:学习接口协议、通信网络协议

后续3:sysgen 学习模块搭建

后续4:HLS(选学)/microblaze

后续5:调研更加智能化地解决FPGA算法设计实现的工具,例如python好像就存在这样的工具包....

切记,兼顾论文阅读、多向前辈请教。

上一篇:WINCE 电池状态(C#)


下一篇:夺命雷公狗---DEDECMS----16dedecms取出首页今日更新