Digital Signal Processing typically involves repetitive computations being performed on streams of input data, subject to constraints such as sampling rate or desired throughput. Often such systems need to be implemented under tight constraints on factors such as timing, resources, power or cost. When they are used in embedded systems, it is often worth the effort to design custom architectures that have much better cost tradeoffs than general purpose computing architectures. This course deals with the analysis of such algorithms, and mapping them to architectures that are either custom designed or have specific extensions that make them better suited to certain kinds of operations. Topics covered include fundamental bounds on performance, mapping to dedicated and custom resource shared architectures, and techniques for automating the process of scheduling. Aspects of architectures such as memory access, shared buses, and memory mapped accelerators will be studied. Assignments will cover various aspects of the design process, starting from implementing and testing specifications, to synthesis and scheduling using high level synthesis tools, and analyzing and improving the resulting architectures. INTENDED AUDIENCE: Students interested in hardware (VLSI / FPGA) implementations of DSP systems; also useful for those using custom parallel architectures (GPU) PREREQUISITES: Digital Design fundamentals (UG) - Digital Signal Processing (UG) - Processor architecture (UG)