ac6-training, un département d'Ac6 SAS
 
Site displayed in English (USA)
Site affiché en English (USA)View the site in FrenchVoir le site en English (GB)
go-up

leftthintrapezium-20-0bb472 ac6 > ac6-training > Processors > NXP Power > P2020 QorIQ implementation Inquire Download as PDF Call us Write us

FCQ2 P2020 QorIQ implementation

This course covers NXP QorIQ P2010 and P2020

Objectives
  • The course clarifies the architecture of the P20X0, particularly the operation of the coherency module that interconnects the e500s to memory and high-speed interfaces.
  • Cache coherency protocol is introduced in increasing depth.
  • The e500 core is viewed in detail, especially the SPE unit that enable vector processing.
  • The boot sequence and the clocking are explained.
  • The course focuses on the hardware implementation of the P20X0.
  • A long introduction to DDR SDRAM operation is done before studying the DDR2/3 SDRAM controller.
  • An in-depth description of the RapidIO port and the PCI-Express port is done.
  • The course explains how to implement QoS on GigaEthernet controllers.

  • ACSYS has developed an optimized SPE based FFT coded in assembler language.
  • Performance for 1024 complex floating point single precision samples is:
    • - 91_386 core clock cycles without reverse ordering, 94_124 with reverse ordering
  • Performance for 4096 complex floating point single precision samples is:
    • - 470_778 core clock cycles without reverse ordering, 511_227 with reverse ordering
  • For any information contact training@ac6-training.com
A more detailed course description is available on request at training@ac6-training.com

Related courses

Course IS2 - eMMC 5.0Course IC4 - PCI Express 3.0Course IC5 - RapidIO 3.0Course N1 - Ethernet and switchingCourse IP2 - USB 2.0
  • Experience of a 32-bit processor or DSP is mandatory.
  • Knowledge of RapidIO and PCI Express is recommended.
  • Theoretical course
    • PDF course material (in English) supplemented by a printed version for face-to-face courses.
    • Online courses are dispensed using the Teams video-conferencing system.
    • The trainer answers trainees' questions during the training and provide technical and pedagogical assistance.
  • At the start of each session the trainer will interact with the trainees to ensure the course fits their expectations and correct if needed
  • Any embedded systems engineer or technician with the above prerequisites.
  • The prerequisites indicated above are assessed before the training by the technical supervision of the traineein his company, or by the trainee himself in the exceptional case of an individual trainee.
  • Trainee progress is assessed by quizzes offered at the end of various sections to verify that the trainees have assimilated the points presented
  • At the end of the training, each trainee receives a certificate attesting that they have successfully completed the course.
    • In the event of a problem, discovered during the course, due to a lack of prerequisites by the trainee a different or additional training is offered to them, generally to reinforce their prerequisites,in agreement with their company manager if applicable.

Course Outline

  • Internal data flows, OCEAN switch fabric, packet reordering
  • Implementation examples
  • Address map, ATMU, OCEAN configuration
  • Local vs external address spaces, inbound and outbound address decoding
  • Accessing memory-mapped registers from external master
  • Dual-issue superscalar control, out-of-order execution
  • Execution units : 2 simple Integer Units + 1 Complex Integer Unit
  • Dynamic branch prediction using a 128-set 4-way set associative Branch Target Buffer
  • Execution timing, rename register operation, instruction serialization
  • The Core Complex Bus : high speed on-chip local bus with data tagging
  • The LMQ, the store queue, the castout queue
  • Store miss merging and store gathering
  • Memory access ordering
  • Lock acquisition and import barriers
  • The first level MMU and the second level MMU, consistency between L1 and L2 TLBs
  • Snooping of TLBs
  • TLB software reload, page attributes WIMGE
  • Process protection, variable number of PID registers and sharing
  • MMU implementation in real-time sensitive applications
  • The L1 caches, PLRU replacement algorithm, 8-way set associativity, cache block and unlock APU
  • Level 2 cache, partition into L2 cache plus SRAM
  • Allocation of data transferred by external masters into the cache: stashing
  • Snooping mechanism, stashing mechanism
  • L2 cache locking
  • Differences between the new Book E architecture and the classic PowerPC architecture
  • Floating Point units, Double-Precision FP
  • Signal Processing APU (SPU) : implementation of the SIMD capability without using a separate unit
  • PowerPC EABI : sections, C-to-assembly interface
  • Book E exception handling
  • Critical versus non critical
  • Handler table
  • Exception nesting, recoverability from interrupt
  • Core timers : Decrementer, Time Base, Fixed Interval Timer and Software Watchdog
  • Performance monitoring, counting of events
  • JTAG emulation, real time trace when the e500 core executes cached instructions
  • Watchpoint logic, triggering capabilities based on user programmable events
  • Platform clock
  • Voltage configuration selection
  • Power-on reset sequence, using the I2C interface to access serial ROM
  • Boot page translation
  • eSDHC boot
  • eSPI boot ROM
  • I/O arbiter
  • CCB arbiter
  • Transaction queue
  • CCB interface
  • DDR2 and DDR3 Jedec specification
  • On-Die termination
  • Mode registers initialization, bank selection and precharge
  • Command truth table
  • Bank activation, read, write and precharge timing diagrams, page mode
  • Introduction to the DDR-SDRAM controller
  • Initial configuration following Power-on-Reset
  • Timing parameters programming
  • Initialization routine
  • Multiplexed or non-multiplexed address and data buses
  • Dynamic bus sizing
  • GPCM, UPMs states machines
  • Flask Control Machine
  • NAND flash controller
  • Message Unit, direct vs chaining mode operation
  • RapidIO doorbell and port-write unit
  • Accessing configuration registers via RapidIO packets
  • Programming inbound and outbound ATMUs
  • Error handling
  • 8-lane PCI Express interface
  • Modes of operation, Root Complex / Endpoint
  • Transaction ordering rules
  • Programming inbound and outbound ATMUs
  • Configuration, initialization
  • PIC in multiple-processor implementation
  • Interrupt sources : external interrupts, internal interrupts, message interrupts
  • Integrated timers
  • Interprocessor interrupts
  • Per-CPU register usage, message registers
  • Nesting implementation
  • Priority between the 4 channels
  • Support for cascading descriptor chains
  • Scatter / gathering
  • Selectable hardware enforced coherency
  • Event counting
  • Threshold events
  • Chaining, triggering
  • Watchpoint facility
  • Trace buffer
  • Address recognition, pattern matching
  • Buffer descriptors management
  • Physical interfaces : GMII, MII, TBI, RGMII, SGMII
  • Buffer descriptor management
  • Layer 2 acceleration accept or reject on address or pattern match
  • 256-entry hash table for unicast and multicast
  • Management of VLAN tags and priority, VLAN insertion and deletion
  • Quality of service, managing several transmit and receive queues
  • TCP/IP offload engine, filer programming
  • IEEE1588 compliant time-stamping
  • Storing and executing commands targeting the external card
  • Multi-block transfers
  • Moving data by using the dedicated DMA controller
  • Dividing large data transfers
  • Card insertion and removal detection
  • Dual-role (DR) operation
  • EHCI implementation
  • ULPI interfaces to the transceiver
  • OTG support
  • Dedicated DMA channels
  • Endpoints configuration
  • Overview of the encryption mechanism
  • Introduction to DES and 3DES algorithms
  • Data packet descriptors
  • Crypto channels
  • XOR acceleration
  • Description of the NS16552 compliant Uarts
  • I2C controller
  • Enhanced SPI, transmit and receive sequences