Discovery

共同研究先：Fudan UniversityAcademic 共同研究数 6

Article　2020　IEEE : Institute of Electrical and Electronics Engineers

A Fast QTMT Partition Decision Strategy for VVC Intra Prediction

VVC内部予測における高速QTMT分割決定法

Yibo Fan, Jun'An Chen, Heming Sun, Jiro Katto, Ming'E Jing
IEEE Access
【抄録】Different from the traditional quaternary tree (QT) structure utilized in the previous generation video coding standard H.265/HEVC, a brand new partition structure named quadtree with nested multi-type tree (QTMT) is applied in the latest codec H.266/VVC. The introduction of QTMT brings in superior encoding performance at the cost of great time-consuming. Therefore, a fast intra partition algorithm based on variance and Sobel operator is proposed in this paper. The proposed method settles the novel asymmetrical partition issue in VVC by well balancing the reduction of computational complexity and the loss of encoding quality. To be more concrete, we first terminate further splitting of a coding unit (CU) when the texture of it is judged as smooth. Then, we use Sobel operator to extract gradient features to decide whether to split this CU by QT, thus terminating further MT partitions. Finally, a completely novel method to choose only one partition from five QTMT partitions is applied. Obviously, homogeneous area tends to use a larger CU as a whole to do prediction while CUs with complicated texture are prone to be divided into small sub-CUs and these sub-CUs usually have different textures from each other. We calculate the variance of variance of each sub-CU to decide which partition will distinguish the sub-textures best. Our method is embedded into the latest VVC official reference software VTM-7.0. Comparing to anchor VTM-7.0, our method saves the encoding time by 49.27% on average at the cost of only 1.63% BDBR increase. As a traditional scheme based on variance and gradient to decrease the computational complexity in VVC intra coding, our method outperforms other relative existing state-of-the-art methods, including traditional machine learning and convolution neural network methods. © 2013 IEEE.
【抄録日本語訳】前世代の映像符号化規格H.265／HEVCで採用されていた従来の四分木（QT）構造とは異なり、最新のコーデックH.266／VVCではQTMT（Quadtree with nested multi-type tree）という全く新しい分割構造を適用しています。QTMTの導入は、優れた符号化性能をもたらしますが、その代償として多大な時間がかかります。そこで、本論文では、分散とSobelオペレータに基づく高速なイントラ分割アルゴリズムを提案する。提案手法は、計算量の削減と符号化品質の低下をうまくバランスさせることで、VVCにおける新しい非対称分割の問題を解決する。具体的には、まず、符号化単位（CU）のテクスチャが滑らかであると判断された時点で、それ以上の分割を中止する。次に、このCUをQTで分割するかどうかを決めるために、Sobel演算子で勾配特徴を抽出し、それ以降のMT分割を終了させる。最後に、5つのQTMT分割から1つの分割のみを選択する全く新しい手法を適用する。明らかに、均質な領域は全体として大きなCUを使用して予測を行う傾向がある一方、複雑なテクスチャを持つCUは小さなサブCUに分割される傾向があり、これらのサブCUは通常互いに異なるテクスチャを持っています。そこで、各サブCUの分散を計算し、どの分割が最もサブテクスチャを区別しやすいかを決定します。本手法は、最新のVVC公式リファレンスソフトVTM-7.0に組み込まれている。アンカーVTM-7.0と比較すると、本手法はBDBRを1.63%だけ増加させる代償として、平均49.27%の符号化時間の短縮を実現する。VVCイントラ符号化における計算量を減らすための分散と勾配に基づく従来の方式として、我々の方法は、従来の機械学習や畳み込みニューラルネットワーク方式など、相対する既存の最先端方式を凌駕するものである。© 2013 IEEE.

Conference Paper　2020 6　IEEE : Institute of Electrical and Electronics Engineers

An image compression framework with learning-based filter

学習型フィルタを用いた画像圧縮フレームワーク

Heming Sun, Chao Liu, Jiro Katto, Yibo Fan
【抄録】In this paper, a coding framework VIP-ICT-Codec is introduced. Our method is based on the VTM (Versatile Video Coding Test Model). First, we propose a color space conversion from RGB to YUV domain by using a PCA-like operation. A method for the PCA mean calculation is proposed to de-correlate the residual components of YUV channels. Besides, the correlation of UV components is compensated considering that they share the same coding tree in VVC. We also learn a residual mapping to alleviate the over-filtered and under-filtered problem of specific images. Finally, we regard the rate control as an unconstraint Lagrangian problem to reach the target bpp. The results show that we achieve 32.625dB at the validation phase. © 2020 IEEE.
【抄録日本語訳】本論文では、符号化フレームワークVIP-ICT-Codecを紹介する。本方式は、VTM (Versatile Video Coding Test Model)に基づいている。まず、PCA的な操作により、RGBからYUV領域への色空間変換を提案する。YUVチャンネルの残留成分を非相関化するために、PCA平均演算の方法を提案する。また、VVCにおいてUV成分は同じ符号化木を共有していることを考慮し、UV成分の相関を補正する。また、特定の画像の過フィルタリングと過少フィルタリングの問題を緩和するために、残差マッピングを学習する。最後に、レート制御を目標bppに到達するための無制約ラグランジュ問題と見なす。その結果、検証段階で32.625dBを達成することができた。© 2020 IEEE.

Conference Paper　2020 7　IEEE : Institute of Electrical and Electronics Engineers

A learning-based low complexity in-loop filter for video coding

動画符号化のための学習型低複雑度インループフィルタ

Chao Liu, Heming Sun, Jiro Katto, Xiaoyang Zeng, Yibo Fan
【抄録】With the continuous development of mobile devices, it becomes possible for people to demand higher definition videos. To alleviate the pressure of deploying the video codec in mobile multimedia, a learning-based low complexity in-loop filter is proposed in this paper. Depthwise separable convolution is combined with batch normalization to construct this model. To enhance its performance, the knowledge from a pre-trained teacher model is transferred to it. However, the over-smoothing problem in the inter frames caused by double enhancing effect remains. To solve this, a Wiener-based filtering algorithm that tries to restore the distortion from the learned residuals is designed and introduces an adequate filtering effect. The experimental results show that our proposed methods achieve considerable BD-rate reduction than HEVC anchor. Compared with the previous state-of-the-art work VR-CNN, our model achieves 1.65% extra BD-rate reduction, 79.1% decrease in FLOPs, 25% decrease in encoding complexity, and 70% decoding complexity decrease. © 2020 IEEE.
【抄録日本語訳】モバイル機器の継続的な発展により、人々はより高精細な映像を要求することが可能になる。モバイルマルチメディアにおけるビデオコーデックの展開の圧力を軽減するために、本論文では、学習ベースの低複雑度インループフィルタが提案されている。深さ方向に分離可能な畳み込みとバッチ正規化を組み合わせて、このモデルを構築する。その性能を向上させるために、事前に学習させた教師モデルからの知識をこのモデルに転送する。しかし，二重強調効果に起因するフレーム間における過度な平滑化の問題が残っている．これを解決するために、学習した残差から歪みを復元しようとするWienerベースのフィルタリングアルゴリズムを設計し、適切なフィルタリング効果を導入する。実験結果より、提案手法はHEVCアンカーよりもかなりのBDレート低減を達成することがわかった。従来の最先端作品であるVR-CNNと比較して、本モデルは1.65%の余分なBD-rateの削減、79.1%のFLOPsの減少、25%の符号化複雑度の減少、70%の復号化複雑度の減少を達成することができた。© 2020 IEEE.

Article　2020 9　IEEE : Institute of Electrical and Electronics Engineers

A Pipelined 2D Transform Architecture Supporting Mixed Block Sizes for the VVC Standard

VVC 規格のブロックサイズ混在をサポートするパイプライン型 2 次元変換アーキテクチャ

Yibo Fan, Yixuan Zeng, Heming Sun, Jiro Katto, Xiaoyang Zeng
IEEE Transactions on Circuits and Systems for Video Technology
【抄録】For the next-generation video coding standard Versatile Video Coding (VVC), several new contributions have been proposed to improve the coding efficiency, especially in the transformation operations. This paper proposes a unified 32× 32 block-based transform architecture for the VVC standard that enables 2D Discrete Sine Transform-VII (DST-VII) and Discrete Cosine Transform-VIII (DCT-VIII) of all sizes. It mainly gives three contributions: 1) The N-Dimensional Reduced Adder Graph (RAG-n) algorithm is adopted to design the minimal adder-oriented computational units. 2) The storage of the asymmetric transform units can be realized in the dual-port SRAM-based transpose memory. 3) The pipelined 2D transformations of mixed block sizes are achieved with the throughput rate of 32 samples per cycle. The synthesis results indicate that this architecture can reduce area by up to 73.1% compared with other state-of-the-art works. Moreover, power saving ranging from 4.9% to 9.9% can be achieved. Regarding the transpose memory, at least 21.9% of the area can be saved by using SRAM. © 1991-2012 IEEE.
【抄録日本語訳】次世代映像符号化規格Versatile Video Coding（VVC）では、符号化効率、特に変換演算を改善するために、いくつかの新しい貢献が提案されています。本論文では、VVC規格のために、あらゆるサイズの2次元離散正弦変換-VII（DST-VII）および離散コサイン変換-VIII（DCT-VIII）を可能にする32×32ブロックベースの統一変換アーキテクチャを提案する。主に3つの貢献をしている。1) N次元RAG-n(Reduced Adder Graph)アルゴリズムを採用し、最小限の加算器指向の計算ユニットを設計している。2) 非対称変換ユニットのストレージは、デュアルポートSRAMベースのトランスポーズメモリで実現可能である。3) ブロックサイズが混在する2次元変換を32サンプル/サイクルのスループットでパイプライン処理することができる。合成の結果、本アーキテクチャは他の最先端技術と比較して、最大で73.1%の面積削減が可能であることが示されました。また、4.9%から9.9%の消費電力の削減を達成することができました。また、転置メモリについては、SRAMを用いることにより、少なくとも21.9%の面積を削減することができます。© 1991-2012 ieee.

Conference Paper　2019 11　IEEE : Institute of Electrical and Electronics Engineers

Dual Learning-based Video Coding with Inception Dense Blocks

インセプション・デンス・ブロックを用いたデュアル・ラーニングに基づく映像符号化

Chao Liu, Heming Sun, Jun'An Chen, Zhengxue Cheng, Masaru Takeuchi, Jiro Katto, Xiaoyang Zeng, Yibo Fan
【抄録】In this paper, a dual learning-based method in intra coding is introduced for PCS Grand Challenge. This method is mainly composed of two parts: intra prediction and reconstruction filtering. They use different network structures, the neural network-based intra prediction uses the full-connected network to predict the block while the neural network-based reconstruction filtering utilizes the convolutional networks. Different with the previous filtering works, we use a network with more powerful feature extraction capabilities in our reconstruction filtering network. And the filtering unit is the block-level so as to achieve a more accurate filtering compensation. To our best knowledge, among all the learning-based methods, this is the first attempt to combine two different networks in one application, and we achieve the state-of-the-art performance for AI configuration on the HEVC Test sequences. The experimental result shows that our method leads to significant BD-rate saving for provided 8 sequences compared to HM-16.20 baseline (average 10.24% and 3.57% bitrate reductions for all-intra and random-access coding, respectively). For HEVC test sequences, our model also achieved a 9.70% BD-rate saving compared to HM-16.20 baseline for allintra configuration. © 2019 IEEE.
【抄録日本語訳】本論文では、PCSグランドチャレンジのために、イントラ符号化における二重学習ベースの手法を紹介する。本手法は主にイントラ予測と再構成フィルタリングの2つの部分から構成されています。ニューラルネットワークによるイントラ予測では、全結合ネットワークを用いてブロックを予測し、ニューラルネットワークによる再構成フィルタリングでは、畳み込みネットワークを用いてブロックを予測するという、異なるネットワーク構造を用いています。私たちは、従来のフィルタリングとは異なり、より強力な特徴抽出機能を持つネットワークを再構成フィルタリングネットワークに採用しています。また、より正確なフィルタリング補正を実現するために、フィルタリング単位をブロックレベルとしています。我々の知る限り、学習ベースの手法の中で、2つの異なるネットワークを1つのアプリケーションで組み合わせる初の試みであり、HEVCテストシーケンスに対するAI構成で最先端の性能を達成しました。実験の結果、本手法はHM-16.20ベースラインと比較して、提供8シーケンスで大幅なビットレート削減を実現した（全イントラ符号化とランダムアクセス符号化でそれぞれ平均10.24%と3.57%のビットレート削減）。また、HEVCテストシーケンスについても、all-intra構成でHM-16.20ベースラインと比較して9.70%のBDレート削減を達成しました。© 2019 IEEE.

Conference Paper　2019 12　IEEE : Institute of Electrical and Electronics Engineers

Fast QTMT Partition Decision Algorithm in VVC Intra Coding based on Variance and Gradient

VVC 内符号化における分散と勾配に基づく高速 QTMT 分割決定アルゴリズム

Junan Chen, Heming Sun, Jiro Katto, Xiaoyang Zeng, Yibo Fan
【抄録】Quadtree with nested multi-Type tree (QTMT) partition structure in Versatile Video Coding (VVC) contributes to superior encoding performance compared to the basic quad-Tree (QT) structure in High Efficiency Video Coding (HEVC). However, the improvement of performance leads to an un-Avoidable increase of computational complexity. To achieve a balance between coding efficiency and compression quality, we propose a fast intra partition algorithm based on variance and gradient to solve the rectangular partition problem in VVC. First, further splitting of smooth areas is terminated. Then, QT partition is chosen depending on the gradient features extracted by Sobel operator. Finally, one partition from five possible QTMT partitions is directly chosen by computing the variance of variance of sub-CUs. The theoretical basis of our method is that a homogeneous area tends to be predicted with a larger coding unit (CU), and sub-parts of a split CU are prone to have different textures from each other. To our knowledge, this is the first attempt to apply traditional method to accelerating the rectangular partition problem in VVC intra prediction. Experimental results show that the proposed method can save averagely 53.17% encoding time with only 1.62% BDBR increase and 0.09dB BDPSNR loss compared to anchor VTM4.0. © 2019 IEEE.
【抄録日本語訳】VVC（Versatile Video Coding）のQTMT（Quadtree with nested multi-Type tree）分割構造は、HEVC（High Efficiency Video Coding）の基本QT（Quadtree）構造と比較して、優れた符号化性能に貢献する。しかし、性能の向上は計算量の増加を避けることができません。そこで、符号化効率と圧縮品質のバランスをとるために、分散と勾配に基づく高速なイントラ分割アルゴリズムを提案し、VVCの矩形分割問題を解決する。まず、滑らかな領域のさらなる分割を終了させる。次に、Sobel演算子によって抽出された勾配特徴量に応じてQT分割を選択する。最後に、サブCUの分散の計算により、5つの可能なQTMT分割から1つの分割を直接選択する。本手法の理論的根拠は、均質な領域はより大きな符号化単位（CU）で予測される傾向があり、分割されたCUのサブパーツは互いに異なる質感を持ちやすいというものである。我々の知る限り、VVCイントラ予測における矩形分割問題の高速化に伝統的手法を適用した初の試みである。実験の結果、提案手法はアンカーVTM4.0と比較して、1.62%のBDBR増加と0.09dBのBDPSNR損失で平均53.17%の符号化時間の短縮が可能であることが示されました。© 2019 IEEE.