Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
多表代换Virginia加密算法及秘钥 破解算法的实现 方贤进 http://star.aust.edu.cn/~xjfang Email: [email protected] Virginia加密算法、解密算法 Virginia加密算法 • 假设语言的字符集为 Charset[26]={‘a’, ’b’, …, ’z’} 字符集大小=26 • 对应的字符编码为 Coding[26]={0, 1, …, 25} Virginia加密算法 • Virginia加密算法是对明文进行加密的 过程中依照密钥的指示轮流使用多个 单表代替密码。 • 设明文串为: M=m1m2…mn,mi∈charset, n是明文长度 • 秘钥为: K=k1k2…kd,ki∈charset, d是秘钥长度 • 密文为: C=c1c2…cn,ci∈charset, n是密文长度 Virginia加密算法 • 加密算法: cj+td=(mj+td+kj ) mod 26 j=1…d, t=0…ceiling(n/d)-1 其中ceiling(x)函数表示不小于x最小整数 • 解密算法: mj+td=(cj+td -kj ) mod 26 j=1…d, t=0…ceiling(n/d)-1 其中ceiling(x)函数表示不小于x最小整数 Virginia加密算法举例 m1 m2 m3 m4 o t h 明文M n (编码) (13) (14) (19) (7) m5 m6 m7 m8 m9 m10 m11 i n g i s t (8) (13) (6) (8) (18) (19) o (14) 秘钥K (编码) j o y j o y j o y (9) (14) (24) (9) (14) (24) (9) (14) (24) j (9) o (14) 密文C (编码) w c r q w l p w q (22) (2) (17) (16) (22) (11) (15) (22) (16) c (2) c (2) j=1 j=2 j=3 j=1 j=2 j=3 j=1 j=2 j=3 t=0 t=0 t=0 t=1 t=1 t=1 t=2 t=2 t=2 j=1 t=3 j=2 t=3 明文长度n=11,秘钥长度d=3, t=ceiling(11/3)-1=3 一个原始的明文文本 Differential Privacy is the state-of-the-art goal for the problem of privacy-preserving data release and privacy-preserving data mining. Existing techniques using differential privacy, however, cannot effectively handle the publication of high-dimensional data. In particular, when the input dataset contains a large number of attributes, existing methods incur higher computing complexity and lower information to noise ratio, which renders the published data next to useless. This proposal aims to reduce computing complexity and signal to noise ratio. The starting point is to approximate the full distribution of high-dimensional dataset with a set of low-dimensional marginal distributions via optimizing score function and reducing sensitivity, in which generation of noisy conditional distributions with differential privacy is computed in a set of low-dimensional subspaces, and then, the sample tuples from the noisy approximation distribution are used to generate and release the synthetic dataset. Some crucial science problems would be investigated below: (i) constructing a low k-degree Bayesian network over the high-dimensional dataset via exponential mechanism in differential privacy, where the score function is optimized to reduce the sensitivity using mutual information, equivalence classes in maximum joint distribution and dynamic programming; (ii)studying the algorithm to compute a set of noisy conditional distributions from joint distributions in the subspace of Bayesian network, via the Laplace mechanism of differential privacy. (iii)exploring how to generate synthetic data from the differentially private Bayesian network and conditional distributions, without explicitly materializing the noisy global distribution. The proposed solution may have theoretical and technical significance for synthetic data generation with differential privacy on business prospects. 经过预处理之后的明文文本 (只保留字符集中的字符) differentialprivacyisthestateoftheartgoalfortheproblemofprivacypreservingdatarelease andprivacypreservingdataminingexistingtechniquesusingdifferentialprivacyhoweverca nnoteffectivelyhandlethepublicationofhighdimensionaldatainparticularwhentheinputd atasetcontainsalargenumberofattributesexistingmethodsincurhighercomputingcomple xityandlowerinformationtonoiseratiowhichrendersthepublisheddatanexttouselessthisp roposalaimstoreducecomputingcomplexityandsignaltonoiseratiothestartingpointistoap proximatethefulldistributionofhighdimensionaldatasetwithasetoflowdimensionalmargi naldistributionsviaoptimizingscorefunctionandreducingsensitivityinwhichgenerationof noisyconditionaldistributionswithdifferentialprivacyiscomputedinasetoflowdimensiona lsubspacesandthenthesampletuplesfromthenoisyapproximationdistributionareusedto generateandreleasethesyntheticdatasetsomecrucialscienceproblemswouldbeinvestiga tedbelowiconstructingalowkdegreebayesiannetworkoverthehighdimensionaldatasetvi aexponentialmechanismindifferentialprivacywherethescorefunctionisoptimizedtoredu cethesensitivityusingmutualinformationequivalenceclassesinmaximumjointdistributio nanddynamicprogrammingiistudyingthealgorithmtocomputeasetofnoisyconditionaldis tributionsfromjointdistributionsinthesubspaceofbayesiannetworkviathelaplacemechan ismofdifferentialprivacyiiiexploringhowtogeneratesyntheticdatafromthedifferentiallypr ivatebayesiannetworkandconditionaldistributionswithoutexplicitlymaterializingthenois yglobaldistributiontheproposedsolutionmayhavetheoreticalandtechnicalsignificancefo rsyntheticdatagenerationwithdifferentialprivacyonbusinessprospects 经过virginia加密后的密文 加密秘钥key=infosec lvktwvgvgnodttqifqqmubujglevmbkhziczglcsphweyvwttwoqseshxenjsgaxejgwvxqalrsxczrqsswgiaid jmxipddjiumeawfkfigfaarkvtjlawvqalhwgjvvviwwwavsuvmhnrwsfxkiyufazcklmcoixmehofrqbrktwg vqijzqlcvqqsllgxhgzagcbvtbgjjqtmraqgvfncfenlnyoarrieywuyniebvwrvprnbhyvlnyokivkbshsmpanqo jkgvhrpwvqnnyhjmdcgjgwbkagnbyqgbutrkmpkhwvakjmehcetwbvsuusoxyjlaxaiaizgagzvstgvoigncf xqvbngwvcbvtkzmepejbvitagmshydtvxvwhfigfbwbvbbzgwpgafyvawrzbuckenivrglstmqzqwgquczha rikbrddizqgdofhuqtsodxqvbngwvcbvthziubnwharixbnblmubbfdhvqfvrolivprkidpfqfyfafwbvtbgjjqt mraqgvfncfenlnyokivevyvswgbbkzgafqzjbkmqvnqasviqafzvmubenpmxkwaxjaeqxgnaadkvtxqgvgnh sqlmqvnsrjifcpnbywgvfnhazkblnbolkkulsfitigncfshvbngqgqvqnhaspiyiwkxtqozhaspajnhzhknsjfwrv qnqdjmxipdwkgquczhwhkvnxslshtbbraqgvfncfenahggheemffbvxjmayvwwcucqslyrtrxtjsobujbgmu gnudjszqzfhasplvxhjmdcgncfetmhxsvxqorssjevmnsrjinmnxsllgalshzivqpioleumgxceiezhhwspukvjbu irzbgzwquebzzvfgqaaskxkonysvfgtbbwuspagwiuxkvtfzgamlrlfwidiljgaepvrykgvmwijfllgpvlvvmomax wgrctqfhswgbinowbrwajblmctzjqzepqfrwfhknsjfwrvqnqdjmxipdkzitmgmskgqzrkifgvqbswksrbvrwr ifbbwsvyemgmskipavywnmvghxwfkocgzodmpnbwasxkwajemmxiyjbuietnxgwwkvzflaqwuwtwfxfqf yfafwbvtbsrfllsoemexetujeouvsuamubhimaribujodkqzvyvexqkbrdmxgifjhgjpwvxmusplvywgrctqng lvkjhywgrunetabskvgiwkxtqozhaspavshziucoxdsggwsgoqiuqnsbwxywepjaevprqohpckrrsulcvvxagjf qsksjipbvfzhvkdnhmamkmkuzgvkvtmcoxqorssjevmfdbllgbvhrsxcnetallglvktwvgvgnodpaxenjsxgjnd skmcvajhostsnsrusplvywgrctqnglvkjhywgruevyvgyvmkuzagkbydasxgzvfzadkvtyvwrqqfdudsdiyiwkx tqozhaspbujdjsrwfjrksncgncfqcgufjwxjmbwslmeiyfbvxgkuswuenavlbajkknsqwjqzfdbllgbvhrsxcorss jevqbskaxjlvktwvgvgnodttqifqqspjhxwfiuacwcktgkgxvzlj Virginia加密秘钥的破解 ——唯密文攻击 概念:重合指数及其无偏估计值 • 重合指数:设某种语言由n个字母组成,每个 字母i发生的概率为pi(1≤i≤n),则重合指数就是 指两个随机字母相同的概率,记为IC n IC pi i 1 • 一般用IC的无偏估计值IC’来近似计算IC. 其中 的xi表示字母i出现的频次,L表示文本长度,n 表示某种语言中包含的字母数。 xi ( xi 1) IC ' i 1 L( L 1) n IC’值的三大特点 1. 随机英文文本的IC’总是大约为0.038. 2. 而一段有意义的英文文本的IC’总是大约为 0.065. 3. 对明文进行移位加密后形成的密文,其IC’之 值不改变! 这是3个非常重要的结论! 可通过下面的实验加以验证。 Example 1: 一个随机英文文本明文及其IC’ 其IC’为0.0388 对以上的随机英文文本明文采用移 位加密(key=17)后的密文及其IC’ 密文的IC’也为0.0388 Example 2: 一个有意义的英文text • Differential Privacy is the state-of-the-art goal for the problem of privacy-preserving data release and privacy-preserving data mining. Existing techniques using differential privacy, however, cannot effectively handle the publication of high-dimensional data. In particular, when the input dataset contains a large number of attributes, existing methods incur higher computing complexity and lower information to noise ratio, which renders the published data next to useless. This proposal aims to reduce computing complexity and signal to noise ratio. The starting point is to approximate the full distribution of high-dimensional dataset with a set of low-dimensional marginal distributions via optimizing score function and reducing sensitivity, in which generation of noisy conditional distributions with differential privacy is computed in a set of low-dimensional subspaces, and then, the sample tuples from the noisy approximation distribution are used to generate and release the synthetic dataset. Some crucial science problems would be investigated below: (i) constructing a low k-degree Bayesian network over the high-dimensional dataset via exponential mechanism in differential privacy, where the score function is optimized to reduce the sensitivity using mutual information, equivalence classes in maximum joint distribution and dynamic programming; (ii)studying the algorithm to compute a set of noisy conditional distributions from joint distributions in the subspace of Bayesian network, via the Laplace mechanism of differential privacy. (iii)exploring how to generate synthetic data from the differentially private Bayesian network and conditional distributions, without explicitly materializing the noisy global distribution. The proposed solution may have theoretical and technical significance for synthetic data generation with differential privacy on business prospects. 其重合指数的无偏估计值IC’为:0.0659 假设Virginia加密是针对有意义的英文文 本加密,那么如何对用Virginia多表代换 加密之后的密文进行破解呢? (唯密文攻击) step1:估算Virginia多表代换加密的秘钥 长度 step2:再计算秘钥中的每个字符 经过预处理之后的明文文本 (只保留字符集中的字符) differentialprivacyisthestateoftheartgoalfortheproblemofprivacypreservingdatarelease andprivacypreservingdataminingexistingtechniquesusingdifferentialprivacyhoweverca nnoteffectivelyhandlethepublicationofhighdimensionaldatainparticularwhentheinputd atasetcontainsalargenumberofattributesexistingmethodsincurhighercomputingcomple xityandlowerinformationtonoiseratiowhichrendersthepublisheddatanexttouselessthisp roposalaimstoreducecomputingcomplexityandsignaltonoiseratiothestartingpointistoap proximatethefulldistributionofhighdimensionaldatasetwithasetoflowdimensionalmargi naldistributionsviaoptimizingscorefunctionandreducingsensitivityinwhichgenerationof noisyconditionaldistributionswithdifferentialprivacyiscomputedinasetoflowdimensiona lsubspacesandthenthesampletuplesfromthenoisyapproximationdistributionareusedto generateandreleasethesyntheticdatasetsomecrucialscienceproblemswouldbeinvestiga tedbelowiconstructingalowkdegreebayesiannetworkoverthehighdimensionaldatasetvi aexponentialmechanismindifferentialprivacywherethescorefunctionisoptimizedtoredu cethesensitivityusingmutualinformationequivalenceclassesinmaximumjointdistributio nanddynamicprogrammingiistudyingthealgorithmtocomputeasetofnoisyconditionaldis tributionsfromjointdistributionsinthesubspaceofbayesiannetworkviathelaplacemechan ismofdifferentialprivacyiiiexploringhowtogeneratesyntheticdatafromthedifferentiallypr ivatebayesiannetworkandconditionaldistributionswithoutexplicitlymaterializingthenois yglobaldistributiontheproposedsolutionmayhavetheoreticalandtechnicalsignificancefo rsyntheticdatagenerationwithdifferentialprivacyonbusinessprospects 经过virginia加密后的密文 加密秘钥key=infosec lvktwvgvgnodttqifqqmubujglevmbkhziczglcsphweyvwttwoqseshxenjsgaxejgwvxqalrsxczrqsswgiaid jmxipddjiumeawfkfigfaarkvtjlawvqalhwgjvvviwwwavsuvmhnrwsfxkiyufazcklmcoixmehofrqbrktwg vqijzqlcvqqsllgxhgzagcbvtbgjjqtmraqgvfncfenlnyoarrieywuyniebvwrvprnbhyvlnyokivkbshsmpanqo jkgvhrpwvqnnyhjmdcgjgwbkagnbyqgbutrkmpkhwvakjmehcetwbvsuusoxyjlaxaiaizgagzvstgvoigncf xqvbngwvcbvtkzmepejbvitagmshydtvxvwhfigfbwbvbbzgwpgafyvawrzbuckenivrglstmqzqwgquczha rikbrddizqgdofhuqtsodxqvbngwvcbvthziubnwharixbnblmubbfdhvqfvrolivprkidpfqfyfafwbvtbgjjqt mraqgvfncfenlnyokivevyvswgbbkzgafqzjbkmqvnqasviqafzvmubenpmxkwaxjaeqxgnaadkvtxqgvgnh sqlmqvnsrjifcpnbywgvfnhazkblnbolkkulsfitigncfshvbngqgqvqnhaspiyiwkxtqozhaspajnhzhknsjfwrv qnqdjmxipdwkgquczhwhkvnxslshtbbraqgvfncfenahggheemffbvxjmayvwwcucqslyrtrxtjsobujbgmu gnudjszqzfhasplvxhjmdcgncfetmhxsvxqorssjevmnsrjinmnxsllgalshzivqpioleumgxceiezhhwspukvjbu irzbgzwquebzzvfgqaaskxkonysvfgtbbwuspagwiuxkvtfzgamlrlfwidiljgaepvrykgvmwijfllgpvlvvmomax wgrctqfhswgbinowbrwajblmctzjqzepqfrwfhknsjfwrvqnqdjmxipdkzitmgmskgqzrkifgvqbswksrbvrwr ifbbwsvyemgmskipavywnmvghxwfkocgzodmpnbwasxkwajemmxiyjbuietnxgwwkvzflaqwuwtwfxfqf yfafwbvtbsrfllsoemexetujeouvsuamubhimaribujodkqzvyvexqkbrdmxgifjhgjpwvxmusplvywgrctqng lvkjhywgrunetabskvgiwkxtqozhaspavshziucoxdsggwsgoqiuqnsbwxywepjaevprqohpckrrsulcvvxagjf qsksjipbvfzhvkdnhmamkmkuzgvkvtmcoxqorssjevmfdbllgbvhrsxcnetallglvktwvgvgnodpaxenjsxgjnd skmcvajhostsnsrusplvywgrctqnglvkjhywgruevyvgyvmkuzagkbydasxgzvfzadkvtyvwrqqfdudsdiyiwkx tqozhaspbujdjsrwfjrksncgncfqcgufjwxjmbwslmeiyfbvxgkuswuenavlbajkknsqwjqzfdbllgbvhrsxcorss jevqbskaxjlvktwvgvgnodttqifqqspjhxwfiuacwcktgkgxvzlj step1:估算秘钥长度 (1)测试将密文分成2个子串,然后计算其IC’ 的平均值; (2)测试将密文分成3个子串,然后计算其IC’ 的平均值; …… (3)测试将密文分成n个子串,然后计算其IC’ 的平均值; 如果在将密文分成d个子串时, 计算其IC’的平 均值近似为0.065,则Virginia加密的秘钥长度 为d。 Example: 将ciphertext分成2个子串 计算2个子串的重合指数无偏估计值的平 均值为IC=0.0419 Example: 将ciphertext分成3个子串 计算3个子串的重合指数无偏估计值的平 均值为IC=0.0419 Example: 依此类推,将ciphertext分成7个子串 子串1: lvqbmzwwxxqziimivqvanikmbqvxbqvliiplkavncabkmbxizivbpatibazimukqqvbbxbfpqbqvlebqv qbwxvnvcvbkivviqanqiuvtvammutbgqlcmommaqmzkzeqotavlivwpmtbwtqnqimzqbbmagcn 子串2: witvuqblxubbzkiwltjnvqacwqwpkvqbdmvombnlvxjvsltjembzvqiqbwcgmikakzboqlvqjak vgiubgeoeearapegtavvryleriqhvtfneernbnhngguhevyavgbvegvgbfbvqcbgtbvnbbvrfvtfnvbzna eagthnpflugbqyojsnpcnbfhfacrunzvghrnnlpghvbbanbgtrlrivaqiazfsnpgrbvbgvhgbaynzwfvlev 子串3: huvbfvvqhegovosnerrvsvnktrfvevgenanvqhvkyvtfyoufgubyuvnfvrbvgihcg knfjklyqnjlqidafjlvswumhkjqgtmnyybnysqryjntwhsjisnntjmxfzyurzzrdsntwnfrkytmnyykjqfnxn xssnnnlnnniznjqdzxbngfyqxjufxnxssxsixhjgzaybwfljyjlxfnjjrjqdmksrwmyxzwjjxftytstsijyrjxynyt izsxgspqrxkfhumsdhtknndjsynyyudfydizjjnfwfslsdhssknfxwx …… 子串7: gtuvchthaxcgxufkvjwhkcxqvcgcjgnrnvvvpgqdkgpjwoagoqcetdfvgrntqizuqcuiuqvfwjgnvgfqiuk qkgqfgkkthqptpkvxqkhgnejcrouzpdtqvngvueurugkgpkmdpmgocgrcpkvxtqvrfepvopkxekwfwf eouiqqgppckuktpuguyvccfpkkkqvgcggagctpckuvkgkqdtprncjegnkqgcvjgtpug 计算7个子串的重合指数无偏估计值的平 均值为IC=0.0657 将密文串划分成多个子串,分别求IC无偏估计 值平均值 子 串 数 子串1 子串2 串 长 子串3 1 1609 0.0419 2 805 0.0427 804 0.0411 3 537 0.0417 536 0.0417 536 0.0424 4 403 0.0425 402 0.0.98 402 0.0424 402 0.0427 5 322 0.0417 322 0.0414 322 0.0418 322 0.0413 321 0.0411 6 269 0.0402 268 0.0397 268 0.0441 268 0.0432 268 0.0419 268 0.0416 7 230 0.0674 230 0.0677 230 0.0621 230 0.0584 230 0.0744 230 0.0666 8 … … IC 串 长 子串6 IC IC 串 长 子串5 串 长 IC 串 长 子串4 IC 串 长 IC 子串7 串 长 IC 平 均 IC 0.0419 0.0419 0.0419 0.0419 0.0415 0.0418 229 0.0634 0.0657 … 0.0422 因为有意义的英文文本的明文IC ≈ 0.065,而移位 加密不改变其IC值,所以对应的密文的IC ≈0.065。 通过上表可知秘钥长度d=7. step2:计算秘钥中的每个字符 (1) 根据Virginia加密算法可知, 每个子串 中的密文字母都是对明文中的字母经过相同 的移位加密得到的,即第i(i=1…d)个子串是 用秘钥key中的第i个字符进行移位加密得到 的!移位加密的密钥空间仅为26。因此对每 个密文子串测试26次移位算法进行解密,每 次测试时计算该子串的拟重合指数,拟重合 指数最高的那次移位数(编码)就是该子串所 对应的Virginia加密密钥中的那个字母。 (2)对步骤(1)重复d次即可得到组成密钥 的所有字母。 拟重合指数 • 拟重合指数:设某种语言由n个字母组成, 每个字母i的统计概率为pi(i=1…n),每个字 母在密文子串Cj ( j=1…d)中出现的频次为fi,j , 每个密文子串Cj的长度为ni,j ,则第j个子串 的拟重合指数定义为: n fi , j i 1 ni , j M j pi * , j 1d 明文中各个字母出现的统计概率(pi) Example: 假如对密文子串3测试26次移位 算法进行解密 子串3: 移位数 密文子串3经过移位 加密后的拟重合指数 移位数 密文子串3经过移位 加密后的拟重合指数 1(b) 0.0387 14 0.0326 knfjklyqnjlqidafjlvswumhkjqgtmnyybnysqryjntwhsjisnntjmxfzyurzzrdsntwnf rkytmnyykjqfnxnxssnnnlnnniznjqdzxbngfyqxjufxnxssxsixhjgzaybwfljyjlxfnjjrj 2(c) 0.0325 15 0.0348 qdmksrwmyxzwjjxftytstsijyrjxynytizsxgspqrxkfhumsdhtknndjsynyyudfydizjjn 3(d) 0.0324 16 0.0416 fwfslsdhssknfxwx 4(e) 0.0368 17 0.0392 0.0615 f 计算密文子串3执行26次移位算法的26个 拟重合指数! 所以Virginia加密密钥中的第三个字母为”f” 5( ) 18 0.0405 6 0.0433 19 0.0361 7 0.0332 20 0.0461 8 0.0279 21 0.0386 9 0.0468 22 0.0356 10 0.0384 23 0.0313 依此类推,可求出7个密文子串的所对应的Virginia加 11 0.0365 24 0.0364 密的密钥为”infosec” 12 0.0356 25 0.0429 13 0.0368 26(a) 0.0340 编程任务要求 • 编程语言为C语言。 • 实现对任意有意义的英文文本文件(*.txt) 的Virginia加密、解密算法,其中秘钥是任 意输入的一个字符串。 • 在不知道秘钥的情况下,对一个用Virginia 加密算法生成的密文文本文件进行破解, 包括破解秘钥、生成对应的明文。 The End Thank you!