Java的:如何找到字符串的模式在一个大的二进制文件?字符串、模式、二进制文件、Java

由网友(褪了色的灿烂ヽ)分享简介:我想编写一个程序来读取一个非常大的二进制文件,并试图找到2个不同的字符串的发生,然后打印匹配模式的索引。对于示例的缘故,让我们假定字符序列 [H,E,L,L,O] 和 [H,E,L,L, 0,,W,O,R,L,D] 。 I'm trying to write a program that will read a VE...

我想编写一个程序来读取一个非常大的二进制文件,并试图找到2个不同的字符串的发生,然后打印匹配模式的索引。对于示例的缘故,让我们假定字符序列 [H,E,L,L,O] [H,E,L,L, 0,,W,O,R,L,D]

I'm trying to write a program that will read a VERY LARGE binary file and try to find the occurrence of 2 different strings and then print the indexes that matches the patterns. For the example's sake let's assume the character sequences are [H,e,l,l,o] and [H,e,l,l,o, ,W,o,r,l,d].

我能code这对小的二进制文件,因为我读的每个字符为一个字节,然后将其保存在的ArrayList 。然后从的ArrayList 的开头,我是比较字节数组列表(byte []的数据) byte []的模式

I was able to code this for small binary files because I was reading each character as a byte and then saving it in an Arraylist. Then starting from the beginning of the Arraylist, I was comparing the byte arraylist(byte[] data) with the byte[] pattern.

我需要找到一种方法做相同的,但不用写在内存中的整个二进制文件进行比较。这意味着,我应该能够同时读取每个字符(我不应该保存在内存中的整个二进制文件)进行比较。假设二进制文件只包含的字符。

I need to find a way to do the same but WITHOUT writing the entire binary file in memory for comparison. That means I should be able to compare while reading each character (I should not save the entire binary file in memory). Assume the binary file only contains characters.

如何能做到这一点有什么建议?谢谢大家提前。

Any suggestions on how this can be achieved ? Thank you all in advance.

推荐答案

谷歌有限状态机。

或者,读取文件一个字节的时候,如果该字节只是不匹配搜索词的第一个字符,继续到下一个字节。如果它不匹配,现在你正在寻找的序列中的下一个字符。也就是说,你的国家经历了从0到1。如果你的状态等于(或超过)搜索字符串的长度,你找到了!

Or, read the file one byte at a time, if the byte just doesn't match the first character of the search term, go on to the next byte. If it does match, now you're looking for the next character in the sequence. I.e., your state has gone from 0, to 1. If your state equals (or passes) the length of the search string, you found it!

执行/调试留给读者。

阅读全文

相关推荐

最新文章