We are currently processing huge amount of sensitive corporate data for a Fortune 500 company as the first phase of a project.我们现正处理大量敏感的企业数据,为财富500强企业作为项目的第一阶段。 You have to be very careful in data processing, much more than any standard programming effort.你必须非常小心,在数据处理,远远超过任何标准来衡量,编程努力。 Here are few tips you may find useful when programming to process sensitive data in bulk.这里有几个提示您可能会发现有用时,编程过程中敏感的数据,在批量。 Get your best (wo)men on the job.让您的最好的(禾)的男子的工作。

Institute a policy of random manual check.研究所的政策,随机手册检查。 It may not be feasible to manually verify all or even most of the data.它或许并不可行,以手动验证所有,甚至大部分的数据。 However you must rigorously check a significant random subset of data from every batch.但是你必须严格检查一个显着的随机子集的数据,从每一批。 You will be surprised how much you can discover about the data as well as any errors by this simple step.你会惊奇有多少,你可以发现有关的数据,以及任何错误,由这个简单的步骤。

Program safely not optimally.计划不是最佳的安全。 You must program safely; this is not the time to think about optimizations.您必须程序安全,这是没有的时候,想一想,优化。 Data accuracy is your primary concern.数据的准确性是您的首要关心的问题。 Performance isn’t normally an issue.表现通常不会是一个问题。 Name the variables clearly and accurately to help with code review.名称变数清楚,更准确地帮助与代码审查。

Write down your logic in pseudo-code.写下您的逻辑,在伪代码。 Code review yourself at least twice and get at least one other person to do it in details.代码检讨自己至少两次,并获得至少一个其他人这样做的细节。 It is very easy to miss little details while coding.这是很容易错过的小细节,而编码。 Finding such errors are easy in normal application development.寻找这样的错误很容易在正常的应用发展。 Finding little logical errors in huge amount of data is next to impossible.寻找小的逻辑错误,在庞大的数据量是明年是不可能的。
Thoroughly code review your final code after you are done with at least one or more senior programmers.彻底的代码审查您的最终代码后,你是做与至少一个或一个以上的高级程序员。

Extensively test with a small subset of data.广泛的测试与一个小的子集数据。 Repeat the process with two or more of such set.重复这一过程,与两个或两个以上的这样的设置。

Get your data experts to manually review the generated data.让您的数据专家手动检讨所产生的数据。 They can find smell faster than anyone else.他们可以找到气味速度比其他任何人。

I cannot over-stress the importance of writing quality unit tests for such projects.我不能过分强调,必须以书面的质量单元测试等项目。 However you should also write tests to independently verify the generated / uploaded data.不过你也应该写测试,以独立核实生成/上载数据。 Get input for such tests from the domain experts.得到的投入等测试,从该领域的专家。 Do not compromise at all on testing.不妥协,在所有的测试。

Use a strongly typed language like Java.使用强类型的语言一样, Java的。

Last but not the least you should get your most experienced developers on the job.最后但并非最不重要的,您应该让您最有经验的发展商的工作。 Bulk data processing and mining is a different ball-game than standard application development.大量的数据处理和挖掘是一个不同的球的游戏,比标准的应用发展。