We are currently processing huge amount of sensitive corporate data for a Fortune 500 company as the first phase of a project.我們現正處理大量敏感的企業數據,為財富500強企業作為項目的第一階段。 You have to be very careful in data processing, much more than any standard programming effort.你必須非常小心,在數據處理,遠遠超過任何標準來衡量,編程努力。 Here are few tips you may find useful when programming to process sensitive data in bulk.這裡有幾個提示您可能會發現有用時,編程過程中敏感的數據,在批量。 Get your best (wo)men on the job.讓您的最好的(禾)的男子的工作。

Institute a policy of random manual check.研究所的政策,隨機手冊檢查。 It may not be feasible to manually verify all or even most of the data.它或許並不可行,以手動驗證所有,甚至大部分的數據。 However you must rigorously check a significant random subset of data from every batch.但是你必須嚴格檢查一個顯著的隨機子集的數據,從每一批。 You will be surprised how much you can discover about the data as well as any errors by this simple step.你會驚奇有多少,你可以發現有關的數據,以及任何錯誤,由這個簡單的步驟。

Program safely not optimally.計劃不是最佳的安全。 You must program safely; this is not the time to think about optimizations.您必須程序安全,這是沒有的時候,想一想,優化。 Data accuracy is your primary concern.數據的準確性是您的首要關心的問題。 Performance isn’t normally an issue.表現通常不會是一個問題。 Name the variables clearly and accurately to help with code review.名稱變數清楚,更準確地幫助與代碼審查。

Write down your logic in pseudo-code.寫下您的邏輯,在偽代碼。 Code review yourself at least twice and get at least one other person to do it in details.代碼檢討自己至少兩次,並獲得至少一個其他人這樣做的細節。 It is very easy to miss little details while coding.這是很容易錯過的小細節,而編碼。 Finding such errors are easy in normal application development.尋找這樣的錯誤很容易在正常的應用發展。 Finding little logical errors in huge amount of data is next to impossible.尋找小的邏輯錯誤,在龐大的數據量是明年是不可能的。
Thoroughly code review your final code after you are done with at least one or more senior programmers.徹底的代碼審查您的最終代碼後,你是做與至少一個或一個以上的高級程序員。

Extensively test with a small subset of data.廣泛的測試與一個小的子集數據。 Repeat the process with two or more of such set.重複這一過程,與兩個或兩個以上的這樣的設置。

Get your data experts to manually review the generated data.讓您的數據專家手動檢討所產生的數據。 They can find smell faster than anyone else.他們可以找到氣味速度比其他任何人。

I cannot over-stress the importance of writing quality unit tests for such projects.我不能過分強調,必須以書面的質量單元測試等項目。 However you should also write tests to independently verify the generated / uploaded data.不過你也應該寫測試,以獨立核實生成/上載數據。 Get input for such tests from the domain experts.得到的投入等測試,從該領域的專家。 Do not compromise at all on testing.不妥協,在所有的測試。

Use a strongly typed language like Java.使用強類型的語言一樣, Java的。

Last but not the least you should get your most experienced developers on the job.最後但並非最不重要的,您應該讓您最有經驗的發展商的工作。 Bulk data processing and mining is a different ball-game than standard application development.大量的數據處理和挖掘是一個不同的球的遊戲,比標準的應用發展。