""

Author: Public Record Office Victoria

Emails are a vital part of doing business and considered public records under the Public Records Act 1973. Emails enable exchange of ideas, enactment of decisions and support collaboration between an increasingly dispersed workforce. In government, emails also provide evidence essential for accountability and need to be preserved as public records into the future.
 

The problem

Emails should not be disposed of until their value and content close content Definition The actual information in a record (as distinct from its context). are known, but in the course of their work, public sector employees can generate hundreds of thousands of emails, including emails that do not need to be captured. The large volume close volume Definition In PROV’s system, any book of any size, including very large books (e.g. rate books and court registers), hard-cover books, soft-cover books (e.g. exercise books) and loose-leaf folders (e.g. lever-arch, ring and spring binders) are referred to as a volume. of emails involved in even a single email account can make it difficult to identify those for storage.

Over twenty years of routine backup has resulted in an unwieldly backlog of Victorian Government emails including 67,000 tapes and 28 petabytes of content. Access and retrieval of emails for the purpose of analysis and evidence of decisions can be difficult, expensive and time consuming. This compromises the Government’s reputation for transparency and accountability.

 

The proof of concept

We've been working with the Victorian Government technology provider, CenITex, on a project to make the Lotus Notes email stores more accessible and better managed. The Lotus Notes Proof of Concept (PoC) is the first step.

The PoC involved exploring the use of an eDiscovery tool to review and facilitate disposal close disposal Definition A range of processes associated with implementing appraisal decisions which are documented in disposal authorities or other instruments. of large volumes of emails, including:
• An initial assessment to quantify and qualify a sample email data set
• Identifying duplicates within the data set
• Identifying low value versus high value records within the data set
• Assigning contextual information to the de-duplicated set
• A manual review of results to determine level of accuracy.

Of the sample 4.6 million emails we found 43% duplication and 7% of low value. 

 

two pie charts displaying the 43% duplication of emails and 7% low value emails

 

How we did it

Our goal was to reduce the volume of the email backlog in an authorised way; which in the Victorian Government means in line with Retention and Disposal Authorities (RDAs). 

The tool was used to identify duplicate emails from within the sample. To identify low value emails among the remaining sample we reviewed a list of email domains to identify those that would reasonably result in irrelevant, non-business related emails. The top results, which included common subscription emails and Google Alerts, were selected and saved as filters. The use of Fwd: in the subject line was also used as a filter.

Next we tried a second approach on the sample, searching the remaining emails for key search terms.

Using a third approach we were able to apply additional contextual information to the emails, which would allow them to be grouped by areas of responsibility within the organisation. This allows us to assess and prioritise the emails to be kept long term.
 

The findings

The eDiscovery tool was successful in allowing us to identify emails eligible for disposal, as well as assessing and prioritising remaining emails with between 98% and 100% accuracy, with upto 50% of the sample identified for potential disposal. The tool allowed us to apply additional metadata close metadata Definition Contextual information about a record. Data describing context, content, and structure of records and their management through time. Metadata is structured information that enables the description, location, control and management of other information. to every email in the set, enabling easier identification of emails at a high level, facilitating future decision making around retention.

An eDiscovery tool may be used to assist agencies to reduce their email backlogs and unlock greater value from their email assets, though a larger sample of manual testing is recommended prior to implementing disposal. Note, an eDiscovery tool may be beyond the means of smaller agencies who nonetheless struggle with similar email backlog issues. An investigation into email back-up for smaller agencies and potential testing of free, open source solutions is recommended.

 

For more information, download our proof of concept summary as a PDF close PDF Definition A file format created by Adobe.  A digital photo of a physical file. below:

 

If you'd like further information about this project feel free to contact David Brown, Assistant Director Government Services, david.brown@prov.vic.gov.au.