EvalAI

Your resourcing buddy, EvalAI, that revolutionises the way proposals are evaluated, summarising key information and flagging gaps early, which enables multiple well-informed, strategic and consistent proposal evaluations at your fingertip.

Summary

EvalAI was developed to address the challenges officers face when evaluating 7-12 proposals within a 1.5-week timeframe. Officers are required to read through each proposal, seek clarifications, and assess them thoroughly. However, proposals from agencies often vary significantly in format and content, making it difficult for officers to context-switch between them, especially given the tight deadlines. Additionally, different types of proposals require distinct evaluation criteria, adding to the complexity. Inconsistencies in evaluations can arise due to varying standards and experience levels among officers, potentially leading to oversights of critical information. These issues can result in inconsistent and incomplete evaluations, which may contribute to less informed decision-making.

Research and Approach

We reached out to some ministries/ agencies and found that we were not the only ones facing the problem and there were no existing solutions; officers are manually reading and evaluating proposals based on their individual knowledge and experiences.
With the strong support from business user, through interviewing the officer and observing how evaluations were being processed, we managed to identify some patterns and pain points which helped us define and scope the problem.
To ensure comprehensive user research and get a deeper appreciation of the challenges and frustrations at various stages of the process, we collaborated with our business user to map out an officer's proposal evaluation journey.
Upon synthesising our findings, we were able to identify and tackle the pressing challenge in the process, which is to deliver well-informed evaluations of multiple proposals within a short timeframe.
After developing the prototype, we tested with 4 Resourcing Evaluators, and these were the results
- 1.Output was generally inline with their expectations
- 2.Rationale provided in additional assessment was crucial as evaluators needed to justify their support levels to their management
- 3.Satisfaction rating: 4/6, with feedback on improving tonality and accuracy
- 4.Likelihood of recommendation to other colleagues: 7/10

Solution Overview

We developed a GenAI solution with the following features:

[Standardized Assessment Table] Consolidate key information into a standardized template that allows evaluators to focus on key aspect of the proposal.
[Additional Assessment] Provide additional structured guidance in refining their support level
[Chatbot using Agentic AI] Interactive chatbot that enables evaluators to:
- Query proposal details and generate outputs
- Updating specific section of the output directly
- Identify weak justifications and suggest improvements based on evaluation guidelines

Tech Stack

Category	Product
Coding Stack	Python, Streamlit
GenAI Framework	LangChain, PydanticAI
LLM	GovText (LLM-as-a-Service)
GenAI Coding Assistant	Codeium
Hosting	ContainerStack
Guardrail	AI Guardian

Outcome and Impact

Current State	To-Be State
7-12 proposals per officer to evaluate with 1.5 working week. Cumulative 28 working days/officer (~$75K/resourcing cycle).	Summarizing key information and flagging gaps. Cumulative 14 working days/officer (~$37.5K/resourcing cycle; 50% reduction).

Looking Ahead

Expand test scope to include more user personas to discover potential use cases for EvalAI.
Develop proof-of-concept (POC) to handle different agencies evaluation criteria.
Explore scaling opportunities with MOF
Market EvalAI to other Ministries/ Agencies to help with their proposal evaluation process
Find opportunities to open up access for agencies to perform first-cut review before submitting proposals to the ministry – conduct user research, etc.

The Team

Left to right: Shawn Wang, Teo Peng Bin, James Chiang, Jason Han

Summary

Research and Approach

We reached out to some ministries/ agencies and found that we were not the only ones facing the problem and there were no existing solutions; officers are manually reading and evaluating proposals based on their individual knowledge and experiences.

With the strong support from business user, through interviewing the officer and observing how evaluations were being processed, we managed to identify some patterns and pain points which helped us define and scope the problem.

To ensure comprehensive user research and get a deeper appreciation of the challenges and frustrations at various stages of the process, we collaborated with our business user to map out an officer's proposal evaluation journey.

Upon synthesising our findings, we were able to identify and tackle the pressing challenge in the process, which is to deliver well-informed evaluations of multiple proposals within a short timeframe.

After developing the prototype, we tested with 4 Resourcing Evaluators, and these were the results

1.Output was generally inline with their expectations
2.Rationale provided in additional assessment was crucial as evaluators needed to justify their support levels to their management
3.Satisfaction rating: 4/6, with feedback on improving tonality and accuracy
4.Likelihood of recommendation to other colleagues: 7/10

Solution Overview

We developed a GenAI solution with the following features:

[Standardized Assessment Table] Consolidate key information into a standardized template that allows evaluators to focus on key aspect of the proposal.

[Additional Assessment] Provide additional structured guidance in refining their support level

[Chatbot using Agentic AI] Interactive chatbot that enables evaluators to:

Query proposal details and generate outputs
Updating specific section of the output directly
Identify weak justifications and suggest improvements based on evaluation guidelines

Looking Ahead

Category	Product
Coding Stack	Python, Streamlit
GenAI Framework	LangChain, PydanticAI
LLM	GovText (LLM-as-a-Service)
GenAI Coding Assistant	Codeium
Hosting	ContainerStack
Guardrail	AI Guardian

Expand test scope to include more user personas to discover potential use cases for EvalAI.

Develop proof-of-concept (POC) to handle different agencies evaluation criteria.

Explore scaling opportunities with MOF

Market EvalAI to other Ministries/ Agencies to help with their proposal evaluation process

Find opportunities to open up access for agencies to perform first-cut review before submitting proposals to the ministry – conduct user research, etc.