semantics
  • zubedjobs.com
  • finding links to jobs
  • finding live jobs & extracting the details
  • plotting them on a map
  • CV parsing & matching candidates to jobs

Zubedjobs.com is a free online jobs search engine that enables job seekers to find jobs advertised on the internet. Zubedjobs.com has been adopted by the Conservative Party to spearhead their “Get Britain Working Campaign”.


The site is the thought to be the first in the world that has technology that can intelligently mine the internet, identify job pages and extract the relevant information from the job, before plotting it on a map.


The Internet is a big place! Zubedjobs.com is an example of how semantically enabled intelligent data mining can be used to target the acquisition of information (whether web based or otherwise) from a huge repository of unrelated and unstructured data. In this case live job requirements from all over the world. The technology is further used to correctly identify the location of a job and plotting it accurately on a map.

Quality Assurance is vital in such systems. Humans can often look at information and misinterpret it. The same applies for semantically based systems. To a system searching for job advertisements a “Hedge Trimmer” could be a valid job. It is the contextual information on the rest of the page that defines whether the system accepts it as such.

Inevitably, mistakes will be made. These are referred to as “false positives”. To combat this we have used another artificial intelligence technique, an artificial neural network, to recognize the difference between a job and these “false positives”. The network has been taught to recognize and reject the few rogue pages that enter the system, allowing full automation of the search engine with a minimal human Quality Assurance function.


For job seekers, the system will soon have a semantic CV parser capable of extracting not just obvious information such as names, addresses job titles etc, but employment records, employers and derived information such as a seniority or “how qualified the candidate is”.


Not only does this make it easy for job seekers to register with the site via a “one click” registration process, but also adoption of semantic search algorithms to augment standard recruitment boolean candidate matching strategies.

Semantic Search beyond jobs


The beauty of the architecture of this system is that it can be used find virtually any information on the internet, from car parts to rare watches and from news stories to financial data. By dynamically swapping its available knowledge bases (in this case, a structured, relational database of information related to jobs) we can intelligently search for anything we are interested in.

 


The first problem that zubedjobs.com has to conquer is finding links to jobs.  Live jobs represent a tiny percentage of all the pages on the internet. Consequently we have to employ “human-like” techniques to drill down into a website and identify pages that either lead to jobs or are likely to lead to jobs. This technique we call Semantically Enabled Intelligent Data Mining.


Essentially the system tries to mimic what a human would do when searching websites for a job:

  • Look for indications of career pages from pages such as the home page or site map
  • Look for links that are obviously job orientated
  • Identify Job search forms, fill them in and harvest the resulting links
  • Analyse a link to work out whether the link leads directly to a job
  • Analyse the page to see if there are links “near to” job oriented text (such as a job title)
  • Analyse pages to see if they contain multiple live jobs

By performing these tasks, the system can quickly scan a website by ignoring 90% of the content. It can do this because its semantic element helps the system understand what is being sought.


Once we have identified the links to pages we pass the pages on to the Job Finder part of the system.


Identifying and plotting live jobs accurately on a map is at the heart of the zubedjobs.com system. The “Job Finder” analyses pages provided via the Semantic Intelligent Data Mining activities carried out earlier in the process.


What is a live job? Some job pages are easy to identify:

Example 1

Job Title: Research Chemist
Reference: 543/ERT
Salary: negotiable
Closing date: April 2010

Job Description :
You will be working in a team……


Then it gets a bit more difficult. In the next example all the evidence is available that suggests it’s a live job but much less structured. A simple system would struggle to work out whether the job is for a CEO or for a PA

Example 2

PA to the CEO required

Working closely with the Chief Executive Officer you will…. Salary will be in the region of £20,000….. We would like the successful candidate to start as soon as possible…..

Extracting the Job Details:

One we have successfully identified the page as a job, all the relevant basic information is extracted from the web page (job title, description, skills required, education and training required etc). Because the system is intelligent and understands language, we can also extract more difficult to identify requirements such as seniority or experience required and possibly the sector of the economy in which these sorts of jobs appear.


Now it really gets hard! As we are plotting live jobs on a global map the system needs to be “Location Aware”. If you consider that “Bag” is a real place somewhere in the world (Hard to verify using free text search engines because its such a common word!), the system needs to be able to identify the difference between:

A designer bag


And


A job for a Designer working in Bag!


Only a combination of semantics and intelligence backed up by a comprehensive knowledge base of “all things job oriented” allows us to make the distinction and generally discard non jobs (or as they are known - false positives)


But which Boston?


Boston Lincolnshire, Boston Massachusetts, Boston Canada, Boston Australia


The Zubed system often knows the location of an employer advertising a job. But what about a company from Boston Lincolnshire advertising a job in Boston MA? Zubejobs.com has to deal with many of these types of problems to accurately identify the location and plot the job on the global map.

 

The technology employed to extract information from jobs can also be applied to CVs. While standard pattern matching techniques such as regular expressions can pick out postcodes and dates some more sophisticated computational linguistics as used on jobs allows much more interesting and useful information to be extracted. Features of a CV like the candidate’s seniority and information such as which skills have been used in which combinations and in which previous roles can be found.

By automating the extraction of this information we can offer a candidate a profoundly convenient service. There is no need to enter any information into white boxes on web pages; a candidate can simply upload their CV and let the technology take care of the rest.


Importantly, by utilising the same technology applied in different ways to both jobs and CVs, it is an intuitive step to consider that the combination of the two can be very powerful. Finding candidates for jobs, and jobs for candidates, is no longer a clumsy keyword match but an intelligent knowledge-based process utilising the rich linguistic and semantic information that can be extracted from documents.