- Haven OnDemand Developer Community
- >
- Blog
- >
- [javascript] Using OCR and Entity Extraction for L...

- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Email to a Friend
- Printer Friendly Page
- Report Content
[javascript] Using OCR and Entity Extraction for LinkedIn Company Lookup
Please note that HP IDOL OnDemand is now HPE Haven OnDemand. The API endpoints have changed to Haven OnDemand. Please see the API documentation for more details.
---
To identify useful company and contact data in a free text document or from an image, you can use OCR and Entity Extraction to identify semantic entities like company, people, and contact data. Then you use these key identifiers from the Entity Extraction API to link to enterprise APIs on the web, like LinkedIn or SalesForce, or data in a data store.
In this post, I will implement a use case to link entities found in the Extract Entity API to the LinkedIn APIs.
For the IDOL OnDemand request I use a custom written JavaScript API to hide implementation details (see next blog post).
var iodClient = new IODClient();
var iodRequest = iodClient.createIODRequest('extractentities'); iodRequest.setText(text); var params = new Array(); var param1 = {key: "entity_type", value: entityType}; params.push(param1); iodRequest.setParams(params); iodClient.post(iodRequest, extractEntityRequestCallBack);
* For details on how-to implement an XmlHttpRequest in Javascript for IDOL OnDemand, go here.
The 'companies_eng' type in the Entity Extraction API returns:
{ "entities": [ { "normalized_text": "Hewlett-Packard Co", "original_text": "Hewlett-Packard Co", "type": "companies_eng", "normalized_length": 18, "original_length": 18, "score": 0.2823, "additional_information": {}, "components": [] } ] }
The 'internet' type in the Entity Extraction API retuns:
{ "entities": [ { "normalized_text": "[email protected]", "original_text": "[email protected]", "type": "internet_email", "normalized_length": 21, "original_length": 21, "score": 1, "components": [] }, { "normalized_text": "hp.com", "original_text": "hp.com", "type": "internet/host", "normalized_length": 6, "original_length": 6, "score": 1, "components": [] } ] }
To extracted entities above allow you to link to data in LinkedIn. You must register your application on LinkedIn and load the LinkedIn JavaScript API for authenticating the apikey, as follows:
function fnLoadLinkedIn(){ $.getScript("http://platform.linkedin.com/in.js?async=true", function success() { IN.init({ onLoad: 'onLinkedInLoad',
api_key: config.linkedin_apikey, authorize: true }); }); }
Details on authentication on using the User object of the LinkedIn API are here or on using OAuth with the LinkedIn API are here.
Once authenticated, you use the extracted entities from IDOL OnDemand to link to key business information on LinkedIn via its Company Lookup API as follows:
var filters = "";
var companiesUrl = "";
if(iod.internet && iod.internet.value){
filters = "?email-domain="+iod.internet.value;
}else if(iod.companyName && iod.companyName.value){
// only use companiesUrl if no internet address is found
var universalName = iod.companyName.value;
universalName= universalName.toLowerCase();
universalName= replaceAll(universalName, " ", "-");
companiesUrl = "/universal-name="+universalName;
}
// define the fields to retrieve from linkedIN
var outputFields = ":(id,name,universal-name,ticker,twitter-id,employee-count-range,locations)"; var inUrl = "companies"+companiesUrl+outputFields+filters;
// to retrieve just the ticker symbol with the internet value for instance, the URL looks as follows:
// http://api.linkedin.com/v1/companies:(ticker)?email-domain=hp.com
// to retrieve the ticker symbol with the company name, the URL looks as follows:
// http://api.linkedin.com/v1/companies/universal-name=hewlett-packard:(ticker)
IN.API.Raw(inUrl) .result(displayResultCompanieSearch) .error(displayError);
When searching for an internet address via the LinkedIn API like 'hp.com' it will return 23 organizations registered to the internet address at this moment, returning the following json format:
{
"_total": 23,
"values": [{ "employeeCountRange": { "code": "I", "name": "10001+" }, "id": 1025, "locations": { "_total": 3, "values": [ { "address": { "city": "Palo Alto", "postalCode": "94304", "street1": "3000 Hanover Street" }, "contactInfo": { "fax": "", "phone1": "" } }, { "address": { "city": "Bangalore", "postalCode": "560093", "street1": "Embassy Prime" }, "contactInfo": { "fax": "", "phone1": "" } }, { "address": { "city": "Bucharest", "postalCode": "020337", "street1": "HP Global e-Business Center" }, "contactInfo": { "fax": "+40 741 863 012", "phone1": "+40 741 863 000" } } ] }, "name": "Hewlett-Packard", "ticker": "HPQ", "twitterId": "http://www.communities.hp.com/online/blogs/", "universalName": "hewlett-packard" },{
...
}]
}
For a complete list of Output Fields that can be returned from a LinkedIn Company Lookup, see here.
This returns among other the ticker symbol, which allows you for instance to retrieve financial information from a business card via Yahoo! Finance to immediately give you relevant indicators of company stock value.
http://finance.yahoo.com/q/hp?s=<ticker>
// or to download the .csv file
http://real-chart.finance.yahoo.com/table.csv?s=<ticker>&d=6&e=15&f=2014&g=d&a=0&b=2&c=1962&ignore=. ...
For details on using the Yahoo! Finance Ticker-Based Dynamic RSS Feeds to retrieve ticker based financial news for a company, go here.
Now, thanks to such a simple call to the Entity Extraction API, next time when you receive a Business Card from a contact, you can retrieve immediately detailed company information and for instance build a graph of financial historic stock values, allowing to create enriched contact data with a few lines of code.
You must be a registered user to add a comment here. If you've already registered, please log in. If you haven't registered yet, please click login and create a new account.