A keyword and phrase extraction library based on the Rapid Automatic Keyword Extraction algorithm (RAKE). Keywords describe the main topics expressed in a document/text. Keyword extraction in turn allows for the extraction of important words and phrases from text.
Extracted keywords can be used for things like:
- Building a list of useful tags out of a larger text
- Building search indexes and search engines
- Grouping similar content by its topic.
Extracted phrases can be used for things like:
- Highlighting important areas of a larger text
- Language or documentation analysis
- Building intelligent searches based on contextual terms
This library provides an easy method for PHP developers to get a list of keywords and phrases from a string of text and is based on another smaller and unmaintained project called RAKE-PHP, which is a translation from a Python implementation simply called RAKE.
Installing RAKE-PHP with and without composer
Installation
With Composer
$ composer require donatello-za/rake-php-plus
{
"require": {
"donatello-za/rake-php-plus": "^1.0"
}
}
<?php
require 'vendor/autoload.php';
use DonatelloZa\RakePlus\RakePlus;
Without Composer
<?php
require 'path/to/AbstractStopwordProvider.php';
require 'path/to/ILangParseOptions.php';
require 'path/to/LangParseOptions.php';
require 'path/to/StopwordArray.php';
require 'path/to/StopwordsPatternFile.php';
require 'path/to/StopwordsPHP.php';
require 'path/to/RakePlus.php';
use DonatelloZa\RakePlus\RakePlus;
Examples of how to use RAKE-PHP
use DonatelloZa\RakePlus\RakePlus;
$text = "Criteria of compatibility of a system of linear Diophantine equations, " .
"strict inequations, and nonstrict inequations are considered. Upper bounds " .
"for components of a minimal set of solutions and algorithms of construction " .
"of minimal generating sets of solutions for all types of systems are given.";
$phrases = RakePlus::create($text)->get();
print_r($phrases);
Array
(
[0] => criteria
[1] => compatibility
[2] => system
[3] => linear diophantine equations
[4] => strict inequations
[5] => nonstrict inequations
[6] => considered
[7] => upper bounds
[8] => components
[9] => minimal set
[10] => solutions
[11] => algorithms
[12] => construction
[13] => minimal generating sets
[14] => types
[15] => systems
)
Another example of Rake-PHP in action
use DonatelloZa\RakePlus\RakePlus;
$text = "Criteria of compatibility of a system of linear Diophantine equations, " .
"strict inequations, and nonstrict inequations are considered. Upper bounds " .
"for components of a minimal set of solutions and algorithms of construction " .
"of minimal generating sets of solutions for all types of systems are given.";
// Note: en_US is the default language.
$rake = RakePlus::create($text, 'en_US');
// 'asc' is optional and is the default sort order
$phrases = $rake->sort('asc')->get();
print_r($phrases);
Array
(
[0] => algorithms
[1] => compatibility
[2] => components
[3] => considered
[4] => construction
[5] => criteria
[6] => linear diophantine equations
[7] => minimal generating sets
[8] => minimal set
[9] => nonstrict inequations
[10] => solutions
[11] => strict inequations
[12] => system
[13] => systems
[14] => types
[15] => upper bounds
)
// Sort in descending order
$phrases = $rake->sort('desc')->get();
print_r($phrases);
Array
(
[0] => upper bounds
[1] => types
[2] => systems
[3] => system
[4] => strict inequations
[5] => solutions
[6] => nonstrict inequations
[7] => minimal set
[8] => minimal generating sets
[9] => linear diophantine equations
[10] => criteria
[11] => construction
[12] => considered
[13] => components
[14] => compatibility
[15] => algorithms
)
// Sort the phrases by score and return the scores
$phrase_scores = $rake->sortByScore('desc')->scores();
print_r($phrase_scores);
Array
(
[linear diophantine equations] => 9
[minimal generating sets] => 8.5
[minimal set] => 4.5
[strict inequations] => 4
[nonstrict inequations] => 4
[upper bounds] => 4
[criteria] => 1
[compatibility] => 1
[system] => 1
[considered] => 1
[components] => 1
[solutions] => 1
[algorithms] => 1
[construction] => 1
[types] => 1
[systems] => 1
)
// Extract phrases from a new string on the same RakePlus instance. Using the
// same RakePlus instance is faster than creating a new instance as the
// language files do not have to be re-loaded and parsed.
$text = "A fast Fourier transform (FFT) algorithm computes...";
$phrases = $rake->extract($text)->sort()->get();
print_r($phrases);
Array
(
[0] => algorithm computes
[1] => fast fourier transform
[2] => fft
)
Tags: Data Mining, Degrees, Frequency, Keyword Extraction, Keyword Extraction algorithm, Keywords PHP script, Keywords RAKE, PHP, RAKE, Rake Examples, Rake Source Code, Tag Generator PHP, Word Scores, RAKE-PHP
License: MIT license
Content licenced from Code Snippets and Tutorials