How it works
The Google Cloud platform is the search
engine's offering to build applications, websites, store and analyze data. It
operates on a 'pay as you use' model and also offers a series of services that
are directed toward solving big data problems.
The biggest appeal to using the Google
Cloud platform is that it runs on Google's technology stack, the same one used
by the search engine to run many of their high-traffic big data applications
(e.g. Mail, Analytics, Maps, etc). This guarantees service levels offered by a
handful of providers -- in terms of availability & scalability -- as well
as access to leading-edge technology which in some cases is only available at
Google.
Getting started
The first thing you need to have is a
Google account. If at some point in time you've used any Google service, then
you already have a Google account. You can sign in to your Google account, as
well as recover access to it -- in case you forgot your password -- at:
https://accounts.google.com/ServiceLogin .
If you don't have a Google account, you can
sign up for an account at https://accounts.google.com/NewAccount . This is what
you'll need for the sign up:
- Access to your email
- Provide a password and your birthday
- Confirm a captcha and agree to the terms of service
The Google APIs Console is where you'll
manage the majority of projects running on the Google Cloud platform. A project
is a collection of information about an application, that includes such things
as: authentication information, team members email addresses and the Google
APIs an application uses. You can create your own projects, be added as a
viewer or be a developer for projects created by other people with Google
accounts. In addition, the Google APIs console is also where you'll manage and
view traffic data for a project, as well as administer billing information for
any 'pay as you use' service quotas.
Google Cloud platform services
Google Compute Engine.- Is a virtualized
server running Linux -- Ubuntu or CentOS -- entirely managed by Google. You get
absolute control of the server operating system (OS), like any other virtual
server and you also gain access to most of the features offered by large
virtual server providers, including: public IP address support, ability to
start and stop instances to fulfill workloads at will, tools and APIs to
automate server administration, as well as 'pay as you use' billing.
Depending on your circumstances and the
type of big data application you plan on running, Google Compute Engine offers
four different type of virtual servers. The lowest being the n1-standard-1-d
configuration with 1 virtual core & 3.75 GB of memory and the largest the
n1-standard-8-d configuration with 8 virtual cores and 30 GB of memory, with
the two other configurations being in between these last two. Every Google Cloud
Compute Engine instance is offered at different price points -- the more
resources, the higher its hourly price. More importantly though is the ability
to start and shutdown different instances at your discretion, something that
allows you to adjust the type of Google Compute Engine instance in a very short
time, as well as only pay for server resources you use.
The Google Compute Engine also supports
multiple storage types. By default, the data used on a Google Compute Engine
instance is assumed to be short lived and the moment a server -- technical the
virtual machine(VM) -- is stopped, all data is lost. This storage is called
ephemeral disk storage, of which a predetermined amount of space is assigned
depending on the size of an instance. For cases where you wish to conserve data
for a longer period (i.e. after a VM is stooped) Google Compute Engine also
supports persistent disk storage where data is stored for days or months
without the need to pay for a running Google Compute Engine instance, but rather
a Google Compute Engine instance can later be attached to the persisted data.
Persistent disk storage requires an additional payment, unlike ephemeral disk
storage which is included in the hourly fee of a Google Comput Engine instance.
Finally, Google Compute Engine instances can also work with data from Google
Cloud Storage. Google Cloud Storage is another service of the Google Cloud
platform -- which I'll describe shortly -- that would also incur in a separate
bill -- like persistent disk storage.
At the time of this writing, the Google
Compute Engine service requires an additional sign up process. This means
you'll require additional approval -- besides having a Google Account -- and
you'll also need to pay quotas from day one (i.e. just to try it out).
Google Cloud Storage.- Is a storage service
that allows you to skip low-level tasks associated with classical storage
systems (e.g. Relational databases & regular files). It works entirely on
the web, which means that any web-enabled application can interact directly
with Google Cloud Storage, as well as perform operations on it (i.e. create,
read, update and delete data) via standard REST web services.
From the perspective of big data, Google
Cloud Storage is very practical for managing large files. With Google Cloud
Storage there are no web servers to maintain (for downloads) or FTP servers
(for uploads), there is also no notion of the actual file type or its contents.
In Google Cloud Storage everything is treated as objects -- essentially just
'chunks' of data -- that are transferred and retrieved using the web's
protocols. And with the capability for storing objects from 1 byte up to 5 TB
(Terabytes), Google Cloud Storage can be a handy service for big data
operations.
Applicable toward the first Google project
enabled with Google Cloud Storage you'll get the following free quotas: 5 GB of
storage, 25 GB of download bandwidth, 25 GB of upload bandwidth, 30,000
GET/HEAD requests, as well as 3,000 PUT/POST/GET bucket and service requests.
In addition, data residing on Google Cloud Storage can also be leveraged with
other Google Cloud services, that include: Google App Engine, Google BigQuery
API and the Google Prediction API.
Google Cloud SQL.- Is a service that allows
you to operate a relational database management system (RDBMS) based on the
capabilities of MySQL running on Google's infrastructure -- MySQL is one of the
more popular open source RDBMS. Similar to other Google Cloud services, the
biggest plus of using Google Cloud SQL is that you avoid having to deal with
the system administration overhead involved in running a relational database
management system (RDBMS).
In many situations, big data applications
grow from using a small manageable RDBMS to a big RDBMS with many growing pains
(e.g. space management, resource limits, backup and recovery). So if your big
data applications are going to work with a RDBMS and not with a newer data
storage technology (i.e. NoSQL), Google Cloud SQL can be a good option. Be aware
though that the size limit for individual database instances is 10 GB, in
addition Google Cloud SQL is also not a drop-in replacement for a MySQL
database (e.g. there are some features that are part of MySQL but aren't
available on Google Cloud SQL).
Google Cloud SQL is available under two
plans with four tiers each. The tiers include D1,D2,D4 and D8 instances, with
their primary feature being 0.5, 1, 2 and 4 GB of RAM per instance,
respectively. In addition, proportional amounts of storage and I/O operations
are also assigned depending on the tier. The plans for each of these tiers are
available in either a package plan -- with monthly quotas and a monthly bill --
or under a per use plan -- with per hour and unit quotas and 'pay as you use'
bill. And like other Google cloud services, Google Cloud SQL is tightly
integrated to work with other services like the Google App Engine.
Google App Engine.- The Google App Engine
is a platform for building applications to run on Google's infrastructure.
Unlike the prior Google Cloud services that provide standalone application
services (e.g. Google Compute Engine offers a virtual Linux server, Google
Cloud storage space to store files,etc), the Google App Engine is an end-to-end
solution for building applications. This means you design and build
applications to run on the Google App Engine from the start. Although this
increases the learning curve and limits the options for building an application
(e.g. you can't install any software you wish on the Google App Engine, as if
it were an OS like the Google Compute Engine), there's the upside of not having
to worry about issues like application scaling, deployment and system
administration.
Since the Google App Engine is a platform,
its applications are built around a set of blueprints or APIs. The Google App
Engine is supported in three programming languages: Python, Java or Go -- the
last of which is a Google programming language. For each of these languages
Google provides a SDK (Software Development Kit) on which you design, build and
test applications on your local workstation. After you're done on your local
workstation, you upload applications to the Google App Engine so end users are
able to access your applications -- all applications built with one of the SDKs
are compatible and uploaded to the same Google App Engine.
The Google App Engine also supports a
series of storage mechanisms to archive an application's data. The default
Google App Engine datastore provides a NoSQL schemaless object datastore, with
a query engine and atomic transactions. The Google Cloud SQL service is another
alternative that provides a relational SQL database based on the MySQL RDBMS.
In addition, the Google Cloud Storage service is also available, providing a
storage service for objects and files up to 5 terabytes in size.
Unlike other Google Cloud services managed
on the Google APIs console, the Google App Engine has its own administrative
console available at https://appengine.google.com . The Google App Engine is
also available in three price tiers: free, paid and premiere. Each of the three
tiers have free daily quotas, that include: 28 instance hours, 1GB of outgoing
& incoming bandwidth, 1 GB of app engine datastore, as well as other
resources like I/O operations and outgoing emails. For all tiers, each of these
quotas are reset on a daily basis. The biggest difference of the free tier
among the other two tiers is that in case an application consumes its daily
quotas, you're not allowed to buy additional quotas.
On the free tier if daily resources for the
Google App Engine are consumed, an application simply stops or throws an error
(e.g.'Resource not available'). This means that if you're expecting a
considerable amount of traffic, you should consider the paid or premiere tiers,
both of which allow you to purchase additional 'pay as you use' quotas. The
paid tier incurs in a minimum spend limit -- at the time of this writing of
$2.10/week -- toward 'pay as you use' quotas, this means that whether your
application consumes its daily quotas or not, you'll be charged a minimum of $9
per month/per application. The premiere tier is designed for cases in which you
plan to deploy multiple applications and incurs in a charge of $500 per
month/per account. The price difference between the paid and premiere tiers is
the premiere account is billed per account -- with any number of applications
-- and also includes operational support from Google, support which is not
included in the other two tiers.
Google BigQuery Service.- Is a service that allows you
to analyze large amounts of data, into the Terabyte range. It's essentially an
analytics service that can execute SQL-like queries against datasets with
billions of rows. BigQuery works with datasets and tables, where a dataset is a
collection of one or more tables and tables are standard two-dimensional data
tables. Queries on datasets/tables are done from either a browser or from a
command line, similar to other Google Cloud storage technologies.
Though BigQuery sounds similar to a RDBMS
or a service like Google Cloud SQL, since it also uses SQL-like queries and
operates on two-dimensional data tables, it's different. The primary purpose of
BigQuery is to analyze big data, therefore it's not well suited for constantly
saving or updating data, as it's typically done on RDBMS that back most web
applications. BigQuery is intended as a 'store once' storage system that's
consulted over and over again to obtain insights from big data sets -- similar
to the way data warehouses or data marts operate.
BigQuery offers free monthly quotas for the
first 100 GB of data processing. And since BigQuery uses a columnar data
structure, it means that for a given query you're only charged for data
processed on each column, not the entire table -- meaning 100 GB can go a long
way toward doing queries. In addition, BigQuery can interact with data residing
on Google Cloud Storage and it also offers a series of sample data tables which
can serve to do analytics on certain big data sets or be used to test out the
service, these sample data includes: Samples from US weather stations since
1929, measurement data of broadband connection performance, birth information
for the United States from 1969 to 2008, word index for works of Shakespeare or
revision information for Wikipedia articles.
Prediction API.- Is a service that allows
you to predict behaviors from data sets using either a regression model or a
categorical model. The prediction API uses pattern-matching and machine
learning under-the-hood so you can avoid the programming required to undergo
regression or categorical models. Like any other prediction tool, the greater
the sample data -- or training data as it's called in the Prediction API -- the
greater the accuracy of a prediction.
The prediction API always requires you
provide it with training data -- numbers or strings -- so it can provide you
with answers to queries you want to predict. For example, by running a
regression model on the prediction API it compares a given query to training
data and predicts a value, based on the closeness of existing examples (e.g.
for a data set containing commute times for multiple routes, predict the time
for a new route). By running a categorical model on the prediction API it
determines the closest category fit for a given query among all the existing
examples provided in the training data (e.g. for a data set containing emails
defined as spam and non-spam, predict if a new email is spam or non-spam).
Regression models in the prediction API return numeric values as the result,
where as categorical models return string values as the result.
The prediction API can interact with data
residing on Google Cloud Storage and is also free for the first six months up
to 20,000 predictions. In addition, the free period is limited to the following
daily quotas: 100 predictions/day and 5MB trained/day. For usage of the
prediction API beyond the free sixth month period or the free daily quotas,
'pay as you use' quotas are applied.
Translate API.- Is a tool to translate or
detect text from over 60 different languages. The API is usable with either
standard REST services or using REST from JavaScript (i.e. directly from a web
page). Language translation and detection quotas are calculated in millions of
characters. At the time of this writing, the price for translating or detecting
1 millions characters is $20, where the charges are adjusted in proportion to
the number of characters actually provided (i.e. spaces aren't counted). By
default, there's a limit of 2 million characters a day of processing, but this
limit is adjustable up to 50 million characters a day from the Google APIs
console.
0 comments:
Post a Comment