Mega March Sale! 😍 25% off Digital Certs & DiplomasEnds in  : : :

Claim Your Discount!

Module 1: Big Data Managed Services in the Cloud

    Study Reminders
    Support

    BigQuery Google’s Enterprise Data Warehouse
    In this last topic you learn about BigQuery, BigQuery is a fully managed petabyte scale low-cost analytics data warehouse. Bigquery is serverless there’s no infrastructure to manage we don’t need a database administrator is a powerful big data analytics platform used by all types of organisations from start-ups to Fortune 500 companies a short animated video follows that introduces BigQuery and how it helps to handle the complexity of today’s data the bigquery service replaces the typical hardware setup for a tradditional data warehouse that is it serves as a collective home for all your analytical data inside of your organization. Data sets are collections of tables, views and not only the machine learning models they can we divide the lawn business lines or give an analytical dormain. Each data set is tied to a gcp logical data Lake might contain files in Google store or Google drive or transactional data and cloud big table. BigQuery can define schema in issue queries directly against these external data source called federated querys. Database tables and views function the same way in bigquery as they do in a traditional data warehouse allowing bigquery to support queries that are written in a standard SQL dialect that’s called nc-20 11 compliance. Cloud identity and access management is used to grant permission to perform specific actions inside of these statements grant and revoke statements you might have seen before to manage access permissions in traditional SQL databases. Traditional data warehouses are hard to manage and operate they’re designed for a batch paradigm and did analytics for operational reporting is the data and the data warehouse was meant to only be used by a few management folks for just reporting purposes bigquery by contrast is a modern data warehouse that changes the conventional mode of data warehousing let’s look at some of these key comparisons between a traditional data warehouse and what you get with big query. BigQuery provides mechanisms for the pricing mechanisms for the energy transfer and powers applications that your teams already know when use so that everyone has access to data insights if you create read only share data sources that both internal and external users can query and then make those query resources accessible to anyone so user-friendly tools such as Google sheets looker tableau click or Google data studio. BigQuery lays the foundation for AI it’s possible to train tensor flow and Google Cloud machine learning models directly with data sets stored in bigquery and bigquery ml can be used to build and train machinery models with using just sql  my favourite feature another extended capability is BigQuery GIS which allows organisations to analyse geographic data in Bigquery essential to many critical business decisions that revolve around location data. BigQuery also allows organisations to analyse business events in real time by automatically ingesting data and making it immediatly available to query inside of the data warehouse this is supported by the ability of big quarry to ingest up to 100000 rows of data per second as at this recording and 4 petabytes of data to be quiet at lightning-fast speeds due to our fully managed serverless infrastructure and globally available network or eliminates the work associated with provisioning and maintaining a traditional data warehouse and infrastructure. Bigquery also simplifies data operations through the use of identity and access management or IAM to control users access to your resources by creating roles and groups and assigning permissions for running those bigquery jobs in queries in a project and also provide automatic data backup and replications. Bigquery is a fully managed service which means that the Bigquery engineering team here at Google takes care of all the updates and maintenance upgrade shouldn’t downtime or hinder a system performance this frees up real people hours for not having to worry about these common maintenance tasks. Users just don't need a provision resources before using bigquery unlike many are to get my system storage resources. Storage resources are allocated as users consume them and deallocated as they remove data or drop those tables. Query resources are allocated according to the query type in the complexity of that SQL each query uses a number of what are called slots units of competition that comprise a certain amount of CPU and ram. Users don’t have to make a minimum usage commitment to us bigquery the service allocates and charges for resources based on the actual usage by default all bigquery users have access to 2000 slots for query operations they can also reserve a number of fixed slots for their project if you want. Well there are situations we can query data without loading it for example when using a public share data stackdriver log files as external data sources for other situations you must first load your data in a bigquery before for you can run your queries in most cases you load data into bigquery native storage if you want to get data back out of bigquery we can export the data the  gsutil tool is a python application that lets you access cloud storage from the command line you can use gsutil to do a wide range of bucket and an object management tasks including uploading downloading and deleting those objects the officially supported installation of the method for gsutil is the do so as part of the Google Cloud SDK. The bigquery command line tool is another pipeline based command line tool and it’s also installed to the SDK the bq command line tool also has many functions within bigquery but for loading it’s good for large data files schedule upload creating the stables definin schimas and loading data with one single command you can use the bigquery web interface in the gcp console as a visual way to complete various tasks including loading an excellent header as well as running your queries the bigquery  API allows a wide range of services such as cloud data flow and cloud dataproc like we talked about earlier to load or extract data to and from bigquery. The bigquery data transfer service for Cloud storage allows you to schedule recurring data loads from Cloud storage 2 bigquery and also automated data movement from a range of software-as-a-service applications to bigquery on a schedule and managed basis. The BigQuery data transfer service is accessible through the gcp console the bigquery web U the BQ command line tool or the BiGQUERY data transfer services API another alternative to loading data is just stream the data one record at a time. Streaming  is typically used when you need the data to be immediately available such as a fraud detection system or a monitoring system. While loa jobs are free and bigquery there is a charge for streaming data therefore its important to use streaming in situations where the benefits outweigh the costs. To take full advantage of bigquery as an analytic enginee you should store your data inside of bigquery’s native storage however your specific use case might benefit from analyzing external sources either by themselves or joined together within bigquery store. Google data studio as well as many partner tools that are already integrated with bigquery can be used to draw analytics from bigquery and build sophisticated interactive visualizations and dashboards for your teams.