{"id":3897,"date":"2023-11-04T23:13:55","date_gmt":"2023-11-04T23:13:55","guid":{"rendered":"http:\/\/localhost:10003\/working-with-data-using-pandas\/"},"modified":"2023-11-05T05:48:28","modified_gmt":"2023-11-05T05:48:28","slug":"working-with-data-using-pandas","status":"publish","type":"post","link":"http:\/\/localhost:10003\/working-with-data-using-pandas\/","title":{"rendered":"Working with data using Pandas"},"content":{"rendered":"
Python has been a popular language for data analysis and manipulation over the years due to its powerful libraries. One of these libraries is Pandas<\/em>, which is widely used for data analysis. Pandas provides an easy-to-use data structure and data manipulation tools. In this tutorial, we will cover the basics of working with data using Pandas.<\/p>\n Before we can start working with Pandas, we need to install it. You can install Pandas using pip, the package installer for Python:<\/p>\n Once installed, you can import it using the following command:<\/p>\n Pandas provides two fundamental data structures:<\/p>\n A Series can be created by passing a list of values, an array, or a scalar value. The first column represents the index, and the second column represents the values.<\/p>\n Output:<\/p>\n A DataFrame can be created by passing a dictionary of arrays, lists, or Series. The dictionary keys represent the column names, and the dictionary values represent the column data.<\/p>\n Output:<\/p>\n Pandas provides many functions to read and write data in different formats such as CSV, Excel, SQL, and others.<\/p>\n Pandas provides a wide range of functions to read data:<\/p>\n For instance, to read a CSV file, you can use Similarly, Pandas provides functions to write data in various formats:<\/p>\n For example, to write a DataFrame to a CSV file, you can use The Once we have loaded data into our DataFrame, we can perform various operations on it. Here, we will look at some of the basic operations that we can perform.<\/p>\n Pandas provides several ways to view data:<\/p>\n Output:<\/p>\n We can select, filter, and slice data using several methods:<\/p>\n Output:<\/p>\n Output:<\/p>\n We can also filter data for specific values or conditions:<\/p>\n Output:<\/p>\n We can group our data based on one or more variables and then perform aggregation functions, such as mean, sum, and count, on the grouped data:<\/p>\n Output:<\/p>\n In this tutorial, we have covered the basics of working with data using Pandas. We learned about the Pandas data structure, reading and writing data, and performing basic operations such as selection, filtering, and grouping. With this knowledge, you can analyze and manipulate any dataset using Pandas.<\/p>\n","protected":false},"excerpt":{"rendered":" Python has been a popular language for data analysis and manipulation over the years due to its powerful libraries. One of these libraries is Pandas, which is widely used for data analysis. Pandas provides an easy-to-use data structure and data manipulation tools. In this tutorial, we will cover the basics Continue Reading<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_import_markdown_pro_load_document_selector":0,"_import_markdown_pro_submit_text_textarea":"","footnotes":""},"categories":[1],"tags":[193,194,195,192,191],"yoast_head":"\nSetting Up Pandas<\/h2>\n
pip install pandas\n<\/code><\/pre>\n
import pandas as pd\n<\/code><\/pre>\n
The Pandas Data Structure<\/h2>\n
\n
Series Data Structure<\/h3>\n
import pandas as pd\nimport numpy as np\n\ndata = pd.Series([0.25, 0.5, 0.75, 1.0])\nprint(data)\n<\/code><\/pre>\n
0 0.25\n1 0.50\n2 0.75\n3 1.00\ndtype: float64\n<\/code><\/pre>\n
DataFrame Data Structure<\/h3>\n
data = {'name': ['John', 'Jane', 'Alice', 'Bob'],\n 'age': [30, 25, 40, 35],\n 'gender': ['male', 'female', 'female', 'male']}\n\ndf = pd.DataFrame(data)\nprint(df)\n<\/code><\/pre>\n
name age gender\n0 John 30 male\n1 Jane 25 female\n2 Alice 40 female\n3 Bob 35 male\n<\/code><\/pre>\n
Reading and Writing Data<\/h2>\n
Reading Data<\/h3>\n
\n
pd.read_csv()<\/code> – reads a CSV file.<\/li>\n
pd.read_excel()<\/code> – reads an Excel file.<\/li>\n
pd.read_sql()<\/code> – reads data from a SQL database.<\/li>\n<\/ul>\n
pd.read_csv()<\/code> as follows:<\/p>\n
data = pd.read_csv('data.csv')\n<\/code><\/pre>\n
Writing Data<\/h3>\n
\n
df.to_csv()<\/code> – write a DataFrame to a CSV file.<\/li>\n
df.to_excel()<\/code> – write a DataFrame to an Excel file.<\/li>\n
df.to_sql()<\/code> – writes data to a SQL database.<\/li>\n<\/ul>\n
df.to_csv()<\/code> as follows:<\/p>\n
df.to_csv('output.csv', index=False)\n<\/code><\/pre>\n
index=False<\/code> parameter will exclude the index column from the CSV file.<\/p>\n
Basic Operations<\/h2>\n
Viewing Data<\/h3>\n
\n
df.head()<\/code> – displays the first few rows of the DataFrame.<\/li>\n
df.tail()<\/code> – displays the last few rows of the DataFrame.<\/li>\n
df.index<\/code> – displays the index of the DataFrame.<\/li>\n
df.columns<\/code> – displays the column names of the DataFrame.<\/li>\n
df.shape<\/code> – displays the number of rows and columns of the DataFrame.<\/li>\n<\/ul>\n
print(df.head())\n<\/code><\/pre>\n
name age gender\n0 John 30 male\n1 Jane 25 female\n2 Alice 40 female\n3 Bob 35 male\n<\/code><\/pre>\n
Selection and Slicing<\/h3>\n
\n
df['column_name']<\/code> or
df.column_name<\/code> – select a column from the DataFrame.<\/li>\n
df.loc[row_label, col_label]<\/code> – select a subset of rows and columns using the row and column labels.<\/li>\n
df.iloc[row_num, col_num]<\/code> – select a subset of rows and columns using integer indexing.<\/li>\n
df.query()<\/code> – select rows based on a condition.<\/li>\n
df.filter()<\/code> – select columns based on a condition.<\/li>\n<\/ul>\n
print(df['name'])\n<\/code><\/pre>\n
0 John\n1 Jane\n2 Alice\n3 Bob\nName: name, dtype: object\n<\/code><\/pre>\n
print(df.loc[0:1, ['name', 'gender']])\n<\/code><\/pre>\n
name gender\n0 John male\n1 Jane female\n<\/code><\/pre>\n
Filtering<\/h3>\n
print(df[df.age > 30])\n<\/code><\/pre>\n
name age gender\n2 Alice 40 f\n3 Bob 35 m\n<\/code><\/pre>\n
Grouping<\/h3>\n
grouped_data = df.groupby(['gender'])['age'].mean()\nprint(grouped_data)\n<\/code><\/pre>\n
gender\nfemale 32.5\nmale 32.5\nName: age, dtype: float64\n<\/code><\/pre>\n
Conclusion<\/h2>\n